date:20250714

Re: [PATCH 1/1] aarch64: AND/BIC combines for unpacked SVE FP comparisons

2025-07-14 Thread Richard Sandiford

Spencer Abson  writes:
> This patch extends the splitting patterns for combining FP comparisons
> with predicated logical operations such that they cover all of SVE_F.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve.md (*fcm_and_combine):
>   Extend from SVE_FULL_F to SVE_F.
>   (*fcmuo_and_combine): Likewise.
>   (*fcm_bic_combine): Likewise.
>   (*fcm_nor_combine): Likewise.
>   (*fcmuo_bic_combine): Likewise.
>   (*fcmuo_nor_combine): Likewise.  Move the comment here to
>   above fcmuo_bic_combine, since it applies to both patterns.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/unpacked_fcm_combines_1.c: New test.
>   * gcc.target/aarch64/sve/unpacked_fcm_combines_2.c: Likewise.

OK.  Thanks for catching the extra optimisations.

Richard

> ---
>  gcc/config/aarch64/aarch64-sve.md | 26 +++---
>  .../aarch64/sve/unpacked_fcm_combines_1.c | 17 +
>  .../aarch64/sve/unpacked_fcm_combines_2.c | 35 +++
>  3 files changed, 65 insertions(+), 13 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_combines_1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_combines_2.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 6b5113eb70f..10aecf1f190 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8690,8 +8690,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w, w")
> -  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "Dz, w")]
> +  (match_operand:SVE_F 2 "register_operand" "w, w")
> +  (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")]
>   SVE_COND_FP_CMP_I0)
> (match_operand: 4 "register_operand" "Upl, Upl")))]
>"TARGET_SVE"
> @@ -8713,8 +8713,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "register_operand" "w")]
> +  (match_operand:SVE_F 2 "register_operand" "w")
> +  (match_operand:SVE_F 3 "register_operand" "w")]
>   UNSPEC_COND_FCMUO)
> (match_operand: 4 "register_operand" "Upl")))]
>"TARGET_SVE"
> @@ -8740,8 +8740,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
> +  (match_operand:SVE_F 2 "register_operand" "w")
> +  (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "wDz")]
>   SVE_COND_FP_CMP_I0))
>   (match_operand: 4 "register_operand" "Upa"))
> (match_dup: 1)))
> @@ -8777,8 +8777,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "aarch64_simd_reg_or_zero" "wDz")]
> +  (match_operand:SVE_F 2 "register_operand" "w")
> +  (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "wDz")]
>   SVE_COND_FP_CMP_I0))
>   (not:
> (match_operand: 4 "register_operand" "Upa")))
> @@ -8808,6 +8808,7 @@
>  }
>  )
>  
> +;; Same for unordered comparisons.
>  (define_insn_and_split "*fcmuo_bic_combine"
>[(set (match_operand: 0 "register_operand" "=Upa")
>   (and:
> @@ -8816,8 +8817,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "register_operand" "w")]
> +  (match_operand:SVE_F 2 "register_operand" "w")
> +  (match_operand:SVE_F 3 "register_operand" "w")]
>   UNSPEC_COND_FCMUO))
>   (match_operand: 4 "register_operand" "Upa"))
> (match_dup: 1)))
> @@ -8843,7 +8844,6 @@
>  }
>  )
>  
> -;; Same for unordered comparisons.
>  (define_insn_and_split "*fcmuo_nor_combine"
>[(set (match_operand: 0 "register_operand" "=Upa")
>   (and:
> @@ -8852,8 +8852,8 @@
> (unspec:
>   [(match_operand: 1)
>(const_int SVE_KNOWN_PTRUE)
> -  (match_operand:SVE_FULL_F 2 "register_operand" "w")
> -  (match_operand:SVE_FULL_F 3 "register_operand" "w")]
> +  (match_operand:SVE_F 2 "register_operand" "w")
> +  (match_operand:SVE_F 3 "register_operand" "w")]
>   UNSPEC_COND_FCMUO))
>   (not:
> (match_operand: 4 "register_operand" "Upa")))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unpacked_fcm_c

Re: [PATCH 0/3] fortran: Reallocation on assignment tweaks

2025-07-14 Thread Harald Anlauf


Hi Mikael,

Am 10.07.25 um 22:01 schrieb Mikael Morin:

From: Mikael Morin 

Hello,

here are three patches as follow-up to this message.
These started as an attempt to remove the PR fortran/108889 workaround,
which I didn't understand.
I had to keep it in the end but this is what I could save from that
failed attempt.

Regression tested on x86_64-pc-linux-gnu.
OK for master?

Mikael Morin (3):
   fortran: Generate array reallocation out of loops
   fortran: Delay evaluation of array bounds after reallocation
   fortran: Amend descriptor bounds init if unallocated

  gcc/fortran/gfortran.h |   4 -
  gcc/fortran/trans-array.cc | 229 +++--
  gcc/fortran/trans-expr.cc  |  35 +++---
  3 files changed, 165 insertions(+), 103 deletions(-)



this seems to make a lot of sense and looks good to me.  OK for trunk.

Thanks for the patch!

Harald

[committed] amdgcn: Don't clobber VCC if we don't need to

2025-07-14 Thread Andrew Stubbs

This is a hold-over from GCN3 where v_add always wrote to the condition
register, whether you wanted it or not.  This hasn't been true since GCN5, and
we dropped support for GCN3 a little while ago, so let's fix it.

There was actually a latent bug here because some other post-reload splitters
were generating v_add instructions without declaring the VCC clobber (at least
mul did this), so this should fix some wrong-code bugs also.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (add3): Rename ...
(add3): ... to this, remove the clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(add3_dup): Rename ...
(add3_dup): ... to this, and likewise.
(sub3): Rename ...
(sub3): ... to this, and likewise
* config/gcn/gcn.md (addsi3): Remove the DI clobber, and change the
instruction from v_add_co_u32 to v_add_u32.
(addsi3_scc): Likewise.
(subsi3): Likewise, but for v_sub_co_u32.
(muldi3): Likewise.
---
 gcc/config/gcn/gcn-valu.md | 23 ++-
 gcc/config/gcn/gcn.md  | 28 +++-
 2 files changed, 21 insertions(+), 30 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 4b21302e82c..f49c1ed0b6d 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -1455,28 +1455,26 @@ (define_insn "@dpp_distribute_odd"
 ;; }}}
 ;; {{{ ALU special case: add/sub
 
-(define_insn "add3"
+(define_insn "add3"
   [(set (match_operand:V_INT_1REG 0 "register_operand")
(plus:V_INT_1REG
  (match_operand:V_INT_1REG 1 "register_operand")
- (match_operand:V_INT_1REG 2 "gcn_alu_operand")))
-   (clobber (reg:DI VCC_REG))]
+ (match_operand:V_INT_1REG 2 "gcn_alu_operand")))]
   ""
   {@ [cons: =0, %1, 2; attrs: type, length]
-  [v,v,vSvA;vop2,4] v_add_co_u32\t%0, vcc, %2, %1
+  [v,v,vSvA;vop2,4] {v_add_u32|v_add_nc_u32}\t%0, %2, %1
   [v,v,vSvB;vop2,8] ^
   })
 
-(define_insn "add3_dup"
+(define_insn "add3_dup"
   [(set (match_operand:V_INT_1REG 0 "register_operand")
(plus:V_INT_1REG
  (vec_duplicate:V_INT_1REG
(match_operand: 2 "gcn_alu_operand"))
- (match_operand:V_INT_1REG 1 "register_operand")))
-   (clobber (reg:DI VCC_REG))]
+ (match_operand:V_INT_1REG 1 "register_operand")))]
   ""
   {@ [cons: =0, 1, 2; attrs: type, length]
-  [v,v,SvA;vop2,4] v_add_co_u32\t%0, vcc, %2, %1
+  [v,v,SvA;vop2,4] {v_add_u32|v_add_nc_u32}\t%0, %2, %1
   [v,v,SvB;vop2,8] ^
   })
 
@@ -1551,16 +1549,15 @@ (define_insn "addc3"
   [(set_attr "type" "vop2,vop3b")
(set_attr "length" "4,8")])
 
-(define_insn "sub3"
+(define_insn "sub3"
   [(set (match_operand:V_INT_1REG 0 "register_operand"  "=  v,   v")
(minus:V_INT_1REG
  (match_operand:V_INT_1REG 1 "gcn_alu_operand" "vSvB,   v")
- (match_operand:V_INT_1REG 2 "gcn_alu_operand" "   v,vSvB")))
-   (clobber (reg:DI VCC_REG))]
+ (match_operand:V_INT_1REG 2 "gcn_alu_operand" "   v,vSvB")))]
   ""
   "@
-   v_sub_co_u32\t%0, vcc, %1, %2
-   v_subrev_co_u32\t%0, vcc, %2, %1"
+   {v_sub_u32|v_sub_nc_u32}\t%0, %1, %2
+   {v_subrev_u32|v_subrev_nc_u32}\t%0, %2, %1"
   [(set_attr "type" "vop2")
(set_attr "length" "8,8")])
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 2ce2e054fbf..9193461ed49 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1136,14 +1136,13 @@ (define_insn "addsi3"
   [(set (match_operand:SI 0 "register_operand" "= Sg, Sg, Sg,   v")
 (plus:SI (match_operand:SI 1 "gcn_alu_operand" "%SgA,  0,SgA,   v")
 (match_operand:SI 2 "gcn_alu_operand" " SgA,SgJ,  B,vBSv")))
-   (clobber (match_scratch:BI 3   "= cs, cs, cs,   
X"))
-   (clobber (match_scratch:DI 4   "=  X,  X,  X,  
cV"))]
+   (clobber (match_scratch:BI 3   "= cs, cs, cs,   
X"))]
   ""
   "@
s_add_i32\t%0, %1, %2
s_addk_i32\t%0, %2
s_add_i32\t%0, %1, %2
-   v_add_co_u32\t%0, vcc, %2, %1"
+   {v_add_u32|v_add_nc_u32}\t%0, %2, %1"
   [(set_attr "type" "sop2,sopk,sop2,vop2")
(set_attr "length" "4,4,8,8")])
 
@@ -1151,8 +1150,7 @@ (define_expand "addsi3_scc"
   [(parallel [(set (match_operand:SI 0 "register_operand")
   (plus:SI (match_operand:SI 1 "gcn_alu_operand")
(match_operand:SI 2 "gcn_alu_operand")))
- (clobber (reg:BI SCC_REG))
- (clobber (scratch:DI))])]
+ (clobber (reg:BI SCC_REG))])]
   ""
   {})
 
@@ -1332,14 +1330,13 @@ (define_insn "subsi3"
   [(set (match_operand:SI 0 "register_operand"  "=Sg, Sg,v,   v")
(minus:SI (match_operand:SI 1 "gcn_alu_operand" "SgA,SgA,v,vBSv")
  (match_operand:SI 2 "gcn_alu_operand" "SgA,  B, vBSv,   v")))
-   (clobber (match_scratch:BI 3"=cs, cs,X, 
  X"))
-   (clobber (match_scratch:DI 4

Re: [PATCH] libstdc++: library side of C++26 P2786R13 - Trivial Relocatability [PR119064]

2025-07-14 Thread Tomasz Kaminski

I have tried using array inside unanimous union instead of local char
buffer, but was blocked by following issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121068

The idea here, would be to have a function:
template
void star_array_lifetime(T(&arr)[N])
{
   new(arr) T[N];
   for (int i = 0; i < N; ++i)
 arr[i].~T();
}

And the in test instead of operating on char buffer, use array inside
anonymous union:
union { S a[20]; };
star_array_lifetime(a);

After that, the remaining placement new code should work correctly, when
PR121068 is fixed.

On Mon, Jul 14, 2025 at 3:30 PM Tomasz Kaminski  wrote:

>
>
> On Tue, Jun 17, 2025 at 1:15 PM Jakub Jelinek  wrote:
>
>> Hi!
>>
>> Here is a new version of the library side of the C++26 P2786R13 paper.
>> For if constexpr the patch uses __builtin_constant_p trick to figure
>> out if __result is non-equality comparable with __first, it adds recursion
>> for the is_array_v cases, adds qualification on several calls and rewrites
>> the testcase, such that it is hopefully valid and also tests the constant
>> evaluation.
>>
>> 2025-06-17  Jakub Jelinek  
>>
>> PR c++/119064
>> * include/bits/version.def (trivially_relocatable): New.
>> * include/bits/version.h: Regenerate.
>> * include/std/type_traits (std::is_trivially_relocatable,
>> std::is_nothrow_relocatable, std::is_replaceable): New traits.
>> std::is_trivially_relocatable_v, std::is_nothrow_relocatable_v,
>> std::is_replaceable_v): New trait variable templates.
>> * include/std/memory (__glibcxx_want_trivially_relocatable):
>> Define
>> before including bits/version.h.
>> (std::trivially_relocate): New template function.
>> (std::relocate): Likewise.
>> * testsuite/std/memory/relocate/relocate.cc: New test.
>>
>> --- libstdc++-v3/include/bits/version.def.jj2025-06-12
>> 20:19:52.367395730 +0200
>> +++ libstdc++-v3/include/bits/version.def   2025-06-16
>> 22:10:09.415721974 +0200
>> @@ -2012,6 +2012,15 @@ ftms = {
>>};
>>  };
>>
>> +ftms = {
>> +  name = trivially_relocatable;
>> +  values = {
>> +v = 202502;
>> +cxxmin = 26;
>> +extra_cond = "__cpp_trivial_relocatability >= 202502L";
>> +  };
>> +};
>> +
>>  // Standard test specifications.
>>  stds[97] = ">= 199711L";
>>  stds[03] = ">= 199711L";
>> --- libstdc++-v3/include/bits/version.h.jj  2025-06-12
>> 20:19:52.367395730 +0200
>> +++ libstdc++-v3/include/bits/version.h 2025-06-16 22:10:09.416721960
>> +0200
>> @@ -2253,4 +2253,14 @@
>>  #endif /* !defined(__cpp_lib_sstream_from_string_view) &&
>> defined(__glibcxx_want_sstream_from_string_view) */
>>  #undef __glibcxx_want_sstream_from_string_view
>>
>> +#if !defined(__cpp_lib_trivially_relocatable)
>> +# if (__cplusplus >  202302L) && (__cpp_trivial_relocatability >=
>> 202502L)
>> +#  define __glibcxx_trivially_relocatable 202502L
>> +#  if defined(__glibcxx_want_all) ||
>> defined(__glibcxx_want_trivially_relocatable)
>> +#   define __cpp_lib_trivially_relocatable 202502L
>> +#  endif
>> +# endif
>> +#endif /* !defined(__cpp_lib_trivially_relocatable) &&
>> defined(__glibcxx_want_trivially_relocatable) */
>> +#undef __glibcxx_want_trivially_relocatable
>> +
>>  #undef __glibcxx_want_all
>> --- libstdc++-v3/include/std/type_traits.jj 2025-06-12
>> 00:20:03.898666479 +0200
>> +++ libstdc++-v3/include/std/type_traits2025-06-16
>> 22:10:09.416721960 +0200
>> @@ -4245,6 +4245,60 @@ template>
>>  #endif // C++2a
>>
>> +#if __glibcxx_trivially_relocatable >= 202502L // C++ >= 26 &&
>> __cpp_trivial_relocatability >= 202502
>> +  /// True if the type is a trivially relocatable type.
>> +  /// @since C++26
>> +
>> +  template
>> +struct is_trivially_relocatable
>> +# if __has_builtin(__builtin_is_trivially_relocatable)
>> +: bool_constant<__builtin_is_trivially_relocatable(_Tp)>
>> +# else
>> +: bool_constant<__builtin_is_cpp_trivially_relocatable(_Tp)>
>> +# endif
>> +{ };
>> +
>> +  template
>> +struct is_nothrow_relocatable
>> +# if _GLIBCXX_USE_BUILTIN_TRAIT(__builtin_is_nothrow_relocatable)
>> +: bool_constant<__builtin_is_nothrow_relocatable(_Tp)>
>> +# else
>> +: public __or_,
>> +
>> __and_>,
>> +
>>  is_nothrow_destructible>>>::type
>> +# endif
>> +{ };
>> +
>> +  template
>> +struct is_replaceable
>> +: bool_constant<__builtin_is_replaceable(_Tp)>
>> +{ };
>> +
>> +  /// @ingroup variable_templates
>> +  /// @since C++26
>> +  template
>> +inline constexpr bool is_trivially_relocatable_v
>> +# if __has_builtin(__builtin_is_trivially_relocatable)
>> +  = __builtin_is_trivially_relocatable(_Tp);
>> +# else
>> +  = __builtin_is_cpp_trivially_relocatable(_Tp);
>> +# endif
>> +
>> +  template
>> +inline constexpr bool is_nothrow_relocatable_v
>> +# if _GLIBCXX_USE_BUILTIN_TRAIT(__builtin_is_nothrow_relocatable)
>> +  = __builtin_is_nothrow_relocatable(_Tp);
>> +# else
>> +  = (is_trivially_relocat

[PATCH v4] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Umesh Kalappa

Fixed the testcase for the commandline typo error and no regress found
for "runtest --tool gcc --target_board='riscv-sim/-mtune=mips-p8700 ' 
riscv.exp" run.

gcc/ChangeLog:

*config/riscv/riscv-cores.def(RISCV_CORE): Updated the supported march.
*config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
New file added for mips conditional mov extension.
*config/riscv/riscv-ext.def: Likewise.
*config/riscv/t-riscv: Generates riscv-ext.opt
*config/riscv/riscv-ext.opt: Generated file.
*config/riscv/riscv.cc(riscv_expand_conditional_move): Updated for mips 
cmov
and outlined some code that handle arch cond move.
*config/riscv/riscv.md(movcc): updated expand for MIPS CCMOV.
*config/riscv/mips-insn.md: New file for mips-p8700 ccmov insn.
*gcc/doc/riscv-ext.texi: Updated for mips cmov.

gcc/testsuite/ChangeLog:

*testsuite/gcc.target/riscv/mipscondmov.c: Test file for mips.ccmov 
insn.

---
 gcc/config/riscv/mips-insn.md|  36 +++
 gcc/config/riscv/riscv-cores.def |   3 +-
 gcc/config/riscv/riscv-ext-mips.def  |  35 ++
 gcc/config/riscv/riscv-ext.def   |   1 +
 gcc/config/riscv/riscv-ext.opt   |   4 +
 gcc/config/riscv/riscv.cc| 107 +--
 gcc/config/riscv/riscv.md|   3 +-
 gcc/config/riscv/t-riscv |   3 +-
 gcc/doc/riscv-ext.texi   |   4 +
 gcc/testsuite/gcc.target/riscv/mipscondmov.c |  28 +
 10 files changed, 187 insertions(+), 37 deletions(-)
 create mode 100644 gcc/config/riscv/mips-insn.md
 create mode 100644 gcc/config/riscv/riscv-ext-mips.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c

diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
new file mode 100644
index 000..de53638d587
--- /dev/null
+++ b/gcc/config/riscv/mips-insn.md
@@ -0,0 +1,36 @@
+;; Machine description for MIPS custom instructions.
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_insn "*movcc_bitmanip"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (if_then_else:GPR
+ (any_eq:X (match_operand:X 1 "register_operand" "r")
+(match_operand:X 2 "const_0_operand" "J"))
+(match_operand:GPR 3 "reg_or_0_operand" "rJ")
+(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
+  "TARGET_XMIPSCMOV"
+{
+  enum rtx_code code = ;
+  if (code == NE)
+return "mips.ccmov\t%0,%1,%z3,%z4";
+  else
+return "mips.ccmov\t%0,%1,%z4,%z3";
+}
+[(set_attr "type" "condmove")
+ (set_attr "mode" "")])
diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 2096c0095d4..98f347034fb 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -169,7 +169,6 @@ RISCV_CORE("xiangshan-kunminghu",   
"rv64imafdcbvh_sdtrig_sha_shcounterenw_"
  "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
  "xiangshan-kunminghu")
 
-RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
- "zaamo_zalrsc_zba_zbb",
+RISCV_CORE("mips-p8700",  "rv64imfd_zicsr_zifencei_zalrsc_zba_zbb",
  "mips-p8700")
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-ext-mips.def 
b/gcc/config/riscv/riscv-ext-mips.def
new file mode 100644
index 000..5d7836d5999
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-mips.def
@@ -0,0 +1,35 @@
+/* MIPS extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to mak

Re: [PATCH v2] c++: P2036R3 - Change scope of lambda trailing-return-type [PR102610]

2025-07-14 Thread Jason Merrill


On 7/11/25 5:49 PM, Marek Polacek wrote:

On Thu, Jul 10, 2025 at 02:13:06PM -0400, Jason Merrill wrote:

On 7/9/25 4:27 PM, Marek Polacek wrote:

On Tue, Jul 08, 2025 at 12:15:03PM -0400, Jason Merrill wrote:

On 7/7/25 4:52 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch is an attempt to implement P2036R3 along with P2579R0, fixing
build breakages caused by P2036R3.

The simplest example is:

 auto counter1 = [j=0]() mutable -> decltype(j) {
 return j++;
 };

which currently doesn't compile because the 'j' in the capture isn't
visible in the trailing return type.  With these proposals, the 'j'
will be in a lambda scope which spans the trailing return type, so
this test will compile.

This oughtn't be difficult but decltype and other issues made this patch
much more challenging.

We have to push the explicit captures before going into _lambda_declarator_opt
because that is what parses the trailing return type.  Yet we can't build
any captures until after _lambda_body -> start_lambda_function which
creates the lambda's operator(), without which we can't build a proxy,
but _lambda_body happens only after parsing the declarator.  This patch
works around it by creating a fake operator() in make_dummy_lambda_op.


I was thinking that we could build the real operator() earlier, before the
trailing return type, so that it's there for the above uses, and then splice
in the trailing return type to the already-built function declaration,
perhaps with apply_deduced_return_type.


Ah, I see what you mean.  But it's not just the return type that we don't
have at the point where we have to have the operator(): it's also tx_qual,
exception_spec, std_attrs, and trailing_requires_clause.  Especially the
requires clause seems to be awkward to set post grokmethod; it seems I'd
have to replicate the flag_concepts block in grokfndecl?

Maybe I could add (by that I mean add it to the lambda via
finish_member_declaration) a bare bones operator() for the purposes of
parsing the return type/noexcept/requires, then after parsing them
construct a real operator(), then find a slot of the bare bones op(),
and replace it with the complete one.  I'm not sure if that makes sense
to do though.


I was hoping to avoid building more than one op().  But really, why do you
need an op() at all for building the proxies?  Could you use
build_dummy_object instead of DECL_ARGUMENTS of some fake op()?


The problem is that we need operator() to be the var's DECL_CONTEXT
for is_capture_proxy:

   && LAMBDA_FUNCTION_P (DECL_CONTEXT (decl)));


Maybe we could set their DECL_CONTEXT to the closure type and adjust 
is_capture_proxy to handle that case as well?



Another thing is that in "-> decltype(j)" we don't have the right
current_function_decl yet, so I've added the in_lambda_declarator_p flag
to be used in finish_decltype_type so that we know this decltype appertains
to a lambda -- then current_lambda_expr should give us the right lambda,
which has another new flag tracking whether mutable was seen.


The flag to finish_decltype_type seems unneeded; we should be able to tell
from the proxy that it belongs to a lambda.  And I would think that the new
handling in finish_decltype_type seems right in general; always refer to
current_lambda_expr instead of current_function_decl, etc.


Good point.  I've removed the flag and simplified the patch quite a bit.
However:
- to honor [expr.prim.id.unqual]/4, I have to know if the decltype is
   in the lambda's parameter-declaration-clause or not:

 [=]() -> decltype((x))  // float const&

 [=](decltype((x)) y)// float&


   so I'm using LAMBDA_EXPR_CONST_QUAL_P for that.


Makes sense.


- if we want to handle nested lambdas correctly:

[=](decltype((x)) y) {}  // float&

[=] {
  [](decltype((x)) y) {};  // float const&
}

   we probably will need a new flag for decltype.


Hmm?  Since the inner lambda has no capture-default, it doesn't qualify 
under https://eel.is/c++draft/expr#prim.id.unqual-4.3 , so we look to 
the outer lambda instead.


I would believe that we need to improve finish_decltype_type to handle 
this properly (see also PR112926) but I don't see the need for a 
decltype flag.



@@ -3351,8 +3351,12 @@ check_local_shadow (tree decl)
}
/* Don't complain if it's from an enclosing function.  */
else if (DECL_CONTEXT (old) == current_function_decl
-  && TREE_CODE (decl) != PARM_DECL
-  && TREE_CODE (old) == PARM_DECL)
+  && ((TREE_CODE (decl) != PARM_DECL
+   && TREE_CODE (old) == PARM_DECL)
+  || (is_capture_proxy (old)
+  && current_lambda_expr ()
+  && DECL_CONTEXT (old)
+ == lambda_function (current_lambda_expr ()


What case is this handling?  Doesn't the previous if already deal with
parm/capture collision?


The proposal say

Re: [PATCH v2 1/2] Match: Refine the widen mul check for SAT_MUL pattern

2025-07-14 Thread Richard Biener

On Sat, Jul 12, 2025 at 10:59 AM  wrote:
>
> From: Pan Li 
>
> The widen mul will have source type from N-bits to
> dest type 2N-bits.  The previous check only focus on
> the HOST_WIDE_INT but not working for QI => HI, HI => SI
> and SI to DImode.  Thus, refine the widen mul precision
> check as dest has twice bits of input.

OK.

> gcc/ChangeLog:
>
> * match.pd: Make sure widen mul has twice bitsize
> of the inputs in SAT_MUL pattern.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 67b33eee5f7..7f84d5149f4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3605,11 +3605,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>unsigned widen_prec = TYPE_PRECISION (TREE_TYPE (@3));
>unsigned cvt5_prec = TYPE_PRECISION (TREE_TYPE (@5));
>unsigned cvt6_prec = TYPE_PRECISION (TREE_TYPE (@6));
> -  unsigned hw_int_prec = sizeof (HOST_WIDE_INT) * 8;
>wide_int c2 = wi::to_wide (@2);
>wide_int max = wi::mask (prec, false, widen_prec);
>bool c2_is_max_p = wi::eq_p (c2, max);
> -  bool widen_mult_p = cvt5_prec == cvt6_prec && hw_int_prec == cvt5_prec;
> +  bool widen_mult_p = cvt5_prec == cvt6_prec && widen_prec == cvt6_prec 
> * 2;
>   }
>   (if (widen_prec > prec && c2_is_max_p && widen_mult_p)
>  )
> --
> 2.43.0
>

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> >
> > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > >
> > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > Author: H.J. Lu 
> > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > >
> > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > >
> > > > > > > > replaces
> > > > > > > >
> > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) 
> > > > > > > > [0  S8 A64])) 2031
> > > > > > > >  {*movv2sf_internal}
> > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > ])
> > > > > > > > (nil)))
> > > > > > > >
> > > > > > > > with
> > > > > > > >
> > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > (const_vector:V8QI [
> > > > > > > > (const_int -1 [0x]) repeated x8
> > > > > > > > ])) -1
> > > > > > > >  (nil))
> > > > > > > > ...
> > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 {*movv2sf_internal}
> > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > ])
> > > > > > > > (nil)))
> > > > > > > >
> > > > > > > > which leads to
> > > > > > > >
> > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > >34 | }
> > > > > > > >   | ^
> > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > (const_vector:V8QI [
> > > > > > > > (const_int -1 [0x]) repeated x8
> > > > > > > > ])) -1
> > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > (const_int -1 [0x]) repeated x8
> > > > > > > > ])
> > > > > > > > (nil)))
> > > > > > > > during RTL pass: ira
> > > > > > > >
> > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > vector -1.
> > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for nonimmediate, 
> > > > > > > > vector 0
> > > > > > > > or integer vector -1 operand.
> > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s operand.
> > > > > > > > 4. Update MMXMODE:*mov_internal to support integer all 1s 
> > > > > > > > vectors.
> > > > > > > > Replace  with  to generate
> > > > > > > >
> > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > >
> > > > > > > > for
> > > > > > > >
> > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > >  (const_vector:V8QI [(const_int -1 [0x]) 
> > > > > > > > repeated x8]))
> > > > > > > >
> > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > >
> > > > > > > Actually, we don't want this, we should keep the top 64 bits zero,
> > > > > > > especially for floating point, where the pattern represents NaN.
> > > > > > >
> > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > narrower modes in the first place.
> > > > > > >
> > > > > >
> > > > > > How does your latest patch handle this?
> > > > > >
> > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > >
> > > > > > __v8qi
> > > > > > m1 (void)
> > > > > > {
> > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > }
> > > > >
> > > > > No, my patch is also not appropriate, because it also introduces
> > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load using
> > > > > pcmpeq, because upper 64 bits are also all 1s.
> > > > >
> > > > > The correct way is to avoid generating 64 bit all-ones, because this
> > > > > constant is not supported and   standard_sse_constant_p () correctly
> > > > > reports this.
> > > >
> > > > We can generate
> > > >
> > > > pcmpeqd %xmm0, %xmm0
> > > > movq %xmm0, %xmm0
> > > >
> > > > for V8QI and
> > > >
> > > > pcmpeqd %xmm0, %xmm0
> > > > movd %xmm0, %xmm0
> > > >
> > > > for V4QI.
> > >
> > > I don't think this is better than skipping the transformation for
> > > instructions that we in fact emulate altogether. While loading
> > > all-zero is OK in any mode, loading all-one is not OK for narrow
> > > modes. So, this transformation should simply be skipped for all-one

Re: [PATCH] aarch64: PR target/120999: Avoid movprfx for NBSL implementation of NOR

2025-07-14 Thread Richard Sandiford

Kyrylo Tkachov  writes:
>> On 11 Jul 2025, at 16:48, Richard Sandiford  
>> wrote:
>>> Shall I backport this for GCC 15.2 as well?
>>> The test case uses C operators which were enabled in GCC 15, though I 
>>> suppose one could construct a pure ACLE intrinsics testcase too.
>> 
>> Sounds good to me.  It's fixing wrong code, even if the gas warning
>> makes it somewhat noisy wrong code.
>> 
>
> Looks like there’s a simple merge conflicts due to trunk also having 
> http://gcc.gnu.org/g:f260146bc05f6fba7b2a67a62063c770588b769d
> Author: Richard Earnshaw 
> Date:   Mon Apr 14 16:41:16 2025 +0100
>
> aarch64: Fix up commutative and early-clobber markers on compact inns
>
> I’d like to backport that commit as well as it looks like a low-risk cleanup.
> Both commits bootstrap and test cleanly on the branch.
> Ok?

Ok from my POV, thanks.

Richard

[PATCH v2] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread pan2 . li

From: Pan Li 

The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator.  Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.

Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_floor): Add new
pattern of avg3_floor for rvv DImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/avg.h: Add int128 type when
xlen == 64.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c:
Suppress __int128 warning for run test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_data.h: Fix one incorrect
test data.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i64-from-i128.c: New 
test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md  | 13 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h |  5 +
 .../rvv/autovec/avg_ceil-run-1-i16-from-i32.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i16-from-i64.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i32-from-i64.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i16.c |  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i32.c |  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i64.c |  2 +-
 .../gcc.target/riscv/rvv/autovec/avg_data.h  |  2 +-
 .../rvv/autovec/avg_floor-1-i64-from-i128.c  | 12 
 .../rvv/autovec/avg_floor-run-1-i16-from-i32.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i16-from-i64.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i32-from-i64.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i64-from-i128.c  | 16 
 .../rvv/autovec/avg_floor-run-1-i8-from-i16.c|  2 +-
 .../rvv/autovec/avg_floor-run-1-i8-from-i32.c|  2 +-
 .../rvv/autovec/avg_floor-run-1-i8-from-i64.c|  2 +-
 17 files changed, 59 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i64-from-i128.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94a61bdc5cf..2e86826f286 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2499,6 +2499,19 @@ (define_expand "avg3_floor"
   }
 )
 
+(define_expand "avg3_floor"
+ [(match_operand:V_VLSI_D 0 "register_operand")
+  (match_operand:V_VLSI_D 1 "register_operand")
+  (match_operand:V_VLSI_D 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+insn_code icode = code_for_pred (UNSPEC_VAADD, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP_VXRM_RDN,
+  operands);
+DONE;
+  }
+)
+
 (define_expand "avg3_ceil"
  [(set (match_operand: 0 "register_operand")
(truncate:
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
index 4aeb637bba7..2de7d7c49df 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
@@ -3,6 +3,11 @@
 
 #include 
 
+#if __riscv_xlen == 64
+typedef unsigned __int128 uint128_t;
+typedef signed __int128 int128_t;
+#endif
+
 #define DEF_AVG_0(NT, WT, NAME) \
 __attribute__((noinline))   \
 void\
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
index 1fa080b3933..3d872a8a4b5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { riscv_v } } } */
-/* { dg-additional-options "-std=c99 -O3" } */
+/* { dg-additional-options "-std=c99 -O3 -Wno-p

[PATCH 1/2] libstdc++: Add missing initializers for __maybe_present_t members [PR119962]

2025-07-14 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
15?  Not sure if this corner case is worth backporting any further.

Can we just use direct-list-initialization via {} instead of '= T()'
here?  I wasn't sure so I went with the latter to more closely mirror
the standard.

-- >8 --

Data members of type __maybe_present_t where the conditionally present
type might be an aggregate or fundamental type need to be explicitly
value-initialized (rather than implicitly default-initialized) to ensure
that default-initialization of the containing class results in an
completely initialized object.

PR libstdc++/119962

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_outer): Initialize.
(lazy_split_view::_OuterIter::_M_current): Initialize.
(join_with_view::_Iterator::_M_outer_it): Initialize.
* testsuite/std/ranges/adaptors/join.cc (test15): New test.
* testsuite/std/ranges/adaptors/join_with/1.cc (test05): New test.
* testsuite/std/ranges/adaptors/lazy_split.cc (test13): New test.
---
 libstdc++-v3/include/std/ranges  | 9 ++---
 libstdc++-v3/testsuite/std/ranges/adaptors/join.cc   | 8 
 .../testsuite/std/ranges/adaptors/join_with/1.cc | 8 
 libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc | 8 
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 3a6710bd0ae1..efe62969d657 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3022,7 +3022,8 @@ namespace views::__adaptor
  { _M_satisfy(); }
 
  [[no_unique_address]]
-   __detail::__maybe_present_t, _Outer_iter> 
_M_outer;
+   __detail::__maybe_present_t, _Outer_iter> 
_M_outer
+ = decltype(_M_outer)();
  optional<_Inner_iter> _M_inner;
  _Parent* _M_parent = nullptr;
 
@@ -3376,7 +3377,8 @@ namespace views::__adaptor
 
  [[no_unique_address]]
__detail::__maybe_present_t,
-   iterator_t<_Base>> _M_current;
+   iterator_t<_Base>> _M_current
+ = decltype(_M_current)();
  bool _M_trailing_empty = false;
 
public:
@@ -7400,7 +7402,8 @@ namespace views::__adaptor
 
 _Parent* _M_parent = nullptr;
 [[no_unique_address]]
-  __detail::__maybe_present_t, _OuterIter> 
_M_outer_it;
+  __detail::__maybe_present_t, _OuterIter> _M_outer_it
+   = decltype(_M_outer_it)();
 variant<_PatternIter, _InnerIter> _M_inner_it;
 
 constexpr _OuterIter&
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
index 2861115c22a0..a9395b489919 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
@@ -233,6 +233,13 @@ test14()
   VERIFY( ranges::equal(v | views::join, (int[]){1, 2, 3}) );
 }
 
+void
+test15()
+{
+  // PR libstdc++/119962 - __maybe_present_t misses initialization
+  constexpr decltype(views::join(views::single(views::single(0))).begin()) it;
+}
+
 int
 main()
 {
@@ -250,4 +257,5 @@ main()
   test12();
   test13();
   test14();
+  test15();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join_with/1.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/join_with/1.cc
index 8ab30a5277da..4d55c9d3be78 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/join_with/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/join_with/1.cc
@@ -94,6 +94,13 @@ test04()
   return true;
 }
 
+void
+test05()
+{
+  // PR libstdc++/119962 - __maybe_present_t misses initialization
+  constexpr decltype(views::join_with(views::single(views::single(0)), 
0).begin()) it;
+}
+
 int
 main()
 {
@@ -105,4 +112,5 @@ main()
 #else
   VERIFY(test04());
 #endif
+  test05();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
index 81fc60b362a8..321ae271bf2b 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/lazy_split.cc
@@ -232,6 +232,13 @@ test12()
   return true;
 }
 
+void
+test13()
+{
+  // PR libstdc++/119962 - __maybe_present_t misses initialization
+  constexpr decltype(views::lazy_split(views::single(0), 0).begin()) it;
+}
+
 int
 main()
 {
@@ -247,4 +254,5 @@ main()
   test10();
   test11();
   static_assert(test12());
+  test13();
 }
-- 
2.50.1.271.gd30e120486

Re: ACCESS_WITH_SIZE for pointers Re: [PATCH] tree-optimization/120929: Limit MEM_REF handling to .ACCESS_WITH_SIZE

2025-07-14 Thread Richard Biener

On Mon, Jul 14, 2025 at 10:58 PM Qing Zhao  wrote:
>
>
> > On Jul 7, 2025, at 13:07, Qing Zhao  wrote:
> >
> > As I mentioned in the latest email I replied to the thread, the original 
> > implementation of the counted_by for pointer was implemented without the 
> > additional indirection.
> > But that implementation has a fundamental bug during testing.  then I 
> > changed the implementation like the current.
> >
> > I need spending a little more time to find the details of that fundamental 
> > bug with the original implementation.
> >
> > If the current bug is urgent to be fixed. and you are not comfortable with 
> > the simple Patch Sid provided, then I am okay to back it out now and then 
> > push it back with the fix to this current bug at a later time after 
> > everyone is comfortable with the current implementation.
> >
> > Thanks a lot!
> >
> > Qing
>
>
> Hi,  this is an update on the above fundamental issue I mentioned previously. 
> (I finally located this issue and recorded it here)
>
> 1. Based on the previous discussion on how to resolve PR120929, we agreed the 
> following solution:
>
> struct S {
>   int n;
>   int *p __attribute__((counted_by(n)));
> } *f;
>
> when generating a call to .ACCESS_WITH_SIZE for f->p, instead of generating
>  *.ACCESS_WITH_SIZE (&f->p, &f->n,...)
>
> We should generate
>  .ACCESS_WITH_SIZE (f->p, &f->n,...)
>
> i.e.,
> the return type and the type of the first argument of the call is the
>original pointer type in this version,
>instead of the pointer to the original pointer type in the 7th version;
>
> 2. I implemented this new .ACCESS_WITH_SIZE generation for pointers in my 
> local workspace. It looked fine in the beginning,
> However, during testing, I finally located the _fundamental issue_ with this 
> design.
>
> This issue can be shown clearly with the following simple testing case:
> (Note, the numbers on the left in the following testing case is the line #)
>
> $ cat t1.c
>   1 struct annotated {
>   2   int b;
>   3   int *c __attribute__ ((counted_by (b)));
>   4 } *p_array_annotated;
>   5
>   6 void __attribute__((__noinline__)) setup (int annotated_count)
>   7 {
>   8   p_array_annotated
>   9 = (struct annotated *) __builtin_malloc (sizeof (struct annotated));
>  10   p_array_annotated->c = (int *) __builtin_malloc (annotated_count * 
> sizeof (int));
>  11   p_array_annotated->c[2] = 10;
>  12   p_array_annotated->b = annotated_count;

But isn't this bogus since you access c[2] while it's counted_by value
is still uninitialized?
I'd say by using counted_by you now invoke UB here.

Richard.

>  13   return;
>  14 }
>  15
>  16 int main(int argc, char *argv[])
>  17 {
>  18   setup (10);
>  19   return 0;
>  20 }
>
> $my-gcc t1.c-O0 -g  -o ./t1.exe -fdump-tree-gimple
> $ ./t1.exe
> Segmentation fault (core dumped)
>
> 3. As I debugged, the segmentation fault happened at line 11: 
> p_array_annotated->c[2] = 10;
> Since the value of the pointer "p_array_annotated->c” is  0x0.
>
> 4. Study the gimple dump t1.c.007t.gimple as following:
>
>   1 __attribute__((noinline))
>   2 void setup (int annotated_count)
>   3 {
>   4   int * D.2969;
>   56   _1 = __builtin_malloc (16);
>   7   p_array_annotated = _1;
>   8   _2 = (long unsigned int) annotated_count;
>   9   _3 = _2 * 4;
>  10   p_array_annotated.0_4 = p_array_annotated;
>  11   _5 = p_array_annotated.0_4->c;
>  12   p_array_annotated.1_6 = p_array_annotated;
>  13   _7 = &p_array_annotated.1_6->b;
>  14   D.2969 = .ACCESS_WITH_SIZE (_5, _7, 0B, 4);
>  15   _8 = __builtin_malloc (_3);
>  16   D.2969 = _8;
>  17   p_array_annotated.2_10 = p_array_annotated;
>  18   _11 = p_array_annotated.2_10->c;
>  19   p_array_annotated.3_12 = p_array_annotated;
>  20   _13 = &p_array_annotated.3_12->b;
>  21   _9 = .ACCESS_WITH_SIZE (_11, _13, 0B, 4);
>  22   _14 = _9 + 8;
>  23   *_14 = 10;
>  24   p_array_annotated.4_15 = p_array_annotated;
>  25   p_array_annotated.4_15->b = annotated_count;
>  26   return;
>  27 }
>
> We can see the root cause of this problem is because we passed the _value_ of 
>  “p_array_annotated->c”
> instead of the _address_ of “p_array_annotated->c” to .ACCESS_WITH_SIZE:
>
> At line 11, the value of “p_array_annotated.0_4->c” is 0x0 when it was 
> assigned to “_5”;
> At line 14, the value of “_5” is passed to the call to .ACCESS_WITH_SIZE, 
> which also is 0x0;
>
> And later when we expand .ACCESS_WITH_SIZE (_5, _7, 0B, 4), we replace it 
> with its first argument “_5”,
> As a result, the IL after the expand will look like the following:
>
> 11   _5 = p_array_annotated.0_4->c;
> 14  D.2969 = _5;
> 15  _8 = __builtin_malloc (_3);
> 16  D.2969 = _8:
>
> We can clearly see that the above IL is wrong: p_array_annotated->c is 
> initialized as 0x0, and this value is passed
> to the pointer D.2969, which is 0x0 too.
>
> 5.  This is exactly the fundamental issue I met in the very beginning during 
> my implementation of the counted_by for
> pointe

[PATCH] i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F

2025-07-14 Thread Haochen Jiang

Hi all,

In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This
leads to re-consideration on the imply for AMX-AVX512.

Since it is using zmm register and using zmm register only, we
need to at least imply AVX512F. AVX512VL is not needed.

On the other hand, if we imply AVX10.1 for AMX-AVX512, it will
cause -mno-avx10.1 disabling AMX-AVX512. This would be a surprise
for users.

Based on the two reasons above, the patch is decoupling AMX-AVX512
from AVX10.2 and imply AVX512F.

Ok for trunk and backport to GCC15?

Thx,
Haochen

gcc/ChangeLog:

* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AMX_AVX512_SET): Do not set AVX10.2.
(OPTION_MASK_ISA2_AVX10_2_UNSET): Remove AMX-AVX512 unset.
(OPTION_MASK_ISA2_AVX512F_UNSET): Unset AMX-AVX512.
(ix86_handle_option): Imply AVX512F for AMX-AVX512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/amxavx512-cvtrowd2ps-2.c: Add -mavx512fp16 to
use FP16 related intrins for convert.
* gcc.target/i386/amxavx512-cvtrowps2bf16-2.c: Ditto.
* gcc.target/i386/amxavx512-cvtrowps2ph-2.c: Ditto.
* gcc.target/i386/amxavx512-movrow-2.c: Ditto.
---
 gcc/common/config/i386/i386-common.cc   | 13 ++---
 .../gcc.target/i386/amxavx512-cvtrowd2ps-2.c|  2 +-
 .../gcc.target/i386/amxavx512-cvtrowps2bf16-2.c |  2 +-
 .../gcc.target/i386/amxavx512-cvtrowps2ph-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/amxavx512-movrow-2.c  |  2 +-
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index dfcd4e9a727..9e807e4b8f6 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -131,8 +131,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX10_2_SET \
   (OPTION_MASK_ISA2_AVX10_1_SET | OPTION_MASK_ISA2_AVX10_2)
 #define OPTION_MASK_ISA2_AMX_AVX512_SET \
-  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AVX10_2_SET \
-   | OPTION_MASK_ISA2_AMX_AVX512)
+  (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_AVX512)
 #define OPTION_MASK_ISA2_AMX_TF32_SET \
   (OPTION_MASK_ISA2_AMX_TILE_SET | OPTION_MASK_ISA2_AMX_TF32)
 #define OPTION_MASK_ISA2_AMX_TRANSPOSE_SET \
@@ -328,8 +327,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR
 #define OPTION_MASK_ISA2_AVX10_1_UNSET \
   (OPTION_MASK_ISA2_AVX10_1 | OPTION_MASK_ISA2_AVX10_2_UNSET)
-#define OPTION_MASK_ISA2_AVX10_2_UNSET \
-  (OPTION_MASK_ISA2_AVX10_2 | OPTION_MASK_ISA2_AMX_AVX512_UNSET)
+#define OPTION_MASK_ISA2_AVX10_2_UNSET OPTION_MASK_ISA2_AVX10_2
 #define OPTION_MASK_ISA2_AMX_AVX512_UNSET OPTION_MASK_ISA2_AMX_AVX512
 #define OPTION_MASK_ISA2_AMX_TF32_UNSET OPTION_MASK_ISA2_AMX_TF32
 #define OPTION_MASK_ISA2_AMX_TRANSPOSE_UNSET OPTION_MASK_ISA2_AMX_TRANSPOSE
@@ -379,7 +377,8 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX512F_UNSET \
   (OPTION_MASK_ISA2_AVX512BW_UNSET \
| OPTION_MASK_ISA2_AVX512VP2INTERSECT_UNSET \
-   | OPTION_MASK_ISA2_AVX10_1_UNSET)
+   | OPTION_MASK_ISA2_AVX10_1_UNSET \
+   | OPTION_MASK_ISA2_AMX_AVX512_UNSET)
 #define OPTION_MASK_ISA2_GENERAL_REGS_ONLY_UNSET \
   OPTION_MASK_ISA2_SSE_UNSET
 #define OPTION_MASK_ISA2_AVX_UNSET \
@@ -1374,8 +1373,8 @@ ix86_handle_option (struct gcc_options *opts,
{
  opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AMX_AVX512_SET;
  opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AMX_AVX512_SET;
- opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX10_1_SET;
- opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX10_1_SET;
+ opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX512F_SET;
+ opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_SET;
}
   else
{
diff --git a/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowd2ps-2.c 
b/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowd2ps-2.c
index cfd5644c5bb..c9a2d19a726 100644
--- a/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowd2ps-2.c
+++ b/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowd2ps-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
 /* { dg-require-effective-target amx_avx512 } */
-/* { dg-options "-O2 -march=x86-64-v3 -mamx-avx512" } */
+/* { dg-options "-O2 -march=x86-64-v3 -mamx-avx512 -mavx512fp16" } */
 #define AMX_AVX512
 #define DO_TEST test_amx_avx512_cvtrowd2ps
 void test_amx_avx512_cvtrowd2ps();
diff --git a/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowps2bf16-2.c 
b/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowps2bf16-2.c
index acd5f76c96c..2014ec6f811 100644
--- a/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowps2bf16-2.c
+++ b/gcc/testsuite/gcc.target/i386/amxavx512-cvtrowps2bf16-2.c
@@ -1,6 +1,6 @@
 /* { dg-do run { target { ! ia32 } } } */
 /* { dg-require-effective-target amx_avx512 } */
-/* { dg-options "-O2 -march=x86-64-v3 -mamx-avx512" } */
+/* { dg-

Re: [PATCH 1/2] libstdc++: Ensure std::hash<__int128> is defined [PR96710]

2025-07-14 Thread Tomasz Kaminski

On Mon, Jul 14, 2025 at 10:34 PM Jonathan Wakely  wrote:

> This is a follow-up to r16-2190-g4faa42ac0dee2c which ensures that
> std::hash is always enabled for signed and unsigned __int128. The
> standard requires std::hash to be enabled for all arithmetic types.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/96710
> * include/bits/functional_hash.h (hash<__int128>): Define for
> strict modes.
> (hash): Likewise.
> * testsuite/20_util/hash/int128.cc: New test.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
> Truncating the result to size_t is unfortunate, but maybe too late to
> change. I've opened PR 121071 for that, as it also affect long long on
> 32-bit targets.
>
>  libstdc++-v3/include/bits/functional_hash.h   |  9 +
>  libstdc++-v3/testsuite/20_util/hash/int128.cc | 20 +++
>  2 files changed, 29 insertions(+)
>  create mode 100644 libstdc++-v3/testsuite/20_util/hash/int128.cc
>
> diff --git a/libstdc++-v3/include/bits/functional_hash.h
> b/libstdc++-v3/include/bits/functional_hash.h
> index e84c9ee04be2..8456089f768d 100644
> --- a/libstdc++-v3/include/bits/functional_hash.h
> +++ b/libstdc++-v3/include/bits/functional_hash.h
> @@ -199,6 +199,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_3 unsigned)
>  #endif
>
> +#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
> +  // In strict modes __GLIBCXX_TYPE_INT_N_0 is not defined for __int128,
> +  // but we want to always treat signed/unsigned __int128 as integral
> types.
>
I was going to ask you to add this comment, after
reading r16-2190-g4faa42ac0dee2c,
and then realized I missed it.

> +  __extension__
> +  _Cxx_hashtable_define_trivial_hash(__int128)
> +  __extension__
> +  _Cxx_hashtable_define_trivial_hash(__int128 unsigned)
> +#endif
> +
>  #undef _Cxx_hashtable_define_trivial_hash
>
>struct _Hash_impl
> diff --git a/libstdc++-v3/testsuite/20_util/hash/int128.cc
> b/libstdc++-v3/testsuite/20_util/hash/int128.cc
> new file mode 100644
> index ..7c3a1baa0ec6
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/hash/int128.cc
> @@ -0,0 +1,20 @@
> +// { dg-do run { target c++11 } }
> +// { dg-add-options strict_std }
> +
> +#include 
> +#include 
> +
> +int main()
> +{
> +#ifdef __SIZEOF_INT128__
> +  std::hash<__int128> h;
> +  __int128 i = (__int128)0x123456789;
> +  VERIFY( h(i) == i );
> +  VERIFY( h(-i) == (std::size_t)-i );
> +  VERIFY( h(~i) == (std::size_t)~i );
> +  std::hash hu;
> +  unsigned __int128 u = i;
> +  VERIFY( hu(u) == u );
> +  VERIFY( hu(~u) == (std::size_t)~u );
> +#endif
> +}
> --
> 2.50.1
>
>

Re: [PATCH 2/2] libstdc++: Ensure std::make_unsigned works for 128-bit enum

2025-07-14 Thread Tomasz Kaminski

On Mon, Jul 14, 2025 at 10:35 PM Jonathan Wakely  wrote:

> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (__make_unsigned_selector): Add
> unsigned __int128 to type list.
> * testsuite/20_util/make_unsigned/int128.cc: New test.
> ---
>
> Tested x86_64-linux.
>
LGTM.

>
>  libstdc++-v3/include/std/type_traits   |  7 ++-
>  .../testsuite/20_util/make_unsigned/int128.cc  | 14 ++
>  2 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
>
> diff --git a/libstdc++-v3/include/std/type_traits
> b/libstdc++-v3/include/std/type_traits
> index 78a5ee8c0eb4..ff23544fbf03 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -1992,8 +1992,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  : __make_unsigned_selector_base
>  {
>// With -fshort-enums, an enum may be as small as a char.
> +  __extension__
>using _UInts = _List -  unsigned long, unsigned long long>;
> +  unsigned long, unsigned long long
> +#ifdef __SIZEOF_INT128__
> +  , unsigned __int128
> +#endif
> + >;
>
>using __unsigned_type = typename __select _UInts>::__type;
>
> diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
> b/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
> new file mode 100644
> index ..46c07b7669e5
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
> @@ -0,0 +1,14 @@
> +// { dg-do compile { target c++11 } }
> +// { dg-add-options strict_std }
> +
> +#include 
> +
> +#ifdef __SIZEOF_INT128__
> +enum E : __int128 { };
> +using U = std::make_unsigned::type;
> +static_assert( std::is_integral::value, "type is an integer" );
> +static_assert( sizeof(U) == sizeof(E), "width of type is 128 bits" );
> +using I = std::make_signed::type;
> +static_assert( std::is_integral::value, "type is an integer" );
> +static_assert( sizeof(I) == sizeof(E), "width of type is 128 bits" );
> +#endif
> --
> 2.50.1
>
>

Ping: [PATCH, V3] Add -mcpu=future to the PowerPC

2025-07-14 Thread Michael Meissner

Ping patch:

| Date: Tue, 1 Jul 2025 12:14:32 -0400
| From: Michael Meissner 
| Subject: [PATCH, V3] Add -mcpu=future to the PowerPC
| Message-ID: 
https://gcc.gnu.org/pipermail/gcc-patches/2025-July/688251.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

[COMMITED] fortran: Fix indentation

2025-07-14 Thread Filip Kastl

Move a block of code two spaces to the left.  Commiting as obvious.

gcc/fortran/ChangeLog:

* resolve.cc (resolve_select_type): Fix indentation.

Signed-off-by: Filip Kastl 
---
 gcc/fortran/resolve.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 93df5d014fa..c33bd17da2d 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11014,16 +11014,16 @@ resolve_select_type (gfc_code *code, gfc_namespace 
*old_ns)
 that does precisely this here (instead of using the
 'global' one).  */
 
-   /* First check the derived type import status.  */
-   if (gfc_current_ns->import_state != IMPORT_NOT_SET
-   && (c->ts.type == BT_DERIVED || c->ts.type == BT_CLASS))
- {
-   st = gfc_find_symtree (gfc_current_ns->sym_root,
-  c->ts.u.derived->name);
-   if (!check_sym_import_status (c->ts.u.derived, st, NULL, old_code,
- gfc_current_ns))
- error++;
- }
+  /* First check the derived type import status.  */
+  if (gfc_current_ns->import_state != IMPORT_NOT_SET
+ && (c->ts.type == BT_DERIVED || c->ts.type == BT_CLASS))
+   {
+ st = gfc_find_symtree (gfc_current_ns->sym_root,
+c->ts.u.derived->name);
+ if (!check_sym_import_status (c->ts.u.derived, st, NULL, old_code,
+   gfc_current_ns))
+   error++;
+   }
 
   const char * var_name = gfc_var_name_for_select_type_temp (orig_expr1);
   if (c->ts.type == BT_CLASS)
-- 
2.49.0

[PATCH] ipa: Remove assertion in dump_possible_polymorphic_call_targets [PR107044]

2025-07-14 Thread Feng Xue OS

Type inheritance graph built for one translation unit with
build_type_inheritance_graph() might be incomplete, in that vtable would
not be emitted for a class by frontend, if its key virtual method is just
a declaration. Therefore, given a virtual call, some of all its possible
target set might be missed, and this would break the assumption that
speculative devirt targets is subset of the former, that is, here
assertion targets.length () <= len may not be satisfied. For example:

   class A {
   public:
 virtual ~A();
 virtual int f() = 0;
   };

   class B : public A {
   public:
 virtual ~B();
 virtual int f();
   };

   void foo(B *b)
   {
 A *a = b;

 delete a;
   }

The relation that "B" is derived class of "A" is not recorded by
build_type_inheritance_graph().  And the result of the first call to
possible_polymorphic_call_targets() would be empty, which is based
on odr type of "A".  The other is for speculative devirt, which is
based on odr type of "B", the result target is destructor of "B".

So this patch would remove the assertion:
 gcc_assert (symtab->state < IPA_SSA || targets.length () <= len); 

Regards,
FengFrom 818c951742414bddf7c38a01d937ff12c5691a54 Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Tue, 15 Jul 2025 14:26:08 +0800
Subject: [PATCH] ipa/107044 - Remove assertion in
 dump_possible_polymorphic_call_targets

Type inheritance graph built for one translation unit with
build_type_inheritance_graph() might be incomplete, in that vtable would
not be emitted for a class by frontend, if its key virtual method is just
a declaration. Therefore, given a virtual call, some of all its possible
target set might be missed, and this would break the assumption that
speculative devirt targets is subset of the former, that is, here
assertion targets.length () <= len may not be satisfied.

2025-07-15  Feng Xue  

gcc/
	PR tree-optimization/107044
	* ipa-devirt.cc (dump_possible_polymorphic_call_targets): Remove
	assertion.

gcc/testsuite/
	PR tree-optimization/107044
	* g++.dg/ipa/pr107044.C: New test.
---
 gcc/ipa-devirt.cc   | 41 -
 gcc/testsuite/g++.dg/ipa/pr107044.C | 21 +++
 2 files changed, 55 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr107044.C

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 18cb5a82195..c0ccc87e34c 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -3460,13 +3460,40 @@ dump_possible_polymorphic_call_targets (FILE *f,
   fprintf (f, "  Speculative targets:");
   dump_targets (f, targets, verbose);
 }
-  /* Ugly: during callgraph construction the target cache may get populated
- before all targets are found.  While this is harmless (because all local
- types are discovered and only in those case we devirtualize fully and we
- don't do speculative devirtualization before IPA stage) it triggers
- assert here when dumping at that stage also populates the case with
- speculative targets.  Quietly ignore this.  */
-  gcc_assert (symtab->state < IPA_SSA || targets.length () <= len);
+
+  /* Type inheritance graph built for one translation unit with
+ build_type_inheritance_graph() might be incomplete, in that vtable would
+ not be emitted for a class by frontend, if its key virtual method is just
+ a declaration. Therefore, given a virtual call, some of all its possible
+ target set might be missed, and this would break the assumption that
+ speculative devirt targets is subset of the former, that is, here
+ assertion targets.length () <= len may not be satisfied. For example:
+
+   class A {
+   public:
+ virtual ~A();
+ virtual int f() = 0;
+   };
+
+   class B : public A {
+   public:
+ virtual ~B();
+ virtual int f();
+   };
+
+   void foo(B *b)
+   {
+ A *a = b;
+
+ delete a;
+   }
+
+ The relation that "B" is derived class of "A" is not recorded by
+ build_type_inheritance_graph().  And the result of the first call to
+ possible_polymorphic_call_targets() would be empty, which is based
+ on odr type of "A".  The other is for speculative devirt, which is
+ based on odr type of "B", the result target is destructor of "B".  */
+
   fprintf (f, "\n");
 }
 
diff --git a/gcc/testsuite/g++.dg/ipa/pr107044.C b/gcc/testsuite/g++.dg/ipa/pr107044.C
new file mode 100644
index 000..67a68626f49
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr107044.C
@@ -0,0 +1,21 @@
+// { dg-do compile }
+// { dg-options "-O3 -fdump-tree-all" }
+
+class A {
+public:
+   virtual ~A();
+   virtual int f() = 0;
+};
+
+class B : public A {
+public:
+virtual ~B();
+virtual int f();
+};
+
+void foo(B *b)
+{
+  A *a = b;
+
+  delete a;
+}
-- 
2.17.1

Re: [PATCH] testsuite: Fix overflow in gcc.dg/vect/pr116125.c

2025-07-14 Thread Richard Biener

On Fri, Jul 11, 2025 at 8:27 PM Siddhesh Poyarekar  wrote:
>
> The test ends up writing a byte beyond bounds of the buffer, which gets
> trapped on some targets when the test is run with
> -fstack-protector-strong.
>
> testsuite/ChangeLog:
>
> * gcc.dg/vect/pr116125.c (mem_overlap): Reduce iteration count
> to 8.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
> OK for trunk and backport to gcc-15?

Can you instead make the buffers larger?  We might otherwise no longer
testing what we did.

>
>  gcc/testsuite/gcc.dg/vect/pr116125.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr116125.c 
> b/gcc/testsuite/gcc.dg/vect/pr116125.c
> index eab9efdc061..2f45ac3edc1 100644
> --- a/gcc/testsuite/gcc.dg/vect/pr116125.c
> +++ b/gcc/testsuite/gcc.dg/vect/pr116125.c
> @@ -8,7 +8,7 @@ struct st
>  void __attribute__((noipa))
>  mem_overlap (struct st *a, struct st *b)
>  {
> -  for (int i = 0; i < 9; i++)
> +  for (int i = 0; i < 8; i++)
>  a[i].num = b[i].num + 1;
>  }
>
> --
> 2.50.0
>

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 9:11 AM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > >
> > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > Author: H.J. Lu 
> > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > >
> > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > >
> > > > > > > > > replaces
> > > > > > > > >
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) 
> > > > > > > > > [0  S8 A64])) 2031
> > > > > > > > >  {*movv2sf_internal}
> > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > >
> > > > > > > > > with
> > > > > > > > >
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])) -1
> > > > > > > > >  (nil))
> > > > > > > > > ...
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > {*movv2sf_internal}
> > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > >
> > > > > > > > > which leads to
> > > > > > > > >
> > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > >34 | }
> > > > > > > > >   | ^
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])) -1
> > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > > during RTL pass: ira
> > > > > > > > >
> > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > > vector -1.
> > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > nonimmediate, vector 0
> > > > > > > > > or integer vector -1 operand.
> > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > operand.
> > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer all 
> > > > > > > > > 1s vectors.
> > > > > > > > > Replace  with  to generate
> > > > > > > > >
> > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > >
> > > > > > > > > for
> > > > > > > > >
> > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > >  (const_vector:V8QI [(const_int -1 [0x]) 
> > > > > > > > > repeated x8]))
> > > > > > > > >
> > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > > >
> > > > > > > > Actually, we don't want this, we should keep the top 64 bits 
> > > > > > > > zero,
> > > > > > > > especially for floating point, where the pattern represents NaN.
> > > > > > > >
> > > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > > narrower modes in the first place.
> > > > > > > >
> > > > > > >
> > > > > > > How does your latest patch handle this?
> > > > > > >
> > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > >
> > > > > > > __v8qi
> > > > > > > m1 (void)
> > > > > > > {
> > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > > }
> > > > > >
> > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load 
> > > > > > using
> > > > > > pcmpeq, because upper 64 bits are also all 1s.
> > > > > >
> > > > > > The correct way is to avoid generating 64 bit all-ones, because this
> > > > > > constant is not supported and   standard_sse_constant_p () correctly
> > > > > > reports this.
> > > > >
> > > > > We can generate
> > > > >
> > > > > pcmpeqd %xmm0, %xmm0
> > > > > movq %

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread H.J. Lu

On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > >
> > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > Author: H.J. Lu 
> > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > >
> > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > >
> > > > > > > > > replaces
> > > > > > > > >
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) 
> > > > > > > > > [0  S8 A64])) 2031
> > > > > > > > >  {*movv2sf_internal}
> > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > >
> > > > > > > > > with
> > > > > > > > >
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])) -1
> > > > > > > > >  (nil))
> > > > > > > > > ...
> > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > {*movv2sf_internal}
> > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > >
> > > > > > > > > which leads to
> > > > > > > > >
> > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > >34 | }
> > > > > > > > >   | ^
> > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])) -1
> > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > (const_int -1 [0x]) repeated 
> > > > > > > > > x8
> > > > > > > > > ])
> > > > > > > > > (nil)))
> > > > > > > > > during RTL pass: ira
> > > > > > > > >
> > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > > vector -1.
> > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > nonimmediate, vector 0
> > > > > > > > > or integer vector -1 operand.
> > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > operand.
> > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer all 
> > > > > > > > > 1s vectors.
> > > > > > > > > Replace  with  to generate
> > > > > > > > >
> > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > >
> > > > > > > > > for
> > > > > > > > >
> > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > >  (const_vector:V8QI [(const_int -1 [0x]) 
> > > > > > > > > repeated x8]))
> > > > > > > > >
> > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > > >
> > > > > > > > Actually, we don't want this, we should keep the top 64 bits 
> > > > > > > > zero,
> > > > > > > > especially for floating point, where the pattern represents NaN.
> > > > > > > >
> > > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > > narrower modes in the first place.
> > > > > > > >
> > > > > > >
> > > > > > > How does your latest patch handle this?
> > > > > > >
> > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > >
> > > > > > > __v8qi
> > > > > > > m1 (void)
> > > > > > > {
> > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > > }
> > > > > >
> > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load 
> > > > > > using
> > > > > > pcmpeq, because upper 64 bits are also all 1s.
> > > > > >
> > > > > > The correct way is to avoid generating 64 bit all-ones, because this
> > > > > > constant is not supported and   standard_sse_constant_p () correctly
> > > > > > reports this.
> > > > >
> > > > > We can generate
> > > > >
> > > > > pcmpeqd %xmm0, %xmm0
> > > > > movq %

Ping^5: [PATCH v4] get source line for diagnostic from preprocessed file [PR preprocessor/79106]

2025-07-14 Thread Bader, Lucas

Gentle ping for https://gcc.gnu.org/pipermail/gcc-patches/2025-March/676875.html

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread H.J. Lu

On Mon, Jul 14, 2025 at 3:34 PM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 9:11 AM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > >
> > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > >
> > > > > > > > > > replaces
> > > > > > > > > >
> > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > >
> > > > > > > > > > with
> > > > > > > > > >
> > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])) -1
> > > > > > > > > >  (nil))
> > > > > > > > > > ...
> > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > >
> > > > > > > > > > which leads to
> > > > > > > > > >
> > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > >34 | }
> > > > > > > > > >   | ^
> > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])) -1
> > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > > during RTL pass: ira
> > > > > > > > > >
> > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > > > vector -1.
> > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > operand.
> > > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer 
> > > > > > > > > > all 1s vectors.
> > > > > > > > > > Replace  with  to generate
> > > > > > > > > >
> > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > >
> > > > > > > > > > for
> > > > > > > > > >
> > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > >
> > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > > > >
> > > > > > > > > Actually, we don't want this, we should keep the top 64 bits 
> > > > > > > > > zero,
> > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > NaN.
> > > > > > > > >
> > > > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > > > narrower modes in the first place.
> > > > > > > > >
> > > > > > > >
> > > > > > > > How does your latest patch handle this?
> > > > > > > >
> > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > >
> > > > > > > > __v8qi
> > > > > > > > m1 (void)
> > > > > > > > {
> > > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > > > }
> > > > > > >
> > > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load 
> > > > > > > using
> > > > > > > pcmpeq, because upper

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 9:41 AM H.J. Lu  wrote:
>
> On Mon, Jul 14, 2025 at 3:34 PM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 9:11 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > > >
> > > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > > >
> > > > > > > > > > > replaces
> > > > > > > > > > >
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > with
> > > > > > > > > > >
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (nil))
> > > > > > > > > > > ...
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > which leads to
> > > > > > > > > > >
> > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > > >34 | }
> > > > > > > > > > >   | ^
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > > during RTL pass: ira
> > > > > > > > > > >
> > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or 
> > > > > > > > > > > integer vector -1.
> > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > > operand.
> > > > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer 
> > > > > > > > > > > all 1s vectors.
> > > > > > > > > > > Replace  with  to generate
> > > > > > > > > > >
> > > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > > >
> > > > > > > > > > > for
> > > > > > > > > > >
> > > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > > >
> > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 
> > > > > > > > > > > 0s.
> > > > > > > > > >
> > > > > > > > > > Actually, we don't want this, we should keep the top 64 
> > > > > > > > > > bits zero,
> > > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > > NaN.
> > > > > > > > > >
> > > > > > > > > > So, I think the correct way is to avoid the transformation 
> > > > > > > > > > for
> > > > > > > > > > narrower modes in the first place.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does your latest patch handle this?
> > > > > > > > >
> > > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > > >
> > > > > > > > > __v8qi
> > > > >

Re: [PATCH] c, c++: Extend -Wunused-but-set-* warnings [PR44677]

2025-07-14 Thread Jason Merrill


On 7/11/25 9:09 AM, Jakub Jelinek wrote:

On Thu, Jul 10, 2025 at 04:35:49PM -0400, Jason Merrill wrote:

@@ -211,8 +211,27 @@ mark_use (tree expr, bool rvalue_p, bool
}
  return expr;
}
-  gcc_fallthrough();
+  gcc_fallthrough ();
   CASE_CONVERT:
+  if (VOID_TYPE_P (TREE_TYPE (expr)))
+   switch (TREE_CODE (TREE_OPERAND (expr, 0)))
+ {
+ case PREINCREMENT_EXPR:
+ case PREDECREMENT_EXPR:
+ case POSTINCREMENT_EXPR:
+ case POSTDECREMENT_EXPR:


Why is this specific to these codes?  I would think we would want consistent
handling of (void) here and in mark_exp_read.


The second/third levels of the warnings (i.e. the new ones) want
++op0
--op0
op0++
op0--
not to be treated as uses of op0 (but op1 = ++op0 etc. counts as use).
Which is why it checks for these special cases.  The cast to void is there
to just ignore the pre/post inc/decrements alone when their value isn't
used.
The above mark_exp_read changes wouldn't be needed if
 case PREINCREMENT_EXPR:
 case PREDECREMENT_EXPR:
 case POSTINCREMENT_EXPR:
 case POSTDECREMENT_EXPR:
   mark_exp_read (TREE_OPERAND (exp, 0));
   break;
weren't added, but that is needed so that x = ++op0; etc. are treated as
use of op0, while ++op0; on its own is not.


Coming back to this comment, it still seems to me that we can generalize 
this and ignore anything cast to void here, as in the below; after 
something has been cast to void, it can no longer be read.  I don't get 
any regressions from this simplification, either.


We might generalize to anything of void type, but I haven't tested that.
commit adcf4220b73a9b7f44a35728f60aa5b351ef51d8
Author: Jason Merrill 
Date:   Mon Jul 14 18:29:17 2025 -0400

void

diff --git a/gcc/cp/expr.cc b/gcc/cp/expr.cc
index 8b5a098ecb3..e4a7cfd7bec 100644
--- a/gcc/cp/expr.cc
+++ b/gcc/cp/expr.cc
@@ -214,24 +214,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
   gcc_fallthrough ();
 CASE_CONVERT:
   if (VOID_TYPE_P (TREE_TYPE (expr)))
-	switch (TREE_CODE (TREE_OPERAND (expr, 0)))
-	  {
-	  case PREINCREMENT_EXPR:
-	  case PREDECREMENT_EXPR:
-	  case POSTINCREMENT_EXPR:
-	  case POSTDECREMENT_EXPR:
-	tree op0;
-	op0 = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
-	STRIP_ANY_LOCATION_WRAPPER (op0);
-	if ((VAR_P (op0) || TREE_CODE (op0) == PARM_DECL)
-		&& !DECL_READ_P (op0)
-		&& (VAR_P (op0) ? warn_unused_but_set_variable
-: warn_unused_but_set_parameter) > 1)
-	  read_p = false;
-	break;
-	  default:
-	break;
-	  }
+	read_p = false;
   recurse_op[0] = true;
   break;
 
@@ -382,16 +365,7 @@ mark_exp_read (tree exp)
   break;
 CASE_CONVERT:
   if (VOID_TYPE_P (TREE_TYPE (exp)))
-	switch (TREE_CODE (TREE_OPERAND (exp, 0)))
-	  {
-	  case PREINCREMENT_EXPR:
-	  case PREDECREMENT_EXPR:
-	  case POSTINCREMENT_EXPR:
-	  case POSTDECREMENT_EXPR:
-	return;
-	  default:
-	break;
-	  }
+	return;
   /* FALLTHRU */
 case ARRAY_REF:
 case COMPONENT_REF:

[PATCH] LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST [PR121064]

2025-07-14 Thread Xi Ruoyao

When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the
same pseudo as op0 and/or op1.  Loading the selector into target
would clobber the input, producing wrong code like

vld $vr0, $t0
vshuf.w $vr0, $vr0, $vr1

So don't load the selector into d->target, use a new pseudo to hold the
selector instead.  The reload pass will load the pseudo for selector and
the pseudo for target into the same hard register (following our
constraint '0' on the shuf instructions) anyway.

gcc/ChangeLog:

PR target/121064
* config/loongarch/lsx.md (lsx_vshuf_): Add '@' to
generate a mode-aware helper.  Use  as the mode of the
operand 1 (selector).
* config/loongarch/lasx.md (lasx_xvshuf_): Likewise.
* config/loongarch/loongarch.cc
(loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for
the selector.  Use the mode-aware helper to simplify the code.
(loongarch_expand_vec_perm_const): Likewise.

gcc/testsuite/ChangeLog:

PR target/121064
* gcc.target/loongarch/pr121064.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk and
14/15?

 gcc/config/loongarch/lasx.md  |   4 +-
 gcc/config/loongarch/loongarch.cc | 126 +-
 gcc/config/loongarch/lsx.md   |   4 +-
 gcc/testsuite/gcc.target/loongarch/pr121064.c |  38 ++
 4 files changed, 73 insertions(+), 99 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/pr121064.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 43e3ab0026a..3d71f30a54b 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -2060,9 +2060,9 @@ (define_insn "lasx_xvssub_u_"
   [(set_attr "type" "simd_int_arith")
(set_attr "mode" "")])
 
-(define_insn "lasx_xvshuf_"
+(define_insn "@lasx_xvshuf_"
   [(set (match_operand:LASX_DWH 0 "register_operand" "=f")
-   (unspec:LASX_DWH [(match_operand:LASX_DWH 1 "register_operand" "0")
+   (unspec:LASX_DWH [(match_operand: 1 "register_operand" "0")
  (match_operand:LASX_DWH 2 "register_operand" "f")
  (match_operand:LASX_DWH 3 "register_operand" "f")]
UNSPEC_LASX_XVSHUF))]
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 0129108d0d3..036e6859d31 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8388,7 +8388,7 @@ static bool
 loongarch_try_expand_lsx_vshuf_const (struct expand_vec_perm_d *d)
 {
   int i;
-  rtx target, op0, op1, sel, tmp;
+  rtx target, op0, op1;
   rtx rperm[MAX_VECT_LEN];
 
   if (GET_MODE_SIZE (d->vmode) == 16)
@@ -8407,47 +8407,23 @@ loongarch_try_expand_lsx_vshuf_const (struct 
expand_vec_perm_d *d)
   for (i = 0; i < d->nelt; i += 1)
  rperm[i] = GEN_INT (d->perm[i]);
 
-  if (d->vmode == E_V2DFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V2DImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V2DImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
-  else if (d->vmode == E_V4SFmode)
-   {
- sel = gen_rtx_CONST_VECTOR (E_V4SImode, gen_rtvec_v (d->nelt, rperm));
- tmp = simplify_gen_subreg (E_V4SImode, d->target, d->vmode, 0);
- emit_move_insn (tmp, sel);
-   }
+  machine_mode sel_mode = related_int_vector_mode (d->vmode)
+   .require ();
+  rtvec sel_v = gen_rtvec_v (d->nelt, rperm);
+
+  /* Despite vshuf.* (except vshuf.b) needs sel == target, we cannot
+load sel into target right now: here we are dealing with
+pseudo regs, and target may be the same pseudo as one of op0
+or op1.  Then we'd clobber the input.  Instead, we use a new
+pseudo reg here.  The reload pass will look at the constraint
+of vshuf.* and move sel into target first if needed.  */
+  rtx sel = force_reg (sel_mode,
+  gen_rtx_CONST_VECTOR (sel_mode, sel_v));
+
+  if (d->vmode == E_V16QImode)
+   emit_insn (gen_lsx_vshuf_b (target, op1, op0, sel));
   else
-   {
- sel = gen_rtx_CONST_VECTOR (d->vmode, gen_rtvec_v (d->nelt, rperm));
- emit_move_insn (d->target, sel);
-   }
-
-  switch (d->vmode)
-   {
-   case E_V2DFmode:
- emit_insn (gen_lsx_vshuf_d_f (target, target, op1, op0));
- break;
-   case E_V2DImode:
- emit_insn (gen_lsx_vshuf_d (target, target, op1, op0));
- break;
-   case E_V4SFmode:
- emit_insn (gen_lsx_vshuf_w_f (target, target, op1, op0));
- break;
-   case E_V4SImode:
- emit_insn (gen_lsx_vshuf_w (target, target, op1, op0));
- break;
-   case E_V8HImode:
- emit_insn (gen_lsx_vshuf_h (target, target, op1, op0));
- break;
-   case E_V16QImode:
- emit_insn (gen_lsx_vshuf_b (target, op1, op0

[PATCH 2/2] libstdc++: Conditionalize LWG 3569 changes to join_view

2025-07-14 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk only
(since it impacts ABI)?

-- >8 --

LWG 3569 adjusted join_view's iterator to handle adapting
non-default-constructible (input) iterators by wrapping the
corresponding data member with std::optional, and we followed suit in
r13-2649-g7aa80c82ecf3a3.

But this wrapping is unnecessary for iterators that are already
default-constructible.  Rather than unconditionally using std::optional
here, which introduces time/space overhead, this patch conditionalizes
our LWG 3569 changes on the iterator in question being
non-default-constructible.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_satisfy):
Adjust to handle non-std::optional _M_inner as per before LWG 3569.
(join_view::_Iterator::_M_get_inner): New.
(join_view::_Iterator::_M_inner): Don't wrap in std::optional if
the iterator is already default constructible.  Initialize.
(join_view::_Iterator::operator*): Use _M_get_inner instead
of *_M_inner.
(join_view::_Iterator::operator++): Likewise.
(join_view::_Iterator::iter_move): Likewise.
(join_view::_Iterator::iter_swap): Likewise.
---
 libstdc++-v3/include/std/ranges | 49 +
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index efe62969d657..799fa7611ce2 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -2971,7 +2971,12 @@ namespace views::__adaptor
  }
 
if constexpr (_S_ref_is_glvalue)
- _M_inner.reset();
+ {
+   if constexpr (default_initializable<_Inner_iter>)
+ _M_inner = _Inner_iter();
+   else
+ _M_inner.reset();
+ }
  }
 
  static constexpr auto
@@ -3011,6 +3016,24 @@ namespace views::__adaptor
  return *_M_parent->_M_outer;
  }
 
+ constexpr _Inner_iter&
+ _M_get_inner()
+ {
+   if constexpr (default_initializable<_Inner_iter>)
+ return _M_inner;
+   else
+ return *_M_inner;
+ }
+
+ constexpr const _Inner_iter&
+ _M_get_inner() const
+ {
+   if constexpr (default_initializable<_Inner_iter>)
+ return _M_inner;
+   else
+ return *_M_inner;
+ }
+
  constexpr
  _Iterator(_Parent* __parent, _Outer_iter __outer) requires 
forward_range<_Base>
: _M_outer(std::move(__outer)), _M_parent(__parent)
@@ -3024,7 +3047,9 @@ namespace views::__adaptor
  [[no_unique_address]]
__detail::__maybe_present_t, _Outer_iter> 
_M_outer
  = decltype(_M_outer)();
- optional<_Inner_iter> _M_inner;
+ __conditional_t,
+ _Inner_iter, optional<_Inner_iter>> _M_inner
+   = decltype(_M_inner)();
  _Parent* _M_parent = nullptr;
 
public:
@@ -3048,7 +3073,7 @@ namespace views::__adaptor
 
  constexpr decltype(auto)
  operator*() const
- { return **_M_inner; }
+ { return *_M_get_inner(); }
 
  // _GLIBCXX_RESOLVE_LIB_DEFECTS
  // 3500. join_view::iterator::operator->() is bogus
@@ -3056,7 +3081,7 @@ namespace views::__adaptor
  operator->() const
requires __detail::__has_arrow<_Inner_iter>
  && copyable<_Inner_iter>
- { return *_M_inner; }
+ { return _M_get_inner(); }
 
  constexpr _Iterator&
  operator++()
@@ -3067,7 +3092,7 @@ namespace views::__adaptor
  else
return *_M_parent->_M_inner;
}();
-   if (++*_M_inner == ranges::end(__inner_range))
+   if (++_M_get_inner() == ranges::end(__inner_range))
  {
++_M_get_outer();
_M_satisfy();
@@ -3097,9 +3122,9 @@ namespace views::__adaptor
  {
if (_M_outer == ranges::end(_M_parent->_M_base))
  _M_inner = ranges::end(__detail::__as_lvalue(*--_M_outer));
-   while (*_M_inner == ranges::begin(__detail::__as_lvalue(*_M_outer)))
- *_M_inner = ranges::end(__detail::__as_lvalue(*--_M_outer));
-   --*_M_inner;
+   while (_M_get_inner() == 
ranges::begin(__detail::__as_lvalue(*_M_outer)))
+ _M_get_inner() = ranges::end(__detail::__as_lvalue(*--_M_outer));
+   --_M_get_inner();
return *this;
  }
 
@@ -3126,14 +3151,14 @@ namespace views::__adaptor
 
  friend constexpr decltype(auto)
  iter_move(const _Iterator& __i)
- noexcept(noexcept(ranges::iter_move(*__i._M_inner)))
- { return ranges::iter_move(*__i._M_inner); }
+ noexcept(noexcept(ranges::iter_move(__i._M_get_inner(
+ { return ranges::iter_move(__i._M_get_inne

[PATCH] x86: Convert MMX integer loads from constant vector pool

2025-07-14 Thread H.J. Lu

For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
vector pool:

(insn 6 2 7 2 (set (reg:V1SI 5 di)
(mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32])) "pr1
21062-2.c":10:3 2036 {*movv1si_internal}
 (expr_list:REG_EQUAL (const_vector:V1SI [
(const_int -1 [0x])
])
(nil)))

we can convert it to

(insn 12 2 7 2 (set (reg:SI 5 di)
(const_int -1 [0x])) "pr121062-2.c":10:3 100 {*movsi_int
ernal}
 (nil))

Co-Developed-by: H.J. Lu 

gcc/

PR target/121062
* config/i386/i386.cc (ix86_convert_const_vector_to_integer):
Handle E_V1SImode and E_V1DImode.
* config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
(mmxinsnmode): Add V1DI and V1SI.
Add V_16_32_64 splitter for constant vector loads from constant
vector pool.
(V_16_32_64:*mov_imm): Replace lowpart_subreg with
adjust_address.

gcc/testsuite/

PR target/121062
* gcc.target/i386/pr121062-1.c: New test.
* gcc.target/i386/pr121062-2.c: Likewise.
* gcc.target/i386/pr121062-3a.c: Likewise.
* gcc.target/i386/pr121062-3b.c: Likewise.
* gcc.target/i386/pr121062-3c.c: Likewise.
* gcc.target/i386/pr121062-4.c: Likewise.
* gcc.target/i386/pr121062-5.c: Likewise.
* gcc.target/i386/pr121062-6.c: Likewise.
* gcc.target/i386/pr121062-7.c: Likewise.

OK for master?

Thanks.


-- 
H.J.
From 835ba39742c73863d4f4b8de85d153fc32fae736 Mon Sep 17 00:00:00 2001
From: Uros Bizjak 
Date: Tue, 15 Jul 2025 05:05:10 +0800
Subject: [PATCH] x86: Convert MMX integer loads from constant vector pool

For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
vector pool:

(insn 6 2 7 2 (set (reg:V1SI 5 di)
(mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32])) "pr121062-2.c":10:3 2036 {*movv1si_internal}
 (expr_list:REG_EQUAL (const_vector:V1SI [
(const_int -1 [0x])
])
(nil)))

we can convert it to

(insn 12 2 7 2 (set (reg:SI 5 di)
(const_int -1 [0x])) "pr121062-2.c":10:3 100 {*movsi_internal}
 (nil))

Co-Developed-by: H.J. Lu 

gcc/

	PR target/121062
	* config/i386/i386.cc (ix86_convert_const_vector_to_integer):
	Handle E_V1SImode and E_V1DImode.
	* config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
	(mmxinsnmode): Add V1DI and V1SI.
	Add V_16_32_64 splitter for constant vector loads from constant
	vector pool.
	(V_16_32_64:*mov_imm): Replace lowpart_subreg with
	adjust_address.

gcc/testsuite/

	PR target/121062
	* gcc.target/i386/pr121062-1.c: New test.
	* gcc.target/i386/pr121062-2.c: Likewise.
	* gcc.target/i386/pr121062-3a.c: Likewise.
	* gcc.target/i386/pr121062-3b.c: Likewise.
	* gcc.target/i386/pr121062-3c.c: Likewise.
	* gcc.target/i386/pr121062-4.c: Likewise.
	* gcc.target/i386/pr121062-5.c: Likewise.
	* gcc.target/i386/pr121062-6.c: Likewise.
	* gcc.target/i386/pr121062-7.c: Likewise.
---
 gcc/config/i386/i386.cc |  4 +++
 gcc/config/i386/mmx.md  | 34 +
 gcc/testsuite/gcc.target/i386/pr121062-1.c  | 34 +
 gcc/testsuite/gcc.target/i386/pr121062-2.c  | 14 +
 gcc/testsuite/gcc.target/i386/pr121062-3a.c | 23 ++
 gcc/testsuite/gcc.target/i386/pr121062-3b.c |  6 
 gcc/testsuite/gcc.target/i386/pr121062-3c.c |  6 
 gcc/testsuite/gcc.target/i386/pr121062-4.c  | 14 +
 gcc/testsuite/gcc.target/i386/pr121062-5.c  | 13 
 gcc/testsuite/gcc.target/i386/pr121062-6.c  | 13 
 gcc/testsuite/gcc.target/i386/pr121062-7.c  | 13 
 11 files changed, 168 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr121062-7.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 313522b88e3..37db8a1d118 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -16704,6 +16704,10 @@ ix86_convert_const_vector_to_integer (rtx op, machine_mode mode)
 	  val = wi::insert (val, wv, innermode_bits * i, innermode_bits);
 	}
   break;
+case E_V1SImode:
+case E_V1DImode:
+  op = CONST_VECTOR_ELT (op, 0);
+  return INTVAL (op);
 case E_V2HFmode:
 case E_V2BFmode:
 case E_V4HFmode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 29a8cb599a7..22d420ddf69 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -81,12 +81,13 @@ (define_mode_iterator VI_16_32 [V4QI V2QI V2HI])
 ;; 4-byte and 2-byte QImode vector modes
 (define_

[PATCH] builtins.cc (fold_builtin_bit_query): Don't consider MAX_FIXED_MODE_SIZE, [PR120935]

2025-07-14 Thread Hans-Peter Nilsson

Tested to fix build for MMIX (and to fix a few type-generic-builtin
test-cases; c-c++-common/pr111309-1.c, gcc.dg/torture/pr116480-1.c).
Also regtested cris-elf and native x86_64-pc-linux-gnu.

Ok for master and gcc-15?

-- >8 --
MMIX uses the default definition of MAX_FIXED_MODE_SIZE, which is
GET_MODE_SIZE (DImode) (arguably a suboptimal default, but that's for
another patch).

That macro doesn't reflect the size of the largest mode *valid* on the
target, but rather the size of the largest mode that *should* be
generated when synthesizing operations doing optimizations.  The
keyword "generated" appears to be missing from the documentation for
MAX_FIXED_MODE_SIZE or is meant to be implied by "use": (perhaps,
s/should actually be used/should actually be generated/, and note,
still "should" not "must").  For example, when putting
larger-than-register objects on the stack it could be the threshold
when to use BLKmode instead of an integer mode (e.g. TImode).  But,
this should affect only optimization, not validness of code and
certainly not cause an ICE, as in PR 120935.

The function fold_builtin_bit_query is responsible for transforming
type-generic calls into type-specific calls.  It currently makes use
of MAX_FIXED_MODE_SIZE.  The effect is here, when transforming
__builtin_clzg(__int128 x), instead of an expression with calls to
__builtin_clzl, it's wrongly changed into a call to an internal
function .CLZ (x).  (N.B. this would validly happen when there's a
matching instruction.)  When later on, another part of gcc sees that
there is no such internal function; no such instruction, boom: ICE.

Here, what's intended to be generated, is *half* the size of the
existing incoming type.  Actually, MAX_FIXED_MODE_SIZE seems to be
inconsistently used instead of the size of unsigned long long.  That
macro just doesn't have to be consulted in this code.

PR middle-end/120935
* builtins.cc (fold_builtin_bit_query): Do not consider
MAX_FIXED_MODE_SIZE.
---
 gcc/builtins.cc | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 7f580a3145ff..728ae5e6c2c0 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10283,8 +10283,8 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
   gcc_unreachable ();
 }
 
-  if (TYPE_PRECISION (arg0_type)
-  <= TYPE_PRECISION (long_long_unsigned_type_node))
+  unsigned int llu_size = TYPE_PRECISION (long_long_unsigned_type_node);
+  if (TYPE_PRECISION (arg0_type) <= llu_size)
 {
   if (TYPE_PRECISION (arg0_type) <= TYPE_PRECISION (unsigned_type_node))
 
@@ -10305,13 +10305,12 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
  fcodei = fcodell;
}
 }
-  else if (TYPE_PRECISION (arg0_type) <= MAX_FIXED_MODE_SIZE)
+  else if (TYPE_PRECISION (arg0_type) <= 2 * llu_size)
 {
   cast_type
-   = build_nonstandard_integer_type (MAX_FIXED_MODE_SIZE,
+   = build_nonstandard_integer_type (2 * llu_size,
  TYPE_UNSIGNED (arg0_type));
-  gcc_assert (TYPE_PRECISION (cast_type)
- == 2 * TYPE_PRECISION (long_long_unsigned_type_node));
+  gcc_assert (TYPE_PRECISION (cast_type) == 2 * llu_size);
   fcodei = END_BUILTINS;
 }
   else
@@ -10342,9 +10341,7 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
   arg2 = NULL_TREE;
 }
   tree call = NULL_TREE, tem;
-  if (TYPE_PRECISION (arg0_type) == MAX_FIXED_MODE_SIZE
-  && (TYPE_PRECISION (arg0_type)
- == 2 * TYPE_PRECISION (long_long_unsigned_type_node))
+  if ((TYPE_PRECISION (arg0_type) == 2 * llu_size)
   /* If the target supports the optab, then don't do the expansion. */
   && !direct_internal_fn_supported_p (ifn, arg0_type, OPTIMIZE_FOR_BOTH))
 {
@@ -10354,8 +10351,7 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
   ? long_long_unsigned_type_node
   : long_long_integer_type_node);
   tree hi = fold_build2 (RSHIFT_EXPR, arg0_type, arg0,
-build_int_cst (integer_type_node,
-   MAX_FIXED_MODE_SIZE / 2));
+build_int_cst (integer_type_node, llu_size));
   hi = fold_convert (type, hi);
   tree lo = fold_convert (type, arg0);
   switch (fcode)
@@ -10363,8 +10359,7 @@ fold_builtin_bit_query (location_t loc, enum 
built_in_function fcode,
case BUILT_IN_CLZG:
  call = fold_builtin_bit_query (loc, fcode, lo, NULL_TREE);
  call = fold_build2 (PLUS_EXPR, integer_type_node, call,
- build_int_cst (integer_type_node,
-MAX_FIXED_MODE_SIZE / 2));
+ build_int_cst (integer_type_node, llu_size));
  if (arg2)
call = fold_build3

RE: [PATCH v1] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread Li, Pan2

Seems the test is not that correct for DImode, will send v2 for this change.

/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i128.c

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, July 15, 2025 10:45 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Chen, Ken ; Liu, Hongtao 
; Li, Pan2 
Subject: [PATCH v1] RISC-V: Support RVVDImode for avg3_floor auto vect

From: Pan Li 

The avg3_floor pattern leverage the add and shift rtl
with the DOUBLE_TRUNC mode iterator.  Aka, RVVDImode
iterator will generate avg3rvvsimode_floor, only the
element size QI, HI and SI are allowed.

Thus, this patch would like to support the DImode by
the standard name, with the iterator V_VLSI_D.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (avg3_floor): Add new
pattern of avg3_floor for rvv DImode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/avg.h: Add int128 type when
xlen == 64.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c:
Suppress __int128 warning for run test.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i128.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md  | 13 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h |  5 +
 .../rvv/autovec/avg_ceil-run-1-i16-from-i32.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i16-from-i64.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i32-from-i64.c|  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i16.c |  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i32.c |  2 +-
 .../rvv/autovec/avg_ceil-run-1-i8-from-i64.c |  2 +-
 .../rvv/autovec/avg_floor-1-i64-from-i128.c  | 12 
 .../rvv/autovec/avg_floor-run-1-i16-from-i32.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i16-from-i64.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i32-from-i64.c   |  2 +-
 .../rvv/autovec/avg_floor-run-1-i8-from-i128.c   | 16 
 .../rvv/autovec/avg_floor-run-1-i8-from-i16.c|  2 +-
 .../rvv/autovec/avg_floor-run-1-i8-from-i32.c|  2 +-
 .../rvv/autovec/avg_floor-run-1-i8-from-i64.c|  2 +-
 16 files changed, 58 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-1-i64-from-i128.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i128.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94a61bdc5cf..2e86826f286 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2499,6 +2499,19 @@ (define_expand "avg3_floor"
   }
 )
 
+(define_expand "avg3_floor"
+ [(match_operand:V_VLSI_D 0 "register_operand")
+  (match_operand:V_VLSI_D 1 "register_operand")
+  (match_operand:V_VLSI_D 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+insn_code icode = code_for_pred (UNSPEC_VAADD, mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP_VXRM_RDN,
+  operands);
+DONE;
+  }
+)
+
 (define_expand "avg3_ceil"
  [(set (match_operand: 0 "register_operand")
(truncate:
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
index 4aeb637bba7..2de7d7c49df 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg.h
@@ -3,6 +3,11 @@
 
 #include 
 
+#if __riscv_xlen == 64
+typedef unsigned __int128 uint128_t;
+typedef signed __int128 int128_t;
+#endif
+
 #define DEF_AVG_0(NT, WT, NAME) \
 __attribute__((noinline))   \
 void\
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i16-from-i32.c
index 1fa080b3933..3d872a8a4b

Re: [PATCH] x86: Convert MMX integer loads from constant vector pool

2025-07-14 Thread Uros Bizjak

On Tue, Jul 15, 2025 at 3:43 AM H.J. Lu  wrote:
>
> For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant
> vector pool:
>
> (insn 6 2 7 2 (set (reg:V1SI 5 di)
> (mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32])) 
> "pr1
> 21062-2.c":10:3 2036 {*movv1si_internal}
>  (expr_list:REG_EQUAL (const_vector:V1SI [
> (const_int -1 [0x])
> ])
> (nil)))
>
> we can convert it to
>
> (insn 12 2 7 2 (set (reg:SI 5 di)
> (const_int -1 [0x])) "pr121062-2.c":10:3 100 
> {*movsi_int
> ernal}
>  (nil))
>
> Co-Developed-by: H.J. Lu 
>
> gcc/
>
> PR target/121062
> * config/i386/i386.cc (ix86_convert_const_vector_to_integer):
> Handle E_V1SImode and E_V1DImode.
> * config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI.
> (mmxinsnmode): Add V1DI and V1SI.
> Add V_16_32_64 splitter for constant vector loads from constant
> vector pool.
> (V_16_32_64:*mov_imm): Replace lowpart_subreg with
> adjust_address.
>
> gcc/testsuite/
>
> PR target/121062
> * gcc.target/i386/pr121062-1.c: New test.
> * gcc.target/i386/pr121062-2.c: Likewise.
> * gcc.target/i386/pr121062-3a.c: Likewise.
> * gcc.target/i386/pr121062-3b.c: Likewise.
> * gcc.target/i386/pr121062-3c.c: Likewise.
> * gcc.target/i386/pr121062-4.c: Likewise.
> * gcc.target/i386/pr121062-5.c: Likewise.
> * gcc.target/i386/pr121062-6.c: Likewise.
> * gcc.target/i386/pr121062-7.c: Likewise.
>
> OK for master?

OK, with some code movements, as mentioned below.

Thanks,
Uros.

+(define_split
+  [(set (match_operand:V_16_32_64 0 "general_reg_operand")
+(match_operand:V_16_32_64 1 "memory_operand"))]
+  "reload_completed
+   && SYMBOL_REF_P (XEXP (operands[1], 0))
+   && CONSTANT_POOL_ADDRESS_P (XEXP (operands[1], 0))"
+  [(set (match_dup 0) (match_dup 1))]
...

Please put this new pattern after *movv2qi_internal as it also applies
to V2QImode and ...

@@ -417,10 +438,11 @@ (define_insn_and_split "*mov_imm"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 1))]

... put *mov_imm" just after the new splitter, to prevent
shadowing of *movv2qi_internal.

+  operands[0] = adjust_address (operands[0], mode, 0);
   operands[1] = GEN_INT (val);
-  operands[0] = lowpart_subreg (mode, operands[0], mode);

FYI, subregs of memory operands should be avoided, we have plenty of
helpers to change address mode or adjust address in other ways.

Re: [PATCH v2] gcc-16/changes.html: Add --enable-x86-64-mfentry

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 9:39 PM H.J. Lu  wrote:

> > > OK to install?
> >
> > This should at least say that the new option is enabled by default
> > with glibc targets.
> >
> > Uros.
>
> Like this?

LGTM for content, but let's ask Gerald to proofread the entry.

Thanks,
Uros.

Re: [PATCH] gcc-16/changes.html: Add --enable-x86-64-mfentry

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 2:34 PM H.J. Lu  wrote:
>
> OK to install?

This should at least say that the new option is enabled by default
with glibc targets.

Uros.

[PATCH] RISC-V: Fix vsetvl merge rule.

2025-07-14 Thread Robin Dapp


Hi,

In PR120297 we fuse
 vsetvl e8,mf2,...
 vsetvl e64,m1,...
into
 vsetvl e64,m4,...

Individually, that's ok but we also change the new vsetvl's demand to
"SEW only" even though the first original one demanded SEW >= 8 and
ratio = 16.

As we forget the ratio after the merge we find that the vsetvl following
the merged one has ratio = 64 demand and we fuse into
 vsetvl e64,m1,..
which obviously doesn't have ratio = 16 any more.

Regtested on rv64gcv_zvl512b.

Regards
Robin

PR target/120297

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.def: Do not forget ratio demand of
previous vsetvl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/pr120297.c: New test.
---
gcc/config/riscv/riscv-vsetvl.def |  6 +--
gcc/testsuite/gcc.target/riscv/rvv/pr120297.c | 50 +++
2 files changed, 53 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/pr120297.c

diff --git a/gcc/config/riscv/riscv-vsetvl.def 
b/gcc/config/riscv/riscv-vsetvl.def
index d7a5ada772d..0f999d2276d 100644
--- a/gcc/config/riscv/riscv-vsetvl.def
+++ b/gcc/config/riscv/riscv-vsetvl.def
@@ -79,7 +79,7 @@ DEF_SEW_LMUL_RULE (sew_only, sew_only, sew_only, sew_eq_p, 
sew_eq_p, nop)
DEF_SEW_LMUL_RULE (sew_only, ge_sew, sew_only,
   sew_ge_and_prev_sew_le_next_max_sew_p, sew_ge_p, nop)
DEF_SEW_LMUL_RULE (
-  sew_only, ratio_and_ge_sew, sew_lmul,
+  sew_only, ratio_and_ge_sew, ratio_and_ge_sew,
  sew_ge_and_prev_sew_le_next_max_sew_and_next_ratio_valid_for_prev_sew_p,
  always_false, modify_lmul_with_next_ratio)

@@ -104,9 +104,9 @@ DEF_SEW_LMUL_RULE (ratio_and_ge_sew, sew_lmul, sew_lmul,
DEF_SEW_LMUL_RULE (ratio_and_ge_sew, ratio_only, ratio_and_ge_sew, ratio_eq_p,
   ratio_eq_p, use_max_sew_and_lmul_with_prev_ratio)
DEF_SEW_LMUL_RULE (
-  ratio_and_ge_sew, sew_only, sew_only,
+  ratio_and_ge_sew, sew_only, ratio_and_ge_sew,
  sew_le_and_next_sew_le_prev_max_sew_and_prev_ratio_valid_for_next_sew_p,
-  always_false, use_next_sew_with_prev_ratio)
+  sew_eq_p, use_next_sew_with_prev_ratio)
DEF_SEW_LMUL_RULE (ratio_and_ge_sew, ge_sew, ratio_and_ge_sew,
   max_sew_overlap_and_prev_ratio_valid_for_next_sew_p,
   sew_ge_p, use_max_sew_and_lmul_with_prev_ratio)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/pr120297.c 
b/gcc/testsuite/gcc.target/riscv/rvv/pr120297.c
new file mode 100644
index 000..3d1845d0fe6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/pr120297.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fwhole-program" } */
+
+unsigned a;
+short c;
+char d;
+unsigned long e;
+_Bool f[10][10];
+unsigned g[10];
+long long ak;
+char i = 7;
+long long t[10];
+short x[10][10][10][10];
+short y[10][10][10][10];
+
+void
+h (char i, long long t[], short x[][10][10][10], short y[][10][10][10],
+   _Bool aa)
+{
+  for (int j = 2; j < 8; j += 2)
+{
+  for (short k = 0; k < 10; k++)
+   {
+ for (int l = 3; l < 8; l += 2)
+   a = x[1][j][k][l];
+ c = x[c][1][1][c];
+   }
+  for (int k = 0; k < 10; k++)
+   {
+ f[2][k] |= (_Bool) t[c];
+ g[c] = t[c + 1];
+ d += y[j][1][k][k];
+ e = e > i ? e : i;
+   }
+}
+}
+
+int
+main ()
+{
+  t[c] = 1;
+  h (i, t, x, y, a);
+  for (int j = 0; j < 10; ++j)
+for (int k = 0; k < 10; ++k)
+  ak ^= f[j][k] + 238516665 + (ak >> 2);
+  ak ^= g[c] + 238516665 + (ak >> 2);
+  if (ak != 234635118ull)
+__builtin_abort ();
+}
--
2.50.0

Re: [PATCH] tree-optimization/121059 - record loop mask when required

2025-07-14 Thread Richard Biener

On Mon, 14 Jul 2025, Richard Sandiford wrote:

> Richard Biener  writes:
> > For loop masking we need to mask a mask AND operation with the loop
> > mask.  The following makes sure we have a corresponding mask
> > available.  There's no good way to distinguish loop masking from
> > len masking here, so assume we have recorded a mask for the operands
> > mask producers.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > PR tree-optimization/121059
> > * tree-vect-stmts.cc (vectorizable_operation): Record a
> > loop mask for mask AND operations.
> >
> > * gcc.dg/vect/pr121059.c: New testcase.
> 
> Please could you revert this?  It's not the right fix.  The point of the
> code is to opportunistically reuse loop masks that are needed by other
> operations.  It isn't supposed to record new loop masks itself.

Reverted.  I'll try to find some cycles in the next day to test the
alternative.

Richard.

> Like I say, replacing:
> 
> if (loop_vinfo->scalar_cond_masked_set.contains ({ op0, 1 }))
> 
> with:
> 
> if (loop_vinfo->scalar_cond_masked_set.contains ({ op0, vec_num 
> }))
> 
> fixed the bug for me.
> 
> Thanks,
> Richard
> 
> > ---
> >   gcc/testsuite/gcc.dg/vect/pr121059.c | 24 
> >   gcc/tree-vect-stmts.cc   | 10 ++
> >   2 files changed, 34 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.dg/vect/pr121059.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr121059.c 
> > b/gcc/testsuite/gcc.dg/vect/pr121059.c
> > new file mode 100644
> > index 000..d7f69b4f1f5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr121059.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3 --param vect-partial-vector-usage=1" } */
> > +/* { dg-additional-options "-march=x86-64-v4" { target avx512f } } */
> > +
> > +typedef struct {
> > +  long left, right, top, bottom;
> > +} MngBox;
> > +typedef struct {
> > +  MngBox object_clip[6];
> > +  char exists[256], frozen[];
> > +} MngReadInfo;
> > +MngReadInfo mng_info;
> > +
> > +long ReadMNGImage_i;
> > +
> > +void ReadMNGImage(int ReadMNGImage_i)
> > +{
> > +  for (; ReadMNGImage_i < 256; ReadMNGImage_i++)
> > +if (mng_info.exists[ReadMNGImage_i] && mng_info.frozen[ReadMNGImage_i])
> > +  mng_info.object_clip[ReadMNGImage_i].left =
> > +  mng_info.object_clip[ReadMNGImage_i].right =
> > +  mng_info.object_clip[ReadMNGImage_i].top =
> > +  mng_info.object_clip[ReadMNGImage_i].bottom = 0;
> > +}
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 4aa69da2218..f0dc4843ca7 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -6978,6 +6978,16 @@ vectorizable_operation (vec_info *vinfo,
> >   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > }
> > }
> > +  else if (loop_vinfo
> > +  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && code == BIT_AND_EXPR
> > +  && VECTOR_BOOLEAN_TYPE_P (vectype)
> > +  /* We cannot always record a mask since that will disable
> > + len-based partial vectors, but there should be already
> > + one mask producer stmt which should require loop
> > + masking.  */
> > +  && !masks->is_empty ())
> > +   vect_record_loop_mask (loop_vinfo, masks, vec_num, vectype, NULL);
> >
> > /* Put types on constant and invariant SLP children.  */
> > if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-14 Thread Andrew Pinski

On Mon, Jul 14, 2025 at 2:57 AM Claudiu Zissulescu-Ianculescu
 wrote:
>
> > I see it now from Richard B.. Also I noticed you missed Richard S.'s
> > suggestion of using a typedef which will definitely help in the future
> > where we could even replace this with an enum class and overload the
> > bitwise operators to do the right thing.
> >
>
> Indeed, I've missed that message. Do you thing adding this type in
> hwint.h is a good place, and what name shall I use for this new type?

flag-types.h where sanitizer_code is defined . sanitize_code_type
Later on we can just convert sanitize_code to be that.

Thanks,
Andrew

>
> Thank you,
> Claudiu

Re: [PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-14 Thread Andrew Pinski

On Mon, Jul 14, 2025 at 2:10 AM Andrew Pinski  wrote:
>
>
>
> On Mon, Jul 14, 2025, 1:28 AM Robin Dapp  wrote:
>>
>> For the record, the Linaro CI notified me that this caused regressions:
>>
>> Produces 2 regressions:
>>   |
>>   | regressions.sum:
>>   | Running gcc:gcc.dg/dg.exp ...
>>   | FAIL: gcc.dg/pr103248.c (internal compiler error: in 
>> optab_for_tree_code, at optabs-tree.cc:85)
>>   | FAIL: gcc.dg/pr103248.c (test for excess errors)
>>
>> I'll have a look once I get to it.
>
>
> This is what richard B. Mentioned about fixed point also uses RDIV_EXPR.

Note PR 121065 is now filed to keep track of this issue (not by me).

Thanks,
Andrew


>
>
> Thanks,
> Andrew
>
>
>>
>> --
>> Regards
>>  Robin
>>

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-14 Thread Jonathan Wakely

On Mon, 14 Jul 2025 at 16:52, Tomasz Kaminski  wrote:
>
>
>
> On Mon, Jul 14, 2025 at 1:47 PM Jonathan Wakely  wrote:
>>
>> On Mon, 14 Jul 2025 at 11:10, Jonathan Wakely  wrote:
>> >
>> > On Mon, 14 Jul 2025 at 11:08, Björn Schäpers  wrote:
>> > >
>> > > Am 14.07.2025 um 10:20 schrieb Tomasz Kaminski:
>> > > >
>> > > >
>> > > > On Tue, Jul 8, 2025 at 10:48 PM Björn Schäpers wrote:
>> > > > + const auto raw_index = information.Bias / 60;
>> > > > +
>> > > > + // The bias added to the local time equals UTC. And 
>> > > > GMT+X corrosponds
>> > > > + // to UTC-X, the sign is negated. Thus we can use the 
>> > > > hourly bias as
>> > > > + // an index into an array.
>> > > > + if (raw_index < 0 && raw_index >= -14)
>> > > > +   {
>> > > > + static array table{
>> > > > +   "Etc/GMT-1",  "Etc/GMT-2",  "Etc/GMT-3",  
>> > > > "Etc/GMT-4",
>> > > > +   "Etc/GMT-5",  "Etc/GMT-6",  "Etc/GMT-7",  
>> > > > "Etc/GMT-8",
>> > > > +   "Etc/GMT-9",  "Etc/GMT-10", "Etc/GMT-11", 
>> > > > "Etc/GMT-12",
>> > > > +   "Etc/GMT-13", "Etc/GMT-14"
>> > > > + };
>> > > > + return table[-raw_index - 1];
>> > > > +   }
>> > > > + else if (raw_index > 0 && raw_index <= 12)
>> > > > +   {
>> > > > + static array table{
>> > > >
>> > > > This table has size 14, but only 12 entries. I do not think there are 
>> > > > zones
>> > > > past +12,
>> > > > but I believe size and entries should match.
>> > >
>> > > That is totally correct and this a classic copy and paste error.
>> > > @Jonathan: Should I correct that (and the other things you mentioned), 
>> > > or are
>> > > you doing that?
>> >
>> > I can do it before I push it (probably later today).
>>
>> Hmm, we could reduce the number of guard variables for static
>> constructors by using a single array here, and indexing into it with
>> raw_index + 14.
>
> Or just made them constexpr.

Yes, then they'll be initiualized before we even need them, and
there's no cost to check the static init guard variable.


>>
>> This code is only for Windows, so we're not talking constrained
>> microcontrollers where we need to save resources. That can be in a
>> follow-up commit, if we decide it's worth doing.
>>
>> > >
>> > > >
>> > > > +   "Etc/GMT+1", "Etc/GMT+2",  "Etc/GMT+3",  
>> > > > "Etc/GMT+4",
>> > > > +   "Etc/GMT+5", "Etc/GMT+6",  "Etc/GMT+7",  
>> > > > "Etc/GMT+8",
>> > > > +   "Etc/GMT+9", "Etc/GMT+10", "Etc/GMT+11", 
>> > > > "Etc/GMT+12"
>> > > > + };
>> > > > + return table[raw_index - 1];
>> > > > +   }
>> > > > + return {};
>> > > > +   }
>>

[PATCH v2] gcc-16/changes.html: Add --enable-x86-64-mfentry

2025-07-14 Thread H.J. Lu

On Mon, Jul 14, 2025 at 9:44 PM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 2:34 PM H.J. Lu  wrote:
> >
> > OK to install?
>
> This should at least say that the new option is enabled by default
> with glibc targets.
>
> Uros.

Like this?

-- 
H.J.
From c3f9d5d1040b08751e949d3fdbd2b5f7909157cb Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 14 Jul 2025 20:32:11 +0800
Subject: [PATCH v2] gcc-16/changes.html: Add --enable-x86-64-mfentry

Signed-off-by: H.J. Lu 
---
 htdocs/gcc-16/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-16/changes.html b/htdocs/gcc-16/changes.html
index cc6fe204..fd542df4 100644
--- a/htdocs/gcc-16/changes.html
+++ b/htdocs/gcc-16/changes.html
@@ -118,6 +118,12 @@ for general information.
 
 
 
+  The --enable-x86-64-mfentry configure option is
+  added to enable -mfentry for x86-64 by default to use
+  __fentry__, instead of mcount for
+  profiling.  This option is enabled by default for glibc targets.
+  
+
 AMD GPU (GCN)
 
 
-- 
2.50.1

Re: [PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-14 Thread Andrew Pinski

On Mon, Jul 14, 2025, 1:28 AM Robin Dapp  wrote:

> For the record, the Linaro CI notified me that this caused regressions:
>
> Produces 2 regressions:
>   |
>   | regressions.sum:
>   | Running gcc:gcc.dg/dg.exp ...
>   | FAIL: gcc.dg/pr103248.c (internal compiler error: in
> optab_for_tree_code, at optabs-tree.cc:85)
>   | FAIL: gcc.dg/pr103248.c (test for excess errors)
>
> I'll have a look once I get to it.
>

This is what richard B. Mentioned about fixed point also uses RDIV_EXPR.


Thanks,
Andrew



> --
> Regards
>  Robin
>
>

Re: [PATCH v2] libstdc++: Fix constexpr exceptions for -fno-exceptions

2025-07-14 Thread Jakub Jelinek

On Fri, Jul 11, 2025 at 03:12:44PM +0100, Jonathan Wakely wrote:
> The if-consteval branches in std::make_exception_ptr and
> std::exception_ptr_cast use a try-catch block, which gives an error for
> -fno-exceptions. Just make them return a null pointer at compile-time
> when -fno-exceptions is used, because there's no way to get an active
> exception with -fno-exceptions.
> 
> For both functions we have a runtime-only branch that depends on RTTI,
> and a fallback using try-catch which works for runtime and consteval.
> Rearrange both functions to express this logic more clearly.
> 
> Also adjust some formatting and whitespace elsewhere in the file.
> 
> libstdc++-v3/ChangeLog:
> 
>   * libsupc++/exception_ptr.h (make_exception_ptr): Return null
>   for consteval when -fno-exceptions is used.
>   (exception_ptr_cast): Likewise. Allow consteval path to work
>   with -fno-rtti.

LGTM.

Jakub

[PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_

2025-07-14 Thread H.J. Lu

On Mon, Jul 14, 2025 at 4:06 PM Uros Bizjak  wrote:
>
> On Mon, Jul 14, 2025 at 9:37 AM H.J. Lu  wrote:
> >
> > On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > > >
> > > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > > >
> > > > > > > > > > > replaces
> > > > > > > > > > >
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > with
> > > > > > > > > > >
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (nil))
> > > > > > > > > > > ...
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > which leads to
> > > > > > > > > > >
> > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > > >34 | }
> > > > > > > > > > >   | ^
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > > during RTL pass: ira
> > > > > > > > > > >
> > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or 
> > > > > > > > > > > integer vector -1.
> > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > > operand.
> > > > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer 
> > > > > > > > > > > all 1s vectors.
> > > > > > > > > > > Replace  with  to generate
> > > > > > > > > > >
> > > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > > >
> > > > > > > > > > > for
> > > > > > > > > > >
> > > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > > >
> > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 
> > > > > > > > > > > 0s.
> > > > > > > > > >
> > > > > > > > > > Actually, we don't want this, we should keep the top 64 
> > > > > > > > > > bits zero,
> > > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > > NaN.
> > > > > > > > > >
> > > > > > > > > > So, I think the correct way is to avoid the transformation 
> > > > > > > > > > for
> > > > > > > > > > narrower modes in the first place.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does your latest patch handle this?
> > > > > > > > >
> > > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > > >
> > > > > > > > > __v8qi
> > > > >

Re: [PATCH v5] x86: Check all 0s/1s vectors with standard_sse_constant_

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 11:25 AM H.J. Lu  wrote:
>
> On Mon, Jul 14, 2025 at 4:06 PM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 9:37 AM H.J. Lu  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > > > >
> > > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > > > >
> > > > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > > > >
> > > > > > > > > > > > replaces
> > > > > > > > > > > >
> > > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) 
> > > > > > > > > > > > repeated x2
> > > > > > > > > > > > ])
> > > > > > > > > > > > (nil)))
> > > > > > > > > > > >
> > > > > > > > > > > > with
> > > > > > > > > > > >
> > > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > > repeated x8
> > > > > > > > > > > > ])) -1
> > > > > > > > > > > >  (nil))
> > > > > > > > > > > > ...
> > > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) 
> > > > > > > > > > > > repeated x2
> > > > > > > > > > > > ])
> > > > > > > > > > > > (nil)))
> > > > > > > > > > > >
> > > > > > > > > > > > which leads to
> > > > > > > > > > > >
> > > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > > > >34 | }
> > > > > > > > > > > >   | ^
> > > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > > repeated x8
> > > > > > > > > > > > ])) -1
> > > > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > > repeated x8
> > > > > > > > > > > > ])
> > > > > > > > > > > > (nil)))
> > > > > > > > > > > > during RTL pass: ira
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or 
> > > > > > > > > > > > integer vector -1.
> > > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > > > operand.
> > > > > > > > > > > > 4. Update MMXMODE:*mov_internal to support 
> > > > > > > > > > > > integer all 1s vectors.
> > > > > > > > > > > > Replace  with  to generate
> > > > > > > > > > > >
> > > > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > > > >
> > > > > > > > > > > > for
> > > > > > > > > > > >
> > > > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > > > >
> > > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of 
> > > > > > > > > > > > all 0s.
> > > > > > > > > > >
> > > > > > > > > > > Actually, we don't want this, we should keep the top 64 
> > > > > > > > > > > bits zero,
> > > > > > > > > > > especially for floating point, where the pattern 
> > > > > > > > > > > represents NaN.
> > > > > > > > > > >
> > > > > > > > > > > So, I think the correct way is to avoid the 
> > > > > > > > > > > transformation for
>

Re: [PATCH] aarch64: Implement sme2+faminmax extension.

2025-07-14 Thread Spencer Abson

On Mon, Jul 07, 2025 at 08:46:15AM +, Alfie Richards wrote:
> Hello all,
> 
> This patch implements the couple of amin/amax instructions that are part of
> SME2 + faminmax.
> 
> Regression testsed and bootstrapped for Aarch64.
> 
> Thanks,
> Alfie
> 
> -- >8 --
> 
> Implements the sme2+faminmax svamin and svamax intrinsics.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sme.md (@aarch64_sme_):
>   New patterns.
>   * config/aarch64/aarch64-sve-builtins-sme.def (svamin): New intrinsics.
>   (svamax): New intrinsics.
>   * config/aarch64/aarch64-sve-builtins-sve2.cc (class faminmaximpl): New
>   class.
>   (svamin): New function.
>   (svamax): New function.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: New test.
>   * gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: New test.
> ---
>  gcc/config/aarch64/aarch64-sme.md |  18 +++
>  .../aarch64/aarch64-sve-builtins-sme.def  |   5 +
>  .../aarch64/aarch64-sve-builtins-sve2.cc  |  44 +-
>  .../aarch64/sme2/acle-asm/amax_f16_x2.c   |  97 +
>  .../aarch64/sme2/acle-asm/amax_f16_x4.c   | 128 +
>  .../aarch64/sme2/acle-asm/amax_f32_x2.c   |  96 +
>  .../aarch64/sme2/acle-asm/amax_f32_x4.c   | 129 ++
>  .../aarch64/sme2/acle-asm/amax_f64_x2.c   |  96 +
>  .../aarch64/sme2/acle-asm/amax_f64_x4.c   | 128 +
>  .../aarch64/sme2/acle-asm/amin_f16_x2.c   |  96 +
>  .../aarch64/sme2/acle-asm/amin_f16_x4.c   | 128 +
>  .../aarch64/sme2/acle-asm/amin_f32_x2.c   |  96 +
>  .../aarch64/sme2/acle-asm/amin_f32_x4.c   | 128 +
>  .../aarch64/sme2/acle-asm/amin_f64_x2.c   |  96 +
>  .../aarch64/sme2/acle-asm/amin_f64_x4.c   | 128 +
>  15 files changed, 1409 insertions(+), 4 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sme.md 
> b/gcc/config/aarch64/aarch64-sme.md
> index b8bb4cc14b6..bfe368e80b5 100644
> --- a/gcc/config/aarch64/aarch64-sme.md
> +++ b/gcc/config/aarch64/aarch64-sme.md
> @@ -38,6 +38,7 @@
>  ;;  Binary arithmetic on ZA tile
>  ;;  Binary arithmetic on ZA slice
>  ;;  Binary arithmetic, writing to ZA slice
> +;;  Absolute minimum/maximum
>  ;;
>  ;; == Ternary arithmetic
>  ;;  [INT] Dot product
> @@ -1264,6 +1265,23 @@ (define_insn "*aarch64_sme_single__plus"
>"\tza.[%w0, %1, vgx], %2, %3."
>  )
>  

Sorry for picking up on this after it was committed, but:

> +;; -
> +;;  Absolute minimum/maximum
> +;; -
> +;; Includes:
> +;; - svamin (SME2+faminmax)
> +;; - svamin (SME2+faminmax)

Even though these are currently exclusively used by the ACLE, I think we should
be listing the names of the instructions here.  It looks like the convention
would be to capitalise "+faminax" too.

> +;; -
> +
> +(define_insn "@aarch64_sme_"
> +  [(set (match_operand:SVE_Fx24 0 "register_operand" "=Uw")
> + (unspec:SVE_Fx24 [(match_

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-14 Thread Björn Schäpers


Am 14.07.2025 um 10:20 schrieb Tomasz Kaminski:



On Tue, Jul 8, 2025 at 10:48 PM Björn Schäpers > wrote:


From: Björn Schäpers mailto:bjo...@hazardy.de>>

I have based this on my previous (not yet landed) patch, but it only
reuses the #ifdef to include . Since std::array isn't used
anywhere else I thought that was the right place to put it.

I hope the formatting is okay.

I've used wide strings for the Windows zone name and territory, since
the Windows API returns wide strings and thus they can be compared
directly. For the territory there exists a narrow string API, but
internally it calls the wide string version and narrows it down. If
desired I can switch to narrow strings, the conversion can be done by
static_cast per character since only ASCII chars are used.

-- >8 --
On Windows there is no API to get the current time zone as IANA name,
instead Windows has its own zones. But there exists a mapping provided
by the Unicode Consortium. This patch adds a script to convert the XML
file with the mapping to a lookup table and adds a Windows code path to
use that mapping.

libstdc++-v3/Changelog:

         Implement std::chrono::current_zone() for Windows

         * include/bits/windows_zones-map.h: New file, contains the look
         up table.
         * scripts/gen_windows_zones_map.py: New file, generates
         windows_zones-map.h.
         * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.

Signed-off-by: Björn Schäpers mailto:bjo...@hazardy.de>>
---
  libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
  libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
  libstdc++-v3/src/c++20/tzdb.cc                | 102 -
  3 files changed, 635 insertions(+), 2 deletions(-)
  create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
  create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py

diff --git a/libstdc++-v3/include/bits/windows_zones-map.h b/libstdc++-v3/
include/bits/windows_zones-map.h
new file mode 100644
index 000..7be736b063d
--- /dev/null
+++ b/libstdc++-v3/include/bits/windows_zones-map.h
@@ -0,0 +1,407 @@
+// Generated by scripts/gen_windows_zones_map.py, do not edit.
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// >.
+
+/** @file bits/windows_zones-map.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{chrono}
+ */
+
+#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
+# error "This is not a public header, do not include it directly"
+#endif
+
+struct windows_zone_map_entry
+{
+  wstring_view windows_name;
+  wstring_view territory;
+  string_view iana_name;
+};
+
+static constexpr array windows_zone_map{
+  {
+    {L"AUS Central Standard Time", L"001", "Australia/Darwin"},
+    {L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
+    {L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
+    {L"Alaskan Standard Time", L"001", "America/Anchorage"},
+    {L"Aleutian Standard Time", L"001", "America/Adak"},
+    {L"Altai Standard Time", L"001", "Asia/Barnaul"},
+    {L"Arab Standard Time", L"001", "Asia/Riyadh"},
+    {L"Arab Standard Time", L"BH", "Asia/Bahrain"},
+    {L"Arab Standard Time", L"KW", "Asia/Kuwait"},
+    {L"Arab Standard Time", L"QA", "Asia/Qatar"},
+    {L"Arab Standard Time", L"YE", "Asia/Aden"},
+    {L"Arabian Standard Time", L"001", "Asia/Dubai"},
+    {L"Arabian Standard Time", L"OM", "Asia/Muscat"},
+    {L"Arabian Standard Time", L"ZZ", "Etc/GMT-4"},
+    {L"Arabic Standard Time", L"001", "Asia/Baghdad"},
+    {L"Argent

Re: [PATCH v3 2/9] opts: use uint64_t for sanitizer flags

2025-07-14 Thread Claudiu Zissulescu-Ianculescu

> I see it now from Richard B.. Also I noticed you missed Richard S.'s
> suggestion of using a typedef which will definitely help in the future
> where we could even replace this with an enum class and overload the
> bitwise operators to do the right thing.
> 

Indeed, I've missed that message. Do you thing adding this type in
hwint.h is a good place, and what name shall I use for this new type?

Thank you,
Claudiu

Re: [PATCH] aarch64: Implement sme2+faminmax extension.

2025-07-14 Thread Alfie Richards


On 14/07/2025 10:35, Spencer Abson wrote:

On Mon, Jul 07, 2025 at 08:46:15AM +, Alfie Richards wrote:

Hello all,

This patch implements the couple of amin/amax instructions that are part of
SME2 + faminmax.

Regression testsed and bootstrapped for Aarch64.

Thanks,
Alfie

-- >8 --

Implements the sme2+faminmax svamin and svamax intrinsics.

gcc/ChangeLog:

* config/aarch64/aarch64-sme.md (@aarch64_sme_):
New patterns.
* config/aarch64/aarch64-sve-builtins-sme.def (svamin): New intrinsics.
(svamax): New intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc (class faminmaximpl): New
class.
(svamin): New function.
(svamax): New function.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c: New test.
* gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c: New test.
---
  gcc/config/aarch64/aarch64-sme.md |  18 +++
  .../aarch64/aarch64-sve-builtins-sme.def  |   5 +
  .../aarch64/aarch64-sve-builtins-sve2.cc  |  44 +-
  .../aarch64/sme2/acle-asm/amax_f16_x2.c   |  97 +
  .../aarch64/sme2/acle-asm/amax_f16_x4.c   | 128 +
  .../aarch64/sme2/acle-asm/amax_f32_x2.c   |  96 +
  .../aarch64/sme2/acle-asm/amax_f32_x4.c   | 129 ++
  .../aarch64/sme2/acle-asm/amax_f64_x2.c   |  96 +
  .../aarch64/sme2/acle-asm/amax_f64_x4.c   | 128 +
  .../aarch64/sme2/acle-asm/amin_f16_x2.c   |  96 +
  .../aarch64/sme2/acle-asm/amin_f16_x4.c   | 128 +
  .../aarch64/sme2/acle-asm/amin_f32_x2.c   |  96 +
  .../aarch64/sme2/acle-asm/amin_f32_x4.c   | 128 +
  .../aarch64/sme2/acle-asm/amin_f64_x2.c   |  96 +
  .../aarch64/sme2/acle-asm/amin_f64_x4.c   | 128 +
  15 files changed, 1409 insertions(+), 4 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f16_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f32_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amax_f64_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f16_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f32_x4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/amin_f64_x4.c

diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index b8bb4cc14b6..bfe368e80b5 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -38,6 +38,7 @@
  ;;  Binary arithmetic on ZA tile
  ;;  Binary arithmetic on ZA slice
  ;;  Binary arithmetic, writing to ZA slice
+;;  Absolute minimum/maximum
  ;;
  ;; == Ternary arithmetic
  ;;  [INT] Dot product
@@ -1264,6 +1265,23 @@ (define_insn "*aarch64_sme_single__plus"
"\tza.[%w0, %1, vgx], %2, %3."
  )
  


Sorry for picking up on this after it was committed, but:


+;; -
+;;  Absolute minimum/maximum
+;; -
+;; Includes:
+;; - svamin (SME2+faminmax)
+;; - svamin (SME2+faminmax)


Even though these are currently exclusively used by the ACLE, I think we should
be listing the names of the instructions here.  It looks like the convention
would be to capitalise "+faminax" too.

Sure, that makes sense.



+;; -
+
+(define_insn "@aarch64_sme_"
+  [(set (match_operand:SVE_Fx24 0 "register_operand" "=Uw")
+   (unspec:SVE_Fx24 [(match_operand:SVE_Fx24 1 "register_operand" "%0")
+

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-14 Thread Jonathan Wakely

On Mon, 14 Jul 2025 at 11:08, Björn Schäpers  wrote:
>
> Am 14.07.2025 um 10:20 schrieb Tomasz Kaminski:
> >
> >
> > On Tue, Jul 8, 2025 at 10:48 PM Björn Schäpers  > > wrote:
> >
> > From: Björn Schäpers mailto:bjo...@hazardy.de>>
> >
> > I have based this on my previous (not yet landed) patch, but it only
> > reuses the #ifdef to include . Since std::array isn't used
> > anywhere else I thought that was the right place to put it.
> >
> > I hope the formatting is okay.
> >
> > I've used wide strings for the Windows zone name and territory, since
> > the Windows API returns wide strings and thus they can be compared
> > directly. For the territory there exists a narrow string API, but
> > internally it calls the wide string version and narrows it down. If
> > desired I can switch to narrow strings, the conversion can be done by
> > static_cast per character since only ASCII chars are used.
> >
> > -- >8 --
> > On Windows there is no API to get the current time zone as IANA name,
> > instead Windows has its own zones. But there exists a mapping provided
> > by the Unicode Consortium. This patch adds a script to convert the XML
> > file with the mapping to a lookup table and adds a Windows code path to
> > use that mapping.
> >
> > libstdc++-v3/Changelog:
> >
> >  Implement std::chrono::current_zone() for Windows
> >
> >  * include/bits/windows_zones-map.h: New file, contains the look
> >  up table.
> >  * scripts/gen_windows_zones_map.py: New file, generates
> >  windows_zones-map.h.
> >  * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code 
> > path.
> >
> > Signed-off-by: Björn Schäpers  > >
> > ---
> >   libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
> >   libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
> >   libstdc++-v3/src/c++20/tzdb.cc| 102 -
> >   3 files changed, 635 insertions(+), 2 deletions(-)
> >   create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
> >   create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
> >
> > diff --git a/libstdc++-v3/include/bits/windows_zones-map.h 
> > b/libstdc++-v3/
> > include/bits/windows_zones-map.h
> > new file mode 100644
> > index 000..7be736b063d
> > --- /dev/null
> > +++ b/libstdc++-v3/include/bits/windows_zones-map.h
> > @@ -0,0 +1,407 @@
> > +// Generated by scripts/gen_windows_zones_map.py, do not edit.
> > +
> > +// Copyright The GNU Toolchain Authors.
> > +//
> > +// This file is part of the GNU ISO C++ Library.  This library is free
> > +// software; you can redistribute it and/or modify it under the
> > +// terms of the GNU General Public License as published by the
> > +// Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +
> > +// This library is distributed in the hope that it will be useful,
> > +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +// GNU General Public License for more details.
> > +
> > +// Under Section 7 of GPL version 3, you are granted additional
> > +// permissions described in the GCC Runtime Library Exception, version
> > +// 3.1, as published by the Free Software Foundation.
> > +
> > +// You should have received a copy of the GNU General Public License 
> > and
> > +// a copy of the GCC Runtime Library Exception along with this program;
> > +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, 
> > see
> > +// >.
> > +
> > +/** @file bits/windows_zones-map.h
> > + *  This is an internal header file, included by other library headers.
> > + *  Do not attempt to use it directly. @headername{chrono}
> > + */
> > +
> > +#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
> > +# error "This is not a public header, do not include it directly"
> > +#endif
> > +
> > +struct windows_zone_map_entry
> > +{
> > +  wstring_view windows_name;
> > +  wstring_view territory;
> > +  string_view iana_name;
> > +};
> > +
> > +static constexpr array windows_zone_map{
> > +  {
> > +{L"AUS Central Standard Time", L"001", "Australia/Darwin"},
> > +{L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
> > +{L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
> > +{L"Alaskan Standard Time", L"001", "America/Anchorage"},
> > +{L"Aleutian Standard Time", L"001", "America/Adak"},
> > +{L"Altai Standard Time", L"001", "Asia/Barnaul"},
> > +{L"Arab Standard Time", L"001", "Asia/Riyadh"},
>

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread H.J. Lu

On Mon, Jul 14, 2025 at 3:40 PM H.J. Lu  wrote:
>
> On Mon, Jul 14, 2025 at 3:34 PM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 9:11 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > > >
> > > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > > >
> > > > > > > > > > > replaces
> > > > > > > > > > >
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > with
> > > > > > > > > > >
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (nil))
> > > > > > > > > > > ...
> > > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated 
> > > > > > > > > > > x2
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > >
> > > > > > > > > > > which leads to
> > > > > > > > > > >
> > > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > > >34 | }
> > > > > > > > > > >   | ^
> > > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])) -1
> > > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > > repeated x8
> > > > > > > > > > > ])
> > > > > > > > > > > (nil)))
> > > > > > > > > > > during RTL pass: ira
> > > > > > > > > > >
> > > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or 
> > > > > > > > > > > integer vector -1.
> > > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > > operand.
> > > > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer 
> > > > > > > > > > > all 1s vectors.
> > > > > > > > > > > Replace  with  to generate
> > > > > > > > > > >
> > > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > > >
> > > > > > > > > > > for
> > > > > > > > > > >
> > > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > > >
> > > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 
> > > > > > > > > > > 0s.
> > > > > > > > > >
> > > > > > > > > > Actually, we don't want this, we should keep the top 64 
> > > > > > > > > > bits zero,
> > > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > > NaN.
> > > > > > > > > >
> > > > > > > > > > So, I think the correct way is to avoid the transformation 
> > > > > > > > > > for
> > > > > > > > > > narrower modes in the first place.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > How does your latest patch handle this?
> > > > > > > > >
> > > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > > >
> > > > > > > > > __v8qi
> > > > >

Re: [PATCH v3] x86: Update MMX moves to support all 1s vectors

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 9:37 AM H.J. Lu  wrote:
>
> On Mon, Jul 14, 2025 at 3:11 PM Uros Bizjak  wrote:
> >
> > On Mon, Jul 14, 2025 at 5:32 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Jul 14, 2025 at 2:14 AM H.J. Lu  wrote:
> > > >
> > > > On Sat, Jul 12, 2025 at 7:51 PM Uros Bizjak  wrote:
> > > > >
> > > > > On Sat, Jul 12, 2025 at 1:41 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Sat, Jul 12, 2025 at 5:58 PM Uros Bizjak  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jul 12, 2025 at 11:52 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Sat, Jul 12, 2025 at 5:03 PM Uros Bizjak  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jul 11, 2025 at 6:05 AM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > commit 77473a27bae04da99d6979d43e7bd0a8106f4557
> > > > > > > > > > Author: H.J. Lu 
> > > > > > > > > > Date:   Thu Jun 26 06:08:51 2025 +0800
> > > > > > > > > >
> > > > > > > > > > x86: Also handle all 1s float vector constant
> > > > > > > > > >
> > > > > > > > > > replaces
> > > > > > > > > >
> > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > (mem/u/c:V2SF (symbol_ref/u:DI ("*.LC0") [flags 
> > > > > > > > > > 0x2]) [0  S8 A64])) 2031
> > > > > > > > > >  {*movv2sf_internal}
> > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > >
> > > > > > > > > > with
> > > > > > > > > >
> > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])) -1
> > > > > > > > > >  (nil))
> > > > > > > > > > ...
> > > > > > > > > > (insn 29 28 30 5 (set (reg:V2SF 107)
> > > > > > > > > > (subreg:V2SF (reg:V8QI 112) 0)) 2031 
> > > > > > > > > > {*movv2sf_internal}
> > > > > > > > > >  (expr_list:REG_EQUAL (const_vector:V2SF [
> > > > > > > > > > (const_double:SF -QNaN [-QNaN]) repeated x2
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > >
> > > > > > > > > > which leads to
> > > > > > > > > >
> > > > > > > > > > pr121015.c: In function ‘render_result_from_bake_h’:
> > > > > > > > > > pr121015.c:34:1: error: unrecognizable insn:
> > > > > > > > > >34 | }
> > > > > > > > > >   | ^
> > > > > > > > > > (insn 98 13 14 3 (set (reg:V8QI 112)
> > > > > > > > > > (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])) -1
> > > > > > > > > >  (expr_list:REG_EQUIV (const_vector:V8QI [
> > > > > > > > > > (const_int -1 [0x]) 
> > > > > > > > > > repeated x8
> > > > > > > > > > ])
> > > > > > > > > > (nil)))
> > > > > > > > > > during RTL pass: ira
> > > > > > > > > >
> > > > > > > > > > 1. Add vector_const0_or_m1_operand for vector 0 or integer 
> > > > > > > > > > vector -1.
> > > > > > > > > > 2. Add nonimm_or_vector_const0_or_m1_operand for 
> > > > > > > > > > nonimmediate, vector 0
> > > > > > > > > > or integer vector -1 operand.
> > > > > > > > > > 3. Add BX constraint for MMX vector constant all 0s/1s 
> > > > > > > > > > operand.
> > > > > > > > > > 4. Update MMXMODE:*mov_internal to support integer 
> > > > > > > > > > all 1s vectors.
> > > > > > > > > > Replace  with  to generate
> > > > > > > > > >
> > > > > > > > > > pcmpeqd %xmm0, %xmm0
> > > > > > > > > >
> > > > > > > > > > for
> > > > > > > > > >
> > > > > > > > > > (set (reg/i:V8QI 20 xmm0)
> > > > > > > > > >  (const_vector:V8QI [(const_int -1 
> > > > > > > > > > [0x]) repeated x8]))
> > > > > > > > > >
> > > > > > > > > > NB: The upper 64 bits in XMM0 are all 1s, instead of all 0s.
> > > > > > > > >
> > > > > > > > > Actually, we don't want this, we should keep the top 64 bits 
> > > > > > > > > zero,
> > > > > > > > > especially for floating point, where the pattern represents 
> > > > > > > > > NaN.
> > > > > > > > >
> > > > > > > > > So, I think the correct way is to avoid the transformation for
> > > > > > > > > narrower modes in the first place.
> > > > > > > > >
> > > > > > > >
> > > > > > > > How does your latest patch handle this?
> > > > > > > >
> > > > > > > > typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> > > > > > > >
> > > > > > > > __v8qi
> > > > > > > > m1 (void)
> > > > > > > > {
> > > > > > > >   return __extension__(__v8qi){-1, -1, -1, -1, -1, -1, -1, -1};
> > > > > > > > }
> > > > > > >
> > > > > > > No, my patch is also not appropriate, because it also introduces
> > > > > > > "pcmpeq %xmm, %xmm". We should not generate 8-byte all-ones load 
> > > > > > > using
> > > > > > > pcmpeq, because upper 64 b

Re: [PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-14 Thread Robin Dapp


For the record, the Linaro CI notified me that this caused regressions:

Produces 2 regressions:
 |
 | regressions.sum:
 | Running gcc:gcc.dg/dg.exp ...
 | FAIL: gcc.dg/pr103248.c (internal compiler error: in optab_for_tree_code, at 
optabs-tree.cc:85)
 | FAIL: gcc.dg/pr103248.c (test for excess errors)

I'll have a look once I get to it.

--
Regards
Robin

Re: [PATCH] libstdc++: Add more template keywords to for Clang

2025-07-14 Thread Tomasz Kaminski

I was going to ask why you haven't committed it directly, but it looks like
it is already on master.
LGTM.

On Tue, Jul 8, 2025 at 11:34 PM Jonathan Wakely  wrote:

> This fixes:
>
> include/c++/16.0.0/mdspan:1182:33: error: use 'template' keyword to treat
> 'mapping' as a dependent template name
>  1182 |   const typename _OLayout::mapping<_OExtents>&>
>   |^
> include/c++/16.0.0/mdspan:1185:31: error: use 'template' keyword to treat
> 'mapping' as a dependent template name
>  1185 | const typename _OLayout::mapping<_OExtents>&,
> mapping_type>
>   |  ^
>
> libstdc++-v3/ChangeLog:
>
> * include/std/mdspan (mdspan): Add template keyword for
> dependent name.
> ---
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/std/mdspan | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index 5d16de5d9072..b34116a85e67 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -1179,10 +1179,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>template _OLayout,
>typename _OAccessor>
> requires is_constructible_v - const typename _OLayout::mapping<_OExtents>&>
> + const typename _OLayout::template mapping<_OExtents>&>
>   && is_constructible_v
> constexpr explicit(!is_convertible_v<
> -   const typename _OLayout::mapping<_OExtents>&, mapping_type>
> +   const typename _OLayout::template mapping<_OExtents>&,
> mapping_type>
>   || !is_convertible_v)
> mdspan(const mdspan<_OElementType, _OExtents, _OLayout,
> _OAccessor>&
>  __other)
> --
> 2.50.0
>
>

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-14 Thread Tomasz Kaminski

On Tue, Jul 8, 2025 at 10:48 PM Björn Schäpers  wrote:

> From: Björn Schäpers 
>
> I have based this on my previous (not yet landed) patch, but it only
> reuses the #ifdef to include . Since std::array isn't used
> anywhere else I thought that was the right place to put it.
>
> I hope the formatting is okay.
>
> I've used wide strings for the Windows zone name and territory, since
> the Windows API returns wide strings and thus they can be compared
> directly. For the territory there exists a narrow string API, but
> internally it calls the wide string version and narrows it down. If
> desired I can switch to narrow strings, the conversion can be done by
> static_cast per character since only ASCII chars are used.
>
> -- >8 --
> On Windows there is no API to get the current time zone as IANA name,
> instead Windows has its own zones. But there exists a mapping provided
> by the Unicode Consortium. This patch adds a script to convert the XML
> file with the mapping to a lookup table and adds a Windows code path to
> use that mapping.
>
> libstdc++-v3/Changelog:
>
> Implement std::chrono::current_zone() for Windows
>
> * include/bits/windows_zones-map.h: New file, contains the look
> up table.
> * scripts/gen_windows_zones_map.py: New file, generates
> windows_zones-map.h.
> * src/c++20/tzdb.cc (tzdb::current_zone): Add Windows code path.
>
> Signed-off-by: Björn Schäpers 
> ---
>  libstdc++-v3/include/bits/windows_zones-map.h | 407 ++
>  libstdc++-v3/scripts/gen_windows_zones_map.py | 128 ++
>  libstdc++-v3/src/c++20/tzdb.cc| 102 -
>  3 files changed, 635 insertions(+), 2 deletions(-)
>  create mode 100644 libstdc++-v3/include/bits/windows_zones-map.h
>  create mode 100644 libstdc++-v3/scripts/gen_windows_zones_map.py
>
> diff --git a/libstdc++-v3/include/bits/windows_zones-map.h
> b/libstdc++-v3/include/bits/windows_zones-map.h
> new file mode 100644
> index 000..7be736b063d
> --- /dev/null
> +++ b/libstdc++-v3/include/bits/windows_zones-map.h
> @@ -0,0 +1,407 @@
> +// Generated by scripts/gen_windows_zones_map.py, do not edit.
> +
> +// Copyright The GNU Toolchain Authors.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// .
> +
> +/** @file bits/windows_zones-map.h
> + *  This is an internal header file, included by other library headers.
> + *  Do not attempt to use it directly. @headername{chrono}
> + */
> +
> +#ifndef _GLIBCXX_GET_WINDOWS_ZONES_MAP
> +# error "This is not a public header, do not include it directly"
> +#endif
> +
> +struct windows_zone_map_entry
> +{
> +  wstring_view windows_name;
> +  wstring_view territory;
> +  string_view iana_name;
> +};
> +
> +static constexpr array windows_zone_map{
> +  {
> +{L"AUS Central Standard Time", L"001", "Australia/Darwin"},
> +{L"AUS Eastern Standard Time", L"001", "Australia/Sydney"},
> +{L"Afghanistan Standard Time", L"001", "Asia/Kabul"},
> +{L"Alaskan Standard Time", L"001", "America/Anchorage"},
> +{L"Aleutian Standard Time", L"001", "America/Adak"},
> +{L"Altai Standard Time", L"001", "Asia/Barnaul"},
> +{L"Arab Standard Time", L"001", "Asia/Riyadh"},
> +{L"Arab Standard Time", L"BH", "Asia/Bahrain"},
> +{L"Arab Standard Time", L"KW", "Asia/Kuwait"},
> +{L"Arab Standard Time", L"QA", "Asia/Qatar"},
> +{L"Arab Standard Time", L"YE", "Asia/Aden"},
> +{L"Arabian Standard Time", L"001", "Asia/Dubai"},
> +{L"Arabian Standard Time", L"OM", "Asia/Muscat"},
> +{L"Arabian Standard Time", L"ZZ", "Etc/GMT-4"},
> +{L"Arabic Standard Time", L"001", "Asia/Baghdad"},
> +{L"Argentina Standard Time", L"001", "America/Buenos_Aires"},
> +{L"Astrakhan Standard Time", L"001", "Europe/Astrakhan"},
> +{L"Atlantic Standard Time", L"001", "America/Halifax"},
> +{L"Atlantic Standard Time", L"BM", "Atlantic/Bermuda"},
> +{L"Atlantic Standard Time", L"GL", "America/Thule"},
> +{L"Aus Central W. Standard Time", L"001", "Austr

Re: [PATCH v2] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-14 Thread Robin Dapp


This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus-mult RTL
instruction.

Before this patch, we have three instructions, e.g.:
  fcvt.s.h   fa5,fa5
  vfmv.v.f   v24,fa5
  vfmadd.vv  v8,v24,v16


Even though for some reason the CI didn't pick it up again, Jeff relayed to me 
that it passed in his own "CI".  Therefore I'd say this is OK now.


--
Regards
Robin

[Ada] Fix PR ada/121056

2025-07-14 Thread Eric Botcazou

This adds a missing guard before accessing the Underlying_Record_View field.

Tested on x86-64/Linux, applied on the mainline.


2025-07-14  Eric Botcazou  

PR ada/121056
* sem_ch4.adb (Try_Object_Operation.Try_Primitive_Operation): Add
test on Is_Record_Type before accessing Underlying_Record_View.


2025-07-14  Eric Botcazou  

* gnat.dg/deref4.adb: New test.
* gnat.dg/deref4_pkg.ads: New helper.

-- 
Eric Botcazoudiff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index dc814676675..56dc7c6355c 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -10692,6 +10692,7 @@ package body Sem_Ch4 is
 
   or else
 (Has_Unknown_Discriminants (Typ)
+  and then Is_Record_Type (Base_Type (Obj_Type))
   and then Typ = Underlying_Record_View (Base_Type (Obj_Type)))
 
--  Prefix can be dereferenced
-- { dg-do compile }
-- { dg-options "-gnatX" }

with Deref4_Pkg; use Deref4_Pkg;

procedure Deref4 is
begin
  Obj.Proc (null);
end;
package Deref4_Pkg is

  type A is tagged null record;
  type A_Ptr is access A;
  procedure Proc (This : in out A'Class; Some_Parameter : A_Ptr) is null;
  Obj : A_Ptr;

end Deref4_Pkg;

[PATCH] Darwin: account for macOS 26

2025-07-14 Thread FX Coudert

Hello,

darwin25 will be named macOS 26 (codename Tahoe). This is a change from
darwin24, which was macOS 15. We need to adapt the driver to this new
numbering scheme.

Tested by me on aarch64-darwin25 and aarch64-darwin24, and Rainer on 
x86_64-darwin25.

OK to push?
FX



0001-Darwin-account-for-macOS-26.patch
Description: Binary data

Re: [PATCH] libstdc++: library side of C++26 P2786R13 - Trivial Relocatability [PR119064]

2025-07-14 Thread Tomasz Kaminski

On Tue, Jun 17, 2025 at 1:15 PM Jakub Jelinek  wrote:

> Hi!
>
> Here is a new version of the library side of the C++26 P2786R13 paper.
> For if constexpr the patch uses __builtin_constant_p trick to figure
> out if __result is non-equality comparable with __first, it adds recursion
> for the is_array_v cases, adds qualification on several calls and rewrites
> the testcase, such that it is hopefully valid and also tests the constant
> evaluation.
>
> 2025-06-17  Jakub Jelinek  
>
> PR c++/119064
> * include/bits/version.def (trivially_relocatable): New.
> * include/bits/version.h: Regenerate.
> * include/std/type_traits (std::is_trivially_relocatable,
> std::is_nothrow_relocatable, std::is_replaceable): New traits.
> std::is_trivially_relocatable_v, std::is_nothrow_relocatable_v,
> std::is_replaceable_v): New trait variable templates.
> * include/std/memory (__glibcxx_want_trivially_relocatable): Define
> before including bits/version.h.
> (std::trivially_relocate): New template function.
> (std::relocate): Likewise.
> * testsuite/std/memory/relocate/relocate.cc: New test.
>
> --- libstdc++-v3/include/bits/version.def.jj2025-06-12
> 20:19:52.367395730 +0200
> +++ libstdc++-v3/include/bits/version.def   2025-06-16
> 22:10:09.415721974 +0200
> @@ -2012,6 +2012,15 @@ ftms = {
>};
>  };
>
> +ftms = {
> +  name = trivially_relocatable;
> +  values = {
> +v = 202502;
> +cxxmin = 26;
> +extra_cond = "__cpp_trivial_relocatability >= 202502L";
> +  };
> +};
> +
>  // Standard test specifications.
>  stds[97] = ">= 199711L";
>  stds[03] = ">= 199711L";
> --- libstdc++-v3/include/bits/version.h.jj  2025-06-12
> 20:19:52.367395730 +0200
> +++ libstdc++-v3/include/bits/version.h 2025-06-16 22:10:09.416721960 +0200
> @@ -2253,4 +2253,14 @@
>  #endif /* !defined(__cpp_lib_sstream_from_string_view) &&
> defined(__glibcxx_want_sstream_from_string_view) */
>  #undef __glibcxx_want_sstream_from_string_view
>
> +#if !defined(__cpp_lib_trivially_relocatable)
> +# if (__cplusplus >  202302L) && (__cpp_trivial_relocatability >= 202502L)
> +#  define __glibcxx_trivially_relocatable 202502L
> +#  if defined(__glibcxx_want_all) ||
> defined(__glibcxx_want_trivially_relocatable)
> +#   define __cpp_lib_trivially_relocatable 202502L
> +#  endif
> +# endif
> +#endif /* !defined(__cpp_lib_trivially_relocatable) &&
> defined(__glibcxx_want_trivially_relocatable) */
> +#undef __glibcxx_want_trivially_relocatable
> +
>  #undef __glibcxx_want_all
> --- libstdc++-v3/include/std/type_traits.jj 2025-06-12
> 00:20:03.898666479 +0200
> +++ libstdc++-v3/include/std/type_traits2025-06-16
> 22:10:09.416721960 +0200
> @@ -4245,6 +4245,60 @@ template
>  #endif // C++2a
>
> +#if __glibcxx_trivially_relocatable >= 202502L // C++ >= 26 &&
> __cpp_trivial_relocatability >= 202502
> +  /// True if the type is a trivially relocatable type.
> +  /// @since C++26
> +
> +  template
> +struct is_trivially_relocatable
> +# if __has_builtin(__builtin_is_trivially_relocatable)
> +: bool_constant<__builtin_is_trivially_relocatable(_Tp)>
> +# else
> +: bool_constant<__builtin_is_cpp_trivially_relocatable(_Tp)>
> +# endif
> +{ };
> +
> +  template
> +struct is_nothrow_relocatable
> +# if _GLIBCXX_USE_BUILTIN_TRAIT(__builtin_is_nothrow_relocatable)
> +: bool_constant<__builtin_is_nothrow_relocatable(_Tp)>
> +# else
> +: public __or_,
> +
> __and_>,
> +
>  is_nothrow_destructible>>>::type
> +# endif
> +{ };
> +
> +  template
> +struct is_replaceable
> +: bool_constant<__builtin_is_replaceable(_Tp)>
> +{ };
> +
> +  /// @ingroup variable_templates
> +  /// @since C++26
> +  template
> +inline constexpr bool is_trivially_relocatable_v
> +# if __has_builtin(__builtin_is_trivially_relocatable)
> +  = __builtin_is_trivially_relocatable(_Tp);
> +# else
> +  = __builtin_is_cpp_trivially_relocatable(_Tp);
> +# endif
> +
> +  template
> +inline constexpr bool is_nothrow_relocatable_v
> +# if _GLIBCXX_USE_BUILTIN_TRAIT(__builtin_is_nothrow_relocatable)
> +  = __builtin_is_nothrow_relocatable(_Tp);
> +# else
> +  = (is_trivially_relocatable_v<_Tp>
> +|| (is_nothrow_move_constructible_v>
> +&& is_nothrow_destructible_v>);
> +# endif
> +
> +  template
> +inline constexpr bool is_replaceable_v
> +  = __builtin_is_replaceable(_Tp);
> +#endif
> +
>/// @} group metaprogramming
>
>  _GLIBCXX_END_NAMESPACE_VERSION
> --- libstdc++-v3/include/std/memory.jj  2025-06-02 21:58:22.148775743 +0200
> +++ libstdc++-v3/include/std/memory 2025-06-17 12:03:01.249069663 +0200
> @@ -121,6 +121,7 @@
>  #define __glibcxx_want_smart_ptr_for_overwrite
>  #define __glibcxx_want_to_address
>  #define __glibcxx_want_transparent_operators
> +#define __glibcxx_want_trivially_relocatable
>  #include 
>
>  #if __cplusplus >= 201103L && __cplusplus <= 202002L && _GLIBCXX_HOST

Re: [PATCH] libstdc++: library side of C++26 P2786R13 - Trivial Relocatability [PR119064]

2025-07-14 Thread Jakub Jelinek

On Mon, Jul 14, 2025 at 12:11:18PM +0200, Tomasz Kaminski wrote:
> > +  if (__builtin_expect(__fwd, true))
> >
> We have a preference to use [[likely]] attribute when possible.

Ok, changed to
  if (__fwd) [[likely]]
in my copy.

> 
> > +   {
> > + for (; __first != __last; ++__first, ++__result)
> > +   {
> > + if constexpr (is_array_v<_Tp>)
> > +   std::relocate(std::begin(*__first), std::end(*__first),
> > + &(*__result)[0]);
> >
> We should use std::addressof or __builtin_addressof  here to avoid using
> operator& found by ADL.

Ok.  Which one though?  I see all of std::addressof, std::__addressof and
__builtin_addressof used heavily.

> The standard uses start_lifetime_as here (
> https://eel.is/c++draft/memory#obj.lifetime-18.3.1),

I know, but P2590R2 is not implemented yet and as written in
https://gcc.gnu.org/PR106658 I have actually no idea what needs to be done
if anything on the compiler side.  Because at least the GIMPLE model
basically allows placement new anywhere without anything in the IL marking
up that the dynamic type has changed.  Though perhaps for constant
expression evaluation we want something...

As for test, I'm certainly open to suggestions.

Jakub

Re: [PATCH] libstdc++: library side of C++26 P2786R13 - Trivial Relocatability [PR119064]

2025-07-14 Thread Tomasz Kaminski

On Mon, Jul 14, 2025 at 12:23 PM Jakub Jelinek  wrote:

> On Mon, Jul 14, 2025 at 12:11:18PM +0200, Tomasz Kaminski wrote:
> > > +  if (__builtin_expect(__fwd, true))
> > >
> > We have a preference to use [[likely]] attribute when possible.
>
> Ok, changed to
>   if (__fwd) [[likely]]
> in my copy.
>
> >
> > > +   {
> > > + for (; __first != __last; ++__first, ++__result)
> > > +   {
> > > + if constexpr (is_array_v<_Tp>)
> > > +   std::relocate(std::begin(*__first), std::end(*__first),
> > > + &(*__result)[0]);
> > >
> > We should use std::addressof or __builtin_addressof  here to avoid using
> > operator& found by ADL.
>
> Ok.  Which one though?  I see all of std::addressof, std::__addressof and
> __builtin_addressof used heavily.
>
I would go for std::addressof if you have it already
available __builtin_addressor.

>
> > The standard uses start_lifetime_as here (
> > https://eel.is/c++draft/memory#obj.lifetime-18.3.1),
>
> I know, but P2590R2 is not implemented yet and as written in
> https://gcc.gnu.org/PR106658 I have actually no idea what needs to be done
> if anything on the compiler side.  Because at least the GIMPLE model
> basically allows placement new anywhere without anything in the IL marking
> up that the dynamic type has changed.  Though perhaps for constant
> expression evaluation we want something...
>
The observable effects of that call, is that calling relocate on array is
not supported
at compile time, because start_lifetime_as is not constexpr.

>
> As for test, I'm certainly open to suggestions.
>
> Jakub
>
>

Re: [PATCH v2] x86-64: Add --enable-x86-64-mfentry

2025-07-14 Thread Uros Bizjak

On Sun, Jul 13, 2025 at 12:16 AM Sam James  wrote:
>
> "H.J. Lu"  writes:
>
> > On Sat, Jul 12, 2025 at 6:58 AM Siddhesh Poyarekar  
> > wrote:
> >>
> >> On 2025-07-11 15:28, Uros Bizjak wrote:
> >> >> Why not just switch over unconditionally?  __fentry__ seems like a
> >> >> better alternative to mcount overall and it has been around long enough
> >> >> that even older deployments should be relatively unaffected.
> >> >
> >> > Actually, it is switched on by default for i?86-*-linux* |
> >> > x86_64-*-linux*. The default for --enable-x86-64-mfentry is "auto",
> >> > which triggers the mentioned condition. One still has a chance to use
> >> > "yes" or "no" in addition to "auto" when configuring with
> >> > --{enable|disable}-x86-64-mfentry.
> >>
> >> Oh that's good then.
> >>
> >> Thanks,
> >> Sid
> >
> > Here is the v2 patch.   The differences are
> >
> > 1.  Enable -mfentry by default for i?86-*-*gnu* | x86_64-*-*gnu*,
> > not i?86-*-*linux* | x86_64-*-*linux*.
> > 2.  Adjust some testcases.
> >
> > OK for master?
>
> I'll test it tonight but it looks good to me now. Thanks.

Assuming that tests passed, OK for mainline.

Thanks,
Uros.

Re: [PATCH v3] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Umesh Kalappa

Hi Jeff and Marco,

Please pass your comments on the below changes and do needful.

Thank you
~U

On Sun, Jul 6, 2025 at 12:56 PM Umesh Kalappa 
wrote:

> Hi @Jeff Law   and @ma...@orcam.me.uk
>  ,
>
> Please have a look at the updated patch for conditional move support and
> any comments or suggestions  please let us know ?
>
> Thank you
> ~U
>
> On Wed, Jul 2, 2025 at 12:46 PM Umesh Kalappa 
> wrote:
>
>> Indentation are updated accordingly and no regress found.
>>
>> gcc/ChangeLog:
>>
>> *config/riscv/riscv-cores.def(RISCV_CORE): Updated the supported
>> march.
>> *config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
>> New file added for mips conditional mov extension.
>> *config/riscv/riscv-ext.def: Likewise.
>> *config/riscv/t-riscv: Generates riscv-ext.opt
>> *config/riscv/riscv-ext.opt: Generated file.
>> *config/riscv/riscv.cc(riscv_expand_conditional_move): Updated
>> for mips cmov
>> and outlined some code that handle arch cond move.
>> *config/riscv/riscv.md(movcc): updated expand for MIPS
>> CCMOV.
>> *config/riscv/mips-insn.md: New file for mips-p8700 ccmov insn.
>> *gcc/doc/riscv-ext.texi: Updated for mips cmov.
>>
>> gcc/testsuite/ChangeLog:
>>
>> *testsuite/gcc.target/riscv/mipscondmov.c: Test file for
>> mips.ccmov insn.
>> ---
>>  gcc/config/riscv/mips-insn.md|  36 +++
>>  gcc/config/riscv/riscv-cores.def |   3 +-
>>  gcc/config/riscv/riscv-ext-mips.def  |  35 ++
>>  gcc/config/riscv/riscv-ext.def   |   1 +
>>  gcc/config/riscv/riscv-ext.opt   |   4 +
>>  gcc/config/riscv/riscv.cc| 107 +--
>>  gcc/config/riscv/riscv.md|   3 +-
>>  gcc/config/riscv/t-riscv |   3 +-
>>  gcc/doc/riscv-ext.texi   |   4 +
>>  gcc/testsuite/gcc.target/riscv/mipscondmov.c |  30 ++
>>  10 files changed, 189 insertions(+), 37 deletions(-)
>>  create mode 100644 gcc/config/riscv/mips-insn.md
>>  create mode 100644 gcc/config/riscv/riscv-ext-mips.def
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c
>>
>> diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
>> new file mode 100644
>> index 000..de53638d587
>> --- /dev/null
>> +++ b/gcc/config/riscv/mips-insn.md
>> @@ -0,0 +1,36 @@
>> +;; Machine description for MIPS custom instructions.
>> +;; Copyright (C) 2025 Free Software Foundation, Inc.
>> +
>> +;; This file is part of GCC.
>> +
>> +;; GCC is free software; you can redistribute it and/or modify
>> +;; it under the terms of the GNU General Public License as published by
>> +;; the Free Software Foundation; either version 3, or (at your option)
>> +;; any later version.
>> +
>> +;; GCC is distributed in the hope that it will be useful,
>> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +;; GNU General Public License for more details.
>> +
>> +;; You should have received a copy of the GNU General Public License
>> +;; along with GCC; see the file COPYING3.  If not see
>> +;; .
>> +
>> +(define_insn "*movcc_bitmanip"
>> +  [(set (match_operand:GPR 0 "register_operand" "=r")
>> +   (if_then_else:GPR
>> + (any_eq:X (match_operand:X 1 "register_operand" "r")
>> +(match_operand:X 2 "const_0_operand" "J"))
>> +(match_operand:GPR 3 "reg_or_0_operand" "rJ")
>> +(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
>> +  "TARGET_XMIPSCMOV"
>> +{
>> +  enum rtx_code code = ;
>> +  if (code == NE)
>> +return "mips.ccmov\t%0,%1,%z3,%z4";
>> +  else
>> +return "mips.ccmov\t%0,%1,%z4,%z3";
>> +}
>> +[(set_attr "type" "condmove")
>> + (set_attr "mode" "")])
>> diff --git a/gcc/config/riscv/riscv-cores.def
>> b/gcc/config/riscv/riscv-cores.def
>> index 2096c0095d4..98f347034fb 100644
>> --- a/gcc/config/riscv/riscv-cores.def
>> +++ b/gcc/config/riscv/riscv-cores.def
>> @@ -169,7 +169,6 @@ RISCV_CORE("xiangshan-kunminghu",
>>  "rv64imafdcbvh_sdtrig_sha_shcounterenw_"
>>   "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
>>   "xiangshan-kunminghu")
>>
>> -RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
>> - "zaamo_zalrsc_zba_zbb",
>> +RISCV_CORE("mips-p8700",  "rv64imfd_zicsr_zifencei_zalrsc_zba_zbb",
>>   "mips-p8700")
>>  #undef RISCV_CORE
>> diff --git a/gcc/config/riscv/riscv-ext-mips.def
>> b/gcc/config/riscv/riscv-ext-mips.def
>> new file mode 100644
>> index 000..f24507139f6
>> --- /dev/null
>> +++ b/gcc/config/riscv/riscv-ext-mips.def
>> @@ -0,0 +1,35 @@
>> +/* MIPS extension definition file for RISC-V.
>> +   Copyright (C) 2025 Free Software Foundation, Inc.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redist

[PATCH] libstdc++: Make all experimental::observer_ptr functions constexpr

2025-07-14 Thread Jonathan Wakely

I've just created LWG 4295 proposing this change, and am implementing it
via this patch.

libstdc++-v3/ChangeLog:

* include/experimental/memory (swap, make_observer_ptr): Add
constexpr.
(operator==, operator!=, operator<, operator>, operator<=)
(operator>=): Likewise.
* testsuite/experimental/memory/observer_ptr/make_observer.cc:
Checks for constant evaluation.
* testsuite/experimental/memory/observer_ptr/relops/relops.cc:
Likewise.
* testsuite/experimental/memory/observer_ptr/swap/swap.cc:
Likewise.
---

Tested x86_64-linux.

 libstdc++-v3/include/experimental/memory  | 22 +++
 .../memory/observer_ptr/make_observer.cc  | 11 ++--
 .../memory/observer_ptr/relops/relops.cc  | 28 +++
 .../memory/observer_ptr/swap/swap.cc  | 20 -
 4 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/libstdc++-v3/include/experimental/memory 
b/libstdc++-v3/include/experimental/memory
index 131e5acc03d9..1b01462e1b26 100644
--- a/libstdc++-v3/include/experimental/memory
+++ b/libstdc++-v3/include/experimental/memory
@@ -148,42 +148,42 @@ inline namespace fundamentals_v2
 }; // observer_ptr<>
 
   template
-void
+constexpr void
 swap(observer_ptr<_Tp>& __p1, observer_ptr<_Tp>& __p2) noexcept
 {
   __p1.swap(__p2);
 }
 
   template
-observer_ptr<_Tp>
+constexpr observer_ptr<_Tp>
 make_observer(_Tp* __p) noexcept
 {
   return observer_ptr<_Tp>(__p);
 }
 
   template
-bool
+constexpr bool
 operator==(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
   return __p1.get() == __p2.get();
 }
 
   template
-bool
+constexpr bool
 operator!=(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
 return !(__p1 == __p2);
 }
 
   template
-bool
+constexpr bool
 operator==(observer_ptr<_Tp> __p, nullptr_t) noexcept
 {
   return !__p;
 }
 
   template
-bool
+constexpr bool
 operator==(nullptr_t, observer_ptr<_Tp> __p) noexcept
 {
   return !__p;
@@ -197,14 +197,14 @@ inline namespace fundamentals_v2
 }
 
   template
-bool
+constexpr bool
 operator!=(nullptr_t, observer_ptr<_Tp> __p) noexcept
 {
   return bool(__p);
 }
 
   template
-bool
+constexpr bool
 operator<(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
   return std::less::type,
@@ -214,21 +214,21 @@ inline namespace fundamentals_v2
 }
 
   template
-bool
+constexpr bool
 operator>(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
   return __p2 < __p1;
 }
 
   template
-bool
+constexpr bool
 operator<=(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
   return !(__p2 < __p1);
 }
 
   template
-bool
+constexpr bool
 operator>=(observer_ptr<_Tp> __p1, observer_ptr<_Up> __p2)
 {
   return !(__p1 < __p2);
diff --git 
a/libstdc++-v3/testsuite/experimental/memory/observer_ptr/make_observer.cc 
b/libstdc++-v3/testsuite/experimental/memory/observer_ptr/make_observer.cc
index 048735ff63ae..1de9cf095092 100644
--- a/libstdc++-v3/testsuite/experimental/memory/observer_ptr/make_observer.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/observer_ptr/make_observer.cc
@@ -20,12 +20,19 @@
 #include 
 #include 
 
-int main()
+constexpr bool test()
 {
   const int i = 42;
   auto o = std::experimental::make_observer(&i);
   static_assert( std::is_same>(), "" );
+  std::experimental::observer_ptr>(), "" );
   VERIFY( o && *o == 42 );
   VERIFY( o.get() == &i );
+  return true;
+}
+
+int main()
+{
+  test();
+  static_assert( test(), "LWG 4295 - make_observer should be constexpr" );
 }
diff --git 
a/libstdc++-v3/testsuite/experimental/memory/observer_ptr/relops/relops.cc 
b/libstdc++-v3/testsuite/experimental/memory/observer_ptr/relops/relops.cc
index 3e23e0b452be..d03dd5dcbd80 100644
--- a/libstdc++-v3/testsuite/experimental/memory/observer_ptr/relops/relops.cc
+++ b/libstdc++-v3/testsuite/experimental/memory/observer_ptr/relops/relops.cc
@@ -22,13 +22,13 @@
 
 using std::experimental::observer_ptr;
 
-void test01()
+constexpr void test01()
 {
   observer_ptr a, b;
   VERIFY(a == b);
 }
 
-void test02()
+constexpr void test02()
 {
   int x[2]{};
   observer_ptr a{&x[0]};
@@ -40,7 +40,7 @@ void test02()
   VERIFY(b > a);
 }
 
-void test03()
+constexpr void test03()
 {
   int x{};
   observer_ptr a{&x};
@@ -48,9 +48,10 @@ void test03()
   VERIFY(a == b);
 }
 
-void test04()
+int x[2]{};
+
+constexpr void test04()
 {
-  static constexpr int x[2]{};
   constexpr observer_ptr a{&x[0]};
   constexpr observer_ptr b{&x[1]};
   VERIFY(a != b);
@@ -60,20 +61,25 @@ void test04()
   VERIFY(b > a);
 }
 
-void test05()
+constexpr void test05()
 {
-  static constexpr int x{};
-  constexpr observer_ptr a{&x};
-  constexpr observer_ptr b{&x};
+  constexpr observer_ptr a{&x[0]};
+  cons

Re: [PATCH] Darwin: account for macOS 26

2025-07-14 Thread Iain Sandoe

Hi FX,

sorry for the delay ...

> On 14 Jul 2025, at 11:17, FX Coudert  wrote:
> 
> Hello,
> 
> darwin25 will be named macOS 26 (codename Tahoe). This is a change from
> darwin24, which was macOS 15. We need to adapt the driver to this new
> numbering scheme.
> 
> Tested by me on aarch64-darwin25 and aarch64-darwin24, and Rainer on 
> x86_64-darwin25.
> 
> OK to push?

OK, thanks
Iain

[PATCH] arm: avoid gcc_s dependency

2025-07-14 Thread Pierre Ossman


Suggested fix for this issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60428

Did not get any response there, so seeing if this is a better forum for 
suggested changes.


We've been using this patch for years without any known issues.

Regards,
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
diff -up gcc-5.1.0/libgcc/config/arm/t-bpabi.arm-c-unwind gcc-5.1.0/libgcc/config/arm/t-bpabi
--- gcc-5.1.0/libgcc/config/arm/t-bpabi.arm-c-unwind	2012-08-17 17:06:06.0 +0200
+++ gcc-5.1.0/libgcc/config/arm/t-bpabi	2015-05-20 12:56:19.751653982 +0200
@@ -3,7 +3,8 @@ LIB1ASMFUNCS += _aeabi_lcmp _aeabi_ulcmp
 
 # Add the BPABI C functions.
 LIB2ADD += $(srcdir)/config/arm/bpabi.c \
-	   $(srcdir)/config/arm/unaligned-funcs.c
+	   $(srcdir)/config/arm/unaligned-funcs.c \
+	   $(srcdir)/config/arm/unwind-arm-dummy.c
 
 LIB2ADD_ST += $(srcdir)/config/arm/fp16.c
 
diff -up gcc-5.1.0/libgcc/config/arm/unwind-arm.c.arm-c-unwind gcc-5.1.0/libgcc/config/arm/unwind-arm.c
--- gcc-5.1.0/libgcc/config/arm/unwind-arm.c.arm-c-unwind	2015-01-05 13:33:28.0 +0100
+++ gcc-5.1.0/libgcc/config/arm/unwind-arm.c	2015-05-20 12:54:39.617322792 +0200
@@ -144,11 +144,11 @@ restore_non_core_regs (phase1_vrs * vrs)
 
 /* ABI defined personality routines.  */
 extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr0 (_Unwind_State,
-_Unwind_Control_Block *, _Unwind_Context *);// __attribute__((weak));
+_Unwind_Control_Block *, _Unwind_Context *);
 extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr1 (_Unwind_State,
-_Unwind_Control_Block *, _Unwind_Context *) __attribute__((weak));
+_Unwind_Control_Block *, _Unwind_Context *);
 extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr2 (_Unwind_State,
-_Unwind_Control_Block *, _Unwind_Context *) __attribute__((weak));
+_Unwind_Control_Block *, _Unwind_Context *);
 
 /* ABI defined routine to store a virtual register to memory.  */
 
diff -up gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c.arm-c-unwind gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c
--- gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c.arm-c-unwind	2015-05-20 12:54:39.617322792 +0200
+++ gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c	2015-05-20 12:54:39.617322792 +0200
@@ -0,0 +1,54 @@
+/* ARM EABI dummy unwinding routines.
+   Copyright 2014 Pierre Ossman for Cendio AB
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "unwind.h"
+
+extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr0 (_Unwind_State,
+_Unwind_Control_Block *, _Unwind_Context *) __attribute__((weak));
+extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr1 (_Unwind_State,
+_Unwind_Control_Block *, _Unwind_Context *) __attribute__((weak));
+extern _Unwind_Reason_Code __aeabi_unwind_cpp_pr2 (_Unwind_State,
+_Unwind_Control_Block *, _Unwind_Context *) __attribute__((weak));
+
+_Unwind_Reason_Code
+__aeabi_unwind_cpp_pr0 (_Unwind_State state,
+			_Unwind_Control_Block *ucbp,
+			_Unwind_Context *context)
+{
+  return _URC_FAILURE;
+}
+
+_Unwind_Reason_Code
+__aeabi_unwind_cpp_pr1 (_Unwind_State state,
+			_Unwind_Control_Block *ucbp,
+			_Unwind_Context *context)
+{
+  return _URC_FAILURE;
+}
+
+_Unwind_Reason_Code
+__aeabi_unwind_cpp_pr2 (_Unwind_State state,
+			_Unwind_Control_Block *ucbp,
+			_Unwind_Context *context)
+{
+  return _URC_FAILURE;
+}

Re: [PATCH v3] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Jeff Law





On 7/14/25 5:58 AM, Umesh Kalappa wrote:

Hi Jeff and Marco,

Please pass your comments on the below changes and do needful.

The changes fail pre-commit tsting:


https://patchwork.sourceware.org/project/gcc/patch/20250702071624.753431-1-ukalappa.m...@gmail.com/

Re: [PATCH v2] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-14 Thread Jeff Law





On 7/14/25 2:52 AM, Robin Dapp wrote:
This pattern enables the combine pass (or late-combine, depending on 
the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus- 
mult RTL

instruction.

Before this patch, we have three instructions, e.g.:
  fcvt.s.h   fa5,fa5
  vfmv.v.f   v24,fa5
  vfmadd.vv  v8,v24,v16


Even though for some reason the CI didn't pick it up again, Jeff relayed 
to me that it passed in his own "CI".  Therefore I'd say this is OK now.
Yea, not sure what's going on with the pre-commit system.  It seems to 
just randomly not run some patches.  It's happened to some of my patches 
as well.


I'll go ahead and push the change momentarily.

jeff

Re: [PATCH v3] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Umesh Kalappa

Thank you Jeff and let us verify from our end and come back .
~U

On Mon, Jul 14, 2025 at 5:36 PM Jeff Law  wrote:

>
>
> On 7/14/25 5:58 AM, Umesh Kalappa wrote:
> > Hi Jeff and Marco,
> >
> > Please pass your comments on the below changes and do needful.
> The changes fail pre-commit tsting:
>
> >
> https://patchwork.sourceware.org/project/gcc/patch/20250702071624.753431-1-ukalappa.m...@gmail.com/
>
>
>

Re: [PATCH v2] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-14 Thread Jeff Law





On 7/14/25 2:52 AM, Robin Dapp wrote:
This pattern enables the combine pass (or late-combine, depending on 
the case)
to merge a float_extend'ed vec_duplicate into a plus-mult or minus- 
mult RTL

instruction.

Before this patch, we have three instructions, e.g.:
  fcvt.s.h   fa5,fa5
  vfmv.v.f   v24,fa5
  vfmadd.vv  v8,v24,v16


Even though for some reason the CI didn't pick it up again, Jeff relayed 
to me that it passed in his own "CI".  Therefore I'd say this is OK now.
Paul-Antoine''s patches don't have the leading "a" and "b" component 
typically seen in a patch from git diff.  I wonder if that's why 
pre-commit testing isn't picking them up properly.


It's something I noticed when adding them to my system.

Jeff

Re: [PATCH v2 2/2] RISC-V: Add testcase for rv32 SAT_MUL from uint64

2025-07-14 Thread Jeff Law





On 7/12/25 2:58 AM, pan2...@intel.com wrote:

From: Pan Li 

Add the run and asm testcase for rv32 SAT_MUL, widen mul from
uint8_t, uint16_t, uint32_t to uint64_t.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-1-u8-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u16-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u32-from-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-1-u8-from-u64.c: New test.

And these are OK as well.

Jeff

[PATCH] tree-optimization/121059 - record loop mask when required

2025-07-14 Thread Richard Biener


For loop masking we need to mask a mask AND operation with the loop
mask.  The following makes sure we have a corresponding mask
available.  There's no good way to distinguish loop masking from
len masking here, so assume we have recorded a mask for the operands
mask producers.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/121059
* tree-vect-stmts.cc (vectorizable_operation): Record a
loop mask for mask AND operations.

* gcc.dg/vect/pr121059.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr121059.c | 24 
 gcc/tree-vect-stmts.cc   | 10 ++
 2 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr121059.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr121059.c 
b/gcc/testsuite/gcc.dg/vect/pr121059.c
new file mode 100644
index 000..d7f69b4f1f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr121059.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 --param vect-partial-vector-usage=1" } */
+/* { dg-additional-options "-march=x86-64-v4" { target avx512f } } */
+
+typedef struct {
+  long left, right, top, bottom;
+} MngBox;
+typedef struct {
+  MngBox object_clip[6];
+  char exists[256], frozen[];
+} MngReadInfo;
+MngReadInfo mng_info;
+
+long ReadMNGImage_i;
+
+void ReadMNGImage(int ReadMNGImage_i)
+{
+  for (; ReadMNGImage_i < 256; ReadMNGImage_i++)
+if (mng_info.exists[ReadMNGImage_i] && mng_info.frozen[ReadMNGImage_i])
+  mng_info.object_clip[ReadMNGImage_i].left =
+  mng_info.object_clip[ReadMNGImage_i].right =
+  mng_info.object_clip[ReadMNGImage_i].top =
+  mng_info.object_clip[ReadMNGImage_i].bottom = 0;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4aa69da2218..f0dc4843ca7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6978,6 +6978,16 @@ vectorizable_operation (vec_info *vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
}
+  else if (loop_vinfo
+  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+  && code == BIT_AND_EXPR
+  && VECTOR_BOOLEAN_TYPE_P (vectype)
+  /* We cannot always record a mask since that will disable
+ len-based partial vectors, but there should be already
+ one mask producer stmt which should require loop
+ masking.  */
+  && !masks->is_empty ())
+   vect_record_loop_mask (loop_vinfo, masks, vec_num, vectype, NULL);

   /* Put types on constant and invariant SLP children.  */
   if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
--
2.43.0

Re: [PATCH v2] x86-64: Add --enable-x86-64-mfentry

2025-07-14 Thread Uros Bizjak

On Mon, Jul 14, 2025 at 1:16 PM Uros Bizjak  wrote:
>
> On Sun, Jul 13, 2025 at 12:16 AM Sam James  wrote:
> >
> > "H.J. Lu"  writes:
> >
> > > On Sat, Jul 12, 2025 at 6:58 AM Siddhesh Poyarekar  
> > > wrote:
> > >>
> > >> On 2025-07-11 15:28, Uros Bizjak wrote:
> > >> >> Why not just switch over unconditionally?  __fentry__ seems like a
> > >> >> better alternative to mcount overall and it has been around long 
> > >> >> enough
> > >> >> that even older deployments should be relatively unaffected.
> > >> >
> > >> > Actually, it is switched on by default for i?86-*-linux* |
> > >> > x86_64-*-linux*. The default for --enable-x86-64-mfentry is "auto",
> > >> > which triggers the mentioned condition. One still has a chance to use
> > >> > "yes" or "no" in addition to "auto" when configuring with
> > >> > --{enable|disable}-x86-64-mfentry.
> > >>
> > >> Oh that's good then.
> > >>
> > >> Thanks,
> > >> Sid
> > >
> > > Here is the v2 patch.   The differences are
> > >
> > > 1.  Enable -mfentry by default for i?86-*-*gnu* | x86_64-*-*gnu*,
> > > not i?86-*-*linux* | x86_64-*-*linux*.
> > > 2.  Adjust some testcases.
> > >
> > > OK for master?
> >
> > I'll test it tonight but it looks good to me now. Thanks.
>
> Assuming that tests passed, OK for mainline.

HJ, please also mention this change in gcc-16 release notes.

Thanks,
Uros.

[PATCH] aarch64: fixup: Implement sme2+faminmax extension.

2025-07-14 Thread Alfie Richards

Hi all,

This is a minor fixup to the previous patch I committed fixing Spencers
comments.

Bootstrapped and reg tested for Aarch64.

Thanks,
Alfie

-- >8 --

Fixup to the SME2+FAMINMAX intrinsics commit.

gcc/ChangeLog:

* config/aarch64/aarch64-sme.md (@aarch64_sme_):
Change gating and comment.
---
 gcc/config/aarch64/aarch64-sme.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sme.md 
b/gcc/config/aarch64/aarch64-sme.md
index bfe368e80b5..6b3f4390943 100644
--- a/gcc/config/aarch64/aarch64-sme.md
+++ b/gcc/config/aarch64/aarch64-sme.md
@@ -1269,8 +1269,8 @@ (define_insn "*aarch64_sme_single__plus"
 ;;  Absolute minimum/maximum
 ;; -
 ;; Includes:
-;; - svamin (SME2+faminmax)
-;; - svamin (SME2+faminmax)
+;; - FAMIN (SME2+FAMINMAX)
+;; - FAMAX (SME2+FAMINMAX)
 ;; -
 
 (define_insn "@aarch64_sme_"
@@ -1278,7 +1278,7 @@ (define_insn "@aarch64_sme_"
(unspec:SVE_Fx24 [(match_operand:SVE_Fx24 1 "register_operand" "%0")
  (match_operand:SVE_Fx24 2 "register_operand" 
"Uw")]
 FAMINMAX_UNS))]
-  "TARGET_SME2 && TARGET_FAMINMAX"
+  "TARGET_STREAMING_SME2 && TARGET_FAMINMAX"
   "\t%0, %1, %2"
 )
 
-- 
2.34.1

Re: [PATCH] aarch64: Support unpacked SVE integer division

2025-07-14 Thread Spencer Abson

On Fri, Jul 11, 2025 at 02:40:46PM +, Remi Machet wrote:
> 
> On 7/11/25 08:21, Spencer Abson wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This patch extends the existing patterns for SVE_INT_BINARY_SD to
> support partial SVE integer modes, including those implement the
> conditional form.
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve.md (3): Extend
> to SVE_SDI_SIMD.
> (@aarch64_pred_): Likewise.
> (@cond_): Extend to SVE_SDI.
> (*cond__2): Likewise.
> (*cond__3): Likewise.
> (*cond__any): Likewise.
> * config/aarch64/iterators.md (SVE_SDI): New iterator for
> all SVE vector modes with 32-bit or 64-bit elements.
> (SVE_SDI_SIMD): New iterator.  As above, but including
> V4SI and V2DI.
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.target/aarch64/sve/cond_arith_1.C: Rename TEST_SHIFT
> to TEST_OP, add tests for SDIV and UDIV.
> * g++.target/aarch64/sve/cond_arith_2.C: Likewise.
> * g++.target/aarch64/sve/cond_arith_3.C: Likewise.
> * g++.target/aarch64/sve/cond_arith_4.C: Likewise.
> * gcc.target/aarch64/sve/div_2.c: New test.
> 
> ---
> 
> Bootstrapped & regtested on aarch64-linux-gnu.  OK for master?
> 
> Thanks,
> Spencer
> 
> ---
>  gcc/config/aarch64/aarch64-sve.md | 64 +--
>  gcc/config/aarch64/iterators.md   |  7 ++
>  .../g++.target/aarch64/sve/cond_arith_1.C | 25 +---
>  .../g++.target/aarch64/sve/cond_arith_2.C | 25 +---
>  .../g++.target/aarch64/sve/cond_arith_3.C | 27 +---
>  .../g++.target/aarch64/sve/cond_arith_4.C | 27 +---
>  gcc/testsuite/gcc.target/aarch64/sve/div_2.c  | 22 +++
>  7 files changed, 127 insertions(+), 70 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_2.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 6b5113eb70f..871b31623bb 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -4712,12 +4712,12 @@
>  ;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI
>  ;; optabs to the midend.
>  (define_expand "3"
> -  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
> -   (unspec:SVE_FULL_SDI_SIMD
> +  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
> +   (unspec:SVE_SDI_SIMD
>   [(match_dup 3)
> -  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
> -(match_operand:SVE_FULL_SDI_SIMD 1 "register_operand")
> -(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand"))]
> +  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
> +(match_operand:SVE_SDI_SIMD 1 "register_operand")
> +(match_operand:SVE_SDI_SIMD 2 "register_operand"))]
>   UNSPEC_PRED_X))]
>"TARGET_SVE"
>{
> @@ -4727,12 +4727,12 @@
> 
>  ;; Integer division predicated with a PTRUE.
>  (define_insn "@aarch64_pred_"
> -  [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand")
> -   (unspec:SVE_FULL_SDI_SIMD
> +  [(set (match_operand:SVE_SDI_SIMD 0 "register_operand")
> +   (unspec:SVE_SDI_SIMD
>   [(match_operand: 1 "register_operand")
> -  (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD
> -(match_operand:SVE_FULL_SDI_SIMD 2 "register_operand")
> -(match_operand:SVE_FULL_SDI_SIMD 3 "register_operand"))]
> +  (SVE_INT_BINARY_SD:SVE_SDI_SIMD
> +(match_operand:SVE_SDI_SIMD 2 "register_operand")
> +(match_operand:SVE_SDI_SIMD 3 "register_operand"))]
>   UNSPEC_PRED_X))]
>"TARGET_SVE"
>{@ [ cons: =0 , 1   , 2 , 3 ; attrs: movprfx ]
> @@ -4744,25 +4744,25 @@
> 
>  ;; Predicated integer division with merging.
>  (define_expand "@cond_"
> -  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
> -   (unspec:SVE_FULL_SDI
> +  [(set (match_operand:SVE_SDI 0 "register_operand")
> +   (unspec:SVE_SDI
>   [(match_operand: 1 "register_operand")
> -  (SVE_INT_BINARY_SD:SVE_FULL_SDI
> -(match_operand:SVE_FULL_SDI 2 "register_operand")
> -(match_operand:SVE_FULL_SDI 3 "register_operand"))
> -  (match_operand:SVE_FULL_SDI 4 "aarch64_simd_reg_or_zero")]
> +  (SVE_INT_BINARY_SD:SVE_SDI
> +(match_operand:SVE_SDI 2 "register_operand")
> +(match_operand:SVE_SDI 3 "register_operand"))
> +  (match_operand:SVE_SDI 4 "aarch64_simd_reg_or_zero")]
>   UNSPEC_SEL))]
>"TARGET_SVE"
>  )
> 
>  ;; Predicated integer division, merging with the first input.
>  (define_insn "*cond__2"
> -  [(set (match_operand:SVE_FULL_SDI 0 "register_operand")
> -   (unspec:SVE_FULL_SDI
> +  [(set (match_operand:SVE_SDI 0 "register_operand")
> +   (unspec:SVE_SDI
>   [(match_operand: 1 "register_operand")
> -  (SVE_INT_BINARY_SD:SVE_FULL_SDI
> -(match_operand:SVE_FULL_SDI 2 "register_

Re: [PATCH] libstdc++: Implement std::chrono::current_zone() for Windows

2025-07-14 Thread Jonathan Wakely

On Mon, 14 Jul 2025 at 11:10, Jonathan Wakely  wrote:
>
> On Mon, 14 Jul 2025 at 11:08, Björn Schäpers  wrote:
> >
> > Am 14.07.2025 um 10:20 schrieb Tomasz Kaminski:
> > >
> > >
> > > On Tue, Jul 8, 2025 at 10:48 PM Björn Schäpers wrote:
> > > + const auto raw_index = information.Bias / 60;
> > > +
> > > + // The bias added to the local time equals UTC. And GMT+X 
> > > corrosponds
> > > + // to UTC-X, the sign is negated. Thus we can use the 
> > > hourly bias as
> > > + // an index into an array.
> > > + if (raw_index < 0 && raw_index >= -14)
> > > +   {
> > > + static array table{
> > > +   "Etc/GMT-1",  "Etc/GMT-2",  "Etc/GMT-3",  "Etc/GMT-4",
> > > +   "Etc/GMT-5",  "Etc/GMT-6",  "Etc/GMT-7",  "Etc/GMT-8",
> > > +   "Etc/GMT-9",  "Etc/GMT-10", "Etc/GMT-11", 
> > > "Etc/GMT-12",
> > > +   "Etc/GMT-13", "Etc/GMT-14"
> > > + };
> > > + return table[-raw_index - 1];
> > > +   }
> > > + else if (raw_index > 0 && raw_index <= 12)
> > > +   {
> > > + static array table{
> > >
> > > This table has size 14, but only 12 entries. I do not think there are 
> > > zones
> > > past +12,
> > > but I believe size and entries should match.
> >
> > That is totally correct and this a classic copy and paste error.
> > @Jonathan: Should I correct that (and the other things you mentioned), or 
> > are
> > you doing that?
>
> I can do it before I push it (probably later today).

Hmm, we could reduce the number of guard variables for static
constructors by using a single array here, and indexing into it with
raw_index + 14.

This code is only for Windows, so we're not talking constrained
microcontrollers where we need to save resources. That can be in a
follow-up commit, if we decide it's worth doing.

> >
> > >
> > > +   "Etc/GMT+1", "Etc/GMT+2",  "Etc/GMT+3",  "Etc/GMT+4",
> > > +   "Etc/GMT+5", "Etc/GMT+6",  "Etc/GMT+7",  "Etc/GMT+8",
> > > +   "Etc/GMT+9", "Etc/GMT+10", "Etc/GMT+11", "Etc/GMT+12"
> > > + };
> > > + return table[raw_index - 1];
> > > +   }
> > > + return {};
> > > +   }

[committed] libstdc++: Correct value of __cpp_lib_constexpr_exceptions [PR117785]

2025-07-14 Thread Jonathan Wakely

Only P3068R6 (Allowing exception throwing in constant-evaluation) is
implemented in the library so far, so the value of the
constexpr_exceptions feature test macro should be 202411L. Once we
support the library changes in P3378R2 (constexpr exception types) then
we can set the value to 202502L again.

libstdc++-v3/ChangeLog:

PR libstdc++/117785
* include/bits/version.def (constexpr_exceptions): Define
correct value.
* include/bits/version.h: Regenerate.
* libsupc++/exception: Check correct value.
* testsuite/18_support/exception/version.cc: New test.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/bits/version.def  |  2 +-
 libstdc++-v3/include/bits/version.h|  4 ++--
 libstdc++-v3/libsupc++/exception   |  1 +
 libstdc++-v3/testsuite/18_support/exception/version.cc | 10 ++
 4 files changed, 14 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/18_support/exception/version.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index df58e7018d81..cf0672b48224 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -2053,7 +2053,7 @@ ftms = {
 ftms = {
   name = constexpr_exceptions;
   values = {
-v = 202502;
+v = 202411;
 cxxmin = 26;
 extra_cond = "__cpp_constexpr_exceptions >= 202411L";
   };
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 1414dd78ddab..c01ddf14dd57 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2301,9 +2301,9 @@
 
 #if !defined(__cpp_lib_constexpr_exceptions)
 # if (__cplusplus >  202302L) && (__cpp_constexpr_exceptions >= 202411L)
-#  define __glibcxx_constexpr_exceptions 202502L
+#  define __glibcxx_constexpr_exceptions 202411L
 #  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_constexpr_exceptions)
-#   define __cpp_lib_constexpr_exceptions 202502L
+#   define __cpp_lib_constexpr_exceptions 202411L
 #  endif
 # endif
 #endif /* !defined(__cpp_lib_constexpr_exceptions) && 
defined(__glibcxx_want_constexpr_exceptions) */
diff --git a/libstdc++-v3/libsupc++/exception b/libstdc++-v3/libsupc++/exception
index 25ce8d97e315..fc6f8d927711 100644
--- a/libstdc++-v3/libsupc++/exception
+++ b/libstdc++-v3/libsupc++/exception
@@ -38,6 +38,7 @@
 #include 
 
 #define __glibcxx_want_uncaught_exceptions
+#define __glibcxx_want_constexpr_exceptions
 #define __glibcxx_want_exception_ptr_cast
 #include 
 
diff --git a/libstdc++-v3/testsuite/18_support/exception/version.cc 
b/libstdc++-v3/testsuite/18_support/exception/version.cc
new file mode 100644
index ..09a2d102b720
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/exception/version.cc
@@ -0,0 +1,10 @@
+// { dg-do preprocess { target c++26 } }
+// { dg-add-options no_pch }
+
+#include 
+
+#ifndef __cpp_lib_constexpr_exceptions
+# error "Feature test macro for constexpr_exceptions is missing in "
+#elif __cpp_lib_constexpr_exceptions < 202411L
+# error "Feature test macro for constexpr_exceptions has wrong value in 
"
+#endif
-- 
2.50.1

[committed] libstdc++: Protect PSTL headers against overloaded commas

2025-07-14 Thread Jonathan Wakely

Reported upstream: https://github.com/uxlfoundation/oneDPL/issues/2342

libstdc++-v3/ChangeLog:

* include/pstl/algorithm_impl.h (__for_each_n_it_serial):
Protect against overloaded comma operator.
(__brick_walk2): Likewise.
(__brick_walk2_n): Likewise.
(__brick_walk3): Likewise.
(__brick_move_destroy::operator()): Likewise.
(__brick_calc_mask_1): Likewise.
(__brick_copy_by_mask): Likewise.
(__brick_partition_by_mask): Likewise.
(__brick_calc_mask_2): Likewise.
(__brick_reverse): Likewise.
(__pattern_partial_sort_copy): Likewise.
* include/pstl/memory_impl.h (__brick_uninitialized_move):
Likewise.
(__brick_uninitialized_copy): Likewise.
* include/pstl/numeric_impl.h (__brick_transform_scan):
Likewise.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/pstl/algorithm_impl.h | 24 +++---
 libstdc++-v3/include/pstl/memory_impl.h|  4 ++--
 libstdc++-v3/include/pstl/numeric_impl.h   |  4 ++--
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/pstl/algorithm_impl.h 
b/libstdc++-v3/include/pstl/algorithm_impl.h
index 5b1cd2010944..2080e82f8b49 100644
--- a/libstdc++-v3/include/pstl/algorithm_impl.h
+++ b/libstdc++-v3/include/pstl/algorithm_impl.h
@@ -79,7 +79,7 @@ template 
 _ForwardIterator
 __for_each_n_it_serial(_ForwardIterator __first, _Size __n, _Function __f)
 {
-for (; __n > 0; ++__first, --__n)
+for (; __n > 0; ++__first, (void) --__n)
 __f(__first);
 return __first;
 }
@@ -221,7 +221,7 @@ _ForwardIterator2
 __brick_walk2(_ForwardIterator1 __first1, _ForwardIterator1 __last1, 
_ForwardIterator2 __first2, _Function __f,
   /*vector=*/std::false_type) noexcept
 {
-for (; __first1 != __last1; ++__first1, ++__first2)
+for (; __first1 != __last1; ++__first1, (void) ++__first2)
 __f(*__first1, *__first2);
 return __first2;
 }
@@ -240,7 +240,7 @@ _ForwardIterator2
 __brick_walk2_n(_ForwardIterator1 __first1, _Size __n, _ForwardIterator2 
__first2, _Function __f,
 /*vector=*/std::false_type) noexcept
 {
-for (; __n > 0; --__n, ++__first1, ++__first2)
+for (; __n > 0; --__n, (void) ++__first1, ++__first2)
 __f(*__first1, *__first2);
 return __first2;
 }
@@ -364,7 +364,7 @@ _ForwardIterator3
 __brick_walk3(_ForwardIterator1 __first1, _ForwardIterator1 __last1, 
_ForwardIterator2 __first2,
   _ForwardIterator3 __first3, _Function __f, 
/*vector=*/std::false_type) noexcept
 {
-for (; __first1 != __last1; ++__first1, ++__first2, ++__first3)
+for (; __first1 != __last1; ++__first1, (void) ++__first2, ++__first3)
 __f(*__first1, *__first2, *__first3);
 return __first3;
 }
@@ -961,7 +961,7 @@ struct __brick_move_destroy
 {
 using _IteratorValueType = typename 
std::iterator_traits<_RandomAccessIterator1>::value_type;
 
-for (; __first != __last; ++__first, ++__result)
+for (; __first != __last; ++__first, (void) ++__result)
 {
 *__result = std::move(*__first);
 (*__first).~_IteratorValueType();
@@ -1027,7 +1027,7 @@ __brick_calc_mask_1(_ForwardIterator __first, 
_ForwardIterator __last, bool* __r
 static_assert(__are_random_access_iterators<_ForwardIterator>::value,
   "Pattern-brick error. Should be a random access iterator.");
 
-for (; __first != __last; ++__first, ++__mask)
+for (; __first != __last; ++__first, (void) ++__mask)
 {
 *__mask = __pred(*__first);
 if (*__mask)
@@ -1052,7 +1052,7 @@ void
 __brick_copy_by_mask(_ForwardIterator __first, _ForwardIterator __last, 
_OutputIterator __result, bool* __mask,
  _Assigner __assigner, /*vector=*/std::false_type) noexcept
 {
-for (; __first != __last; ++__first, ++__mask)
+for (; __first != __last; ++__first, (void) ++__mask)
 {
 if (*__mask)
 {
@@ -1079,7 +1079,7 @@ void
 __brick_partition_by_mask(_ForwardIterator __first, _ForwardIterator __last, 
_OutputIterator1 __out_true,
   _OutputIterator2 __out_false, bool* __mask, 
/*vector=*/std::false_type) noexcept
 {
-for (; __first != __last; ++__first, ++__mask)
+for (; __first != __last; ++__first, (void) ++__mask)
 {
 if (*__mask)
 {
@@ -1383,7 +1383,7 @@ __brick_calc_mask_2(_RandomAccessIterator __first, 
_RandomAccessIterator __last,
 _BinaryPredicate __pred, /*vector=*/std::false_type) 
noexcept
 {
 _DifferenceType __count = 0;
-for (; __first != __last; ++__first, ++__mask)
+for (; __first != __last; ++__first, (void) ++__mask)
 {
 *__mask = !__pred(*__first, *(__first - 1));
 __count += *__mask;
@@ -1483,7 +1483,7 @@ void
 __brick_reverse(_BidirectionalIterator __first, _BidirectionalIterator __last, 
_BidirectionalIterator __d_last,

[PATCH] [aarch64] Stop using sys/ifunc.h header in libatomic and libgcc

2025-07-14 Thread Yury Khrustalev

This optional header is used to bring in the definition of the
struct __ifunc_arg_t type. Since it has been added to glibc only
recently, the previous implementation had to check whether this
header is present and, if not, it provide its own definition.

This creates dead code because either one of these two parts would
not be tested. The ABI specification for ifunc resolvers allows to
create own ABI-compatible definition for this type, which is the
right way of doing it.

In addition to improving consistency, the new approach also helps
with addition of new fields to struct __ifunc_arg_t type without
the need to work-around situations when the definition imported
from the header lacks these new fields.

ABI allows to define as many hwcap fields in this struct as needed,
provided that at runtime we only access the fields that are permitted
by the _size value.

gcc/
* config/aarch64/aarch64.cc (build_ifunc_arg_type):
Add new fields _hwcap3 and _hwcap4.

libatomic/
* config/linux/aarch64/host-config.h (__ifunc_arg_t):
Remove sys/ifunc.h and add new fields _hwcap3 and _hwcap4.

libgcc/
* config/aarch64/cpuinfo.c (__ifunc_arg_t): Likewise.
(__init_cpu_features): obtain and assign values for the
fields _hwcap3 and _hwcap4.
(__init_cpu_features_constructor): check _size in the
arg argument.

---

Regression has been checked on AArch64 and no regression has been
found. OK for trunk?

base commit: 3a1067c8b8c

---
 gcc/config/aarch64/aarch64.cc| 12 +
 libatomic/config/linux/aarch64/host-config.h | 12 +++--
 libgcc/config/aarch64/cpuinfo.c  | 46 
 3 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6e16763f957..26c0f70f952 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -20466,6 +20466,8 @@ aarch64_compare_version_priority (tree decl1, tree 
decl2)
  unsigned long _size; // Size of the struct, so it can grow.
  unsigned long _hwcap;
  unsigned long _hwcap2;
+ unsigned long _hwcap3;
+ unsigned long _hwcap4;
}
  */
 
@@ -20482,14 +20484,24 @@ build_ifunc_arg_type ()
   tree field3 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
get_identifier ("_hwcap2"),
long_unsigned_type_node);
+  tree field4 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+   get_identifier ("_hwcap3"),
+   long_unsigned_type_node);
+  tree field5 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+   get_identifier ("_hwcap4"),
+   long_unsigned_type_node);
 
   DECL_FIELD_CONTEXT (field1) = ifunc_arg_type;
   DECL_FIELD_CONTEXT (field2) = ifunc_arg_type;
   DECL_FIELD_CONTEXT (field3) = ifunc_arg_type;
+  DECL_FIELD_CONTEXT (field4) = ifunc_arg_type;
+  DECL_FIELD_CONTEXT (field5) = ifunc_arg_type;
 
   TYPE_FIELDS (ifunc_arg_type) = field1;
   DECL_CHAIN (field1) = field2;
   DECL_CHAIN (field2) = field3;
+  DECL_CHAIN (field3) = field4;
+  DECL_CHAIN (field4) = field5;
 
   layout_type (ifunc_arg_type);
 
diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index d0d44bf18ea..c6f8d693f2c 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -40,16 +40,20 @@
 # define HWCAP2_LSE128 (1UL << 47)
 #endif
 
-#if __has_include()
-# include 
-#else
+/* The following struct is ABI-correct description of the 2nd argument for an
+   ifunc resolver as per SYSVABI spec (see link below).  It is safe to extend
+   it with new fields.  The ifunc resolver implementations must always check
+   the runtime size of the buffer using the value in the _size field.
+   https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst.  
*/
 typedef struct __ifunc_arg_t {
   unsigned long _size;
   unsigned long _hwcap;
   unsigned long _hwcap2;
+  unsigned long _hwcap3;
+  unsigned long _hwcap4;
 } __ifunc_arg_t;
+
 # define _IFUNC_ARG_HWCAP (1ULL << 62)
-#endif
 
 /* From the file which imported `host-config.h' we can ascertain which
architectural extension provides relevant atomic support.  From this,
diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index dda9dc69689..50f54cac26b 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -27,15 +27,18 @@
 #if __has_include()
 #include 
 
-#if __has_include()
-#include 
-#else
+/* The following struct is ABI-correct description of the 2nd argument for an
+   ifunc resolver as per SYSVABI spec (see link below).  It is safe to extend
+   it with new fields.  The ifunc resolver implementations must always check
+   the runtime size of the buffer using the value in the _size field.
+   https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvab

[PATCH] gcc-16/changes.html: Add --enable-x86-64-mfentry

2025-07-14 Thread H.J. Lu

OK to install?

-- 
H.J.
From ba8825b15df0172f1c95fd46526fb734ec7a6646 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 14 Jul 2025 20:32:11 +0800
Subject: [PATCH] gcc-16/changes.html: Add --enable-x86-64-mfentry

Signed-off-by: H.J. Lu 
---
 htdocs/gcc-16/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-16/changes.html b/htdocs/gcc-16/changes.html
index cc6fe204..9149fa28 100644
--- a/htdocs/gcc-16/changes.html
+++ b/htdocs/gcc-16/changes.html
@@ -118,6 +118,12 @@ for general information.
 
 
 
+  The --enable-x86-64-mfentry configure option is
+  added to enable -mfentry by default to use
+  __fentry__, instead of mcount for
+  profiling.
+  
+
 AMD GPU (GCN)
 
 
-- 
2.50.1

Re: [PATCH v1] RISC-V: Refine the scalar SAT_* test cases

2025-07-14 Thread Jeff Law





On 7/12/25 8:26 AM, pan2...@intel.com wrote:

From: Pan Li 

Per previous discuss with Jeff, we don't do complicated
asm check like scalar saturation alu.  It is somehow
not easy to maintain, as well as fragile.  Thus, we
remove these function-body check, and introduce the
jmp label asm check instead.The code-gen of SAT_*
will never have a jmp, and the other run test will
make sure the correctness of SAT_* code-gen.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

The below failed test cases are resolved:
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Oz
   check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Os
   check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -O3
   check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -Ofast
   check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
FAIL: gcc.target/riscv/sat/sat_s_add_imm-2-i8.c -O2
   check-function-bodies sat_s_add_imm_int8_t_fmt_2_1
Just to make sure I understand.  We're switching to a check that we've 
got a branchless sequence, right?  That seems like a good idea, 
particularly when mated with the pre-existing test that we've got the 
appropriate SAT IFN in the .optimized dump.


Thanks for taking care of this.  While the changes aren't terribly 
complex, there's a *lot* of them.


Jeff

[PATCH] ifconv: simple factor out operators while doing ifcvt [PR119920]

2025-07-14 Thread Andrew Pinski

For possible reductions, ifconv currently handles if the addition
is on one side of the if. But in the case of PR 119920, the reduction
addition is on both sides of the if.
E.g.
```
  if (_27 == 0)
goto ; [50.00%]
  else
goto ; [50.00%]

  
  a_29 = b_14(D) + a_17;
  goto ; [100.00%]

  
  a_28 = c_12(D) + a_17;

  
  # a_30 = PHI 
```

Which ifcvt converts into:
```
  _34 = _32 + _33;
  a_15 = (int) _34;
  _23 = _4 == 0;
  _37 = _33 + _35;
  a_13 = (int) _37;
  a_5 = _23 ? a_15 : a_13;
```

But the vectorizer does not recognize this as a reduction.
To fix this, we should factor out the addition from the `if`.
This allows us to get:
```
  iftmp.0_7 = _22 ? b_13(D) : c_12(D);
  a_14 = iftmp.0_7 + a_18;
```

Which then the vectorizer recognizes as a reduction.

In the case of PR 112324 and PR 110015, it is similar but with MAX_EXPR 
reduction
instead of an addition.

Note while this should be done in phiopt, there are regressions
due to other passes not able to handle the factored out cases
(see linked bug to PR 64700). I have not had time to fix all of the passes
that could handle the addition being in the if/then/else rather than being 
outside yet.
So this is I thought it would be useful just to have a localized version in 
ifconv which
is then only used for the vectorizer.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/119920
PR tree-optimization/112324
PR tree-optimization/110015

gcc/ChangeLog:

* tree-if-conv.cc (find_different_opnum): New function.
(factor_out_operators): New function.
(predicate_scalar_phi): Call factor_out_operators when
there is only 2 elements of a phi.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-reduc-cond-1.c: New test.
* gcc.dg/vect/vect-reduc-cond-2.c: New test.
* gcc.dg/vect/vect-reduc-cond-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-1.c |  59 ++
 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-2.c |  61 ++
 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-3.c |  56 ++
 gcc/tree-if-conv.cc   | 186 ++
 4 files changed, 362 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-cond-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-1.c
new file mode 100644
index 000..d8356b4685c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-1.c
@@ -0,0 +1,59 @@
+/* { dg-require-effective-target vect_int } */
+
+#include 
+#include "tree-vect.h"
+
+/* PR tree-optimization/119920 */
+
+#define N 32
+
+unsigned int ub[N];
+
+/* Test vectorization of reduction of unsigned-int.  */
+
+__attribute__ ((noinline, noipa))
+void init(void)
+{
+  #pragma GCC novector
+  for(int i = 0;i < N; i++)
+ub[i] = i;
+}
+
+
+__attribute__ ((noinline, noipa))
+void main1 (unsigned int b, unsigned int c)
+{
+  int i;
+  unsigned int usum = 0;
+
+  init();
+
+  /* Summation.  */
+  for (i = 0; i < N; i++) {
+if ( ub[i] < N/2 )
+{
+  usum += b;
+}
+else
+{
+  usum += c;
+}
+  }
+
+  /* check results:  */
+ /* __builtin_printf("%d : %d\n", usum, (N/2*b + N/2*c)); */
+  if (usum != N/2*b + N/2*c)
+abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (0, 0);
+  main1 (1, 1);
+  main1 (10, 1);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_int_add } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-2.c
new file mode 100644
index 000..80c1dba9fc1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-cond-2.c
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-ifcvt-details" } */
+
+#include 
+#include "tree-vect.h"
+
+/* PR tree-optimization/119920 */
+
+#define N 32
+
+unsigned int ub[N];
+unsigned int ua[N];
+
+/* Test vectorization of reduction of unsigned-int.  */
+
+__attribute__ ((noinline, noipa))
+void init(void)
+{
+  #pragma GCC novector
+  for(int i = 0;i < N; i++) {
+ub[i] = i;
+ua[i] = 1;
+  }
+}
+
+
+__attribute__ ((noinline, noipa))
+void main1 (unsigned int b, unsigned int c)
+{
+  int i;
+  unsigned int usum = 0;
+
+  init();
+
+  /* Summation.  */
+  for (i = 0; i < N; i++) {
+unsigned t = ua[i];
+if ( ub[i] < N/2 )
+  usum += b * t;
+else
+  usum += c * t;
+  }
+
+  /* check results:  */
+  /* __builtin_printf("%d : %d\n", usum, (N/2*b*1 + N/2*c*1)); */
+  if (usum != N/2*b + N/2*c)
+abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (0, 0);
+  main1 (1, 1);
+  main1 (10, 1);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_int_add } } } } */
+/

Re: [PATCH v3 1/1] aarch64: Fold builtins with highpart args to highpart equivalent [PR117850]

2025-07-14 Thread Richard Sandiford

Spencer Abson  writes:
> [...]
> +/* If REF describes the high half of a 128-bit vector, return this
> +   vector.  Otherwise, return NULL_TREE.  */
> +static tree
> +aarch64_v128_highpart_ref (const_tree ref)
> +{
> +  if (TREE_CODE (ref) != SSA_NAME)
> +return NULL_TREE;
> +
> +  gassign *stmt = dyn_cast (SSA_NAME_DEF_STMT (ref));
> +  if (!stmt || gimple_assign_rhs_code (stmt) != BIT_FIELD_REF)
> +return NULL_TREE;
> +
> +  /* Look for a BIT_FIELD_REF that denotes the most significant 64
> + bits of a 128-bit vector.  */
> +  tree bf_ref = gimple_assign_rhs1 (stmt);
> +  unsigned int offset = BYTES_BIG_ENDIAN ? 0 : 64;
> +
> +  if (bit_field_size (bf_ref).to_constant () != 64
> +  || bit_field_offset (bf_ref).to_constant () != offset)
> +return NULL_TREE;

There should be a comment justifying the to_constants, but...

> +
> +  tree obj = TREE_OPERAND (bf_ref, 0);
> +  tree type = TREE_TYPE (obj);
> +
> +  if (VECTOR_TYPE_P (type) && tree_fits_uhwi_p (TYPE_SIZE (type))
> +  && tree_to_uhwi (TYPE_SIZE (type)) == 128)
> +return obj;

...I think the fact that we only test this later suggests that the
to_constants might not be safe, or at least not future-proof.  I think
we should instead use:

  if (maybe_ne (bit_field_size (bf_ref), 64)
  || maybe_ne (bit_field_offset (bf_ref), offset))
return NULL_TREE;

> +
> +  return NULL_TREE;
> +}
> +
> +/* Build and return a new VECTOR_CST of type OUT_TY using the
> +   elements of VEC_IN.  */

Might be worth saying "using repeated copies of the elements of VEC_IN".

> +static tree
> +aarch64_build_vector_cst (const_tree vec_in, tree out_ty)
> +{
> +  gcc_assert (TREE_CODE (vec_in) == VECTOR_CST
> +   && VECTOR_TYPE_P (out_ty));
> +  unsigned HOST_WIDE_INT nelts
> += VECTOR_CST_NELTS (vec_in).to_constant ();
> +
> +  tree_vector_builder vec_out (out_ty, nelts, 1);
> +  for (unsigned i = 0; i < nelts; i++)
> +vec_out.quick_push (VECTOR_CST_ELT (vec_in, i));
> +
> +  return vec_out.build ();
> +}
> +
> +/* Try to fold STMT, a call to to a lowpart-operating builtin, to
> +   it's highpart-operating equivalent if doing so would save
> +   unnecessary data movement instructions.
> +
> +   Return the new call if so, otherwise nullptr.  */
> +static gcall *
> +aarch64_fold_lo_call_to_hi (unsigned int fcode, gcall *stmt,
> + gimple_stmt_iterator *gsi)
> +{
> +  /* Punt until as late as possible:
> +1) By folding away BIT_FIELD_REFs we remove information about the
> +operands that may be useful to other optimizers.
> +
> +2) For simplicity, we'd like the expression
> +
> + x = BIT_FIELD_REF
> +
> +to imply that A is not a VECTOR_CST.  This assumption is unlikely
> +to hold before constant prop/folding.  */
> +  if (!(cfun->curr_properties & PROP_last_full_fold))
> +return nullptr;
> +
> +  tree builtin_hi = aarch64_get_highpart_builtin (fcode);
> +  gcc_assert (builtin_hi != NULL_TREE);
> +
> +  /* Prefer to use the highpart builtin when at least one vector
> + argument is a reference to the high half of a 128b vector, and
> + all others are VECTOR_CSTs that we can extend to 128b.  */
> +  auto_vec vec_constants;
> +  auto_vec vec_highparts;
> +  /* The arguments and signature of the new call.  */
> +  auto_vec call_args;
> +  auto_vec call_types;
> +
> +  /* The interesting args are those that differ between the lo/hi
> + builtins.  Walk the function signatures to find these.  */
> +  tree types_hi = TYPE_ARG_TYPES (TREE_TYPE (builtin_hi));
> +  tree types_lo = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
> +  unsigned int argno = 0;
> +  while (types_lo != void_list_node && types_hi != void_list_node)
> +{
> +  tree type_lo = TREE_VALUE (types_lo);
> +  tree type_hi = TREE_VALUE (types_hi);
> +  tree arg = gimple_call_arg (stmt, argno);
> +  if (!types_compatible_p (type_lo, type_hi))
> + {
> +   /* Check our assumptions about this pair.  */
> +   gcc_assert (wi::to_widest (TYPE_SIZE (type_lo)) == 64
> +   && wi::to_widest (TYPE_SIZE (type_hi)) == 128);
> +
> +   tree vq = aarch64_v128_highpart_ref (arg);
> +   if (vq && is_gimple_reg (vq))
> + {
> +   vec_highparts.safe_push (argno);
> +   arg = vq;
> + }
> +   else if (TREE_CODE (arg) == VECTOR_CST)
> + vec_constants.safe_push (argno);
> +   else
> + return nullptr;
> + }
> +  call_args.safe_push (arg);
> +  call_types.safe_push (type_hi);
> +
> +  argno++;
> +  types_hi = TREE_CHAIN (types_hi);
> +  types_lo = TREE_CHAIN (types_lo);
> +}
> +  gcc_assert (types_lo == void_list_node
> +   && types_hi == void_list_node);
> +
> +  if (vec_highparts.is_empty ())
> +return nullptr;
> +
> +  /* Build and return a new call to BUILTIN_HI.  */
> +  for (auto i : vec_constants)
> +call_args[i] = aarch64_build_vector_cst (call_args[i],
> +

Re: [PATCH] tree-optimization/121059 - record loop mask when required

2025-07-14 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, 14 Jul 2025, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > For loop masking we need to mask a mask AND operation with the loop
>> > mask.  The following makes sure we have a corresponding mask
>> > available.  There's no good way to distinguish loop masking from
>> > len masking here, so assume we have recorded a mask for the operands
>> > mask producers.
>> >
>> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>> >
>> >PR tree-optimization/121059
>> >* tree-vect-stmts.cc (vectorizable_operation): Record a
>> >loop mask for mask AND operations.
>> >
>> >* gcc.dg/vect/pr121059.c: New testcase.
>> 
>> Please could you revert this?  It's not the right fix.  The point of the
>> code is to opportunistically reuse loop masks that are needed by other
>> operations.  It isn't supposed to record new loop masks itself.
>
> Reverted.  I'll try to find some cycles in the next day to test the
> alternative.

Thanks!  And sorry that the intent isn't described well.

Richard

[PATCH v5 3/6] ctf: translate annotation DIEs to internal ctf

2025-07-14 Thread David Faust

Translate DW_TAG_GNU_annotation DIEs created for C attributes
btf_decl_tag and btf_type_tag into an in-memory representation in the
CTF/BTF container.  They will be output in BTF as BTF_KIND_DECL_TAG and
BTF_KIND_TYPE_TAG records.

The new CTF kinds used to represent these annotations, CTF_K_DECL_TAG
and CTF_K_TYPE_TAG, are expected to be formalized in the next version of
the CTF specification.  For now they only exist in memory as a
translation step to BTF, and are not emitted when generating CTF
information.

gcc/
* ctfc.cc (ctf_dtu_d_union_selector): Handle CTF_K_DECL_TAG and
CTF_K_TYPE_TAG.
(ctf_add_type_tag, ctf_add_decl_tag): New.
(ctf_add_variable): Return the new ctf_dvdef_ref rather than zero.
(new_ctf_container): Initialize new members.
(ctfc_delete_container): Deallocate new members.
* ctfc.h (ctf_dvdef, ctf_dvdef_t, ctf_dvdef_ref): Move forward
declarations earlier in file.
(ctf_decl_tag_t): New typedef.
(ctf_dtdef): Add ctf_decl_tag_t member to dtd_u union.
(ctf_dtu_d_union_enum): Add new CTF_DTU_D_TAG enumerator.
(ctf_container): Add ctfc_tags vector and ctfc_type_tags_map hash_map
members.
(ctf_add_type_tag, ctf_add_decl_tag): New function protos.
(ctf_add_variable): Change prototype return type to ctf_dvdef_ref.
* dwarf2ctf.cc (gen_ctf_type_tags, gen_ctf_decl_tags)
(gen_ctf_decl_tags_for_var): New static functions.
(gen_ctf_pointer_type): Handle type tags.
(gen_ctf_sou_type): Handle decl tags.
(gen_ctf_function_type): Likewise.
(gen_ctf_variable): Likewise.
(gen_ctf_function): Likewise.
(gen_ctf_type): Handle TAG_GNU_annotation DIEs.

gcc/testsuite
* gcc.dg/debug/ctf/ctf-decl-tag-1.c: New test.
* gcc.dg/debug/ctf/ctf-type-tag-1.c: New test.

include/
* ctf.h (CTF_K_DECL_TAG, CTF_K_TYPE_TAG): New defines.
---
 gcc/ctfc.cc   |  80 ++-
 gcc/ctfc.h|  43 +-
 gcc/dwarf2ctf.cc  | 135 +-
 .../gcc.dg/debug/ctf/ctf-decl-tag-1.c |  31 
 .../gcc.dg/debug/ctf/ctf-type-tag-1.c |  19 +++
 include/ctf.h |   4 +
 6 files changed, 299 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/ctf/ctf-type-tag-1.c

diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index 51511d69baa..49251489ae1 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -107,6 +107,9 @@ ctf_dtu_d_union_selector (ctf_dtdef_ref ctftype)
   return CTF_DTU_D_ARGUMENTS;
 case CTF_K_SLICE:
   return CTF_DTU_D_SLICE;
+case CTF_K_DECL_TAG:
+case CTF_K_TYPE_TAG:
+  return CTF_DTU_D_TAG;
 default:
   /* The largest member as default.  */
   return CTF_DTU_D_ARRAY;
@@ -445,6 +448,68 @@ ctf_add_reftype (ctf_container_ref ctfc, uint32_t flag, 
ctf_dtdef_ref ref,
   return dtd;
 }
 
+ctf_dtdef_ref
+ctf_add_type_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd)
+{
+  ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  CTF format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* A single DW_TAG_GNU_annotation DIE may be referenced by multiple DIEs,
+ e.g. when multiple distinct types specify the same type tag.  We will
+ synthesize multiple CTF DTD records in that case, so we cannot tie them
+ all to the same key (the DW_TAG_GNU_annotation DIE) in ctfc_types.  */
+  dtd->dtd_key = NULL;
+  dtd->ref_type = ref_dtd;
+  dtd->dtd_data.ctti_info = CTF_TYPE_INFO (CTF_K_TYPE_TAG, flag, 0);
+  dtd->dtd_u.dtu_tag.ref_var = NULL; /* Not used for type tags.  */
+  dtd->dtd_u.dtu_tag.component_idx = 0; /* Not used for type tags.  */
+
+  /* Insert tag directly into the tag list.  Type ID will be assigned later.  
*/
+  vec_safe_push (ctfc->ctfc_tags, dtd);
+
+  /* Keep ctfc_aux_strlen updated.  */
+  if ((value != NULL) && strcmp (value, ""))
+ctfc->ctfc_aux_strlen += strlen (value) + 1;
+
+  return dtd;
+}
+
+ctf_dtdef_ref
+ctf_add_decl_tag (ctf_container_ref ctfc, uint32_t flag, const char *value,
+ ctf_dtdef_ref ref_dtd, uint32_t comp_idx)
+{
+   ctf_dtdef_ref dtd;
+   /* Create a DTD for the tag, but do not place it in the regular types list;
+  ctf format does not (yet) encode tags.  */
+  dtd = ggc_cleared_alloc ();
+
+  dtd->dtd_name = ctf_add_string (ctfc, value, &(dtd->dtd_data.ctti_name),
+ CTF_AUX_STRTAB);
+  /* A single DW_TAG_GNU_annotation DIE may be referenced by multiple DIEs,
+ e.g. when multiple distinct declarations specify the same decl ta

[PATCH v5 6/6] bpf: add tests for CO-RE and BTF tag interaction

2025-07-14 Thread David Faust

Add a couple of tests to ensure that BTF type/decl tags do not interfere
with generation of BPF CO-RE relocations.

gcc/testsuite/
* gcc.target/bpf/core-btf-tag-1.c: New test.
* gcc.target/bpf/core-btf-tag-2.c: New test.
---
 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c | 23 +++
 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c | 23 +++
 2 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c

diff --git a/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c 
b/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
new file mode 100644
index 000..bd0fb3e40be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-btf-tag-1.c
@@ -0,0 +1,23 @@
+/* Test that BTF type tags do not interfere with CO-RE relocations.  */
+
+/* { dg-do compile } */
+/* { dg-options "-gbtf -dA -mco-re" } */
+
+struct bpf_cpumask {
+  int i;
+  char c;
+} __attribute__((preserve_access_index));
+
+struct kptr_nested {
+   struct bpf_cpumask * __attribute__((btf_type_tag("kptr"))) mask;
+} __attribute__((preserve_access_index));
+
+void foo (struct kptr_nested *nested)
+{
+  if (nested && nested->mask)
+nested->mask->i = 5;
+}
+
+/* { dg-final { scan-assembler-times "bpfcr_insn" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type \\(struct" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:0\"\\)" 3 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c 
b/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c
new file mode 100644
index 000..6654ffe3ae0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-btf-tag-2.c
@@ -0,0 +1,23 @@
+/* Test that BTF decl tags do not interfere with CO-RE relocations.  */
+
+/* { dg-do compile } */
+/* { dg-options "-gbtf -dA -mco-re" } */
+
+struct bpf_cpumask {
+  int i;
+  char c;
+} __attribute__((preserve_access_index));
+
+struct kptr_nested {
+   struct bpf_cpumask * mask __attribute__((btf_decl_tag ("decltag")));
+} __attribute__((preserve_access_index));
+
+void foo (struct kptr_nested *nested __attribute__((btf_decl_tag ("foo"
+{
+  if (nested && nested->mask)
+nested->mask->i = 5;
+}
+
+/* { dg-final { scan-assembler-times "bpfcr_insn" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type \\(struct" 3 } } */
+/* { dg-final { scan-assembler-times "bpfcr_astr_off \\(\"0:0\"\\)" 3 } } */
-- 
2.47.2

[PATCH v5 0/6] c, dwarf, btf: Add btf_decl_tag and btf_type_tag C attributes

2025-07-14 Thread David Faust

[v4: https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686373.html
 Changes from v4:
 - (Changes only in patches 1 and 2, based on Richard's reviews and
following discussion.)
 - Update the attribute hanlder for btf_type_tag, to ensure a variant type
   is always created, if not ATTR_FLAG_TYPE_IN_PLACE.  Very similar to aligned
   attribute.
 - Rework the handling for btf_type_tag in modified_type_die.  The type_tag
   attribute is now passed as a separate argument and dealt with similarly
   to cv-quals.  This allows to delete the remove_attribute calls in prior
   versions of the patch, and avoids tampering with the tree at all while
   generating DWARF for btf_type_tags.
 - Do not try to reverse engineer whether a btf_type_tag attribute appeared
   on one side or the other of a typedef in the source, when generating DWARF
   for the use of a typedef'd type with additional btf_type_tags.  This was
   introduced in v4 but after discussion we decided that it is not good.
   First, because we still cannot always exactly determine where the attribute
   was in the input source.  Second, it is better to reflect in DWARF the tree
   node we do have, even though it is imperfect in this case.
   The proper fix for this will likely require saving some additional
   information in the tree, and will be addressed in future patch(es).
   See the discussion on v4 patch 2.
 - Update the added test dwarf-btf-type-tag-7 according to above bullet, and
   include a note about this situation.  ]

This patch series adds support for the btf_decl_tag and btf_type_tag attributes
to GCC. This entails:

- Two new C-family attributes that allow to associate (to "tag") particular
  declarations and types with arbitrary strings. As explained below, this is
  intended to be used to, for example, characterize certain pointer types.  A
  single declaration or type may have multiple occurrences of these attributes.

- The conveyance of that information in the DWARF output in the form of a new
  DIE: DW_TAG_GNU_annotation, and a new attribute: DW_AT_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
  kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. These BTF
  kinds are already supported by LLVM and other tools in the BPF ecosystem.

Both of these attributes are already supported by clang, and are already being
used in various ways by BPF support inside the Linux kernel.

It is worth noting that while the Linux kernel and BPF/BTF is the motivating use
case of this feature, the format of the new DWARF extension is generic.  This
work could be easily adapted to provide a general way for program authors to
annotate types and declarations with arbitrary information for any
post-compilation analysis needs, not just the Linux kernel BPF verifier.  For
example, these annotations could be used to aid in ABI analysis.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
tags on certain language elements, such as struct fields.

The purpose of these annotations is to provide additional information about
types, variables, and function parameters of interest to the kernel. A
driving use case is to tag pointer types within the Linux kernel and BPF
programs with additional semantic information, such as '__user' or '__rcu'.

For example, consider the Linux kernel function do_execve with the
following declaration:

  static int do_execve(struct filename *filename,
 const char __user *const __user *__argv,
 const char __user *const __user *__envp);

Here, __user could be defined with these annotations to record semantic
information about the pointer parameters (e.g., they are user-provided) in
DWARF and BTF information. Other kernel facilities such as the BPF verifier
can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

The main motivation for emitting the tags in DWARF is that the Linux kernel
generates its BTF information via pahole, using DWARF as a source:

++  BTF  BTF   +--+
| pahole |---> vmlinux.btf --->| verifier |
++ +--+
^^
||
  DWARF |BTF |
||
 vmlinux  +-+
 module1.ko   | BPF program |
 module2.ko   +-+
   ...

This is because:

a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

b)  GCC can generate BTF for whatever target with -gbtf, but there is no
support for linking/deduplicating BTF in the linker.

c)  pahole injects additional BTF inform

[PATCH v5 1/6] c-family: add btf_type_tag and btf_decl_tag attributes

2025-07-14 Thread David Faust

Add two new c-family attributes, "btf_type_tag" and "btf_decl_tag"
along with attribute handlers for them.

gcc/c-family/
* c-attribs.cc (c_common_attribute_table): Add btf_decl_tag and
btf_type_tag attributes.
(handle_btf_decl_tag_attribute): New handler for btf_decl_tag.
(hanlde_btf_type_tag_attribute): New handler for btf_type_tag.
---
 gcc/c-family/c-attribs.cc | 80 ++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 5a0e3d328ba..7183ba4b279 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -189,6 +189,9 @@ static tree handle_fd_arg_attribute (tree *, tree, tree, 
int, bool *);
 static tree handle_flag_enum_attribute (tree *, tree, tree, int, bool *);
 static tree handle_null_terminated_string_arg_attribute (tree *, tree, tree, 
int, bool *);
 
+static tree handle_btf_decl_tag_attribute (tree *, tree, tree, int, bool *);
+static tree handle_btf_type_tag_attribute (tree *, tree, tree, int, bool *);
+
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)  \
   { name, function, type, variable }
@@ -640,7 +643,11 @@ const struct attribute_spec c_common_gnu_attributes[] =
   { "flag_enum", 0, 0, false, true, false, false,
  handle_flag_enum_attribute, NULL },
   { "null_terminated_string_arg", 1, 1, false, true, true, false,
- handle_null_terminated_string_arg_attribute, NULL}
+ handle_null_terminated_string_arg_attribute, 
NULL},
+  { "btf_type_tag",  1, 1, false, true, false, false,
+ handle_btf_type_tag_attribute, NULL},
+  { "btf_decl_tag",  1, 1, true, false, false, false,
+ handle_btf_decl_tag_attribute, NULL}
 };
 
 const struct scoped_attribute_specs c_common_gnu_attribute_table =
@@ -5101,6 +5108,77 @@ handle_null_terminated_string_arg_attribute (tree *node, 
tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle the "btf_decl_tag" attribute.  */
+
+static tree
+handle_btf_decl_tag_attribute (tree * ARG_UNUSED (node), tree name, tree args,
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (!args)
+*no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  error ("%qE attribute requires a string", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Handle the "btf_type_tag" attribute.  */
+
+static tree
+handle_btf_type_tag_attribute (tree *node, tree name, tree args,
+  int flags, bool *no_add_attrs)
+{
+  if (!args)
+*no_add_attrs = true;
+  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  error ("%qE attribute requires a string", name);
+  *no_add_attrs = true;
+}
+
+  /* Ensure a variant type is always created to hold the type_tag,
+ unless ATTR_FLAG_IN_PLACE is set.  Same logic as in
+ common_handle_aligned_attribute.  */
+  tree decl = NULL_TREE;
+  tree *type = NULL;
+  bool is_type = false;
+
+  if (DECL_P (*node))
+{
+  decl = *node;
+  type = &TREE_TYPE (decl);
+  is_type = TREE_CODE (*node) == TYPE_DECL;
+}
+  else if (TYPE_P (*node))
+type = node, is_type = true;
+
+  if (is_type)
+{
+  if ((flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+   /* OK, modify the type in place.  */;
+
+  /* If we have a TYPE_DECL, then copy the type, so that we
+don't accidentally modify a builtin type.  See pushdecl.  */
+  else if (decl && TREE_TYPE (decl) != error_mark_node
+  && DECL_ORIGINAL_TYPE (decl) == NULL_TREE)
+   {
+ tree tt = TREE_TYPE (decl);
+ *type = build_variant_type_copy (*type);
+ DECL_ORIGINAL_TYPE (decl) = tt;
+ TYPE_NAME (*type) = decl;
+ TREE_USED (*type) = TREE_USED (decl);
+ TREE_TYPE (decl) = *type;
+   }
+  else
+   *type = build_variant_type_copy (*type);
+}
+
+  return NULL_TREE;
+}
+
 /* Handle the "nonstring" variable attribute.  */
 
 static tree
-- 
2.47.2

[PATCH v5 5/6] doc: document btf_type_tag and btf_decl_tag attributes

2025-07-14 Thread David Faust

gcc/
* doc/extend.texi (Common Function Attributes)
(Common Variable Attributes): Document btf_decl_tag attribute.
(Common Type Attributes): Document btf_type_tag attribute.
---
 gcc/doc/extend.texi | 79 +
 1 file changed, 79 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e4f3cc2ad09..f3d5ac050db 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1971,6 +1971,13 @@ declares that @code{my_alloc1} returns 16-byte aligned 
pointers and
 that @code{my_alloc2} returns a pointer whose value modulo 32 is equal
 to 8.
 
+@cindex @code{btf_decl_tag} function attribute
+@item btf_decl_tag
+The @code{btf_decl_tag} attribute may be used to associate function
+declarations with arbitrary strings by recording those strings in DWARF
+and/or BTF information in the same way that it is used for variables.
+See @ref{Common Variable Attributes}.
+
 @cindex @code{cold} function attribute
 @item cold
 The @code{cold} attribute on functions is used to inform the compiler that
@@ -7102,6 +7109,41 @@ align them on any target.
 The @code{aligned} attribute can also be used for functions
 (@pxref{Common Function Attributes}.)
 
+@cindex @code{btf_decl_tag} variable attribute
+@item btf_decl_tag (@var{argument})
+The @code{btf_decl_tag} attribute may be used to associate variable
+declarations, struct or union member declarations, function
+declarations, and function parameter declarations with arbitrary strings.
+These strings are not interpreted by the compiler in any way, and have
+no effect on code generation.  Instead, these user-provided strings
+are recorded in DWARF (via @code{DW_AT_GNU_annotation} and
+@code{DW_TAG_GNU_annotation} extensions) and BTF information (via
+@code{BTF_KIND_DECL_TAG} records), and associated to the attributed
+declaration.  If neither DWARF nor BTF information is generated, the
+attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single declaration,
+in which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the declaration.  For
+a single declaration with multiple @code{btf_decl_tag} attributes,
+the order of the @code{DW_TAG_GNU_annotation} DIEs produced is not
+guaranteed to maintain the order of attributes in the source code.
+
+For example:
+
+@smallexample
+int *foo __attribute__ ((btf_decl_tag ("__percpu")));
+@end smallexample
+
+@noindent
+when compiled with @code{-gbtf} results in an additional
+@code{BTF_KIND_DECL_TAG} BTF record to be emitted in the BTF info,
+associating the string ``__percpu'' with the @code{BTF_KIND_VAR}
+record for the variable ``foo''.
+
 @cindex @code{counted_by} variable attribute
 @item counted_by (@var{count})
 The @code{counted_by} attribute may be attached to the C99 flexible array
@@ -8291,6 +8333,43 @@ is given by the product of arguments 1 and 2, and that
 @code{malloc_type}, like the standard C function @code{malloc},
 returns an object whose size is given by argument 1 to the function.
 
+@cindex @code{btf_type_tag} type attribute
+@item btf_type_tag (@var{argument})
+The @code{btf_type_tag} attribute may be used to associate (to ``tag'')
+particular types with arbitrary string annotations.  These annotations
+are recorded in debugging info by supported debug formats, currently
+DWARF (via @code{DW_AT_GNU_annotation} and @code{DW_TAG_GNU_annotation}
+extensions) and BTF (via @code{BTF_KIND_TYPE_TAG} records).  These
+annotation strings are not interpreted by the compiler in any way, and
+have no effect on code generation.  If neither DWARF nor BTF
+information is generated, the attribute has no effect.
+
+The argument is treated as an ordinary string in the source language
+with no additional special rules.
+
+The attribute may be supplied multiple times for a single type, in
+which case each distinct argument string will be recorded in a
+separate DIE or BTF record, each associated to the type.  For a single
+type with multiple @code{btf_type_tag} attributes, the order of the
+@code{DW_TAG_GNU_annotation} DIEs produced is not guaranteed to
+maintain the order of attributes in the source code.
+
+For example the following code:
+
+@smallexample
+int * __attribute__ ((btf_type_tag ("__user"))) foo;
+@end smallexample
+
+@noindent
+when compiled with @code{-gbtf} results in an additional
+@code{BTF_KIND_TYPE_TAG} BTF record to be emitted in the BTF info,
+associating the string ``__user'' with the normal @code{BTF_KIND_PTR}
+record for the pointer-to-integer type used in the declaration.
+
+Note that the BTF format currently only has a representation for type
+tags associated with pointer types.  Type tags on non-pointer types
+may be silently skipped when generating BTF.
+
 @cindex @code{copy} type attribute
 @item copy
 @itemx copy (@var{expression})
-- 
2.47.2

[PATCH v5 4/6] btf: generate and output DECL_TAG and TYPE_TAG records

2025-07-14 Thread David Faust

Support the btf_decl_tag and btf_type_tag attributes in BTF by creating
and emitting BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG records,
respectively, for them.

Some care is required when -gprune-btf is in effect to avoid emitting
decl or type tags for declarations or types which have been pruned and
will not be emitted in BTF.

gcc/
* btfout.cc (get_btf_kind): Handle DECL_TAG and TYPE_TAG kinds.
(btf_calc_num_vbytes): Likewise.
(btf_asm_type): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_tags): New.
(btf_output): Call it here.
(btf_add_used_type): Replace with simple wrapper around...
(btf_add_used_type_1): ...the implementation.  Handle
BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
(btf_add_vars): Update btf_add_used_type call.
(btf_assign_tag_ids): New.
(btf_mark_type_used): Update btf_add_used_type call.
(btf_collect_pruned_types): Likewise.  Handle type and decl tags.
(btf_finish): Call btf_assign_tag_ids.

gcc/testsuite/
* gcc.dg/debug/btf/btf-decl-tag-1.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-2.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-3.c: New test.
* gcc.dg/debug/btf/btf-decl-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-1.c: New test.
* gcc.dg/debug/btf/btf-type-tag-2.c: New test.
* gcc.dg/debug/btf/btf-type-tag-3.c: New test.
* gcc.dg/debug/btf/btf-type-tag-4.c: New test.
* gcc.dg/debug/btf/btf-type-tag-c2x-1.c: New test.

include/
* btf.h (BTF_KIND_DECL_TAG, BTF_KIND_TYPE_TAG) New defines.
(struct btf_decl_tag): New.
---
 gcc/btfout.cc | 171 +++---
 .../gcc.dg/debug/btf/btf-decl-tag-1.c |  14 ++
 .../gcc.dg/debug/btf/btf-decl-tag-2.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-3.c |  22 +++
 .../gcc.dg/debug/btf/btf-decl-tag-4.c |  34 
 .../gcc.dg/debug/btf/btf-type-tag-1.c |  26 +++
 .../gcc.dg/debug/btf/btf-type-tag-2.c |  13 ++
 .../gcc.dg/debug/btf/btf-type-tag-3.c |  28 +++
 .../gcc.dg/debug/btf/btf-type-tag-4.c |  24 +++
 .../gcc.dg/debug/btf/btf-type-tag-c2x-1.c |  22 +++
 include/btf.h |  14 ++
 11 files changed, 366 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decl-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-type-tag-c2x-1.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index ff7ea42a961..c00e0c98015 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -141,6 +141,8 @@ get_btf_kind (uint32_t ctf_kind)
 case CTF_K_VOLATILE: return BTF_KIND_VOLATILE;
 case CTF_K_CONST:return BTF_KIND_CONST;
 case CTF_K_RESTRICT: return BTF_KIND_RESTRICT;
+case CTF_K_DECL_TAG: return BTF_KIND_DECL_TAG;
+case CTF_K_TYPE_TAG: return BTF_KIND_TYPE_TAG;
 default:;
 }
   return BTF_KIND_UNKN;
@@ -217,6 +219,7 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
 case BTF_KIND_CONST:
 case BTF_KIND_RESTRICT:
 case BTF_KIND_FUNC:
+case BTF_KIND_TYPE_TAG:
 /* These kinds have no vlen data.  */
   break;
 
@@ -256,6 +259,10 @@ btf_calc_num_vbytes (ctf_dtdef_ref dtd)
   vlen_bytes += vlen * sizeof (struct btf_var_secinfo);
   break;
 
+case BTF_KIND_DECL_TAG:
+  vlen_bytes += sizeof (struct btf_decl_tag);
+  break;
+
 default:
   break;
 }
@@ -452,6 +459,20 @@ btf_asm_type (ctf_dtdef_ref dtd)
 and should write 0.  */
   dw2_asm_output_data (4, 0, "(unused)");
   return;
+case BTF_KIND_DECL_TAG:
+  {
+   if (dtd->ref_type)
+ break;
+   else if (dtd->dtd_u.dtu_tag.ref_var)
+ {
+   /* ref_type is NULL for decl tag attached to a variable.  */
+   ctf_dvdef_ref dvd = dtd->dtd_u.dtu_tag.ref_var;
+   dw2_asm_output_data (4, dvd->dvd_id,
+"btt_type: (BTF_KIND_VAR '%s')",
+dvd->dvd_name);
+   return;
+ }
+  }
 default:
   break;
 }
@@ -801,6 +822,12 @@ output_asm_btf_vlen_bytes (ctf_container_ref ctfc, 
ctf_dtdef_ref dtd)
 at this point.  */
   gcc_unreachable ();
 
+case BTF_KIND_DECL_TAG:
+  dw2_asm_output_data (4, dtd->dtd_u.dtu_tag.component_idx,
+  "component_idx=%d",
+  dtd->dtd_u.dtu_tag.component_idx);
+  break;
+

[PATCH v5 2/6] dwarf: create annotation DIEs for btf tags

2025-07-14 Thread David Faust

The btf_decl_tag and btf_type_tag attributes provide a means to annotate
declarations and types respectively with arbitrary user provided
strings.  These strings are recorded in debug information for
post-compilation uses, and despite the name they are meant to be
recorded in DWARF as well as BTF.  New DWARF extensions
DW_TAG_GNU_annotation and DW_AT_GNU_annotation are used to represent
these user annotations in DWARF.

This patch introduces the new DWARF extension DIE and attribute, and
generates them as necessary to represent user annotations from
btf_decl_tag and btf_type_tag.

The format of the new DIE is as follows:

DW_TAG_GNU_annotation
DW_AT_name: "btf_decl_tag" or "btf_type_tag"
DW_AT_const_value: 
DW_AT_GNU_annotation: 

DW_AT_GNU_annotation is a new attribute extension used to refer to these
new annotation DIEs.  If non-null in any given declaration or type DIE,
it is a reference to a DW_TAG_GNU_annotation DIE holding an annotation
for that declaration or type.  In addition, the DW_TAG_GNU_annotation
DIEs may also have a non-null DW_AT_GNU_annotation, referring to another
annotation DIE.  This allows chains of annotation DIEs to be formed,
such as in the case where a single declaration has multiple instances of
btf_decl_tag with different string annotations.

gcc/
* dwarf2out.cc (struct annotation_node, struct annotation_node_hasher)
(btf_tag_htab): New ancillary structures and hash table.
(annotation_node_hasher::hash, annotation_node_hasher::equal): New.
(hash_btf_tag, gen_btf_tag_dies, gen_btf_type_tag_dies)
(maybe_gen_btf_type_tag_dies, maybe_gen_btf_decl_tag_dies): New 
functions.
(modified_type_die): Add new argument to pass btf_type_tag attribute.
Handle btf_type_tag, and update recursive calls.
(base_type_for_mode): Add new arg for modified_type_die call.
(add_type_attribute): Likewise.
(gen_array_type_die): Call maybe_gen_btf_type_tag_dies for the type.
(gen_formal_parameter_die): Call maybe_gen_btf_decl_tag_dies for the
parameter.
(override_type_for_decl_p): Add new arg for modified_type_die call.
(force_type_die): Likewise.
(gen_tagged_type_die): Call maybe_gen_btf_type_tag_dies for the type.
(gen_decl_die): Call maybe_gen_btf_decl_tag_dies for the decl.
(dwarf2out_finish): Empty btf_tag_htab.
(dwarf2out_cc_finalize): Delete btf_tag_htab hash table.

include/
* dwarf2.def (DW_TAG_GNU_annotation): New DWARF extension.
(DW_AT_GNU_annotation): Likewise.

gcc/testsuite/
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-4.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-5.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-6.c: New test.
* gcc.dg/debug/dwarf2/dwarf-btf-type-tag-7.c: New test.
---
 gcc/dwarf2out.cc  | 337 --
 .../debug/dwarf2/dwarf-btf-decl-tag-1.c   |  11 +
 .../debug/dwarf2/dwarf-btf-decl-tag-2.c   |  25 ++
 .../debug/dwarf2/dwarf-btf-decl-tag-3.c   |  21 ++
 .../debug/dwarf2/dwarf-btf-type-tag-1.c   |  10 +
 .../debug/dwarf2/dwarf-btf-type-tag-2.c   |  31 ++
 .../debug/dwarf2/dwarf-btf-type-tag-3.c   |  15 +
 .../debug/dwarf2/dwarf-btf-type-tag-4.c   |  34 ++
 .../debug/dwarf2/dwarf-btf-type-tag-5.c   |  10 +
 .../debug/dwarf2/dwarf-btf-type-tag-6.c   |  27 ++
 .../debug/dwarf2/dwarf-btf-type-tag-7.c   |  25 ++
 include/dwarf2.def|   4 +
 12 files changed, 527 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-decl-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-5.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-6.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/dwarf-btf-type-tag-7.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index d1a55dbcbcb..20007f439a2 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3696,6 +3696,33 @@ static bool frame_pointer_fb_offset_valid;
 
 static vec base_types;
 
+

[PATCH v4] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Umesh Kalappa

Jeff thank you for the info and issue with the testcase.

Updated testcase(long long is not supported in C90) and lint warnings.

gcc/ChangeLog:

*config/riscv/riscv-cores.def(RISCV_CORE): Updated the supported march.
*config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
New file added for mips conditional mov extension.
*config/riscv/riscv-ext.def: Likewise.
*config/riscv/t-riscv: Generates riscv-ext.opt
*config/riscv/riscv-ext.opt: Generated file.
*config/riscv/riscv.cc(riscv_expand_conditional_move): Updated for mips 
cmov
and outlined some code that handle arch cond move.
*config/riscv/riscv.md(movcc): updated expand for MIPS CCMOV.
*config/riscv/mips-insn.md: New file for mips-p8700 ccmov insn.
*gcc/doc/riscv-ext.texi: Updated for mips cmov.
---
 gcc/config/riscv/mips-insn.md|  36 +++
 gcc/config/riscv/riscv-cores.def |   3 +-
 gcc/config/riscv/riscv-ext-mips.def  |  35 ++
 gcc/config/riscv/riscv-ext.def   |   1 +
 gcc/config/riscv/riscv-ext.opt   |   4 +
 gcc/config/riscv/riscv.cc| 107 +--
 gcc/config/riscv/riscv.md|   3 +-
 gcc/config/riscv/t-riscv |   3 +-
 gcc/doc/riscv-ext.texi   |   4 +
 gcc/testsuite/gcc.target/riscv/mipscondmov.c |  28 +
 10 files changed, 187 insertions(+), 37 deletions(-)
 create mode 100644 gcc/config/riscv/mips-insn.md
 create mode 100644 gcc/config/riscv/riscv-ext-mips.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c

diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
new file mode 100644
index 000..de53638d587
--- /dev/null
+++ b/gcc/config/riscv/mips-insn.md
@@ -0,0 +1,36 @@
+;; Machine description for MIPS custom instructions.
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_insn "*movcc_bitmanip"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (if_then_else:GPR
+ (any_eq:X (match_operand:X 1 "register_operand" "r")
+(match_operand:X 2 "const_0_operand" "J"))
+(match_operand:GPR 3 "reg_or_0_operand" "rJ")
+(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
+  "TARGET_XMIPSCMOV"
+{
+  enum rtx_code code = ;
+  if (code == NE)
+return "mips.ccmov\t%0,%1,%z3,%z4";
+  else
+return "mips.ccmov\t%0,%1,%z4,%z3";
+}
+[(set_attr "type" "condmove")
+ (set_attr "mode" "")])
diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 2096c0095d4..98f347034fb 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -169,7 +169,6 @@ RISCV_CORE("xiangshan-kunminghu",   
"rv64imafdcbvh_sdtrig_sha_shcounterenw_"
  "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
  "xiangshan-kunminghu")
 
-RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
- "zaamo_zalrsc_zba_zbb",
+RISCV_CORE("mips-p8700",  "rv64imfd_zicsr_zifencei_zalrsc_zba_zbb",
  "mips-p8700")
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-ext-mips.def 
b/gcc/config/riscv/riscv-ext-mips.def
new file mode 100644
index 000..5d7836d5999
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-mips.def
@@ -0,0 +1,35 @@
+/* MIPS extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to make sure updated anything.
+
+Format of DEFINE_RISCV_EXT, please refer to riscv-ext.def.  */
+
+DEFINE_RISCV_EXT (
+  /* NAME.  */ xmipscmov,

[PATCH v4] RISC-V: Mips P8700 Conditional Move Support.

2025-07-14 Thread Umesh Kalappa

Please ignore the previous changes and 

Jeff thank you for the info and issue with the testcase.

Updated testcase(long long is not supported in C90) and lint warnings.

gcc/ChangeLog:

*config/riscv/riscv-cores.def(RISCV_CORE): Updated the supported march.
*config/riscv/riscv-ext-mips.def(DEFINE_RISCV_EXT):
New file added for mips conditional mov extension.
*config/riscv/riscv-ext.def: Likewise.
*config/riscv/t-riscv: Generates riscv-ext.opt
*config/riscv/riscv-ext.opt: Generated file.
*config/riscv/riscv.cc(riscv_expand_conditional_move): Updated for mips 
cmov
and outlined some code that handle arch cond move.
*config/riscv/riscv.md(movcc): updated expand for MIPS CCMOV.
*config/riscv/mips-insn.md: New file for mips-p8700 ccmov insn.
*gcc/doc/riscv-ext.texi: Updated for mips cmov.
---
 gcc/config/riscv/mips-insn.md|  36 +++
 gcc/config/riscv/riscv-cores.def |   3 +-
 gcc/config/riscv/riscv-ext-mips.def  |  35 ++
 gcc/config/riscv/riscv-ext.def   |   1 +
 gcc/config/riscv/riscv-ext.opt   |   4 +
 gcc/config/riscv/riscv.cc| 107 +--
 gcc/config/riscv/riscv.md|   3 +-
 gcc/config/riscv/t-riscv |   3 +-
 gcc/doc/riscv-ext.texi   |   4 +
 gcc/testsuite/gcc.target/riscv/mipscondmov.c |  28 +
 10 files changed, 187 insertions(+), 37 deletions(-)
 create mode 100644 gcc/config/riscv/mips-insn.md
 create mode 100644 gcc/config/riscv/riscv-ext-mips.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/mipscondmov.c

diff --git a/gcc/config/riscv/mips-insn.md b/gcc/config/riscv/mips-insn.md
new file mode 100644
index 000..de53638d587
--- /dev/null
+++ b/gcc/config/riscv/mips-insn.md
@@ -0,0 +1,36 @@
+;; Machine description for MIPS custom instructions.
+;; Copyright (C) 2025 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_insn "*movcc_bitmanip"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (if_then_else:GPR
+ (any_eq:X (match_operand:X 1 "register_operand" "r")
+(match_operand:X 2 "const_0_operand" "J"))
+(match_operand:GPR 3 "reg_or_0_operand" "rJ")
+(match_operand:GPR 4 "reg_or_0_operand" "rJ")))]
+  "TARGET_XMIPSCMOV"
+{
+  enum rtx_code code = ;
+  if (code == NE)
+return "mips.ccmov\t%0,%1,%z3,%z4";
+  else
+return "mips.ccmov\t%0,%1,%z4,%z3";
+}
+[(set_attr "type" "condmove")
+ (set_attr "mode" "")])
diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 2096c0095d4..98f347034fb 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -169,7 +169,6 @@ RISCV_CORE("xiangshan-kunminghu",   
"rv64imafdcbvh_sdtrig_sha_shcounterenw_"
  "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b",
  "xiangshan-kunminghu")
 
-RISCV_CORE("mips-p8700",   "rv64imafd_zicsr_zmmul_"
- "zaamo_zalrsc_zba_zbb",
+RISCV_CORE("mips-p8700",  "rv64imfd_zicsr_zifencei_zalrsc_zba_zbb",
  "mips-p8700")
 #undef RISCV_CORE
diff --git a/gcc/config/riscv/riscv-ext-mips.def 
b/gcc/config/riscv/riscv-ext-mips.def
new file mode 100644
index 000..5d7836d5999
--- /dev/null
+++ b/gcc/config/riscv/riscv-ext-mips.def
@@ -0,0 +1,35 @@
+/* MIPS extension definition file for RISC-V.
+   Copyright (C) 2025 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.
+
+Please run `make riscv-regen` in build folder to make sure updated anything.
+
+Format of DEFINE_RISCV_EXT, please refer to riscv-ext.def.  */
+
+DEFIN

[committed] libstdc++: Add comments to deleted std::swap overloads for LWG 2766

2025-07-14 Thread Jonathan Wakely

We pre-emptively implemented part of LWG 2766, which still hasn't been
approved. Add comments to the deleted swap overloads saying why they're
there, because the standard doesn't require them.

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (swap): Add comment to deleted
overload.
* include/bits/unique_ptr.h (swap): Likewise.
* include/std/array (swap): Likewise.
* include/std/optional (swap): Likewise.
* include/std/tuple (swap): Likewise.
* include/std/variant (swap): Likewise.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.
---

Tested powerpc64le-linux, pushed to trunk.

 libstdc++-v3/include/bits/stl_pair.h| 2 ++
 libstdc++-v3/include/bits/unique_ptr.h  | 2 ++
 libstdc++-v3/include/std/array  | 2 ++
 libstdc++-v3/include/std/optional   | 2 ++
 libstdc++-v3/include/std/tuple  | 4 +++-
 libstdc++-v3/include/std/variant| 2 ++
 .../23_containers/array/tuple_interface/get_neg.cc  | 6 +++---
 7 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index 8c57712b4617..393f6a016196 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -1132,6 +1132,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // C++23
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 typename enable_if,
   __is_swappable<_T2>>::value>::type
diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index 6ae46a93800c..d76ad63ba7bf 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -832,6 +832,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { __x.swap(__y); }
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 typename enable_if::value>::type
 swap(unique_ptr<_Tp, _Dp>&,
diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index fdcf0b073762..12f010921db1 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -381,6 +381,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { __one.swap(__two); }
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 __enable_if_t::_Is_swappable::value>
 swap(array<_Tp, _Nm>&, array<_Tp, _Nm>&) = delete;
diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index cc7af5bbd7d2..e5051d72c828 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1740,6 +1740,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 noexcept(noexcept(__lhs.swap(__rhs)))
 { __lhs.swap(__rhs); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 enable_if_t && is_swappable_v<_Tp>)>
 swap(optional<_Tp>&, optional<_Tp>&) = delete;
diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index b39ce710984c..2e6499eab22d 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -2835,6 +2835,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { __x.swap(__y); }
 
 #if __cpp_lib_ranges_zip // >= C++23
+  /// Exchange the values of two const tuples (if const elements can be 
swapped)
   template
 requires (is_swappable_v && ...)
 constexpr void
@@ -2844,7 +2845,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // C++23
 
 #if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
-  /// Exchange the values of two const tuples (if const elements can be 
swapped)
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 _GLIBCXX20_CONSTEXPR
 typename enable_if...>::value>::type
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index ec46ff1dabb5..2f44f9700283 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1387,6 +1387,8 @@ namespace __detail::__variant
 noexcept(noexcept(__lhs.swap(__rhs)))
 { __lhs.swap(__rhs); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2766. Swapping non-swappable types
   template
 enable_if_t && ...)
   && (is_swappable_v<_Types> && ...))>
diff --git 
a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_neg.cc 
b/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_neg.cc
index 25511e79941d..e1e9ce9bdac1 100644
--- a/libstdc++-v3/testsuite/23_containers/array/tuple_interface/get_neg.cc
+++ b/libstdc+

[PATCH 2/2] ifconv: Small improvement to fold_build_cond_expr; lhs and rhs being the same.

2025-07-14 Thread Andrew Pinski

This is a small compile time optimization, as match and simplify will generate
the same thing but with rhs and lhs being the same we can return early instead
of having to go through match and simplify. This might not show up that much
at this point but can/will show up after my patch for PR 119920 where we factor
out common code between the 2 sides of the if statement while in if-conv.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-if-conv.cc (fold_build_cond_expr): Return early if lhs and rhs
are the same.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-if-conv.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index d2b9f9fe080..366e959fd77 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -494,6 +494,10 @@ fold_or_predicates (location_t loc, tree c1, tree c2)
 static tree
 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
+  /* Short cut the case where both rhs and lhs are the same. */
+  if (operand_equal_p (rhs, lhs))
+return rhs;
+
   /* If COND is comparison r != 0 and r has boolean type, convert COND
  to SSA_NAME to accept by vect bool pattern.  */
   if (TREE_CODE (cond) == NE_EXPR)
-- 
2.43.0

ACCESS_WITH_SIZE for pointers Re: [PATCH] tree-optimization/120929: Limit MEM_REF handling to .ACCESS_WITH_SIZE

2025-07-14 Thread Qing Zhao


> On Jul 7, 2025, at 13:07, Qing Zhao  wrote:
> 
> As I mentioned in the latest email I replied to the thread, the original 
> implementation of the counted_by for pointer was implemented without the 
> additional indirection. 
> But that implementation has a fundamental bug during testing.  then I changed 
> the implementation like the current. 
> 
> I need spending a little more time to find the details of that fundamental 
> bug with the original implementation. 
> 
> If the current bug is urgent to be fixed. and you are not comfortable with 
> the simple Patch Sid provided, then I am okay to back it out now and then 
> push it back with the fix to this current bug at a later time after everyone 
> is comfortable with the current implementation. 
> 
> Thanks a lot!
> 
> Qing


Hi,  this is an update on the above fundamental issue I mentioned previously. 
(I finally located this issue and recorded it here)

1. Based on the previous discussion on how to resolve PR120929, we agreed the 
following solution:

struct S {
  int n;
  int *p __attribute__((counted_by(n)));
} *f;

when generating a call to .ACCESS_WITH_SIZE for f->p, instead of generating
 *.ACCESS_WITH_SIZE (&f->p, &f->n,...)

We should generate
 .ACCESS_WITH_SIZE (f->p, &f->n,...)

i.e.,
the return type and the type of the first argument of the call is the
   original pointer type in this version,
   instead of the pointer to the original pointer type in the 7th version;

2. I implemented this new .ACCESS_WITH_SIZE generation for pointers in my local 
workspace. It looked fine in the beginning, 
However, during testing, I finally located the _fundamental issue_ with this 
design.  

This issue can be shown clearly with the following simple testing case: 
(Note, the numbers on the left in the following testing case is the line #)

$ cat t1.c
  1 struct annotated {
  2   int b;
  3   int *c __attribute__ ((counted_by (b)));
  4 } *p_array_annotated;
  5
  6 void __attribute__((__noinline__)) setup (int annotated_count)
  7 {
  8   p_array_annotated
  9 = (struct annotated *) __builtin_malloc (sizeof (struct annotated));
 10   p_array_annotated->c = (int *) __builtin_malloc (annotated_count * sizeof 
(int));
 11   p_array_annotated->c[2] = 10;
 12   p_array_annotated->b = annotated_count;
 13   return;
 14 }
 15   
 16 int main(int argc, char *argv[])
 17 {
 18   setup (10);
 19   return 0;
 20 }

$my-gcc t1.c-O0 -g  -o ./t1.exe -fdump-tree-gimple
$ ./t1.exe
Segmentation fault (core dumped)

3. As I debugged, the segmentation fault happened at line 11: 
p_array_annotated->c[2] = 10;
Since the value of the pointer "p_array_annotated->c” is  0x0. 

4. Study the gimple dump t1.c.007t.gimple as following:

  1 __attribute__((noinline))
  2 void setup (int annotated_count)
  3 {
  4   int * D.2969;
  56   _1 = __builtin_malloc (16);
  7   p_array_annotated = _1;
  8   _2 = (long unsigned int) annotated_count;
  9   _3 = _2 * 4;
 10   p_array_annotated.0_4 = p_array_annotated;
 11   _5 = p_array_annotated.0_4->c;
 12   p_array_annotated.1_6 = p_array_annotated;
 13   _7 = &p_array_annotated.1_6->b;
 14   D.2969 = .ACCESS_WITH_SIZE (_5, _7, 0B, 4);
 15   _8 = __builtin_malloc (_3);
 16   D.2969 = _8;
 17   p_array_annotated.2_10 = p_array_annotated;
 18   _11 = p_array_annotated.2_10->c;
 19   p_array_annotated.3_12 = p_array_annotated;
 20   _13 = &p_array_annotated.3_12->b;
 21   _9 = .ACCESS_WITH_SIZE (_11, _13, 0B, 4);
 22   _14 = _9 + 8;
 23   *_14 = 10;
 24   p_array_annotated.4_15 = p_array_annotated;
 25   p_array_annotated.4_15->b = annotated_count;
 26   return;
 27 }

We can see the root cause of this problem is because we passed the _value_ of  
“p_array_annotated->c” 
instead of the _address_ of “p_array_annotated->c” to .ACCESS_WITH_SIZE:

At line 11, the value of “p_array_annotated.0_4->c” is 0x0 when it was assigned 
to “_5”;
At line 14, the value of “_5” is passed to the call to .ACCESS_WITH_SIZE, which 
also is 0x0;

And later when we expand .ACCESS_WITH_SIZE (_5, _7, 0B, 4), we replace it with 
its first argument “_5”, 
As a result, the IL after the expand will look like the following:

11   _5 = p_array_annotated.0_4->c;
14  D.2969 = _5;
15  _8 = __builtin_malloc (_3);
16  D.2969 = _8:

We can clearly see that the above IL is wrong: p_array_annotated->c is 
initialized as 0x0, and this value is passed
to the pointer D.2969, which is 0x0 too. 

5.  This is exactly the fundamental issue I met in the very beginning during my 
implementation of the counted_by for
pointers. I should record this issue at that time and sent email to the alias 
at that time. 

6.  Based on this issue, I changed the .ACCESS_WITH_SIZE for pointers to

 *.ACCESS_WITH_SIZE (&f->p, &f->n,…)

at that time. 

And this resolved this fundamental issue. 

In a summary:

1. We still need to keep the design of  .ACCESS_WITH_SIZE for pointers  as in 
patch version #7, i.e:

*.ACCESS_WITH_SIZE (&f->p, &f->n,…)

2. As a side effect of th

[PATCH 1/2] ifconv: Remove unused array predicated

2025-07-14 Thread Andrew Pinski

While starting to improve if-conv, I noticed that predicated
was only being set once inside a loop and accessed right below.
This became this way due to r14-8869-g8636c538b68068 and before
it was needed since it was accessed via 2 loops but now it is only set
and then accessed in the next statement, there is no reason for it being
there.  So let's remove it and just use the value from it instead.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-if-conv.cc (combine_blocks): Remove predicated
dynamic array.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-if-conv.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 636361e7c36..d2b9f9fe080 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -3004,12 +3004,10 @@ combine_blocks (class loop *loop, bool loop_versioned)
 
   /* Reset flow-sensitive info before predicating stmts or PHIs we
  might fold.  */
-  bool *predicated = XNEWVEC (bool, orig_loop_num_nodes);
   for (i = 0; i < orig_loop_num_nodes; i++)
 {
   bb = ifc_bbs[i];
-  predicated[i] = is_predicated (bb);
-  if (predicated[i])
+  if (is_predicated (bb))
{
  for (auto gsi = gsi_start_phis (bb);
   !gsi_end_p (gsi); gsi_next (&gsi))
@@ -3211,7 +3209,6 @@ combine_blocks (class loop *loop, bool loop_versioned)
 
   free (ifc_bbs);
   ifc_bbs = NULL;
-  free (predicated);
 }
 
 /* Version LOOP before if-converting it; the original loop
-- 
2.43.0

[committed] cobol: Eliminate cppcheck warnings in gcc/cobol .cc files.

2025-07-14 Thread Robert Dubner

Subject: [PATCH] cobol: Eliminate cppcheck warnings in gcc/cobol .cc
files.

These changes eliminate various cppcheck warnings, mostly involving
C-Style
casting and applying "const" to various variables and formal parameters.
Some tab characters were eliminated, and some lines were trimmed to
seventy-nine characters.

gcc/cobol/ChangeLog:

* cobol1.cc (cobol_langhook_handle_option): Eliminate cppcheck
warnings.
* dts.h: Likewise.
* except.cc (cbl_enabled_exceptions_t::dump): Likewise.
* gcobolspec.cc (lang_specific_driver): Likewise.
* genapi.cc (parser_file_merge): Likewise.
* gengen.cc (gg_unique_in_function): Likewise.
(gg_declare_variable): Likewise.
(gg_peek_fn_decl): Likewise.
(gg_define_function): Likewise.
* genmath.cc (set_up_on_exception_label): Likewise.
(set_up_compute_error_label): Likewise.
(arithmetic_operation): Likewise.
(fast_divide): Likewise.
* genutil.cc (get_and_check_refstart_and_reflen): Likewise.
(get_depending_on_value_from_odo): Likewise.
(get_data_offset): Likewise.
(get_binary_value): Likewise.
(process_this_exception): Likewise.
(copy_little_endian_into_place): Likewise.
(refer_is_clean): Likewise.
(refer_fill_depends): Likewise.
* genutil.h (process_this_exception): Likewise.
(copy_little_endian_into_place): Likewise.
(refer_is_clean): Likewise.
* lexio.cc (check_push_pop_directive): Likewise.
(check_source_format_directive): Likewise.
(location_in): Likewise.
(lexer_input): Likewise.
(cdftext::lex_open): Likewise.
(lexio_dialect_mf): Likewise.
(valid_sequence_area): Likewise.
(cdftext::free_form_reference_format): Likewise.
(cdftext::segment_line): Likewise.
* lexio.h (struct span_t): Likewise.
* scan_ante.h (trim_location): Likewise.
* symbols.cc (symbol_elem_cmp): Likewise.
(symbol_alphabet): Likewise.
(end_of_group): Likewise.
(cbl_field_t::attr_str): Likewise.
(symbols_update): Likewise.
(symbol_typedef_add): Likewise.
(symbol_field_add): Likewise.
(new_temporary_impl): Likewise.
(symbol_label_section_exists): Likewise.
(symbol_program_callables): Likewise.
(file_status_status_of): Likewise.
* symfind.cc (is_data_field): Likewise.
(finalize_symbol_map2): Likewise.
(class in_scope): Likewise.
(symbol_match2): Likewise.
* util.cc (get_current_dir_name): Likewise.
(gb4): Likewise.
(class cdf_directives_t): Likewise.
(cbl_field_t::report_invalid_initial_value): Likewise.
(literal_subscript_oob): Likewise.
(cbl_refer_t::str): Likewise.
(date_time_fmt): Likewise.
(class unique_stack): Likewise.
(cobol_set_pp_option): Likewise.
(cobol_filename): Likewise.
(cobol_filename_restore): Likewise.
(gcc_location_set_impl): Likewise.
(ydferror): Likewise.
(error_msg_direct): Likewise.
(yyerror): Likewise.
(cbl_unimplemented_at): Likewise.
---
 gcc/cobol/cobol1.cc |  8 +++--
 gcc/cobol/dts.h |  2 +-
 gcc/cobol/except.cc |  2 +-
 gcc/cobol/gcobolspec.cc |  5 +--
 gcc/cobol/genapi.cc |  3 +-
 gcc/cobol/gengen.cc | 15 
 gcc/cobol/genmath.cc| 25 +++--
 gcc/cobol/genutil.cc| 80 +++--
 gcc/cobol/genutil.h |  6 ++--
 gcc/cobol/lexio.cc  | 31 
 gcc/cobol/lexio.h   |  4 +--
 gcc/cobol/scan_ante.h   |  3 +-
 gcc/cobol/symbols.cc| 50 +++---
 gcc/cobol/symfind.cc| 14 
 gcc/cobol/util.cc   | 79 +---
 15 files changed, 177 insertions(+), 150 deletions(-)

diff --git a/gcc/cobol/cobol1.cc b/gcc/cobol/cobol1.cc
index 4bd79f1f605..3146da57899 100644
--- a/gcc/cobol/cobol1.cc
+++ b/gcc/cobol/cobol1.cc
@@ -357,7 +357,7 @@ cobol_langhook_handle_option (size_t scode,
 return true;
 
 case OPT_M:
-   cobol_set_pp_option('M');
+cobol_set_pp_option('M');
 return true;
 
 case OPT_fstatic_call:
@@ -368,16 +368,18 @@ cobol_langhook_handle_option (size_t scode,
 wsclear(cobol_default_byte);
 return true;
 
-case OPT_fflex_debug:
+case OPT_fflex_debug: // cppcheck-suppress syntaxError // The
need for this is a mystery
 yy_flex_debug = 1;
 cobol_set_debugging( true, yy_debug == 1, cobol_trace_debug
== 1 );
 return true;
+
 case OPT_fyacc_debug:
 yy_debug = 1;
 cobol_set_debugging(yy_flex_debug == 1,
 true,
 cobol_trace_debug == 1 );
 return true;
+
 case OPT_ftrace_d

Re: [PATCH] arm: avoid gcc_s dependency

2025-07-14 Thread Sam James

Kyrylo Tkachov  writes:

> + arm maintainers.
>
> Hi Pierre,
>
>> On 14 Jul 2025, at 14:07, Pierre Ossman  wrote:
>> 
>> Suggested fix for this issue:
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60428
>> 
>> Did not get any response there, so seeing if this is a better forum for 
>> suggested changes.
>> 
>> We've been using this patch for years without any known issues.
>> 
>> Regards,
>> -- 
>> Pierre Ossman   Software Development
>> Cendio AB   https://cendio.com
>> Teknikringen 8  https://twitter.com/ThinLinc
>> 583 30 Linköpinghttps://facebook.com/ThinLinc
>> Phone: +46-13-214600
>> 
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
>
> diff -up gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c.arm-c-unwind 
> gcc-5.1.0/libgcc/config/arm/unwind-arm-dummy.c

A patch rebased against trunk would also be appreciated. See
https://gcc.gnu.org/contribute.html for the needed format.

> [...]

sam

[PATCH 1/2] libstdc++: Ensure std::hash<__int128> is defined [PR96710]

2025-07-14 Thread Jonathan Wakely

This is a follow-up to r16-2190-g4faa42ac0dee2c which ensures that
std::hash is always enabled for signed and unsigned __int128. The
standard requires std::hash to be enabled for all arithmetic types.

libstdc++-v3/ChangeLog:

PR libstdc++/96710
* include/bits/functional_hash.h (hash<__int128>): Define for
strict modes.
(hash): Likewise.
* testsuite/20_util/hash/int128.cc: New test.
---

Tested x86_64-linux.

Truncating the result to size_t is unfortunate, but maybe too late to
change. I've opened PR 121071 for that, as it also affect long long on
32-bit targets.

 libstdc++-v3/include/bits/functional_hash.h   |  9 +
 libstdc++-v3/testsuite/20_util/hash/int128.cc | 20 +++
 2 files changed, 29 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/hash/int128.cc

diff --git a/libstdc++-v3/include/bits/functional_hash.h 
b/libstdc++-v3/include/bits/functional_hash.h
index e84c9ee04be2..8456089f768d 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -199,6 +199,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_3 unsigned)
 #endif
 
+#if defined __STRICT_ANSI__ && defined __SIZEOF_INT128__
+  // In strict modes __GLIBCXX_TYPE_INT_N_0 is not defined for __int128,
+  // but we want to always treat signed/unsigned __int128 as integral types.
+  __extension__
+  _Cxx_hashtable_define_trivial_hash(__int128)
+  __extension__
+  _Cxx_hashtable_define_trivial_hash(__int128 unsigned)
+#endif
+
 #undef _Cxx_hashtable_define_trivial_hash
 
   struct _Hash_impl
diff --git a/libstdc++-v3/testsuite/20_util/hash/int128.cc 
b/libstdc++-v3/testsuite/20_util/hash/int128.cc
new file mode 100644
index ..7c3a1baa0ec6
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/hash/int128.cc
@@ -0,0 +1,20 @@
+// { dg-do run { target c++11 } }
+// { dg-add-options strict_std }
+
+#include 
+#include 
+
+int main()
+{
+#ifdef __SIZEOF_INT128__
+  std::hash<__int128> h;
+  __int128 i = (__int128)0x123456789;
+  VERIFY( h(i) == i );
+  VERIFY( h(-i) == (std::size_t)-i );
+  VERIFY( h(~i) == (std::size_t)~i );
+  std::hash hu;
+  unsigned __int128 u = i;
+  VERIFY( hu(u) == u );
+  VERIFY( hu(~u) == (std::size_t)~u );
+#endif
+}
-- 
2.50.1

[PATCH 2/2] libstdc++: Ensure std::make_unsigned works for 128-bit enum

2025-07-14 Thread Jonathan Wakely

libstdc++-v3/ChangeLog:

* include/std/type_traits (__make_unsigned_selector): Add
unsigned __int128 to type list.
* testsuite/20_util/make_unsigned/int128.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/std/type_traits   |  7 ++-
 .../testsuite/20_util/make_unsigned/int128.cc  | 14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 78a5ee8c0eb4..ff23544fbf03 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1992,8 +1992,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : __make_unsigned_selector_base
 {
   // With -fshort-enums, an enum may be as small as a char.
+  __extension__
   using _UInts = _List;
+  unsigned long, unsigned long long
+#ifdef __SIZEOF_INT128__
+  , unsigned __int128
+#endif
+ >;
 
   using __unsigned_type = typename __select::__type;
 
diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc 
b/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
new file mode 100644
index ..46c07b7669e5
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/make_unsigned/int128.cc
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++11 } }
+// { dg-add-options strict_std }
+
+#include 
+
+#ifdef __SIZEOF_INT128__
+enum E : __int128 { };
+using U = std::make_unsigned::type;
+static_assert( std::is_integral::value, "type is an integer" );
+static_assert( sizeof(U) == sizeof(E), "width of type is 128 bits" );
+using I = std::make_signed::type;
+static_assert( std::is_integral::value, "type is an integer" );
+static_assert( sizeof(I) == sizeof(E), "width of type is 128 bits" );
+#endif
-- 
2.50.1

Re: [PATCH] aarch64: Enable selective LDAPUR generation for cores with RCPC2

2025-07-14 Thread Soumya AR



> On 10 Jul 2025, at 5:20 PM, Richard Sandiford  
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Soumya AR  writes:
>>> On 10 Jul 2025, at 3:15 PM, Richard Sandiford  
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Soumya AR  writes:
> On 1 Jul 2025, at 9:22 PM, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 1 Jul 2025, at 17:36, Richard Sandiford  
>> wrote:
>> 
>> Soumya AR  writes:
>>> From 2a2c3e3683aaf3041524df166fc6f8cf20895a0b Mon Sep 17 00:00:00 2001
>>> From: Soumya AR 
>>> Date: Mon, 30 Jun 2025 12:17:30 -0700
>>> Subject: [PATCH] aarch64: Enable selective LDAPUR generation for cores 
>>> with
>>> RCPC2
>>> 
>>> This patch adds the ability to fold the address computation into the 
>>> addressing
>>> mode for LDAPR instructions using LDAPUR when RCPC2 is available.
>>> 
>>> LDAPUR emission is controlled by the tune flag enable_ldapur, to enable 
>>> it on a
>>> per-core basis. Earlier, the following code:
>>> 
>>> uint64_t
>>> foo (std::atomic *x)
>>> {
>>> return x[1].load(std::memory_order_acquire);
>>> }
>>> 
>>> would generate:
>>> 
>>> foo(std::atomic*):
>>> add x0, x0, 8
>>> ldapr   x0, [x0]
>>> ret
>>> 
>>> but now generates:
>>> 
>>> foo(std::atomic*):
>>> ldapur  x0, [x0, 8]
>>> ret
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Soumya AR 
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
>>> Add the enable_ldapur flag to conwtrol LDAPUR emission.
>>> * config/aarch64/aarch64.h (TARGET_ENABLE_LDAPUR): Use new flag.
>>> * config/aarch64/aarch64.md (any): Add ldapur_enable attribute.
>>> * config/aarch64/atomics.md: (aarch64_atomic_load_rcpc): Modify
>>> to emit LDAPUR for cores with RCPC2 when enable_ldapur is set.
>>> (*aarch64_atomic_load_rcpc_zext): Likewise.
>>> (*aarch64_atomic_load_rcpc_sext): Modified to emit LDAPURS
>>> for addressing with offsets.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.target/aarch64/ldapur.c: New test.
>> 
>> Thanks for doing this.  It generally looks good, but a couple of comments
>> below:
>> 
>>> ---
>>> gcc/config/aarch64/aarch64-tuning-flags.def |  2 +
>>> gcc/config/aarch64/aarch64.h|  5 ++
>>> gcc/config/aarch64/aarch64.md   | 11 +++-
>>> gcc/config/aarch64/atomics.md   | 22 +---
>>> gcc/testsuite/gcc.target/aarch64/ldapur.c   | 61 +
>>> 5 files changed, 92 insertions(+), 9 deletions(-)
>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/ldapur.c
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
>>> b/gcc/config/aarch64/aarch64-tuning-flags.def
>>> index f2c916e9d77..5bf54165306 100644
>>> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>>> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>>> @@ -44,6 +44,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
>>> AVOID_CROSS_LOOP_FMA)
>>> 
>>> AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>>> 
>>> +AARCH64_EXTRA_TUNING_OPTION ("enable_ldapur", ENABLE_LDAPUR)
>>> +
>> 
>> Let's see what others say, but personally, I think this would be better
>> as an opt-out, such as avoid_ldapur.  The natural default seems to be to 
>> use
>> the extra addressing capacity when it's available and have CPUs 
>> explicitly
>> flag when they don't want that.
>> 
>> A good, conservatively correct, default would probably be to add 
>> avoid_ldapur
>> to every *current* CPU that includes rcpc2 and then separately remove it
>> from those that are known not to need it.  In that sense, it's more work
>> for current CPUs than the current patch, but it should ease the impact
>> on future CPUs.
> 
> LLVM used to do this folding by default everywhere until it was 
> discovered that it hurts various CPUs.
> So they’ve taken the approach you describe, and disable the folding 
> explicitly for:
> neoverse-v2 neoverse-v3 cortex-x3 cortex-x4 cortex-x925
> I don’t know for sure if those are the only CPUs where this applies.
> They also disable the folding for generic tuning when -march is between 
> armv8.4 - armv8.7/armv9.2.
> I guess we can do the same in GCC.
 
 Thanks for your suggestions, Richard and Kyrill.
 
 I've updated the patch to use avoid_ldapur.
 
 There's now an explicit override in aarch64_override_options_internal to 
 use avoid_ldapur for armv8.4 through armv8.7.
 
 I added it here because aarch64_adjust_gen

1 2 >

1 - 100 of 127 matches

Mail list logo