Re: C/C++ frontend patches ping

2024-03-14 Thread Andi Kleen
Andrew Pinski  writes:

> On Thu, Mar 14, 2024 at 9:36 PM Andi Kleen  wrote:
>>
>>
>> musttail support for C/C++
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643867.html
>>
>>
>> Support constexpr for asm statements in C++
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643933.html
>
>
> Both of these were posted long after the start of stage 3 and close
> into the beginning of stage 4 and since they are both new features I
> really doubt they will be reviewed until stage 1 opens up which will
> be in about a month or so.

I don't buy it.

This mailing list and the git logs are full of approved feature patches
that are clearly not bug fixes.  If there is really such a rule it is
extremely selectively and unfairly enforced.

-Andi


Re: C/C++ frontend patches ping

2024-03-14 Thread Andrew Pinski
On Thu, Mar 14, 2024 at 9:36 PM Andi Kleen  wrote:
>
>
> musttail support for C/C++
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643867.html
>
>
> Support constexpr for asm statements in C++
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643933.html


Both of these were posted long after the start of stage 3 and close
into the beginning of stage 4 and since they are both new features I
really doubt they will be reviewed until stage 1 opens up which will
be in about a month or so.

Thanks,
Andrew Pinski


C/C++ frontend patches ping

2024-03-14 Thread Andi Kleen


musttail support for C/C++

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643867.html


Support constexpr for asm statements in C++

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643933.html


Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs  wrote:
>
> Don't enable excess lanes when inverting vector bit-masks smaller than the
> integer mode.  This is yet another case of wrong-code due to mishandling
> of oversized bitmasks.
>
> This issue shows up in vect/tsvc/vect-tsvc-s278.c and
> vect/tsvc/vect-tsvc-s279.c if I set the preferred vector size to V32
> (down from V64) on amdgcn.
>
> OK for mainline?
>
> Andrew
>
> gcc/ChangeLog:
>
> * expr.cc (expand_expr_real_2): Use xor to invert vector masks.
> ---
>  gcc/expr.cc | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 403eeaa108e4..3540327d879e 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -10497,6 +10497,17 @@ expand_expr_real_2 (sepops ops, rtx target, 
> machine_mode tmode,
>immed_wide_int_const (mask, int_mode),
>target, 1, OPTAB_LIB_WIDEN);
> }
> +  /* If it's a vector mask don't enable excess bits.  */
> +  else if (VECTOR_BOOLEAN_TYPE_P (type)
> +  && SCALAR_INT_MODE_P (mode)
> +  && maybe_ne (GET_MODE_PRECISION (mode),
> +   TYPE_VECTOR_SUBPARTS (type).to_constant ()))
> +   {
> + auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> + temp = expand_binop (mode, xor_optab, op0,
> +  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
> +  target, true, OPTAB_WIDEN);
> +   }
Not review, just curious, should the issue be fixed by the commit in PR113576.
Also wonder besides cbranch, excess land bits also matter?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113576#c35
>else
> temp = expand_unop (mode, one_cmpl_optab, op0, target, 1);
>gcc_assert (temp);
> --
> 2.41.0
>


-- 
BR,
Hongtao


[r14-9478 Regression] FAIL: g++.dg/torture/pr104601.C -Os (test for excess errors) on Linux/x86_64

2024-03-14 Thread haochen.jiang
On Linux/x86_64,

df483ebd24689a3bebfae2089637a00eca0e5a12 is the first bad commit
commit df483ebd24689a3bebfae2089637a00eca0e5a12
Author: Jonathan Wakely 
Date:   Mon Feb 26 13:17:13 2024 +

libstdc++: Add nodiscard in 

caused

FAIL: g++.dg/torture/pr104601.C   -O0  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -O1  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -O2  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -O3 -g  (test for excess errors)
FAIL: g++.dg/torture/pr104601.C   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-9478/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr104601.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr104601.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr104601.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=g++.dg/torture/pr104601.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v1] libstdc++: Optimize removal from unique assoc containers [PR112934]

2024-03-14 Thread Barnabás Pőcze
Hi


2024. március 13., szerda 12:43 keltezéssel, Jonathan Wakely 
 írta:

> On Mon, 11 Mar 2024 at 23:36, Barnabás Pőcze  wrote:
> >
> > Previously, calling erase(key) on both std::map and std::set
> > would execute that same code that std::multi{map,set} would.
> > However, doing that is unnecessary because std::{map,set}
> > guarantee that all elements are unique.
> >
> > It is reasonable to expect that erase(key) is equivalent
> > or better than:
> >
> >   auto it = m.find(key);
> >   if (it != m.end())
> > m.erase(it);
> >
> > However, this was not the case. Fix that by adding a new
> > function _Rb_tree<>::_M_erase_unique() that is essentially
> > equivalent to the above snippet, and use this from both
> > std::map and std::set.
> 
> Hi, this change looks reasonable, thanks for the patch. Please note
> that GCC is currently in "stage 3" of its dev process so this change
> would have to wait until after GCC 14 branches from trunk, due in a
> few weeks.

OK; I didn't know that, thanks for telling me.


> 
> I assume you ran the testsuite with no regressions. [...]


I hope so. I ran `make check-target-libstdc++-v3`, and it did not note any
unexpected failures as far as I can see:

  Native configuration is x86_64-pc-linux-gnu

=== libstdc++ tests ===

  Schedule of variations:
  unix

  Running target unix
  Using /usr/share/dejagnu/baseboards/unix.exp as board description file for 
target.
  Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
  Using /gcc/libstdc++-v3/testsuite/config/default.exp as 
tool-and-target-specific interface file.
  Running /gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp ...
  Running /gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
  Running 
/gcc/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp ...
  Running /gcc/libstdc++-v3/testsuite/libstdc++-xmethods/xmethods.exp ...

=== libstdc++ Summary ===

  # of expected passes  18646
  # of expected failures126
  # of unsupported tests672

> [...] Do you have benchmarks to show this making a difference?


As for benchmarks, I do not have any. But even if the performance does not
improve appreciably, the size of the generated code will definitely be smaller.
And in the end, the excessive code was the reason I opened the mentioned 
issue[0]
in the first place, which should be eliminated hopefully.


> [...]


Regards,
Barnabás Pőcze


[0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112934


Re: [PATCH v14 23/26] c++: Implement __is_invocable built-in trait

2024-03-14 Thread Ken Matsui
On Fri, Mar 8, 2024 at 9:17 AM Patrick Palka  wrote:
>
> On Wed, 28 Feb 2024, Ken Matsui wrote:
>
> > This patch implements built-in trait for std::is_invocable.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-trait.def: Define __is_invocable.
> >   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_INVOCABLE.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >   * cp-tree.h (build_invoke): New function.
> >   * method.cc (build_invoke): New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Test existence of __is_invocable.
> >   * g++.dg/ext/is_invocable1.C: New test.
> >   * g++.dg/ext/is_invocable2.C: New test.
> >   * g++.dg/ext/is_invocable3.C: New test.
> >   * g++.dg/ext/is_invocable4.C: New test.
>
> Thanks, this looks great!  This generic build_invoke function could be
> used for invoke_result etc as well, and it could also cache the built-up
> call across __is_invocable and __is_nothrow_invocable checks on the same
> arguments (which is a common pattern in the standard library).  LGTM
>
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/cp/constraint.cc |   6 +
> >  gcc/cp/cp-trait.def  |   1 +
> >  gcc/cp/cp-tree.h |   2 +
> >  gcc/cp/method.cc | 132 +
> >  gcc/cp/semantics.cc  |   4 +
> >  gcc/testsuite/g++.dg/ext/has-builtin-1.C |   3 +
> >  gcc/testsuite/g++.dg/ext/is_invocable1.C | 349 +++
> >  gcc/testsuite/g++.dg/ext/is_invocable2.C | 139 +
> >  gcc/testsuite/g++.dg/ext/is_invocable3.C |  51 
> >  gcc/testsuite/g++.dg/ext/is_invocable4.C |  33 +++
> >  10 files changed, 720 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable1.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable2.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable3.C
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_invocable4.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 23ea66d9c12..c87b126fdb1 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3791,6 +3791,12 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_IS_FUNCTION:
> >inform (loc, "  %qT is not a function", t1);
> >break;
> > +case CPTK_IS_INVOCABLE:
> > +  if (!t2)
> > +inform (loc, "  %qT is not invocable", t1);
> > +  else
> > +inform (loc, "  %qT is not invocable by %qE", t1, t2);
> > +  break;
> >  case CPTK_IS_LAYOUT_COMPATIBLE:
> >inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
> >break;
> > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> > index 85056c8140b..6cb2b55f4ea 100644
> > --- a/gcc/cp/cp-trait.def
> > +++ b/gcc/cp/cp-trait.def
> > @@ -75,6 +75,7 @@ DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
> >  DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
> >  DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
> >  DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
> > +DEFTRAIT_EXPR (IS_INVOCABLE, "__is_invocable", -1)
> >  DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
> >  DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
> >  DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 
> > 1)
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 334c11396c2..261d3a71faa 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -7334,6 +7334,8 @@ extern tree get_copy_assign 
> > (tree);
> >  extern tree get_default_ctor (tree);
> >  extern tree get_dtor (tree, tsubst_flags_t);
> >  extern tree build_stub_object(tree);
> > +extern tree build_invoke (tree, const_tree,
> > +  tsubst_flags_t);
> >  extern tree strip_inheriting_ctors   (tree);
> >  extern tree inherited_ctor_binfo (tree);
> >  extern bool base_ctor_omit_inherited_parms   (tree);
> > diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
> > index 98c10e6a8b5..953f1bed6fc 100644
> > --- a/gcc/cp/method.cc
> > +++ b/gcc/cp/method.cc
> > @@ -1928,6 +1928,138 @@ build_trait_object (tree type)
> >return build_stub_object (type);
> >  }
> >
> > +/* [func.require] Build an expression of INVOKE(FN_TYPE, ARG_TYPES...).  
> > If the
> > +   given is not invocable, returns error_mark_node.  */
> > +
> > +tree
> > +build_invoke (tree fn_type, const_tree arg_types, tsubst_flags_t complain)
> > +{
> > +  if (fn_type == error_mark_node || arg_types == error_mark_node)
> > +return error_mark_node;
> > +
> > +  gcc_assert (TYPE_P (fn_type));
> > +  gcc_assert (TREE_CODE (arg_types) == TREE_VEC);
> > +
> > +  /* Access check is required to determine if the given is invocable.  */
> > +  deferring_access_check_sentinel acs (dk_no_deferred);
> > +
> > 

RE: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits for RVV

2024-03-14 Thread Li, Pan2
> Shouldn't a major user-facing change like this be discussed in a PR against
> https://github.com/riscv-non-isa/riscv-c-api-doc/ or
> https://github.com/riscv-non-isa/rvv-intrinsic-doc before or concurrent with
> compiler implementation?

I think Kito is working on the spec doc already.

Hi Kito
Could you please help to correct me the behavior of the riscv_rvv_vector_bits 
attribute?
Sort of details and I suspect there is something missing, or different behavior 
compared with clang side.

Pan

-Original Message-
From: Stefan O'Rear  
Sent: Tuesday, March 12, 2024 9:25 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Kito Cheng ; Wang, Yanzhang 
; rdapp@gmail.com; Vineet Gupta 
; Palmer Dabbelt 
Subject: Re: [PATCH v3] RISC-V: Introduce gcc attribute riscv_rvv_vector_bits 
for RVV

On Tue, Mar 12, 2024, at 2:15 AM, pan2...@intel.com wrote:
> From: Pan Li 
>
> Update in v3:
> * Add pre-defined __riscv_v_fixed_vlen when zvl.
>
> Update in v2:
> * Cleanup some unused code.
> * Fix some typo of commit log.
>
> Original log:
>
> This patch would like to introduce one new gcc attribute for RVV.
> This attribute is used to define fixed-length variants of one
> existing sizeless RVV types.
>
> This attribute is valid if and only if the mrvv-vector-bits=zvl, the only
> one args should be the integer constant and its' value is terminated
> by the LMUL and the vector register bits in zvl*b.  For example:
>
> typedef vint32m2_t fixed_vint32m2_t 
> __attribute__((riscv_rvv_vector_bits(128)));
>
> The above type define is valid when -march=rv64gc_zve64d_zvl64b
> (aka 2(m2) * 64 = 128 for vin32m2_t), and will report error when
> -march=rv64gcv_zvl128b similar to below.
>
> "error: invalid RVV vector size '128', expected size is '256' based on
> LMUL of type and '-mrvv-vector-bits=zvl'"
>
> Meanwhile, a pre-define macro __riscv_v_fixed_vlen is introduced to
> represent the fixed vlen in a RVV vector register.

Shouldn't a major user-facing change like this be discussed in a PR against
https://github.com/riscv-non-isa/riscv-c-api-doc/ or
https://github.com/riscv-non-isa/rvv-intrinsic-doc before or concurrent with
compiler implementation?

-s

> For the vint*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, %, &, |, ^, >>, <<, ~, -
>
> For the vfloat*m*_t below operations are allowed.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
> * CMP: >, <, ==, !=, <=, >=
> * ALU: +, -, *, /, -
>
> For the vbool*_t types only below operations are allowed except
> the CMP and ALU. The CMP and ALU operations on vbool*_t is not
> well defined currently.
> * The sizeof.
> * The global variable(s).
> * The element of union and struct.
> * The cast to other equalities.
>
> For the vint*x*m*_t tuple types are not suppored in this patch
> which is compatible with clang.
>
> This patch passed the below testsuites.
> * The riscv fully regression tests.
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Add pre-define
>   macro __riscv_v_fixed_vlen when zvl.
>   * config/riscv/riscv.cc (riscv_handle_rvv_vector_bits_attribute):
>   New static func to take care of the RVV types decorated by
>   the attributes.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-1.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-10.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-11.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-12.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-13.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-14.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-15.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-16.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-17.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-2.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-3.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-4.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-5.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-6.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-7.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-8.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits-9.c: New test.
>   * gcc.target/riscv/rvv/base/riscv_rvv_vector_bits.h: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   |   3 +
>  gcc/config/riscv/riscv.cc |  87 +-
>  .../riscv/rvv/base/riscv_rvv_vector_bits-1.c  |   6 +
>  

[PATCH v2 0/3] LoongArch: Cleanup unused/redundant codes.

2024-03-14 Thread Chenghui Pan
Changes from v1: Some correction about ChangeLog format.

There's some unused/redundant definitions inside LoongArch target support
codes, these patches make a simple cleanup. Regression test passed.

Chenghui Pan (3):
  LoongArch: Remove unused/useless definitions.
  LoongArch: Change loongarch_expand_vec_cmp()'s return type from bool
to void.
  LoongArch: Combine UNITS_PER_FP_REG and UNITS_PER_FPREG macros.

 gcc/config/loongarch/lasx.md|  6 ++--
 gcc/config/loongarch/loongarch-protos.h |  7 +
 gcc/config/loongarch/loongarch.cc   | 39 -
 gcc/config/loongarch/loongarch.h|  7 ++---
 gcc/config/loongarch/lsx.md |  6 ++--
 5 files changed, 13 insertions(+), 52 deletions(-)

-- 
2.39.3



[PATCH v2 3/3] LoongArch: Combine UNITS_PER_FP_REG and UNITS_PER_FPREG macros.

2024-03-14 Thread Chenghui Pan
These macros are completely same in definition, so we can keep the previous one
and eliminate later one.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_hard_regno_mode_ok_uncached): Combine UNITS_PER_FP_REG and
UNITS_PER_FPREG macros.
(loongarch_hard_regno_nregs): Ditto.
(loongarch_class_max_nregs): Ditto.
(loongarch_get_separate_components): Ditto.
(loongarch_process_components): Ditto.
* config/loongarch/loongarch.h (UNITS_PER_FPREG): Ditto.
(UNITS_PER_HWFPVALUE): Ditto.
(UNITS_PER_FPVALUE): Ditto.
---
 gcc/config/loongarch/loongarch.cc | 10 +-
 gcc/config/loongarch/loongarch.h  |  7 ++-
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 7ef04329668..8f657ee1f9c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6770,7 +6770,7 @@ loongarch_hard_regno_mode_ok_uncached (unsigned int 
regno, machine_mode mode)
 and TRUNC.  There's no point allowing sizes smaller than a word,
 because the FPU has no appropriate load/store instructions.  */
   if (mclass == MODE_INT)
-   return size >= MIN_UNITS_PER_WORD && size <= UNITS_PER_FPREG;
+   return size >= MIN_UNITS_PER_WORD && size <= UNITS_PER_FP_REG;
 }
 
   return false;
@@ -6813,7 +6813,7 @@ loongarch_hard_regno_nregs (unsigned int regno, 
machine_mode mode)
   if (LASX_SUPPORTED_MODE_P (mode))
return 1;
 
-  return (GET_MODE_SIZE (mode) + UNITS_PER_FPREG - 1) / UNITS_PER_FPREG;
+  return (GET_MODE_SIZE (mode) + UNITS_PER_FP_REG - 1) / UNITS_PER_FP_REG;
 }
 
   /* All other registers are word-sized.  */
@@ -6848,7 +6848,7 @@ loongarch_class_max_nregs (enum reg_class rclass, 
machine_mode mode)
  else if (LSX_SUPPORTED_MODE_P (mode))
size = MIN (size, UNITS_PER_LSX_REG);
  else
-   size = MIN (size, UNITS_PER_FPREG);
+   size = MIN (size, UNITS_PER_FP_REG);
}
   left &= ~reg_class_contents[FP_REGS];
 }
@@ -8222,7 +8222,7 @@ loongarch_get_separate_components (void)
if (IMM12_OPERAND (offset))
  bitmap_set_bit (components, regno);
 
-   offset -= UNITS_PER_FPREG;
+   offset -= UNITS_PER_FP_REG;
   }
 
   /* Don't mess with the hard frame pointer.  */
@@ -8301,7 +8301,7 @@ loongarch_process_components (sbitmap components, 
loongarch_save_restore_fn fn)
if (bitmap_bit_p (components, regno))
  loongarch_save_restore_reg (mode, regno, offset, fn);
 
-   offset -= UNITS_PER_FPREG;
+   offset -= UNITS_PER_FP_REG;
   }
 }
 
diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index bf2351f0968..888a633961d 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -138,19 +138,16 @@ along with GCC; see the file COPYING3.  If not see
 /* Width of a LASX vector register in bits.  */
 #define BITS_PER_LASX_REG (UNITS_PER_LASX_REG * BITS_PER_UNIT)
 
-/* For LARCH, width of a floating point register.  */
-#define UNITS_PER_FPREG (TARGET_DOUBLE_FLOAT ? 8 : 4)
-
 /* The largest size of value that can be held in floating-point
registers and moved with a single instruction.  */
 #define UNITS_PER_HWFPVALUE \
-  (TARGET_SOFT_FLOAT ? 0 : UNITS_PER_FPREG)
+  (TARGET_SOFT_FLOAT ? 0 : UNITS_PER_FP_REG)
 
 /* The largest size of value that can be held in floating-point
registers.  */
 #define UNITS_PER_FPVALUE \
   (TARGET_SOFT_FLOAT ? 0 \
-   : TARGET_SINGLE_FLOAT ? UNITS_PER_FPREG \
+   : TARGET_SINGLE_FLOAT ? UNITS_PER_FP_REG \
 : LONG_DOUBLE_TYPE_SIZE / BITS_PER_UNIT)
 
 /* The number of bytes in a double.  */
-- 
2.39.3



[PATCH v2 2/3] LoongArch: Change loongarch_expand_vec_cmp()'s return type from bool to void.

2024-03-14 Thread Chenghui Pan
This function is always return true at the end of function implementation,
so the return value is useless.

gcc/ChangeLog:

* config/loongarch/lasx.md (vec_cmp): Remove checking
of loongarch_expand_vec_cmp()'s return value.
(vec_cmpu): Ditto.
* config/loongarch/lsx.md (vec_cmp): Ditto.
(vec_cmpu): Ditto.
* config/loongarch/loongarch-protos.h
(loongarch_expand_vec_cmp): Change loongarch_expand_vec_cmp()'s return
type from bool to void.
* config/loongarch/loongarch.cc (loongarch_expand_vec_cmp): Ditto.
---
 gcc/config/loongarch/lasx.md| 6 ++
 gcc/config/loongarch/loongarch-protos.h | 2 +-
 gcc/config/loongarch/loongarch.cc   | 3 +--
 gcc/config/loongarch/lsx.md | 6 ++
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index ac84db7f0ce..8d4c6b4ec35 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -1383,8 +1383,7 @@ (define_expand "vec_cmp"
   (match_operand:LASX 3 "register_operand")]))]
   "ISA_HAS_LASX"
 {
-  bool ok = loongarch_expand_vec_cmp (operands);
-  gcc_assert (ok);
+  loongarch_expand_vec_cmp (operands);
   DONE;
 })
 
@@ -1395,8 +1394,7 @@ (define_expand "vec_cmpu"
   (match_operand:ILASX 3 "register_operand")]))]
   "ISA_HAS_LASX"
 {
-  bool ok = loongarch_expand_vec_cmp (operands);
-  gcc_assert (ok);
+  loongarch_expand_vec_cmp (operands);
   DONE;
 })
 
diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 871544f760c..e3ed2b912a5 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -95,7 +95,7 @@ extern void loongarch_split_lsx_fill_d (rtx, rtx);
 extern const char *loongarch_output_move (rtx, rtx);
 #ifdef RTX_CODE
 extern void loongarch_expand_scc (rtx *);
-extern bool loongarch_expand_vec_cmp (rtx *);
+extern void loongarch_expand_vec_cmp (rtx *);
 extern void loongarch_expand_conditional_branch (rtx *);
 extern void loongarch_expand_conditional_move (rtx *);
 extern void loongarch_expand_conditional_trap (rtx);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b25624c9406..7ef04329668 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10801,13 +10801,12 @@ loongarch_expand_vec_cond_mask_expr (machine_mode 
mode, machine_mode vimode,
 }
 
 /* Expand integer vector comparison */
-bool
+void
 loongarch_expand_vec_cmp (rtx operands[])
 {
 
   rtx_code code = GET_CODE (operands[1]);
   loongarch_expand_lsx_cmp (operands[0], code, operands[2], operands[3]);
-  return true;
 }
 
 /* Implement TARGET_PROMOTE_FUNCTION_MODE.  */
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index b9b94b9079c..87d3e7c5d9f 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -518,8 +518,7 @@ (define_expand "vec_cmp"
   (match_operand:LSX 3 "register_operand")]))]
   "ISA_HAS_LSX"
 {
-  bool ok = loongarch_expand_vec_cmp (operands);
-  gcc_assert (ok);
+  loongarch_expand_vec_cmp (operands);
   DONE;
 })
 
@@ -530,8 +529,7 @@ (define_expand "vec_cmpu"
   (match_operand:ILSX 3 "register_operand")]))]
   "ISA_HAS_LSX"
 {
-  bool ok = loongarch_expand_vec_cmp (operands);
-  gcc_assert (ok);
+  loongarch_expand_vec_cmp (operands);
   DONE;
 })
 
-- 
2.39.3



[PATCH v2 1/3] LoongArch: Remove unused/useless definitions.

2024-03-14 Thread Chenghui Pan
This patch removes some unnecessary definitions of target hook functions
according to the documentation of GCC.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_cfun_has_cprestore_slot_p): Delete.
(loongarch_adjust_insn_length): Delete.
(current_section_name): Delete.
(loongarch_split_symbol_type): Delete.
* config/loongarch/loongarch.cc
(loongarch_case_values_threshold): Delete.
(loongarch_spill_class): Delete.
(TARGET_OPTAB_SUPPORTED_P): Delete.
(TARGET_CASE_VALUES_THRESHOLD): Delete.
(TARGET_SPILL_CLASS): Delete.
---
 gcc/config/loongarch/loongarch-protos.h |  5 -
 gcc/config/loongarch/loongarch.cc   | 26 -
 2 files changed, 31 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index 1fdfda9af01..871544f760c 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -93,7 +93,6 @@ extern void loongarch_split_lsx_copy_d (rtx, rtx, rtx, rtx 
(*)(rtx, rtx, rtx));
 extern void loongarch_split_lsx_insert_d (rtx, rtx, rtx, rtx);
 extern void loongarch_split_lsx_fill_d (rtx, rtx);
 extern const char *loongarch_output_move (rtx, rtx);
-extern bool loongarch_cfun_has_cprestore_slot_p (void);
 #ifdef RTX_CODE
 extern void loongarch_expand_scc (rtx *);
 extern bool loongarch_expand_vec_cmp (rtx *);
@@ -135,7 +134,6 @@ extern int loongarch_class_max_nregs (enum reg_class, 
machine_mode);
 extern machine_mode loongarch_hard_regno_caller_save_mode (unsigned int,
   unsigned int,
   machine_mode);
-extern int loongarch_adjust_insn_length (rtx_insn *, int);
 extern const char *loongarch_output_conditional_branch (rtx_insn *, rtx *,
const char *,
const char *);
@@ -157,7 +155,6 @@ extern bool loongarch_global_symbol_noweak_p (const_rtx);
 extern bool loongarch_weak_symbol_p (const_rtx);
 extern bool loongarch_symbol_binds_local_p (const_rtx);
 
-extern const char *current_section_name (void);
 extern unsigned int current_section_flags (void);
 extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 extern bool loongarch_check_zero_div_p (void);
@@ -198,8 +195,6 @@ extern bool loongarch_epilogue_uses (unsigned int);
 extern bool loongarch_load_store_bonding_p (rtx *, machine_mode, bool);
 extern bool loongarch_split_symbol_type (enum loongarch_symbol_type);
 
-typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
-
 extern void loongarch_register_frame_header_opt (void);
 extern void loongarch_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void loongarch_expand_vec_cond_mask_expr (machine_mode, machine_mode,
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 70e31bb831c..b25624c9406 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10810,23 +10810,6 @@ loongarch_expand_vec_cmp (rtx operands[])
   return true;
 }
 
-/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
-
-unsigned int
-loongarch_case_values_threshold (void)
-{
-  return default_case_values_threshold ();
-}
-
-/* Implement TARGET_SPILL_CLASS.  */
-
-static reg_class_t
-loongarch_spill_class (reg_class_t rclass ATTRIBUTE_UNUSED,
-  machine_mode mode ATTRIBUTE_UNUSED)
-{
-  return NO_REGS;
-}
-
 /* Implement TARGET_PROMOTE_FUNCTION_MODE.  */
 
 /* This function is equivalent to default_promote_function_mode_always_promote
@@ -11281,9 +11264,6 @@ loongarch_asm_code_end (void)
 #undef TARGET_FUNCTION_ARG_BOUNDARY
 #define TARGET_FUNCTION_ARG_BOUNDARY loongarch_function_arg_boundary
 
-#undef TARGET_OPTAB_SUPPORTED_P
-#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p
-
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P loongarch_vector_mode_supported_p
 
@@ -11353,18 +11333,12 @@ loongarch_asm_code_end (void)
 #undef TARGET_SCHED_REASSOCIATION_WIDTH
 #define TARGET_SCHED_REASSOCIATION_WIDTH loongarch_sched_reassociation_width
 
-#undef TARGET_CASE_VALUES_THRESHOLD
-#define TARGET_CASE_VALUES_THRESHOLD loongarch_case_values_threshold
-
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV loongarch_atomic_assign_expand_fenv
 
 #undef TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS
 #define TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS true
 
-#undef TARGET_SPILL_CLASS
-#define TARGET_SPILL_CLASS loongarch_spill_class
-
 #undef TARGET_HARD_REGNO_NREGS
 #define TARGET_HARD_REGNO_NREGS loongarch_hard_regno_nregs
 #undef TARGET_HARD_REGNO_MODE_OK
-- 
2.39.3



Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 10:46 PM Uros Bizjak  wrote:
>
> On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak  wrote:
> >
> > On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu  wrote:
> > >
> > > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
> > > >
> > > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> > > > >
> > > > > When we split
> > > > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > > > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
> > > > > MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 
> > > > > A32])) "test.C":22:42 84 {*movdi_internal}
> > > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > > >
> > > > > into
> > > > >
> > > > > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > > > > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ 
> > > > > CallNative_nclosure.0_1 ]) [6 MEM[(struct SQRefCounted 
> > > > > *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
> > > > > (const_int 0 [0]))) "test.C":22:42 -1
> > > > > (nil)))
> > > > > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > > > > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 
> > > > > {movv2di_internal}
> > > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > > > (nil)))
> > > > >
> > > > > we must copy the REG_EH_REGION note to the first insn and split the 
> > > > > block
> > > > > after the newly added insn.  The REG_EH_REGION on the second insn 
> > > > > will be
> > > > > removed later since it no longer traps.
> > > > >
> > > > > Currently we only handle memory_operand, are there any other insns
> > > > > need to be handled???
> > > >
> > > > I think memory access is the only thing that can trap.
> > > >
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk 
> > > > > and gcc-13/gcc-12 release branch.
> > > > > Ok for trunk and backport?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/i386/i386-features.cc
> > > > > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > > > > (convert_scalars_to_vector): Ditto.
> > > > > * config/i386/i386-features.h (class scalar_chain): New
> > > > > memeber control_flow_insns.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * g++.target/i386/pr111822.C: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386-features.cc | 48 
> > > > > ++--
> > > > >  gcc/config/i386/i386-features.h  |  1 +
> > > > >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> > > > >  3 files changed, 90 insertions(+), 4 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > > b/gcc/config/i386/i386-features.cc
> > > > > index 1de2a07ed75..2ed27a9ebdd 100644
> > > > > --- a/gcc/config/i386/i386-features.cc
> > > > > +++ b/gcc/config/i386/i386-features.cc
> > > > > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, 
> > > > > rtx_insn *insn)
> > > > >  }
> > > > >else if (MEM_P (*op))
> > > > >  {
> > > > > +  rtx_insn* eh_insn, *movabs = NULL;
> > > > >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> > > > >
> > > > >/* Handle movabs.  */
> > > > >if (!memory_operand (*op, GET_MODE (*op)))
> > > > > {
> > > > >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > > > > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > > >
> > > > > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > > >   *op = tmp2;
> > > > > }
> > > >
> > > > I may be missing something, but isn't the above a dead code? We have
> > > > if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
> > > It's PR91814 #c1, memory_operand will also check invalid memory addresses.
> >
> > Oh, it is even my comment ;)
> >
> > Perhaps the comment should be improved to something like:
> >
> > "Emit MOVABS to load from a 64-bit absolute address to a GPR."
> >
> > LGTM then.
>
> BTW: Do we need to also fix timode_scalar_chain::convert_op ? There we
> also preload operand, so a similar fix should be applied there.
Yes, I'll make another patch. Didn't realize there are 2 of them.
>
> Uros.



-- 
BR,
Hongtao


Re:[pushed] [PATCH v2] LoongArch: Remove masking process for operand 3 of xvpermi.q.

2024-03-14 Thread chenglulu

Pushed to r14-9486.

在 2024/3/14 上午9:26, Chenghui Pan 写道:

The behavior of non-zero unused bits in xvpermi.q instruction's
third operand is undefined on LoongArch, according to our
discussion (https://github.com/llvm/llvm-project/pull/83540),
we think that keeping original insn operand as unmodified
state is better solution.

This patch partially reverts 7b158e036a95b1ab40793dd53bed7dbd770ffdaf.

gcc/ChangeLog:

* config/loongarch/lasx.md (lasx_xvpermi_q_):
Remove masking of operand 3.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c:
Reposition operand 3's value into instruction's defined accept range.
---
  gcc/config/loongarch/lasx.md| 5 -
  .../gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c   | 6 +++---
  2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index ac84db7f0ce..3f25c0c1756 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -640,8 +640,6 @@ (define_insn "lasx_xvpermi_d__1"
 (set_attr "mode" "")])
  
  ;; xvpermi.q

-;; Unused bits in operands[3] need be set to 0 to avoid
-;; causing undefined behavior on LA464.
  (define_insn "lasx_xvpermi_q_"
[(set (match_operand:LASX 0 "register_operand" "=f")
(unspec:LASX
@@ -651,9 +649,6 @@ (define_insn "lasx_xvpermi_q_"
  UNSPEC_LASX_XVPERMI_Q))]
"ISA_HAS_LASX"
  {
-  int mask = 0x33;
-  mask &= INTVAL (operands[3]);
-  operands[3] = GEN_INT (mask);
return "xvpermi.q\t%u0,%u2,%3";
  }
[(set_attr "type" "simd_splat")
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
index dbc29d2fb22..f89dfc31120 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c
@@ -27,7 +27,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x7fff7fff7fff;
*((unsigned long*)& __m256i_result[1]) = 0x7fe37fe3001d001d;
*((unsigned long*)& __m256i_result[0]) = 0x7fff7fff7fff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x2a);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x22);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
*((unsigned long*)& __m256i_op0[3]) = 0x;

@@ -42,7 +42,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x0019001c;
*((unsigned long*)& __m256i_result[1]) = 0x;
*((unsigned long*)& __m256i_result[0]) = 0x01fe;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xb9);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x31);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
*((unsigned long*)& __m256i_op0[3]) = 0x00ff00ff00ff00ff;

@@ -57,7 +57,7 @@ main ()
*((unsigned long*)& __m256i_result[2]) = 0x;
*((unsigned long*)& __m256i_result[1]) = 0x00ff00ff00ff00ff;
*((unsigned long*)& __m256i_result[0]) = 0x00ff00ff00ff00ff;
-  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0xca);
+  __m256i_out = __lasx_xvpermi_q (__m256i_op0, __m256i_op1, 0x02);
ASSERTEQ_64 (__LINE__, __m256i_result, __m256i_out);
  
return 0;




Re: _LIBCXX_DEBUG value initialized singular iterators assert failures in std algorithms [PR104316]

2024-03-14 Thread François Dumont

Hi

This is what I started to do.

For now I haven't touch to __cpp_lib_null_iterators definition as 
_Safe_local_iterator still need some work.


libstdc++: Implement N3644 on _Safe_iterator<> [PR114316]

Consider range of value-initialized iterators as valid and empty.

libstdc++-v3/ChangeLog:

    PR libstdc++/114316
    * include/debug/safe_iterator.tcc 
(_Safe_iterator<>::_M_valid_range):
    First check if both iterators are value-initialized before 
checking if

    singular.
    * testsuite/23_containers/set/debug/114316.cc: New test case.
    * testsuite/23_containers/vector/debug/114316.cc: New test case.

Tested under Linux x86_64, ok to commit ?

François


On 12/03/2024 10:52, Jonathan Wakely wrote:

On Tue, 12 Mar 2024 at 01:03, Jonathan Wakely  wrote:

On Tue, 12 Mar 2024 at 00:55, Maciej Miera  wrote:



Wiadomość napisana przez Jonathan Wakely  w dniu 
11.03.2024, o godz. 21:40:

On Mon, 11 Mar 2024 at 20:07, Maciej Miera  wrote:


Hello,

I have tried to introduce an extra level of safety to my codebase and utilize 
_GLIBCXX_DEBUG in my test builds in order to catch faulty iterators.
However, I have encountered the following problem: I would like to utilize singular, 
value-initialized iterators as an arbitrary "null range”.
However, this leads to failed assertions in std:: algorithms taking such range.

Consider the following code sample with find_if:

#include 
#include 
#include 

#ifndef __cpp_lib_null_iterators
#warning "Not standard compliant"
#endif

int main()
{
std::multimap::iterator it1{};
std::multimap::iterator it2{};

(void) (it1==it2); // OK
(void) std::find_if(
it1, it2, [](const auto& el) { return el.second == 8;});
}

Compiled with -std=c++20 and -D_GLIBCXX_DEBUG it produces the warning "Not standard 
compliant"
and the execution results in the following assert failure:

/opt/compiler-explorer/gcc-12.2.0/include/c++/12.2.0/bits/stl_algo.h:3875:
In function:
constexpr _IIter std::find_if(_IIter, _IIter, _Predicate) [with _IIter =
gnu_debug::_Safe_iterator<_Rb_tree_iterator >,
debug::multimap, bidirectional_iterator_tag>; _Predicate =
main()::]

The question is though: is it by design, or is it just a mere oversight? The 
warning actually suggest the first option.
If it is an intentional design choice, could you provide some rationale behind 
it, please?


The macro was not defined because the C++14 rule wasn't implemented
for debug mode, but that should have been fixed for GCC 11, according
to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98466 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70303
So we should be able to define macro now, except maybe it wasn't fixed
for the RB tree containers.



Just to make sure there are no misunderstandings: comparison via == works fine. The 
feature check macro without _GLIBCXX_DEBUG and with  included is also 
fine. Maybe the need to include a header is the issue, but that’s not the core of the 
problem anyway.

No, it has nothing to do with the headers included. The feature test
macro is defined like so:

# if (__cplusplus >= 201402L) && (!defined(_GLIBCXX_DEBUG))
#  define __glibcxx_null_iterators 201304L

It's a very deliberate choice to not define it when _GLIBCXX_DEBUG is
defined. But as I said, I think we should have changed that.


The actual question is though, whether passing singular iterators to std 
algorithms (like find_if) should make the asserts at the beginning of the algo 
function fail when compiled with _GLIBCXX_DEBUG. IMHO, intuitively it should 
not, as comparing iterators equal would just ensure early exit and return of 
the same singular iterator.
This seems not to be the case though. The actual message is this:
Error: the function requires a valid iterator range [first, last).
What bothers me is whether the empty virtual range limited by two same singular 
iterators is actually valid or not.

Yes, it's valid. So the bug is in the __glibcxx_requires_valid_range macro.

Thanks for the bugzilla report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114316
We'll get it fixed!



[PATCH v3] c++: ICE with temporary of class type in array DMI [PR109966]

2024-03-14 Thread Marek Polacek
On Tue, Mar 12, 2024 at 06:26:14PM -0400, Jason Merrill wrote:
> On 3/12/24 11:56, Marek Polacek wrote:
> > On Tue, Mar 12, 2024 at 09:57:14AM -0400, Jason Merrill wrote:
> > > On 3/11/24 19:27, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?
> > > > 
> > > > -- >8 --
> > > > This ICE started with the fairly complicated r13-765.  We crash in
> > > > gimplify_var_or_parm_decl because a stray VAR_DECL leaked there.
> > > > The problem is ultimately that potential_prvalue_result_of wasn't
> > > > correctly handling arrays and replace_placeholders_for_class_temp_r
> > > > replaced a PLACEHOLDER_EXPR in a TARGET_EXPR which is used in the
> > > > context of copy elision.  If I have
> > > > 
> > > > M m[2] = { M{""}, M{""} };
> > > > 
> > > > then we don't invoke the M(const M&) copy-ctor.  I think the fix is
> > > > to detect such a case in potential_prvalue_result_of.
> > > > 
> > > > PR c++/109966
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * typeck2.cc (potential_prvalue_result_of): Add walk_subtrees
> > > > parameter.  Handle initializing an array from a
> > > > brace-enclosed-initializer.
> > > > (replace_placeholders_for_class_temp_r): Pass walk_subtrees 
> > > > down to
> > > > potential_prvalue_result_of.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp1y/nsdmi-aggr20.C: New test.
> > > > * g++.dg/cpp1y/nsdmi-aggr21.C: New test.
> > > > ---
> > > >gcc/cp/typeck2.cc | 27 ---
> > > >gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr20.C | 17 +++
> > > >gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr21.C | 59 
> > > > +++
> > > >3 files changed, 96 insertions(+), 7 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr20.C
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr21.C
> > > > 
> > > > diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
> > > > index 31198b2f9f5..8b99ce78e9a 100644
> > > > --- a/gcc/cp/typeck2.cc
> > > > +++ b/gcc/cp/typeck2.cc
> > > > @@ -1406,46 +1406,59 @@ digest_init_flags (tree type, tree init, int 
> > > > flags, tsubst_flags_t complain)
> > > > A a = (A{});  // initializer
> > > > A a = (1, A{});   // initializer
> > > > A a = true ? A{} : A{};  // initializer
> > > > + A arr[1] = { A{} };  // initializer
> > > > auto x = A{}.x;   // temporary materialization
> > > > auto x = foo(A{});// temporary materialization
> > > >   FULL_EXPR is the whole expression, SUBOB is its TARGET_EXPR 
> > > > subobject.  */
> > > >static bool
> > > > -potential_prvalue_result_of (tree subob, tree full_expr)
> > > > +potential_prvalue_result_of (tree subob, tree full_expr, int 
> > > > *walk_subtrees)
> > > >{
> > > > +#define RECUR(t) potential_prvalue_result_of (subob, t, walk_subtrees)
> > > >  if (subob == full_expr)
> > > >return true;
> > > >  else if (TREE_CODE (full_expr) == TARGET_EXPR)
> > > >{
> > > >  tree init = TARGET_EXPR_INITIAL (full_expr);
> > > >  if (TREE_CODE (init) == COND_EXPR)
> > > > -   return (potential_prvalue_result_of (subob, TREE_OPERAND (init, 
> > > > 1))
> > > > -   || potential_prvalue_result_of (subob, TREE_OPERAND 
> > > > (init, 2)));
> > > > +   return (RECUR (TREE_OPERAND (init, 1))
> > > > +   || RECUR (TREE_OPERAND (init, 2)));
> > > >  else if (TREE_CODE (init) == COMPOUND_EXPR)
> > > > -   return potential_prvalue_result_of (subob, TREE_OPERAND (init, 
> > > > 1));
> > > > +   return RECUR (TREE_OPERAND (init, 1));
> > > >  /* ??? I don't know if this can be hit.  */
> > > >  else if (TREE_CODE (init) == PAREN_EXPR)
> > > > {
> > > >   gcc_checking_assert (false);
> > > > - return potential_prvalue_result_of (subob, TREE_OPERAND 
> > > > (init, 0));
> > > > + return RECUR (TREE_OPERAND (init, 0));
> > > > }
> > > >}
> > > > +  /* The array case listed above.  */
> > > > +  else if (TREE_CODE (full_expr) == CONSTRUCTOR
> > > > +  && TREE_CODE (TREE_TYPE (full_expr)) == ARRAY_TYPE)
> > > > +for (constructor_elt : CONSTRUCTOR_ELTS (full_expr))
> > > > +  if (e.value == subob)
> > > > +   {
> > > > + *walk_subtrees = 0;
> > > 
> > > Why clear walk_subtrees?  Won't that mean we fail to replace any
> > > placeholders nested within an array element initializer?
> > 
> > Right.  I couldn't find a testcase where that would cause a problem
> > but I think I just wasn't inventive enough.
> > 
> > Originally, I was checking same_type_ignoring_top_level_qualifiers_p
> > but that's not going to work for code like
> > 
> >struct N { N(M); };
> >N arr[2] = { M{""}, M{""} };
> > 
> > or with operator M().  But I suppose I could just use can_convert
> > like below.  

Re: [PATCH RFA] tree-core: clarify clobber comments

2024-03-14 Thread Jakub Jelinek
On Thu, Mar 14, 2024 at 04:27:22PM -0400, Jason Merrill wrote:
> OK for trunk?
> 
> -- 8< --
> 
> It came up on the mailing list that OBJECT_BEGIN/END are described as
> marking object lifetime, but mark the beginning of the constructor and end
> of the destructor, whereas the C++ notion of lifetime is between the end of
> the constructor and beginning of the destructor.  So let's fix the comments.
> 
> gcc/ChangeLog:
> 
>   * tree-core.h (enum clobber_kind): Clarify CLOBBER_OBJECT_*
>   comments.

LGTM.


> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 8a89462bd7e..654d182b1c3 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -993,9 +993,11 @@ enum clobber_kind {
>CLOBBER_UNDEF,
>/* Beginning of storage duration, e.g. malloc.  */
>CLOBBER_STORAGE_BEGIN,
> -  /* Beginning of object lifetime, e.g. C++ constructor.  */
> +  /* Beginning of object data, e.g. start of C++ constructor.  This differs
> + from C++ 'lifetime', which starts when initialization is complete; a
> + clobber there would discard the initialization.  */
>CLOBBER_OBJECT_BEGIN,
> -  /* End of object lifetime, e.g. C++ destructor.  */
> +  /* End of object data, e.g. end of C++ destructor.  */
>CLOBBER_OBJECT_END,
>/* End of storage duration, e.g. free.  */
>CLOBBER_STORAGE_END,
> 
> base-commit: 5c01ede02a1f9ba1a58ab8d96a73e46e0484d820
> -- 
> 2.43.2

Jakub



[PATCH RFA] tree-core: clarify clobber comments

2024-03-14 Thread Jason Merrill
OK for trunk?

-- 8< --

It came up on the mailing list that OBJECT_BEGIN/END are described as
marking object lifetime, but mark the beginning of the constructor and end
of the destructor, whereas the C++ notion of lifetime is between the end of
the constructor and beginning of the destructor.  So let's fix the comments.

gcc/ChangeLog:

* tree-core.h (enum clobber_kind): Clarify CLOBBER_OBJECT_*
comments.
---
 gcc/tree-core.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 8a89462bd7e..654d182b1c3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -993,9 +993,11 @@ enum clobber_kind {
   CLOBBER_UNDEF,
   /* Beginning of storage duration, e.g. malloc.  */
   CLOBBER_STORAGE_BEGIN,
-  /* Beginning of object lifetime, e.g. C++ constructor.  */
+  /* Beginning of object data, e.g. start of C++ constructor.  This differs
+ from C++ 'lifetime', which starts when initialization is complete; a
+ clobber there would discard the initialization.  */
   CLOBBER_OBJECT_BEGIN,
-  /* End of object lifetime, e.g. C++ destructor.  */
+  /* End of object data, e.g. end of C++ destructor.  */
   CLOBBER_OBJECT_END,
   /* End of storage duration, e.g. free.  */
   CLOBBER_STORAGE_END,

base-commit: 5c01ede02a1f9ba1a58ab8d96a73e46e0484d820
-- 
2.43.2



Re: [PATCH] c++: explicit inst of template method not generated [PR110323]

2024-03-14 Thread Jason Merrill

On 3/8/24 12:02, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Consider

   constexpr int VAL = 1;
   struct foo {
   template 
   void bar(typename std::conditional::type arg) { }
   };
   template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

   using P = struct {}*;
   template 
   void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.


Hmm, how is that different from the B == VAL case?  In both cases we're 
naming an internal entity that gets folded away.


I guess the difference is that B == VAL falls under the special 
allowance in https://eel.is/c++draft/basic.def.odr#14.5.1 because it's a 
constant used as a prvalue, and therefore is not odr-used under 
https://eel.is/c++draft/basic.def.odr#5.2


So I would limit this change to decl_constant_var_p.  Really we should 
also be checking that the lvalue-rvalue conversion is applied, but 
that's more complicated.


Jason



[committed] hppa: Fix REG+D address support before reload

2024-03-14 Thread John David Anglin
Tested on hppa-unknown-linux-gnu.  Committed to trunk.

Dave
---

hppa: Fix REG+D address support before reload

When generating PA 1.x code or code for GNU ld, floating-point
accesses only support 5-bit displacements but integer accesses
support 14-bit displacements.  I mistakenly assumed reload
could fix an invalid 14-bit displacement in a floating-point
access but this is not the case.

2024-03-14  John David Anglin  

gcc/ChangeLog:

PR target/114288
* config/pa/pa.cc (pa_legitimate_address_p): Don't allow
14-bit displacements before reload for modes that may use
a floating-point load or store.

diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index 694123e37c9..129289f8e62 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -10968,20 +10968,15 @@ pa_legitimate_address_p (machine_mode mode, rtx x, 
bool strict, code_helper)
 
  /* Long 14-bit displacements always okay for these cases.  */
  if (INT14_OK_STRICT
+ || reload_completed
  || mode == QImode
  || mode == HImode)
return true;
 
- /* A secondary reload may be needed to adjust the displacement
-of floating-point accesses when STRICT is nonzero.  */
- if (strict)
-   return false;
-
- /* We get significantly better code if we allow long displacements
-before reload for all accesses.  Instructions must satisfy their
-constraints after reload, so we must have an integer access.
-Return true for both cases.  */
- return true;
+ /* We have to limit displacements to those supported by
+both floating-point and integer accesses as reload can't
+fix invalid displacements.  See PR114288.  */
+ return false;
}
 
   if (!TARGET_DISABLE_INDEXING


signature.asc
Description: PGP signature


Re: [PATCH] bpf: define INT8_TYPE as signed char

2024-03-14 Thread David Faust



On 3/14/24 10:16, Jose E. Marchesi wrote:
> 
> Hi David.
> 
>> Change the BPF backend to define INT8_TYPE with an explicit sign, rather
>> than a plain char.  This is in line with other targets and removes the
>> risk of int8_t being affected by the signedness of the plain char type
>> of the host system.
> 
> OK.
> 
> I would add to the commit message that the motivation for this change is
> that even if `char' is defined to be signed in BPF targets, some BPF
> programs use the (mal)practice of including internal libc headers
> indirectly via kernel headers and that may trigger compilation errors
> regarding redefinitions of types.

Thanks, added this to the commit message and pushed.

> 
> Thanks for the patch!
> 
>>
>> Tested on x86_64-linux-gnu host for bpf-unknown-none.
>> Sanity checked compiling Linux kernel BPF selftests.
>>
>> gcc/
>>
>>  * config/bpf/bpf.h (INT8_TYPE): Change to signed char.
>> ---
>>  gcc/config/bpf/bpf.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
>> index f107a5a4c34..3cc5daa1b58 100644
>> --- a/gcc/config/bpf/bpf.h
>> +++ b/gcc/config/bpf/bpf.h
>> @@ -99,7 +99,7 @@
>>  
>>  #define SIG_ATOMIC_TYPE "char"
>>  
>> -#define INT8_TYPE "char"
>> +#define INT8_TYPE "signed char"
>>  #define INT16_TYPE "short int"
>>  #define INT32_TYPE "int"
>>  #define INT64_TYPE "long int"


[PATCH] gcc: xtensa: reorder movsi_internal patterns for better code generation during LRA

2024-03-14 Thread Max Filippov
After switching to LRA xtensa backend generates the following code for
saving/loading registers:

movi a9, 0x190
add  a9, a9, sp
s32i.n   a3, a9, 0

instead of the shorter and more efficient

s32i a3, a9, 0x190

E.g. the following code can be used to reproduce it:

int f1(int a, int b, int c, int d, int e, int f, int *p);
int f2(int a, int b, int c, int d, int e, int f, int *p);
int f3(int a, int b, int c, int d, int e, int f, int *p);

int foo(int a, int b, int c, int d, int e, int f)
{
int g[100];
return
f1(a, b, c, d, e, f, g) +
f2(a, b, c, d, e, f, g) +
f3(a, b, c, d, e, f, g);
}

This happens in the LRA pass because s32i.n and l32i.n are listed before
the s32i and l32i in the movsi_internal pattern and alternative
consideration loop stops early.

gcc/

* config/xtensa/xtensa.md (movsi_internal): Move l32i and s32i
patterns ahead of the l32i.n and s32i.n.
---
 gcc/config/xtensa/xtensa.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 1a2249b059a0..5cdf4dffe700 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1270,13 +1270,15 @@
 })
 
 (define_insn "movsi_internal"
-  [(set (match_operand:SI 0 "nonimmed_operand" 
"=D,D,D,D,R,R,a,q,a,a,W,a,a,U,*a,*A")
-   (match_operand:SI 1 "move_operand" 
"M,D,d,R,D,d,r,r,I,Y,i,T,U,r,*A,*r"))]
+  [(set (match_operand:SI 0 "nonimmed_operand" 
"=D,D,D,a,U,D,R,R,a,q,a,a,W,a,*a,*A")
+   (match_operand:SI 1 "move_operand" 
"M,D,d,U,r,R,D,d,r,r,I,Y,i,T,*A,*r"))]
   "xtensa_valid_move (SImode, operands)"
   "@
movi.n\t%0, %x1
mov.n\t%0, %1
mov.n\t%0, %1
+   %v1l32i\t%0, %1
+   %v0s32i\t%1, %0
%v1l32i.n\t%0, %1
%v0s32i.n\t%1, %0
%v0s32i.n\t%1, %0
@@ -1286,13 +1288,11 @@
movi\t%0, %1
const16\t%0, %t1\;const16\t%0, %b1
%v1l32r\t%0, %1
-   %v1l32i\t%0, %1
-   %v0s32i\t%1, %0
rsr\t%0, ACCLO
wsr\t%1, ACCLO"
-  [(set_attr "type"
"move,move,move,load,store,store,move,move,move,move,move,load,load,store,rsr,wsr")
+  [(set_attr "type"
"move,move,move,load,store,load,store,store,move,move,move,move,move,load,rsr,wsr")
(set_attr "mode""SI")
-   (set_attr "length"  "2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
+   (set_attr "length"  "2,2,2,3,3,2,2,2,3,3,3,3,6,3,3,3")])
 
 (define_split
   [(set (match_operand:SHI 0 "register_operand")
-- 
2.39.2



Re: [PATCH] bpf: define INT8_TYPE as signed char

2024-03-14 Thread Jose E. Marchesi


Hi David.

> Change the BPF backend to define INT8_TYPE with an explicit sign, rather
> than a plain char.  This is in line with other targets and removes the
> risk of int8_t being affected by the signedness of the plain char type
> of the host system.

OK.

I would add to the commit message that the motivation for this change is
that even if `char' is defined to be signed in BPF targets, some BPF
programs use the (mal)practice of including internal libc headers
indirectly via kernel headers and that may trigger compilation errors
regarding redefinitions of types.

Thanks for the patch!

>
> Tested on x86_64-linux-gnu host for bpf-unknown-none.
> Sanity checked compiling Linux kernel BPF selftests.
>
> gcc/
>
>   * config/bpf/bpf.h (INT8_TYPE): Change to signed char.
> ---
>  gcc/config/bpf/bpf.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index f107a5a4c34..3cc5daa1b58 100644
> --- a/gcc/config/bpf/bpf.h
> +++ b/gcc/config/bpf/bpf.h
> @@ -99,7 +99,7 @@
>  
>  #define SIG_ATOMIC_TYPE "char"
>  
> -#define INT8_TYPE "char"
> +#define INT8_TYPE "signed char"
>  #define INT16_TYPE "short int"
>  #define INT32_TYPE "int"
>  #define INT64_TYPE "long int"


[committed] libstdc++: Fix std::format("{}", negative_integer) [PR114325]

2024-03-14 Thread Jonathan Wakely
Tested aarch64-linux. Pushed to trunk.

-- >8 --

The fast path for "{}" format strings has a bug for negative integers
where the length passed to std::to_chars is too long.

libstdc++-v3/ChangeLog:

PR libstdc++/114325
* include/std/format (_Scanner::_M_scan): Pass correct length to
__to_chars_10_impl.
* testsuite/std/format/functions/format.cc: Check negative
integers with empty format-spec.
---
 libstdc++-v3/include/std/format   | 7 ---
 libstdc++-v3/testsuite/std/format/functions/format.cc | 5 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 1e839e88db4..613016d1a10 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -4091,6 +4091,7 @@ namespace __format
__sink_out = __sink.out();
 
   if constexpr (is_same_v<_CharT, char>)
+   // Fast path for "{}" format strings and simple format arg types.
if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
  {
bool __done = false;
@@ -4124,14 +4125,14 @@ namespace __format
__uval = make_unsigned_t<_Tp>(~__arg) + 1u;
  else
__uval = __arg;
- const auto __n = __detail::__to_chars_len(__uval) + __neg;
- if (auto __res = __sink_out._M_reserve(__n))
+ const auto __n = __detail::__to_chars_len(__uval);
+ if (auto __res = __sink_out._M_reserve(__n + __neg))
{
  auto __ptr = __res.get();
  *__ptr = '-';
  __detail::__to_chars_10_impl(__ptr + (int)__neg, __n,
   __uval);
- __res._M_bump(__n);
+ __res._M_bump(__n + __neg);
  __done = true;
}
}
diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index a27fbe74631..4499397aaf9 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -157,6 +157,11 @@ test_std_examples()
 
 // Restore
 std::locale::global(std::locale::classic());
+
+string s5 = format("{}", -100); // PR libstdc++/114325
+VERIFY(s5 == "-100");
+string s6 = format("{:d} {:d}", -123, 999);
+VERIFY(s6 == "-123 999");
   }
 }
 
-- 
2.44.0



[committed] libstdc++: Add nodiscard in

2024-03-14 Thread Jonathan Wakely
Tested aarch64-linux and x86_64-linux. Pushed to trunk.

I forgot to update the commit log to remove the speculation, because
Stephan Lavavej confirmed that MSVC doesn't mark those functions
nodiscard because it would result in too many false positives. Although
it might find some real bugs, it would also warn about a lot of
perfectly correct code.

-- >8 --

Add the [[nodiscard]] attribute to several functions in .
These all have no side effects and are only called for their return
value (e.g. std::count) or produce a result that must not be discarded
for correctness (e.g. std::remove).

I was intending to add the attribute to a number of other functions like
std::copy_if, std::unique_copy, std::set_union, and std::set_difference.
I stopped when I noticed that MSVC doesn't use it on those functions,
which I suspect is because they're often used with an insert iterator
(e.g. std::back_insert_iterator). In that case it doesn't matter if
you discard the result, because you have the container to tell you how
many elements were copied to the output range.

libstdc++-v3/ChangeLog:

* include/bits/stl_algo.h (find_end, all_of, none_of, any_of)
(find_if_not, is_partitioned, partition_point, remove)
(remove_if, unique, lower_bound, upper_bound, equal_range)
(binary_search, includes, is_sorted, is_sorted_until, minmax)
(minmax_element, is_permutation, clamp, find_if, find_first_of)
(adjacent_find, count, count_if, search, search_n, min_element)
(max_element): Add nodiscard attribute.
* include/bits/stl_algobase.h (min, max, lower_bound, equal)
(lexicographical_compare, lexicographical_compare_three_way)
(mismatch): Likewise.
* include/bits/stl_heap.h (is_heap, is_heap_until): Likewise.
* testsuite/25_algorithms/equal/debug/1_neg.cc: Add dg-warning.
* testsuite/25_algorithms/equal/debug/2_neg.cc: Likewise.
* testsuite/25_algorithms/equal/debug/3_neg.cc: Likewise.
* testsuite/25_algorithms/find_first_of/concept_check_1.cc:
Likewise.
* testsuite/25_algorithms/is_permutation/2.cc: Likewise.
* testsuite/25_algorithms/lexicographical_compare/71545.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/33613.cc: Likewise.
* testsuite/25_algorithms/lower_bound/debug/irreflexive.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/lower_bound/debug/partitioned_pred_neg.cc:
Likewise.
* testsuite/25_algorithms/minmax/3.cc: Likewise.
* testsuite/25_algorithms/search/78346.cc: Likewise.
* testsuite/25_algorithms/search_n/58358.cc: Likewise.
* testsuite/25_algorithms/unique/1.cc: Likewise.
* testsuite/25_algorithms/unique/11480.cc: Likewise.
* testsuite/25_algorithms/upper_bound/33613.cc: Likewise.
* testsuite/25_algorithms/upper_bound/debug/partitioned_neg.cc:
Likewise.
* testsuite/25_algorithms/upper_bound/debug/partitioned_pred_neg.cc:
Likewise.
* testsuite/ext/concept_checks.cc: Likewise.
* testsuite/ext/is_heap/47709.cc: Likewise.
* testsuite/ext/is_sorted/cxx0x.cc: Likewise.
---
 libstdc++-v3/include/bits/stl_algo.h  | 102 +-
 libstdc++-v3/include/bits/stl_algobase.h  |  32 +++---
 libstdc++-v3/include/bits/stl_heap.h  |   8 +-
 .../25_algorithms/equal/debug/1_neg.cc|   1 +
 .../25_algorithms/equal/debug/2_neg.cc|   1 +
 .../25_algorithms/equal/debug/3_neg.cc|   1 +
 .../find_first_of/concept_check_1.cc  |   1 +
 .../25_algorithms/is_permutation/2.cc |   1 +
 .../lexicographical_compare/71545.cc  |   1 +
 .../25_algorithms/lower_bound/33613.cc|   1 +
 .../lower_bound/debug/irreflexive.cc  |   1 +
 .../lower_bound/debug/partitioned_neg.cc  |   1 +
 .../lower_bound/debug/partitioned_pred_neg.cc |   1 +
 .../testsuite/25_algorithms/minmax/3.cc   |   1 +
 .../testsuite/25_algorithms/search/78346.cc   |   1 +
 .../testsuite/25_algorithms/search_n/58358.cc |   1 +
 .../testsuite/25_algorithms/unique/1.cc   |   1 +
 .../testsuite/25_algorithms/unique/11480.cc   |   2 +-
 .../25_algorithms/upper_bound/33613.cc|   1 +
 .../upper_bound/debug/partitioned_neg.cc  |   1 +
 .../upper_bound/debug/partitioned_pred_neg.cc |   1 +
 libstdc++-v3/testsuite/ext/concept_checks.cc  |   4 +
 libstdc++-v3/testsuite/ext/is_heap/47709.cc   |   1 +
 libstdc++-v3/testsuite/ext/is_sorted/cxx0x.cc |   1 +
 24 files changed, 95 insertions(+), 72 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 7a0cf6b6737..1a996aa61da 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -320,7 +320,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  [__first1,__last1-(__last2-__first2))
   

Re: OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing

2024-03-14 Thread Tobias Burnus

Hi all, hi Thomas & Chung-Lin,

Thomas Schwinge wrote:

But I realized another thing: don't we have to handle the 'readonly'
modifier also in Fortran module files, that is, next to the OpenACC
'declare' 'copyin' handling in 'gcc/fortran/module.cc':
'AB_OACC_DECLARE_COPYIN' etc.?


I bet so; it is not as bad as with the others as it is "only" an 
optimization hint, but it makes sense to make it available.


Note that when you place the 'module' in the same file as the module 
users ('use'), the compiler might know things because they are in the 
same translation unit / file not because it is in the module ...



  Chung-Lin, please check, via test cases.
'gfortran.dg/goacc/routine-module*', for example, should provide some
guidance of how to achieve actual module file use, and then do the same
'scan-tree-dump' as in the current 'readonly' modifier test cases.

...

By means of only emitting a tag
in the module file if the 'readonly' modifier is specified, we should
maintain compatibility with the current 'MOD_VERSION'.


That was the idea: If only new information gets added (if used), older 
compilers still work. This has huge limitations and does not work as 
well as imagined but here it should work: Older .mod will work with new 
compilers, even though the reverse might not be true.


Tobias


[committed] gcn: Fix a comment typo

2024-03-14 Thread Jakub Jelinek
Hi!

I've noticed a typo in the comment above ABI_VERSION_SPEC.

Fixed thusly, committed to trunk as obvious.

2024-03-14  Jakub Jelinek  

* config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Fix comment typo.

--- gcc/config/gcn/gcn-hsa.h.jj 2024-01-27 13:03:55.589073484 +0100
+++ gcc/config/gcn/gcn-hsa.h2024-03-14 17:03:48.593594055 +0100
@@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 i no longer
+   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 "!march=*|march=*:--amdhsa-code-object-version=4"

Jakub



Re: OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing

2024-03-14 Thread Tobias Burnus

Hi all, hi Thomas & Chung-Lin,

Thomas Schwinge wrote:

But I realized another thing: don't we have to handle the 'readonly'
modifier also in Fortran module files, that is, next to the OpenACC
'declare' 'copyin' handling in 'gcc/fortran/module.cc':
'AB_OACC_DECLARE_COPYIN' etc.?


I bet so; it is not as bad as with the others as it is "only" an
optimization hint, but it makes sense to make it available.

Note that when you place the 'module' in the same file as the module
users ('use'), the compiler might know things because they are in the
same translation unit / file not because it is in the module ...


  Chung-Lin, please check, via test cases.
'gfortran.dg/goacc/routine-module*', for example, should provide some
guidance of how to achieve actual module file use, and then do the same
'scan-tree-dump' as in the current 'readonly' modifier test cases.

...

By means of only emitting a tag
in the module file if the 'readonly' modifier is specified, we should
maintain compatibility with the current 'MOD_VERSION'.


That was the idea: If only new information gets added (if used), older
compilers still work. This has huge limitations and does not work as
well as imagined but here it should work: Older .mod will work with new
compilers, even though the reverse might not be true.

Tobias


Re: [PATCH] Fix PR ipa/113996

2024-03-14 Thread Jan Hubicka
> > Patch is still OK, but ipa-ICF will only identify the functions if
> > static chain is unused. Perhaps just picking the winning candidate to be
> > version without static chain and making ipa-inline to not ICE when calls
> > with static chain lands to function with no static chain would help us
> > to optimize better.
> 
> I see, thanks for the explanation.  The attached patch appears to work.
> 
> 
>   PR ipa/113996
>   * ipa-icf.h (sem_function): Add static_chain_p member.
>   * ipa-icf.cc (sem_function::init): Initialize it.
>   (sem_item_optimizer::merge_classes): If the class is made of
>   functions, pick one without static chain as the target.
> 
> -- 
> Eric Botcazou


> @@ -3399,11 +3401,22 @@ sem_item_optimizer::merge_classes (unsigned int 
> prev_class_count,
>  
>   sem_item *source = c->members[0];
>  
> - if (DECL_NAME (source->decl)
> - && MAIN_NAME_P (DECL_NAME (source->decl)))
> -   /* If merge via wrappers, picking main as the target can be
> -  problematic.  */
> -   source = c->members[1];
> + if (source->type == FUNC)
> +   {
> + /* Pick a member without static chain, if any.  */
> + for (unsigned int j = 0; j < c->members.length (); j++)
> +   if (!static_cast (c->members[j])->static_chain_p)
> + {
> +   source = c->members[j];
> +   break;
> + }
> +
> + /* If merge via wrappers, picking main as the target can be
> +problematic.  */
> + if (DECL_NAME (source->decl)
> + && MAIN_NAME_P (DECL_NAME (source->decl)))
> +   source = c->members[source == c->members[0] ? 1 : 0];

Thanks for looking into this.
I wonder if it can happen that we ICF merge function with static chain
and main?
We can fix this unlikely case by simply considering functions with
static chain not equivalent to MAIN_NAME_P function.

On x86_64 static chain goes to R10, but I wonder if we support targets
where API of function taking static chain and not using it differs from
API of function with no static chain?

Honza


Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-14 Thread Jan Hubicka
> > Otherwise
> > I will add your testcase for this patch and commit this one.
> > Statistically we almost never merge functions with different value
> > ranges (three in testsuite, 0 during bootstrap, 1 during LTO bootstrap
> > and probably few in LLVM build - there are 15 cases reported, but some
> > are false positives caused by hash function conflicts).
> 
> It is mostly the fnsplit functions, sure, because we don't really drop
> the __builtin_unreachable checks before IPA and so the if (cond)
> __builtin_unreachable (); style assertions or [[assume (cond)]]; still
> result in ICF failures.

Actually on llvm they are not split functions, but functions where value
ranges were determined by early VRP based on code optimized out by time
we reach ipa-icf (the equal thingy from libstdc++ I wrote about in
previous email)

Honza


Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-14 Thread Jakub Jelinek
On Thu, Mar 14, 2024 at 05:16:59PM +0100, Jan Hubicka wrote:
> Sorry, this was bit of a misunderstanding: I tought you still considered
> the original patch to be full fix, while I tought I should look into it
> more and dig out more issues.  This is bit of can of worms.  Overall I
> think the plan is:
> 
> This stage4
> 1) fix VR divergences by either comparing or droping them
> 2) fix jump function differences by comparing them
>(I just constructed a testcase showing that jump functions can
> diverge for other reasons than just VR, so this is deeper problem,
> too)
> 3) try to construct aditional wrong code testcases (finite_p
>divergences, trapping)
> Next stage1
> 4) implement VR and PTR info streaming
> 5) make ICF to compare VRs and punt otherwise
> 6) implement optimize_size feature to ICF that will not punt on
>diverging TBAA or value ranges and do merging instead.
>We need some extra infrastructure for that, being able to keep the
>maps between basic blocks and SSA names from comparsion stage to
>merging stage
> 7) measure how effective ICF becomes in optimize_size mode and implement
>heuristics on how much metadata merging we want to do with -O2/-O3 as
>well.
>Based on older data Martin collected for his thesis, ICF used to be
>must more effective on libreoffice then it is today, so hopefully we
>can recover 10-15% binary size improvmeents here.
> 8) General ICF TLC.  There are many half finished things for a while in
>this pass (such as merging with different BB or stmt orders).

Agreed.

> I am attaching the compare patch which also fixes the original wrong
> code. If you preffer your version, just go ahead to commit it.

Seems both patches are the same size (at least when looking at number of
added lines).  I think I prefer my patch because it will make the LTO and
non-LTO cases more similar which IMHO helps maintainance of the release
branches.  So, e.g. for all the other wrong-code issues
like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907#c54
or https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907#c58 one can actually
trigger them with both non-LTO and LTO, rather than only with LTO and for
non-LTO not trigger because there are SSA_NAME_RANGE_INFO differences that
prevent ICF.
If we start streaming SSA_NAME_RANGE_INFO in GCC 15, that changes things
and then we can decide what to do for differences, 5) or 6) or combinations
thereof (e.g. consider not just if the ranges are different, but how much
too or if one is a subset or superset of the other to decide between punting
and unioning for non-Os).

> Otherwise
> I will add your testcase for this patch and commit this one.
> Statistically we almost never merge functions with different value
> ranges (three in testsuite, 0 during bootstrap, 1 during LTO bootstrap
> and probably few in LLVM build - there are 15 cases reported, but some
> are false positives caused by hash function conflicts).

It is mostly the fnsplit functions, sure, because we don't really drop
the __builtin_unreachable checks before IPA and so the if (cond)
__builtin_unreachable (); style assertions or [[assume (cond)]]; still
result in ICF failures.

Jakub



[PATCH] bpf: define INT8_TYPE as signed char

2024-03-14 Thread David Faust
Change the BPF backend to define INT8_TYPE with an explicit sign, rather
than a plain char.  This is in line with other targets and removes the
risk of int8_t being affected by the signedness of the plain char type
of the host system.

Tested on x86_64-linux-gnu host for bpf-unknown-none.
Sanity checked compiling Linux kernel BPF selftests.

gcc/

* config/bpf/bpf.h (INT8_TYPE): Change to signed char.
---
 gcc/config/bpf/bpf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index f107a5a4c34..3cc5daa1b58 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -99,7 +99,7 @@
 
 #define SIG_ATOMIC_TYPE "char"
 
-#define INT8_TYPE "char"
+#define INT8_TYPE "signed char"
 #define INT16_TYPE "short int"
 #define INT32_TYPE "int"
 #define INT64_TYPE "long int"
-- 
2.43.0



Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-14 Thread Jan Hubicka
> > We have wrong code with LTO, too.
> 
> I know.
> 
> > The problem is that IPA passes (and
> > not only that, loop analysis too) does analysis at compile time (with
> > value numbers in) and streams the info separately.
> 
> And that is desirable, because otherwise it simply couldn't derive any
> ranges.
> 
> >  Removal of value ranges
> > (either by LTO or by your patch) happens between computing these
> > summaries and using them, so this can be used to trigger wrong code,
> > sadly.
> 
> Yes.  But with LTO, I don't see how the IPA ICF comparison whether
> two functions are the same or not could be done with the
> SSA_NAME_{RANGE,PTR}_INFO in, otherwise it could only ICF merge functions
> from the same TUs.  So the comparison IMHO (and the assert checks in my
> patch prove that) is done when the SSA_NAME_{RANGE,PTR}_INFO aren't in
> anymore.  So, one just needs to compare and punt or union whatever
> is or could be influenced in the IPA streamed data from the ranges etc.
> And because one has to do it for LTO, doing it for non-LTO should be
> sufficient too.

Sorry, this was bit of a misunderstanding: I tought you still considered
the original patch to be full fix, while I tought I should look into it
more and dig out more issues.  This is bit of can of worms.  Overall I
think the plan is:

This stage4
1) fix VR divergences by either comparing or droping them
2) fix jump function differences by comparing them
   (I just constructed a testcase showing that jump functions can
diverge for other reasons than just VR, so this is deeper problem,
too)
3) try to construct aditional wrong code testcases (finite_p
   divergences, trapping)
Next stage1
4) implement VR and PTR info streaming
5) make ICF to compare VRs and punt otherwise
6) implement optimize_size feature to ICF that will not punt on
   diverging TBAA or value ranges and do merging instead.
   We need some extra infrastructure for that, being able to keep the
   maps between basic blocks and SSA names from comparsion stage to
   merging stage
7) measure how effective ICF becomes in optimize_size mode and implement
   heuristics on how much metadata merging we want to do with -O2/-O3 as
   well.
   Based on older data Martin collected for his thesis, ICF used to be
   must more effective on libreoffice then it is today, so hopefully we
   can recover 10-15% binary size improvmeents here.
8) General ICF TLC.  There are many half finished things for a while in
   this pass (such as merging with different BB or stmt orders).

I am attaching the compare patch which also fixes the original wrong
code. If you preffer your version, just go ahead to commit it. Otherwise
I will add your testcase for this patch and commit this one.
Statistically we almost never merge functions with different value
ranges (three in testsuite, 0 during bootstrap, 1 during LTO bootstrap
and probably few in LLVM build - there are 15 cases reported, but some
are false positives caused by hash function conflicts).

Honza

gcc/ChangeLog:

* ipa-icf-gimple.cc (func_checker::compare_ssa_name): Compare value 
ranges.

diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc
index 8c2df7a354e..37c416d777d 100644
--- a/gcc/ipa-icf-gimple.cc
+++ b/gcc/ipa-icf-gimple.cc
@@ -39,9 +39,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "attribs.h"
 #include "gimple-walk.h"
+#include "value-query.h"
+#include "value-range-storage.h"
 
 #include "tree-ssa-alias-compare.h"
 #include "ipa-icf-gimple.h"
 
 namespace ipa_icf_gimple {
 
@@ -109,6 +116,35 @@ func_checker::compare_ssa_name (const_tree t1, const_tree 
t2)
   else if (m_target_ssa_names[i2] != (int) i1)
 return false;
 
+  if (POINTER_TYPE_P (TREE_TYPE (t1)))
+{
+  if (SSA_NAME_PTR_INFO (t1))
+   {
+ if (!SSA_NAME_PTR_INFO (t2)
+ || SSA_NAME_PTR_INFO (t1)->align != SSA_NAME_PTR_INFO (t2)->align
+ || SSA_NAME_PTR_INFO (t1)->misalign != SSA_NAME_PTR_INFO 
(t2)->misalign)
+   return false;
+   }
+  else if (SSA_NAME_PTR_INFO (t2))
+   return false;
+}
+  else
+{
+  if (SSA_NAME_RANGE_INFO (t1))
+   {
+ if (!SSA_NAME_RANGE_INFO (t2))
+   return false;
+ Value_Range r1 (TREE_TYPE (t1));
+ Value_Range r2 (TREE_TYPE (t2));
+ SSA_NAME_RANGE_INFO (t1)->get_vrange (r1, TREE_TYPE (t1));
+ SSA_NAME_RANGE_INFO (t2)->get_vrange (r2, TREE_TYPE (t2));
+ if (r1 != r2)
+   return false;
+   }
+  else if (SSA_NAME_RANGE_INFO (t2))
+   return false;
+}
+
   if (SSA_NAME_IS_DEFAULT_DEF (t1))
 {
   tree b1 = SSA_NAME_VAR (t1);


[PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Andrew Stubbs
Don't enable excess lanes when inverting vector bit-masks smaller than the
integer mode.  This is yet another case of wrong-code due to mishandling
of oversized bitmasks.

This issue shows up in vect/tsvc/vect-tsvc-s278.c and
vect/tsvc/vect-tsvc-s279.c if I set the preferred vector size to V32
(down from V64) on amdgcn.

OK for mainline?

Andrew

gcc/ChangeLog:

* expr.cc (expand_expr_real_2): Use xor to invert vector masks.
---
 gcc/expr.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 403eeaa108e4..3540327d879e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10497,6 +10497,17 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
   immed_wide_int_const (mask, int_mode),
   target, 1, OPTAB_LIB_WIDEN);
}
+  /* If it's a vector mask don't enable excess bits.  */
+  else if (VECTOR_BOOLEAN_TYPE_P (type)
+  && SCALAR_INT_MODE_P (mode)
+  && maybe_ne (GET_MODE_PRECISION (mode),
+   TYPE_VECTOR_SUBPARTS (type).to_constant ()))
+   {
+ auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ temp = expand_binop (mode, xor_optab, op0,
+  GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
+  target, true, OPTAB_WIDEN);
+   }
   else
temp = expand_unop (mode, one_cmpl_optab, op0, target, 1);
   gcc_assert (temp);
-- 
2.41.0



[PATCH] libstdc++: Suppress deprecation messages from [PR101228]

2024-03-14 Thread Jonathan Wakely
Should we do this to silence the deprecation messages from the TBB
headers we include?

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/101228
* include/pstl/parallel_backend_tbb.h 
(TBB_SUPPRESS_DEPRECATED_MESSAGES):
Define before including  then undef afterwards.
---
 libstdc++-v3/include/pstl/parallel_backend_tbb.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/libstdc++-v3/include/pstl/parallel_backend_tbb.h 
b/libstdc++-v3/include/pstl/parallel_backend_tbb.h
index 3ff55237bff..96e4b709fbe 100644
--- a/libstdc++-v3/include/pstl/parallel_backend_tbb.h
+++ b/libstdc++-v3/include/pstl/parallel_backend_tbb.h
@@ -15,6 +15,11 @@
 
 #include "parallel_backend_utils.h"
 
+#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
+# define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
+# define _GLIBCXX_UNDEF_SUPPRESS
+#endif
+
 // Bring in minimal required subset of Intel TBB
 #include 
 #include 
@@ -25,6 +30,11 @@
 #include 
 #include 
 
+#ifdef _GLIBCXX_UNDEF_SUPPRESS
+# undef TBB_SUPPRESS_DEPRECATED_MESSAGES
+# undef _GLIBCXX_UNDEF_SUPPRESS
+#endif
+
 #if TBB_INTERFACE_VERSION < 1
 #error Intel(R) Threading Building Blocks 2018 is required; older versions 
are not supported.
 #endif
-- 
2.44.0



Re: [PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-03-14 Thread Jonathan Wakely
On Fri, 16 Feb 2024 at 15:15, Jonathan Wakely wrote:
>
> On Fri, 16 Feb 2024 at 14:10, Jakub Jelinek wrote:
> >
> > On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote:
> > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > which takes the newval by value, which presumably uses an 80-bit FP
> > > register and so the padding bits become indeterminate again.
> >
> > __atomic_compare_exchange_n only works with integers, so I guess
> > it is doing VIEW_CONVERT_EXPR (aka union-style type punning) on the
> > argument.
> >
> > Do you have preprocessed source for the testcase?
>
> Sent offlist.

Jakub fixed the compiler, so I've pushed the attached patch now.

Tested x86_64-linux.
commit 0adc8c5f146b108f99c4df09e43276e3a2419262
Author: xndcn 
Date:   Fri Feb 16 11:00:13 2024

libstdc++: Add missing clear_padding in __atomic_float constructor

For 80-bit long double we need to clear the padding bits on
construction.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (__atomic_float::__atomic_float(Fp)):
Clear padding.
* testsuite/29_atomics/atomic_float/compare_exchange_padding.cc:
New test.

Signed-off-by: xndcn 

Reviewed-by: Jonathan Wakely 

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index b857b441169..dd360302f80 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1283,7 +1283,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr
   __atomic_float(_Fp __t) : _M_fp(__t)
-  { }
+  { __atomic_impl::__clear_padding(_M_fp); }
 
   __atomic_float(const __atomic_float&) = delete;
   __atomic_float& operator=(const __atomic_float&) = delete;
diff --git 
a/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
new file mode 100644
index 000..49626ac6651
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
@@ -0,0 +1,53 @@
+// { dg-do run { target c++20 } }
+// { dg-options "-O0" }
+// { dg-additional-options "[atomic_link_flags [get_multilibs]] -latomic" }
+
+#include 
+#include 
+#include 
+#include 
+
+template
+void __attribute__((noinline,noipa))
+fill_padding(T& f)
+{
+  T mask;
+  std::memset(, 0xff, sizeof(T));
+  __builtin_clear_padding();
+  unsigned char* ptr_f = (unsigned char*)
+  unsigned char* ptr_mask = (unsigned char*)
+  for (unsigned i = 0; i < sizeof(T); i++)
+  {
+if (ptr_mask[i] == 0x00)
+{
+  ptr_f[i] = 0xff;
+}
+  }
+}
+
+void
+test01()
+{
+  // test for long double with padding (float80)
+  if constexpr (std::numeric_limits::digits == 64)
+  {
+long double f = 0.5f; // long double has padding bits on x86
+fill_padding(f);
+std::atomic as{ f }; // padding cleared on constructor
+long double t = 1.5;
+
+as.fetch_add(t);
+long double s = f + t;
+t = as.load();
+VERIFY(s == t); // padding ignored on comparison
+fill_padding(s);
+VERIFY(as.compare_exchange_weak(s, f)); // padding cleared on cmpexchg
+fill_padding(f);
+VERIFY(as.compare_exchange_strong(f, t)); // padding cleared on cmpexchg
+  }
+}
+
+int main()
+{
+  test01();
+}


OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 'declare' testing (was: [PATCH, OpenACC 2.7, v2] readonly modifier support in front-ends)

2024-03-14 Thread Thomas Schwinge
Hi!

On 2024-03-13T10:12:17+0100, I wrote:
> On 2024-03-07T17:02:02+0900, Chung-Lin Tang  
> wrote:
>> Also added simple 'declare' tests, but there is not anything to scan in the 
>> 'tree-original' dump though.
>
> Yeah, the current OpenACC 'declare' implementation is "special".

Actually -- commit 38958ac987dc3e6162e2ddaba3c7e7f41381e079
"OpenACC 2.7: front-end support for readonly modifier: Add basic OpenACC 
'declare' testing",
see attached.


But I realized another thing: don't we have to handle the 'readonly'
modifier also in Fortran module files, that is, next to the OpenACC
'declare' 'copyin' handling in 'gcc/fortran/module.cc':
'AB_OACC_DECLARE_COPYIN' etc.?  Chung-Lin, please check, via test cases.
'gfortran.dg/goacc/routine-module*', for example, should provide some
guidance of how to achieve actual module file use, and then do the same
'scan-tree-dump' as in the current 'readonly' modifier test cases.
I suppose the code changes would look similar to
commit a61f6afbee370785cf091fe46e2e022748528307
"OpenACC 'nohost' clause", for example.  By means of only emitting a tag
in the module file if the 'readonly' modifier is specified, we should
maintain compatibility with the current 'MOD_VERSION'.


Grüße
 Thomas


>> diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
>> index 7b154eb3ca7..db84b06289b 100644
>> --- a/gcc/fortran/dump-parse-tree.cc
>> +++ b/gcc/fortran/dump-parse-tree.cc
>> @@ -1400,6 +1400,9 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
>>  fputs (") ALLOCATE(", dumpfile);
>>continue;
>>  }
>> +  if ((list_type == OMP_LIST_MAP || list_type == OMP_LIST_CACHE)
>> +  && n->u.map.readonly)
>> +fputs ("readonly,", dumpfile);
>>if (list_type == OMP_LIST_REDUCTION)
>>  switch (n->u.reduction_op)
>>{
>> @@ -1467,7 +1470,7 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
>>default: break;
>>}
>>else if (list_type == OMP_LIST_MAP)
>> -switch (n->u.map_op)
>> +switch (n->u.map.op)
>>{
>>case OMP_MAP_ALLOC: fputs ("alloc:", dumpfile); break;
>>case OMP_MAP_TO: fputs ("to:", dumpfile); break;
>> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
>> index ebba2336e12..32b792f85fb 100644
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -1363,7 +1363,11 @@ typedef struct gfc_omp_namelist
>>  {
>>gfc_omp_reduction_op reduction_op;
>>gfc_omp_depend_doacross_op depend_doacross_op;
>> -  gfc_omp_map_op map_op;
>> +  struct
>> +{
>> +  ENUM_BITFIELD (gfc_omp_map_op) op:8;
>> +  bool readonly;
>> +} map;
>>gfc_expr *align;
>>struct
>>  {
>> diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
>> index 38de60238c0..5c44e666eb9 100644
>> --- a/gcc/fortran/openmp.cc
>> +++ b/gcc/fortran/openmp.cc
>> @@ -1210,7 +1210,7 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, 
>> gfc_omp_map_op map_op,
>>  {
>>gfc_omp_namelist *n;
>>for (n = *head; n; n = n->next)
>> -n->u.map_op = map_op;
>> +n->u.map.op = map_op;
>>return true;
>>  }
>>  
>> @@ -1524,7 +1524,7 @@ gfc_match_omp_clause_reduction (char pc, 
>> gfc_omp_clauses *c, bool openacc,
>>  gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl;
>>  p->sym = n->sym;
>>  p->where = p->where;
>> -p->u.map_op = OMP_MAP_ALWAYS_TOFROM;
>> +p->u.map.op = OMP_MAP_ALWAYS_TOFROM;
>>  
>>  tl = >lists[OMP_LIST_MAP];
>>  while (*tl)
>> @@ -2181,11 +2181,25 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
>> omp_mask mask,
>>  {
>>if (openacc)
>>  {
>> -  if (gfc_match ("copyin ( ") == MATCH_YES
>> -  && gfc_match_omp_map_clause (>lists[OMP_LIST_MAP],
>> -   OMP_MAP_TO, true,
>> -   allow_derived))
>> -continue;
>> +  if (gfc_match ("copyin ( ") == MATCH_YES)
>> +{
>> +  bool readonly = gfc_match ("readonly : ") == MATCH_YES;
>> +  head = NULL;
>> +  if (gfc_match_omp_variable_list ("",
>> +   >lists[OMP_LIST_MAP],
>> +   true, NULL, , true,
>> +   allow_derived)
>> +  == MATCH_YES)
>> +{
>> +  gfc_omp_namelist *n;
>> +  for (n = *head; n; n = n->next)
>> +{
>> +  n->u.map.op = OMP_MAP_TO;
>> +  n->u.map.readonly = readonly;
>> +}
>> +  continue;
>> +}
>> +}
>>  }
>>else if 

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak  wrote:
>
> On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu  wrote:
> >
> > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
> > >
> > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> > > >
> > > > When we split
> > > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
> > > > MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) 
> > > > "test.C":22:42 84 {*movdi_internal}
> > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > >
> > > > into
> > > >
> > > > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > > > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 
> > > > ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 
> > > > A32])
> > > > (const_int 0 [0]))) "test.C":22:42 -1
> > > > (nil)))
> > > > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > > > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 
> > > > {movv2di_internal}
> > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > > (nil)))
> > > >
> > > > we must copy the REG_EH_REGION note to the first insn and split the 
> > > > block
> > > > after the newly added insn.  The REG_EH_REGION on the second insn will 
> > > > be
> > > > removed later since it no longer traps.
> > > >
> > > > Currently we only handle memory_operand, are there any other insns
> > > > need to be handled???
> > >
> > > I think memory access is the only thing that can trap.
> > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> > > > gcc-13/gcc-12 release branch.
> > > > Ok for trunk and backport?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386-features.cc
> > > > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > > > (convert_scalars_to_vector): Ditto.
> > > > * config/i386/i386-features.h (class scalar_chain): New
> > > > memeber control_flow_insns.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * g++.target/i386/pr111822.C: New test.
> > > > ---
> > > >  gcc/config/i386/i386-features.cc | 48 ++--
> > > >  gcc/config/i386/i386-features.h  |  1 +
> > > >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> > > >  3 files changed, 90 insertions(+), 4 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > b/gcc/config/i386/i386-features.cc
> > > > index 1de2a07ed75..2ed27a9ebdd 100644
> > > > --- a/gcc/config/i386/i386-features.cc
> > > > +++ b/gcc/config/i386/i386-features.cc
> > > > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, 
> > > > rtx_insn *insn)
> > > >  }
> > > >else if (MEM_P (*op))
> > > >  {
> > > > +  rtx_insn* eh_insn, *movabs = NULL;
> > > >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> > > >
> > > >/* Handle movabs.  */
> > > >if (!memory_operand (*op, GET_MODE (*op)))
> > > > {
> > > >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > > > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > >
> > > > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > >   *op = tmp2;
> > > > }
> > >
> > > I may be missing something, but isn't the above a dead code? We have
> > > if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
> > It's PR91814 #c1, memory_operand will also check invalid memory addresses.
>
> Oh, it is even my comment ;)
>
> Perhaps the comment should be improved to something like:
>
> "Emit MOVABS to load from a 64-bit absolute address to a GPR."
>
> LGTM then.

BTW: Do we need to also fix timode_scalar_chain::convert_op ? There we
also preload operand, so a similar fix should be applied there.

Uros.


Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

2024-03-14 Thread Jan Hubicka
> > int test (int a)
> > {
> > return a>0 ? CST1:  CST2;
> > }
> > 
> > gets same hash value no matter what CST1/CST2 is.  I added hasher and I
> > am re-running stats.
> 
> The hash should be commutative here at least.

It needs to match what comparator is doing later, and sadly it does not
try to find the right order of basic blocks.  I added TODO and will fix
it next stage1.

With the attached patch we have no comparsion with mismatching VRP
ranges with non-LTO bootstrap and there are 4 instances of mismatched
jump function with LTO bootstrap.

I checked them and these are functions that are still different and 
would not be unified anyway.  In testsuite we have
 gcc.c-torture/execute/builtin-prefetch-4.c
   this matches
   [irange] long unsigned int [0, 2147483647][18446744071562067968, +INF]
   [irange] long unsigned int [0, 2147483646][18446744071562067968, +INF]
   in postdec_glob_idx/22new vs predec_glob_idx/20

   The functions are different, but have precisely same statements, just
   differ by SSA names that we do not hash.
 c-c++-common/builtins.c
   [irange] long unsigned int [2, 2147483647] MASK 0x7ffe VALUE 0x0
   [irange] long unsigned int [8, 2147483647] MASK 0x7ff8 VALUE 0x0
   in bar.part.0/10new  foo.part.0/9

   This is a real mismatch
 23_containers/vector/cons/2.cc
   [irange] long unsigned int [1, 9223372036854775807] MASK 0x7fff 
VALUE 0x0
   [irange] long unsigned int [1, 9223372036854775807] MASK 0xfff8 
VALUE 0x0
   static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, 
_Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, 
std::true_type) [with _Tp = A; _Alloc = std::allocator >]/3482
   static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, 
_Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, 
std::true_type) [with _Tp = double; _Alloc = std::allocator]/3372

   Here funtions are different initially but are optimized to same body
   while we keep info about ranges from optimized out code.

   I tried to make easy testcase, but

   int a;
   a = (char)a;

   is not optimized away, I wonder why, when for all valid values this
   is noop?

 25_algorithms/lexicographical_compare/uchar.cc
   [irange] long unsigned int [8, +INF] MASK 0xfff8 VALUE 0x0
   [irange] long unsigned int [4, +INF] MASK 0xfffc VALUE 0x0
   static constexpr bool std::__equal::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = long int]/13669
   static constexpr bool std::__equal::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = int]/13663
 std/ranges/conv/1.cc
   [irange] long unsigned int [8, +INF] MASK 0xfff8 VALUE 0x0
   [irange] long unsigned int [4, +INF] MASK 0xfffc VALUE 0x0
   static constexpr bool std::__equal::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = long int]/13757
   static constexpr bool std::__equal::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = int]/13751

   Those two are also happening in llvm.  We try to merge two functions
   which take pointer parameters of different type.

boolD.2783 std::__equal::equalD.426577 (const intD.9 * 
__first1D.433380, const intD.9 * __last1D.433381, const intD.9 * 
__first2D.433382)
{ 
  const intD.9 * __first1_4(D) = __first1D.433380;
  const intD.9 * __last1_3(D) = __last1D.433381;
  const intD.9 * __first2_6(D) = __first2D.433382;
  long intD.12 _1;
  boolD.2783 _2;
  long unsigned intD.16 _7;
  boolD.2783 _8;
  intD.9 _9;

  _1 = __last1_3(D) - __first1_4(D);
  if (_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

  # RANGE [irange] long unsigned int [4, +INF] MASK 0xfffc 
VALUE 0x0
  _7 = (long unsigned intD.16) _1;

  Compared to

boolD.2783 std::__equal::equalD.426685 (const long 
intD.12 * __first1D.433472, const long intD.12 * __last1D.433473, const long 
intD.12 * __first2D.433474)
{
  const long intD.12 * __first1_4(D) = __first1D.433472;
  const long intD.12 * __last1_3(D) = __last1D.433473;
  const long intD.12 * __first2_6(D) = __first2D.433474;
  long intD.12 _1;
  boolD.2783 _2;
  long unsigned intD.16 _7;
  boolD.2783 _8;
  intD.9 _9;

  _1 = __last1_3(D) - __first1_4(D);
  if (_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

  # RANGE [irange] long unsigned int [8, +INF] MASK 0xfff8 
VALUE 0x0
  _7 = (long unsigned intD.16) _1;

   This looks like potentially dangerous situation, since if we can
   derive range from pointed-to type, then ICF needs to match them too.

   However with
   #include 
   size_t diff (long *a, long *b)
   {
 long int d = (b - a) * sizeof (long);
 if (!d)
   abort ();
 return d;
   }
   size_t diff2 (int *a, int *b)
   {
 long int d = (b - a) * 

Re: [PATCH 1/3] bpf: Fix CO-RE field expression builtins

2024-03-14 Thread Jose E. Marchesi


>> Jose E. Marchesi writes:
>>
 This patch corrects bugs within the CO-RE builtin field expression
 related builtins.
 The following bugs were identified and corrected based on the expected
 results of bpf-next selftests testsuite.
 It addresses the following problems:
  - Expressions with pointer dereferencing now point to the BTF structure
type, instead of the structure pointer type.
  - Pointer addition to structure root is now identified and constructed
in CO-RE relocations as if it is an array access. For example,
   "&(s+2)->b" generates "2:1" as an access string where "2" is
   refering to the access for "s+2".

 gcc/ChangeLog:
* config/bpf/core-builtins.cc (core_field_info): Add
support for POINTER_PLUS_EXPR in the root of the field expression.
(bpf_core_get_index): Likewise.
(pack_field_expr): Make the BTF type to point to the structure
related node, instead of its pointer type.
(make_core_safe_access_index): Correct to new code.

 gcc/testsuite/ChangeLog:
* gcc.target/bpf/core-attr-5.c: Correct.
* gcc.target/bpf/core-attr-6.c: Likewise.
* gcc.target/bpf/core-attr-struct-as-array.c: Add test case for
pointer arithmetics as array access use case.
 ---
  gcc/config/bpf/core-builtins.cc   | 54 +++
  gcc/testsuite/gcc.target/bpf/core-attr-5.c|  4 +-
  gcc/testsuite/gcc.target/bpf/core-attr-6.c|  4 +-
  .../bpf/core-attr-struct-as-array.c   | 35 
  4 files changed, 82 insertions(+), 15 deletions(-)
  create mode 100644 
 gcc/testsuite/gcc.target/bpf/core-attr-struct-as-array.c

 diff --git a/gcc/config/bpf/core-builtins.cc 
 b/gcc/config/bpf/core-builtins.cc
 index 8d8c54c1fb3d..4256fea15e49 100644
 --- a/gcc/config/bpf/core-builtins.cc
 +++ b/gcc/config/bpf/core-builtins.cc
 @@ -388,8 +388,8 @@ core_field_info (tree src, enum btf_core_reloc_kind 
 kind)

src = root_for_core_field_info (src);

 -  get_inner_reference (src, , , _off, , 
 ,
 - , );
 +  tree root = get_inner_reference (src, , , _off, 
 ,
 + , , );

/* Note: Use DECL_BIT_FIELD_TYPE rather than DECL_BIT_FIELD here, 
 because it
   remembers whether the field in question was originally declared as a
 @@ -414,6 +414,23 @@ core_field_info (tree src, enum btf_core_reloc_kind 
 kind)
  {
  case BPF_RELO_FIELD_BYTE_OFFSET:
{
 +  result = 0;
 +  if (var_off == NULL_TREE
 +  && TREE_CODE (root) == INDIRECT_REF
 +  && TREE_CODE (TREE_OPERAND (root, 0)) == POINTER_PLUS_EXPR)
 +{
 +  tree node = TREE_OPERAND (root, 0);
 +  tree offset = TREE_OPERAND (node, 1);
 +  tree type = TREE_TYPE (TREE_OPERAND (node, 0));
 +  type = TREE_TYPE (type);
 +
 +  gcc_assert (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p 
 (offset)
 +  && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE 
 (type)));
>>>
>>> What if an expression with a non-constant offset (something like s+foo)
>>> is passed to the builtin?  Wouldn't it be better to error there instead
>>> of ICEing?
>>>
>> In that case, var_off == NULL_TREE, and it did not reach the assert.
>> In any case, please notice that this code was copied from some different
>> code in the same file which in that case would actually produce the
>> error earlier.  The assert is there as a safe guard just in case the
>> other function stops detecting this case.
>>
>> In core-builtins.cc:572
>>
>> else if (code == POINTER_PLUS_EXPR)
>>   {
>> tree offset = TREE_OPERAND (node, 1);
>> tree type = TREE_TYPE (TREE_OPERAND (node, 0));
>> type = TREE_TYPE (type);
>>
>> if (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p (offset)
>> && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE (type)))
>>   {
>> HOST_WIDE_INT offset_i = tree_to_shwi (offset);
>> HOST_WIDE_INT type_size_i = tree_to_shwi (TYPE_SIZE_UNIT (type));
>> if ((offset_i % type_size_i) == 0)
>>   return offset_i / type_size_i;
>>   }
>>   }
>>
>> if (valid != NULL)
>>   *valid = false;
>> return -1;
>>   }
>>
>> Because the code, although similar, is actually having different
>> purposes, I decided not to abstract this in an independent function. My
>> perception was that it would be more confusing.
>>
>> Without wanting to paste too much code, please notice that the function
>> with the assert is only called if the above function, does not return
>> with error (i.e. valid != false).
>
> Ok understood.
> Please submit upstream.
> Thanks!

Heh this is already upstream, sorry.
The patch is OK.
Thanks!

>
>>
>>>
 +
 

Re: [PATCH 1/3] bpf: Fix CO-RE field expression builtins

2024-03-14 Thread Jose E. Marchesi


> Jose E. Marchesi writes:
>
>>> This patch corrects bugs within the CO-RE builtin field expression
>>> related builtins.
>>> The following bugs were identified and corrected based on the expected
>>> results of bpf-next selftests testsuite.
>>> It addresses the following problems:
>>>  - Expressions with pointer dereferencing now point to the BTF structure
>>>type, instead of the structure pointer type.
>>>  - Pointer addition to structure root is now identified and constructed
>>>in CO-RE relocations as if it is an array access. For example,
>>>   "&(s+2)->b" generates "2:1" as an access string where "2" is
>>>   refering to the access for "s+2".
>>>
>>> gcc/ChangeLog:
>>> * config/bpf/core-builtins.cc (core_field_info): Add
>>> support for POINTER_PLUS_EXPR in the root of the field expression.
>>> (bpf_core_get_index): Likewise.
>>> (pack_field_expr): Make the BTF type to point to the structure
>>> related node, instead of its pointer type.
>>> (make_core_safe_access_index): Correct to new code.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/bpf/core-attr-5.c: Correct.
>>> * gcc.target/bpf/core-attr-6.c: Likewise.
>>> * gcc.target/bpf/core-attr-struct-as-array.c: Add test case for
>>> pointer arithmetics as array access use case.
>>> ---
>>>  gcc/config/bpf/core-builtins.cc   | 54 +++
>>>  gcc/testsuite/gcc.target/bpf/core-attr-5.c|  4 +-
>>>  gcc/testsuite/gcc.target/bpf/core-attr-6.c|  4 +-
>>>  .../bpf/core-attr-struct-as-array.c   | 35 
>>>  4 files changed, 82 insertions(+), 15 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-struct-as-array.c
>>>
>>> diff --git a/gcc/config/bpf/core-builtins.cc 
>>> b/gcc/config/bpf/core-builtins.cc
>>> index 8d8c54c1fb3d..4256fea15e49 100644
>>> --- a/gcc/config/bpf/core-builtins.cc
>>> +++ b/gcc/config/bpf/core-builtins.cc
>>> @@ -388,8 +388,8 @@ core_field_info (tree src, enum btf_core_reloc_kind 
>>> kind)
>>>
>>>src = root_for_core_field_info (src);
>>>
>>> -  get_inner_reference (src, , , _off, , ,
>>> -  , );
>>> +  tree root = get_inner_reference (src, , , _off, ,
>>> +  , , );
>>>
>>>/* Note: Use DECL_BIT_FIELD_TYPE rather than DECL_BIT_FIELD here, 
>>> because it
>>>   remembers whether the field in question was originally declared as a
>>> @@ -414,6 +414,23 @@ core_field_info (tree src, enum btf_core_reloc_kind 
>>> kind)
>>>  {
>>>  case BPF_RELO_FIELD_BYTE_OFFSET:
>>>{
>>> +   result = 0;
>>> +   if (var_off == NULL_TREE
>>> +   && TREE_CODE (root) == INDIRECT_REF
>>> +   && TREE_CODE (TREE_OPERAND (root, 0)) == POINTER_PLUS_EXPR)
>>> + {
>>> +   tree node = TREE_OPERAND (root, 0);
>>> +   tree offset = TREE_OPERAND (node, 1);
>>> +   tree type = TREE_TYPE (TREE_OPERAND (node, 0));
>>> +   type = TREE_TYPE (type);
>>> +
>>> +   gcc_assert (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p 
>>> (offset)
>>> +   && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE 
>>> (type)));
>>
>> What if an expression with a non-constant offset (something like s+foo)
>> is passed to the builtin?  Wouldn't it be better to error there instead
>> of ICEing?
>>
> In that case, var_off == NULL_TREE, and it did not reach the assert.
> In any case, please notice that this code was copied from some different
> code in the same file which in that case would actually produce the
> error earlier.  The assert is there as a safe guard just in case the
> other function stops detecting this case.
>
> In core-builtins.cc:572
>
> else if (code == POINTER_PLUS_EXPR)
>   {
> tree offset = TREE_OPERAND (node, 1);
> tree type = TREE_TYPE (TREE_OPERAND (node, 0));
> type = TREE_TYPE (type);
>
> if (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p (offset)
> && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE (type)))
>   {
> HOST_WIDE_INT offset_i = tree_to_shwi (offset);
> HOST_WIDE_INT type_size_i = tree_to_shwi (TYPE_SIZE_UNIT (type));
> if ((offset_i % type_size_i) == 0)
>   return offset_i / type_size_i;
>   }
>   }
>
> if (valid != NULL)
>   *valid = false;
> return -1;
>   }
>
> Because the code, although similar, is actually having different
> purposes, I decided not to abstract this in an independent function. My
> perception was that it would be more confusing.
>
> Without wanting to paste too much code, please notice that the function
> with the assert is only called if the above function, does not return
> with error (i.e. valid != false).

Ok understood.
Please submit upstream.
Thanks!

>
>>
>>> +
>>> +   HOST_WIDE_INT offset_i = tree_to_shwi (offset);
>>> +   result += offset_i;
>>> + }
>>> +
>>> type = unsigned_type_node;
>>> if (var_off != NULL_TREE)
>>>   

Here's the document that was shared with you.

2024-03-14 Thread Office
test

Re: [PATCH] aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

2024-03-14 Thread Jakub Jelinek
On Thu, Mar 14, 2024 at 11:49:14AM +, Richard Earnshaw (lists) wrote:
> On 14/03/2024 08:37, Jakub Jelinek wrote:
> > Hi!
> > 
> > The following testcase ICEs with LSE atomics.
> > The problem is that the @atomic_compare_and_swap expander uses
> > aarch64_reg_or_zero predicate for the desired operand, which is fine,
> > given that for most of the modes and even for TImode in some cases
> > it can handle zero immediate just fine, but the TImode
> > @aarch64_compare_and_swap_lse just uses register_operand for
> > that operand instead, again intentionally so, because the casp,
> > caspa, caspl and caspal instructions need to use a pair of consecutive
> > registers for the operand and xzr is just one register and we can't
> > just store zero into the link register to emulate pair of zeros.
> > 
> > So, the following patch fixes that by forcing the newval operand into
> > a register for the TImode LSE case.
> > 
> > Bootstrapped/regtested on aarch64-linux, ok for trunk?
> 
> An alternative fix would be to use a mode_attr to pick a different predicate
> for TImode.  But that's probably just a matter of taste; I'm not sure that
> one would be better than the other in reality.

The reason I didn't do something like that is that for the non-LSE case,
@aarch64_compare_and_swap pattern actually handles const0_rtx desired
operand and so does the TARGET_OUTLINE_ATOMICS case, so if it was done through
a predicate, it would need to be a predicate that would be specific to this
case and would yield register_operand for TImode && TARGET_LSE and and
otherwise aarch64_reg_or_zero.  The patch seems to be simpler to me,
especially when aarch64_expand_compare_and_swap already does the forcing
stuff into registers at various other spots.

Jakub



[committed] libstdc++: Correct notes about std::call_once in manual [PR66146]

2024-03-14 Thread Jonathan Wakely
Pushed to trunk. I should backport this too.

-- >8 --

The bug with exceptions thrown during a std::call_once call affects all
targets, so fix the docs that say it only affects non-Linux targets.

libstdc++-v3/ChangeLog:

PR libstdc++/66146
* doc/xml/manual/status_cxx2011.xml: Remove mention of Linux in
note about std::call_once.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/html/manual/status.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/status.html   | 6 +++---
 libstdc++-v3/doc/xml/manual/status_cxx2011.xml | 2 +-
 libstdc++-v3/doc/xml/manual/status_cxx2014.xml | 2 +-
 libstdc++-v3/doc/xml/manual/status_cxx2017.xml | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
index 1eeb2d1ccd7..7f589ad7f7a 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2011.xml
@@ -2404,7 +2404,7 @@ particular release.
   30.4.4.2
   Function call_once
   Y
-  Exception support is broken on non-Linux targets.
+  Exception support is broken.
See http://www.w3.org/1999/xlink;
xlink:href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146;>PR
66146.
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
index 807cea57d12..518a8973f72 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2014.xml
@@ -1390,7 +1390,7 @@ not in any particular release.
   30.4.4.2
   Function call_once
   Broken
-  Exception support is broken on non-Linux targets.
+  Exception support is broken.
See http://www.w3.org/1999/xlink;
xlink:href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146;>PR
66146.
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index bea6db929c6..144b9909fac 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -2505,7 +2505,7 @@ since C++14 and the implementation is complete.
   33.4.6.2
   Function call_once
   Y
-  Exception support is broken on non-Linux targets.
+  Exception support is broken.
See http://www.w3.org/1999/xlink;
xlink:href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146;>PR
66146.
-- 
2.44.0



[committed] libstdc++: Update C++23 status in the manual

2024-03-14 Thread Jonathan Wakely
I think we have all C++23 changes in this table now. At some point soon
we should replace the C++20 table of papers with the C++20 table of
contents, as we do for the other standards.

Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2023.xml: Update C++23 status table.
* doc/html/manual/status.html: Regenerate.
* include/bits/version.def: Fix typo in comment.
---
 libstdc++-v3/doc/html/manual/status.html  | 146 +++--
 .../doc/xml/manual/status_cxx2023.xml | 295 +++---
 libstdc++-v3/include/bits/version.def |   2 +-
 3 files changed, 376 insertions(+), 67 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2023.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2023.xml
index 4bf22f00bce..9b870d1dbdf 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2023.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2023.xml
@@ -312,15 +312,32 @@ or any notes about the implementation.
 
 
 
-  
+  
ranges::to 
   
 http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1206r7.pdf;>
 P1206R7
 
   
+   14.1 (ranges::to function) 
+  
+   __cpp_lib_containers_ranges = 202202L,
+   __cpp_lib_ranges_to_container = 202202L
+  
+
+
+
+  
+   Ranges iterators as inputs to non-Ranges algorithms 
+  
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2408r5.html;>
+P2408R5
+
+  

-   __cpp_lib_ranges_to_container = 202202L 
+  
+   __cpp_lib_algorithm_iterator_requirements = 202207L
+  
 
 
 
@@ -377,6 +394,18 @@ or any notes about the implementation.
__cpp_lib_ranges_contains = 202207L 
 
 
+
+  
+   Making multi-param constructors of views explicit 
+  
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2711r1.html;>
+P2711R1
+
+  
+   
+  
+
+
 
ranges::fold 
   
@@ -388,6 +417,18 @@ or any notes about the implementation.
__cpp_lib_ranges_fold = 202207L 
 
 
+
+  
+   Relaxing Ranges Just A Smidge
+  
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2609r3.html;>
+P2609R3
+
+  
+   
+   __cpp_lib_ranges = 202302L 
+
+
 
   
 Compile-time programming
@@ -484,65 +525,73 @@ or any notes about the implementation.
 
 
 
-  
A type trait to detect reference binding to temporary 
   
 http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2255r2.html;>
 P2255R2
 
   
-   13.1 (missing changes to std::tuple 

+  
+   
+ 13.1 (missing changes to std::tuple) 

+ 14.1 (complete) 
+
+  
__cpp_lib_reference_from_temporary = 202202L 

 
 
 
-  
-Strings and text
+  
+  
+Move-only types for equality_comparable_with, totally_ordered_with,
+   and three_way_comparable_with
   
+  
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2404r3.pdf;>
+P2404R3
+
+  
+   
+   __cpp_lib_concepts = 202207L 
 
 
 
-   string contains function 
+  
+   A trait for implicit lifetime types 
   
-http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1679r3.html;>
-P1679R3
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2674r1.pdf;>
+P2674R1
 
   
-   11.1 
-   __cpp_lib_string_contains = 202011L 
+   
+   __cpp_lib_is_implicit_lifetime = 202302L 

 
 
 
-   Prohibit std::basic_string and std::basic_string_view 
construction from nullptr 
+  
   
-http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2166r1.html;>
-P2166R1
+   common_reference_t of reference_wrapper
+   Should Be a Reference Type
+  
+  
+http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2655r3.html;>
+P2655R3
 
   
-   12.1 
-  
+   
+   __cpp_lib_common_reference = 202302L 
 
 
 
-   basic_string::resize_and_overwrite 
+  
+   Deprecate numeric_limits::has_denorm 
   
-http://www.w3.org/1999/xlink; 
xlink:href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1072r10.html;>
-P1072R10
+http://www.w3.org/1999/xlink; 

Re: [PATCH] bitint, v2: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Richard Biener
On Thu, 14 Mar 2024, Jakub Jelinek wrote:

> On Thu, Mar 14, 2024 at 09:59:12AM +0100, Richard Biener wrote:
> > On Thu, 14 Mar 2024, Jakub Jelinek wrote:
> > 
> > > On Thu, Mar 14, 2024 at 09:48:45AM +0100, Richard Biener wrote:
> > > > Ugh.  OK, but I wonder whether we might want to simply delay
> > > > fixing the CFG for inserts before returns-twice?  Would that make
> > > > things less ugly?
> > > 
> > > You mean in lower_call just remember if we added anything before
> > > ECF_RETURNS_TWICE call (or say add such stmt into a vector) and
> > > then fix it up before the end of the pass?
> > 
> > Yeah, or just walk BBs with abnormal preds, whatever tracking is
> > easier.
> 
> Walking all the bbs with abnormal preds would mean I'd have to walk their
> whole bodies, because the ECF_RETURNS_TWICE call is no longer at the start.
> 
> The following patch seems to work, ok for trunk if it passes full
> bootstrap/regtest?

OK.  I'll let you decide which variant is better maintainable
(I think this one).

Thanks,
Richard.

> 2024-03-14  Jakub Jelinek  
> 
>   PR tree-optimization/113466
>   * gimple-lower-bitint.cc (bitint_large_huge): Add m_returns_twice_calls
>   member.
>   (bitint_large_huge::bitint_large_huge): Initialize it.
>   (bitint_large_huge::~bitint_large_huge): Release it.
>   (bitint_large_huge::lower_call): Remember ECF_RETURNS_TWICE call stmts
>   before which at least one statement has been inserted.
>   (gimple_lower_bitint): Move argument loads before ECF_RETURNS_TWICE
>   calls to a different block and add corresponding PHIs.
> 
>   * gcc.dg/bitint-100.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-13 15:34:29.969725763 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-14 11:25:07.763169074 +0100
> @@ -419,7 +419,8 @@ struct bitint_large_huge
>bitint_large_huge ()
>  : m_names (NULL), m_loads (NULL), m_preserved (NULL),
>m_single_use_names (NULL), m_map (NULL), m_vars (NULL),
> -  m_limb_type (NULL_TREE), m_data (vNULL) {}
> +  m_limb_type (NULL_TREE), m_data (vNULL),
> +  m_returns_twice_calls (vNULL) {}
>  
>~bitint_large_huge ();
>  
> @@ -553,6 +554,7 @@ struct bitint_large_huge
>unsigned m_bitfld_load;
>vec m_data;
>unsigned int m_data_cnt;
> +  vec m_returns_twice_calls;
>  };
>  
>  bitint_large_huge::~bitint_large_huge ()
> @@ -565,6 +567,7 @@ bitint_large_huge::~bitint_large_huge ()
>  delete_var_map (m_map);
>XDELETEVEC (m_vars);
>m_data.release ();
> +  m_returns_twice_calls.release ();
>  }
>  
>  /* Insert gimple statement G before current location
> @@ -5248,6 +5251,7 @@ bitint_large_huge::lower_call (tree obj,
>default:
>   break;
>}
> +  bool returns_twice = (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0;
>for (unsigned int i = 0; i < nargs; ++i)
>  {
>tree arg = gimple_call_arg (stmt, i);
> @@ -5271,6 +5275,11 @@ bitint_large_huge::lower_call (tree obj,
> arg = make_ssa_name (TREE_TYPE (arg));
> gimple *g = gimple_build_assign (arg, v);
> gsi_insert_before (, g, GSI_SAME_STMT);
> +   if (returns_twice)
> + {
> +   m_returns_twice_calls.safe_push (stmt);
> +   returns_twice = false;
> + }
>   }
>gimple_call_set_arg (stmt, i, arg);
>if (m_preserved == NULL)
> @@ -7091,6 +7100,66 @@ gimple_lower_bitint (void)
>if (edge_insertions)
>  gsi_commit_edge_inserts ();
>  
> +  /* Fix up arguments of ECF_RETURNS_TWICE calls.  Those were temporarily
> + inserted before the call, but that is invalid IL, so move them to the
> + right place and add corresponding PHIs.  */
> +  if (!large_huge.m_returns_twice_calls.is_empty ())
> +{
> +  auto_vec arg_stmts;
> +  while (!large_huge.m_returns_twice_calls.is_empty ())
> + {
> +   gimple *stmt = large_huge.m_returns_twice_calls.pop ();
> +   gimple_stmt_iterator gsi = gsi_after_labels (gimple_bb (stmt));
> +   while (gsi_stmt (gsi) != stmt)
> + {
> +   arg_stmts.safe_push (gsi_stmt (gsi));
> +   gsi_remove (, false);
> + }
> +   gimple *g;
> +   basic_block bb = NULL;
> +   edge e = NULL, ead = NULL;
> +   FOR_EACH_VEC_ELT (arg_stmts, i, g)
> + {
> +   gsi_safe_insert_before (, g);
> +   if (i == 0)
> + {
> +   bb = gimple_bb (stmt);
> +   gcc_checking_assert (EDGE_COUNT (bb->preds) == 2);
> +   e = EDGE_PRED (bb, 0);
> +   ead = EDGE_PRED (bb, 1);
> +   if ((ead->flags & EDGE_ABNORMAL) == 0)
> + std::swap (e, ead);
> +   gcc_checking_assert ((e->flags & EDGE_ABNORMAL) == 0
> +&& (ead->flags & EDGE_ABNORMAL));
> + }
> +   tree lhs = gimple_assign_lhs (g);
> +   tree arg = lhs;
> +   gphi *phi = create_phi_node (copy_ssa_name 

Re: [PATCH] aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

2024-03-14 Thread Richard Earnshaw (lists)




On 14/03/2024 08:37, Jakub Jelinek wrote:

Hi!

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

Bootstrapped/regtested on aarch64-linux, ok for trunk?


An alternative fix would be to use a mode_attr to pick a different 
predicate for TImode.  But that's probably just a matter of taste; I'm 
not sure that one would be better than the other in reality.


OK (or with my suggestion if you prefer).

R.



2024-03-14  Jakub Jelinek  

PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.

* gcc.dg/pr114310.c: New test.

--- gcc/config/aarch64/aarch64.cc.jj2024-03-12 10:16:12.024101665 +0100
+++ gcc/config/aarch64/aarch64.cc   2024-03-13 18:55:39.147986554 +0100
@@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope
  rval = copy_to_mode_reg (r_mode, oldval);
else
emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+  if (mode == TImode)
+   newval = force_reg (mode, newval);
  
emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,

   newval, mod_s));
--- gcc/testsuite/gcc.dg/pr114310.c.jj  2024-03-13 19:09:25.322597418 +0100
+++ gcc/testsuite/gcc.dg/pr114310.c 2024-03-13 19:08:50.802073314 +0100
@@ -0,0 +1,20 @@
+/* PR target/114310 */
+/* { dg-do run { target int128 } } */
+
+volatile __attribute__((aligned (sizeof (__int128_t __int128_t v = 10;
+
+int
+main ()
+{
+#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 0) != 10)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 15) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 0, (__int128_t) 42) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 31, (__int128_t) 35) != 42)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



Re: [PATCH v2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls

2024-03-14 Thread Tobias Burnus

Hi Kwok,

On January 22, 2024, Kwok Cheung Yeung wrote:
There was a bug in the declare-target-indirect-2.c libgomp testcase 
(testing indirect calls in offloaded target regions, spread over 
multiple teams/threads) that due to an errant fallthrough in a switch 
statement resulted in only one indirect function ever getting called:


(When applying, also the 'dg-xfail-run-if' needs to be removed from
libgomp.fortran/declare-target-indirect-2.f90) ...

However, when the missing break statements are added, the testcase 
fails with an invalid memory access. Upon investigation, this is due 
to the use of a splay-tree as the lookup structure for indirect 
addresses, as the splay-tree moves frequently accessed elements closer 
to the root node and so needs locking when used from multiple threads. 
However, this would end up partially serialising all the threads and 
kill performance. I have switched the lookup structure from a splay 
tree to a hashtab instead to avoid locking during lookup.


I have also tidied up the initialisation of the lookup table by 
calling it only from the first thread of the first team, instead of 
redundantly calling it from every thread and only having the first one 
reached do the initialisation. This removes the need for locking 
during initialisation.


LGTM - except of the following, which we need to solve
(as suggested or differently (locking, or ...) or
by declaring it a nonissue (e.g. because of thinko of mine).

Thoughts about the following?

* * *

Namely, I wonder whether there will be an issue for

#pragma target nowait
   ...
#pragma target
   ...

Once the kernel is started, thegcn_expand_prologue creates some setup code and then a call to 
gomp_gcn_enter_kernel. Likewise for gcc/config/nvptx/nvptx.cc, where 
nvptx_declare_function_name adds via write_omp_entry a call to 
gomp_nvptx_main. And one of the first tasks there is 'build_indirect_map'. Assume a very simple kernel for the second item (i.e. it is quickly started)

and a very large number of reverse kernels.

Now, I wonder whether it is possible to have a race between the two kernels;
it seems as if that might happen but is extremely unlikely accounting for all
the overhead of launching and the rather small list of reverse offload items.

As it is unlikely, I wonder whether doing the following lock free, opportunistic
approach will be the best solution. Namely, assuming that no other kernel 
updates
the hash, but if that happens by chance, use the one that was created first.
(If we are lucky, the atomic overhead is fully cancelled by using a local
variable in the function but neither should matter much.)

if (!indirect_htab) // or: __atomic_load_n (_htab, __ATOMIC_RELAXED) ?
{
  htab_t local_indirect_htab = htab_create (num_ind_funcs);
  ...
  htab_t expected = NULL;
  __atomic_compare_exchange_n (_htab, ,
   local_indirect_htab, false, ...);
  if (expected) // Other kernel was faster, drop our version
htab_free (local_indirect_htab);
}

On January 29, 2024, Kwok Cheung Yeung wrote:
Can you please akso update the comments to talk about hashtab instead 
of splay?
This version has the comments updated and removes a stray 'volatile' 
in the #ifdefed out code.

Thanks,

Tobias



Re: [PATCH 1/3] bpf: Fix CO-RE field expression builtins

2024-03-14 Thread Cupertino Miranda


Jose E. Marchesi writes:

>> This patch corrects bugs within the CO-RE builtin field expression
>> related builtins.
>> The following bugs were identified and corrected based on the expected
>> results of bpf-next selftests testsuite.
>> It addresses the following problems:
>>  - Expressions with pointer dereferencing now point to the BTF structure
>>type, instead of the structure pointer type.
>>  - Pointer addition to structure root is now identified and constructed
>>in CO-RE relocations as if it is an array access. For example,
>>   "&(s+2)->b" generates "2:1" as an access string where "2" is
>>   refering to the access for "s+2".
>>
>> gcc/ChangeLog:
>>  * config/bpf/core-builtins.cc (core_field_info): Add
>>  support for POINTER_PLUS_EXPR in the root of the field expression.
>>  (bpf_core_get_index): Likewise.
>>  (pack_field_expr): Make the BTF type to point to the structure
>>  related node, instead of its pointer type.
>>  (make_core_safe_access_index): Correct to new code.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/bpf/core-attr-5.c: Correct.
>>  * gcc.target/bpf/core-attr-6.c: Likewise.
>>  * gcc.target/bpf/core-attr-struct-as-array.c: Add test case for
>>  pointer arithmetics as array access use case.
>> ---
>>  gcc/config/bpf/core-builtins.cc   | 54 +++
>>  gcc/testsuite/gcc.target/bpf/core-attr-5.c|  4 +-
>>  gcc/testsuite/gcc.target/bpf/core-attr-6.c|  4 +-
>>  .../bpf/core-attr-struct-as-array.c   | 35 
>>  4 files changed, 82 insertions(+), 15 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-struct-as-array.c
>>
>> diff --git a/gcc/config/bpf/core-builtins.cc 
>> b/gcc/config/bpf/core-builtins.cc
>> index 8d8c54c1fb3d..4256fea15e49 100644
>> --- a/gcc/config/bpf/core-builtins.cc
>> +++ b/gcc/config/bpf/core-builtins.cc
>> @@ -388,8 +388,8 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
>>
>>src = root_for_core_field_info (src);
>>
>> -  get_inner_reference (src, , , _off, , ,
>> -   , );
>> +  tree root = get_inner_reference (src, , , _off, ,
>> +   , , );
>>
>>/* Note: Use DECL_BIT_FIELD_TYPE rather than DECL_BIT_FIELD here, because 
>> it
>>   remembers whether the field in question was originally declared as a
>> @@ -414,6 +414,23 @@ core_field_info (tree src, enum btf_core_reloc_kind 
>> kind)
>>  {
>>  case BPF_RELO_FIELD_BYTE_OFFSET:
>>{
>> +result = 0;
>> +if (var_off == NULL_TREE
>> +&& TREE_CODE (root) == INDIRECT_REF
>> +&& TREE_CODE (TREE_OPERAND (root, 0)) == POINTER_PLUS_EXPR)
>> +  {
>> +tree node = TREE_OPERAND (root, 0);
>> +tree offset = TREE_OPERAND (node, 1);
>> +tree type = TREE_TYPE (TREE_OPERAND (node, 0));
>> +type = TREE_TYPE (type);
>> +
>> +gcc_assert (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p 
>> (offset)
>> +&& COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE 
>> (type)));
>
> What if an expression with a non-constant offset (something like s+foo)
> is passed to the builtin?  Wouldn't it be better to error there instead
> of ICEing?
>
In that case, var_off == NULL_TREE, and it did not reach the assert.
In any case, please notice that this code was copied from some different
code in the same file which in that case would actually produce the
error earlier.  The assert is there as a safe guard just in case the
other function stops detecting this case.

In core-builtins.cc:572

else if (code == POINTER_PLUS_EXPR)
  {
tree offset = TREE_OPERAND (node, 1);
tree type = TREE_TYPE (TREE_OPERAND (node, 0));
type = TREE_TYPE (type);

if (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p (offset)
&& COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE (type)))
  {
HOST_WIDE_INT offset_i = tree_to_shwi (offset);
HOST_WIDE_INT type_size_i = tree_to_shwi (TYPE_SIZE_UNIT (type));
if ((offset_i % type_size_i) == 0)
  return offset_i / type_size_i;
  }
  }

if (valid != NULL)
  *valid = false;
return -1;
  }

Because the code, although similar, is actually having different
purposes, I decided not to abstract this in an independent function. My
perception was that it would be more confusing.

Without wanting to paste too much code, please notice that the function
with the assert is only called if the above function, does not return
with error (i.e. valid != false).

>
>> +
>> +HOST_WIDE_INT offset_i = tree_to_shwi (offset);
>> +result += offset_i;
>> +  }
>> +
>>  type = unsigned_type_node;
>>  if (var_off != NULL_TREE)
>>{
>> @@ -422,9 +439,9 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
>>}
>>
>>  if (bitfieldp)
>> -  result = start_bitpos / 8;
>> +  result += 

Re: Re: [PATCH] RISC-V: Introduce option -mrvv-autovec-max-lmul for RVV autovec

2024-03-14 Thread juzhe.zh...@rivai.ai
>> This PR is not really resolved or affected by the patch if I'm not
>>mistaken.  We still have code paths that will generate a larger LMUL
>>(also in vsetvl last I checked, but that was a while ago).
Ok.  we should only mention PR target/112651 which is enough.

>>Should it really be called autovec-max-lmul?  We also use TARGET_MAX_LMUL
>>for builtins etc.  Or are we just following LLVM's naming here?
>>Isn't -mrvv-max-lmul sufficient?

The original option is kito's recommandation. Both -mrvv-max-lmul and 
-mrvv-autovec-max-lmul are ok for me.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-03-14 18:59
To: 钟居哲; demin.han; gcc-patches
CC: rdapp.gcc; kito.cheng; Li, Pan2; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Introduce option -mrvv-autovec-max-lmul for RVV 
autovec
Should it really be called autovec-max-lmul?  We also use TARGET_MAX_LMUL
for builtins etc.  Or are we just following LLVM's naming here?
Isn't -mrvv-max-lmul sufficient?
 
> PR target/112648 
 
This PR is not really resolved or affected by the patch if I'm not
mistaken.  We still have code paths that will generate a larger LMUL
(also in vsetvl last I checked, but that was a while ago).
 
Regards
Robin
 


Re: [PATCH] RISC-V: Introduce option -mrvv-autovec-max-lmul for RVV autovec

2024-03-14 Thread Robin Dapp
Should it really be called autovec-max-lmul?  We also use TARGET_MAX_LMUL
for builtins etc.  Or are we just following LLVM's naming here?
Isn't -mrvv-max-lmul sufficient?

> PR target/112648 

This PR is not really resolved or affected by the patch if I'm not
mistaken.  We still have code paths that will generate a larger LMUL
(also in vsetvl last I checked, but that was a while ago).

Regards
 Robin


[PATCH] bitint, v2: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Jakub Jelinek
On Thu, Mar 14, 2024 at 09:59:12AM +0100, Richard Biener wrote:
> On Thu, 14 Mar 2024, Jakub Jelinek wrote:
> 
> > On Thu, Mar 14, 2024 at 09:48:45AM +0100, Richard Biener wrote:
> > > Ugh.  OK, but I wonder whether we might want to simply delay
> > > fixing the CFG for inserts before returns-twice?  Would that make
> > > things less ugly?
> > 
> > You mean in lower_call just remember if we added anything before
> > ECF_RETURNS_TWICE call (or say add such stmt into a vector) and
> > then fix it up before the end of the pass?
> 
> Yeah, or just walk BBs with abnormal preds, whatever tracking is
> easier.

Walking all the bbs with abnormal preds would mean I'd have to walk their
whole bodies, because the ECF_RETURNS_TWICE call is no longer at the start.

The following patch seems to work, ok for trunk if it passes full
bootstrap/regtest?

2024-03-14  Jakub Jelinek  

PR tree-optimization/113466
* gimple-lower-bitint.cc (bitint_large_huge): Add m_returns_twice_calls
member.
(bitint_large_huge::bitint_large_huge): Initialize it.
(bitint_large_huge::~bitint_large_huge): Release it.
(bitint_large_huge::lower_call): Remember ECF_RETURNS_TWICE call stmts
before which at least one statement has been inserted.
(gimple_lower_bitint): Move argument loads before ECF_RETURNS_TWICE
calls to a different block and add corresponding PHIs.

* gcc.dg/bitint-100.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-13 15:34:29.969725763 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-14 11:25:07.763169074 +0100
@@ -419,7 +419,8 @@ struct bitint_large_huge
   bitint_large_huge ()
 : m_names (NULL), m_loads (NULL), m_preserved (NULL),
   m_single_use_names (NULL), m_map (NULL), m_vars (NULL),
-  m_limb_type (NULL_TREE), m_data (vNULL) {}
+  m_limb_type (NULL_TREE), m_data (vNULL),
+  m_returns_twice_calls (vNULL) {}
 
   ~bitint_large_huge ();
 
@@ -553,6 +554,7 @@ struct bitint_large_huge
   unsigned m_bitfld_load;
   vec m_data;
   unsigned int m_data_cnt;
+  vec m_returns_twice_calls;
 };
 
 bitint_large_huge::~bitint_large_huge ()
@@ -565,6 +567,7 @@ bitint_large_huge::~bitint_large_huge ()
 delete_var_map (m_map);
   XDELETEVEC (m_vars);
   m_data.release ();
+  m_returns_twice_calls.release ();
 }
 
 /* Insert gimple statement G before current location
@@ -5248,6 +5251,7 @@ bitint_large_huge::lower_call (tree obj,
   default:
break;
   }
+  bool returns_twice = (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0;
   for (unsigned int i = 0; i < nargs; ++i)
 {
   tree arg = gimple_call_arg (stmt, i);
@@ -5271,6 +5275,11 @@ bitint_large_huge::lower_call (tree obj,
  arg = make_ssa_name (TREE_TYPE (arg));
  gimple *g = gimple_build_assign (arg, v);
  gsi_insert_before (, g, GSI_SAME_STMT);
+ if (returns_twice)
+   {
+ m_returns_twice_calls.safe_push (stmt);
+ returns_twice = false;
+   }
}
   gimple_call_set_arg (stmt, i, arg);
   if (m_preserved == NULL)
@@ -7091,6 +7100,66 @@ gimple_lower_bitint (void)
   if (edge_insertions)
 gsi_commit_edge_inserts ();
 
+  /* Fix up arguments of ECF_RETURNS_TWICE calls.  Those were temporarily
+ inserted before the call, but that is invalid IL, so move them to the
+ right place and add corresponding PHIs.  */
+  if (!large_huge.m_returns_twice_calls.is_empty ())
+{
+  auto_vec arg_stmts;
+  while (!large_huge.m_returns_twice_calls.is_empty ())
+   {
+ gimple *stmt = large_huge.m_returns_twice_calls.pop ();
+ gimple_stmt_iterator gsi = gsi_after_labels (gimple_bb (stmt));
+ while (gsi_stmt (gsi) != stmt)
+   {
+ arg_stmts.safe_push (gsi_stmt (gsi));
+ gsi_remove (, false);
+   }
+ gimple *g;
+ basic_block bb = NULL;
+ edge e = NULL, ead = NULL;
+ FOR_EACH_VEC_ELT (arg_stmts, i, g)
+   {
+ gsi_safe_insert_before (, g);
+ if (i == 0)
+   {
+ bb = gimple_bb (stmt);
+ gcc_checking_assert (EDGE_COUNT (bb->preds) == 2);
+ e = EDGE_PRED (bb, 0);
+ ead = EDGE_PRED (bb, 1);
+ if ((ead->flags & EDGE_ABNORMAL) == 0)
+   std::swap (e, ead);
+ gcc_checking_assert ((e->flags & EDGE_ABNORMAL) == 0
+  && (ead->flags & EDGE_ABNORMAL));
+   }
+ tree lhs = gimple_assign_lhs (g);
+ tree arg = lhs;
+ gphi *phi = create_phi_node (copy_ssa_name (arg), bb);
+ add_phi_arg (phi, arg, e, UNKNOWN_LOCATION);
+ tree var = create_tmp_reg (TREE_TYPE (arg));
+ suppress_warning (var, OPT_Wuninitialized);
+ arg = get_or_create_ssa_default_def (cfun, var);
+ SSA_NAME_OCCURS_IN_ABNORMAL_PHI 

Re: [PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-14 Thread Richard Biener
On Thu, Mar 14, 2024 at 11:14 AM Tejas Belagod  wrote:
>
>
> Ping.
>
> Thanks,
> Tejas.
>
> On 3/13/24 6:07 PM, Tejas Belagod wrote:
> > Ping!
> >
> > On 3/7/24 4:14 PM, Tejas Belagod wrote:
> >> This patch fixes a bug where vect_recog_abd_pattern called
> >> vect_convert_output
> >> with the incorrect vecitype for the corresponding pattern_stmt.
> >> vect_convert_output expects vecitype to be the vector form of the
> >> scalar type
> >> of the LHS of pattern_stmt, but we were passing in the vector form of
> >> the LHS
> >> of the new impending conversion statement.  This caused a skew in ABD's
> >> pattern_stmt having the vectype of the following gimple pattern_stmt.

OK

> >> 2024-03-06  Tejas Belagod  
> >>
> >> gcc/ChangeLog:
> >>
> >> PR middle-end/114108
> >> * tree-vect-patterns.cc (vect_recog_abd_pattern): Call
> >> vect_convert_output with the correct vecitype.
> >>
> >> gcc/testsuite/ChangeLog:
> >> * gcc.dg/vect/pr114108.c: New test.
> >> ---
> >>   gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
> >>   gcc/tree-vect-patterns.cc|  5 ++---
> >>   2 files changed, 21 insertions(+), 3 deletions(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c
> >> b/gcc/testsuite/gcc.dg/vect/pr114108.c
> >> new file mode 100644
> >> index 000..b3075d41398
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
> >> @@ -0,0 +1,19 @@
> >> +/* { dg-do compile } */
> >> +
> >> +#include "tree-vect.h"
> >> +
> >> +typedef signed char schar;
> >> +
> >> +__attribute__((noipa, noinline, optimize("O3")))
> >> +void foo(const schar *a, const schar *b, schar *c, int n)
> >> +{
> >> +  for (int i = 0; i < n; i++)
> >> +{
> >> +  unsigned u = __builtin_abs (a[i] - b[i]);
> >> +  c[i] = u <= 7U ? u : 7U;
> >> +}
> >> +}
> >> +
> >> +
> >> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target
> >> aarch64*-*-* } } } */
> >> +/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected"
> >> "vect" { target aarch64*-*-* } } } */
> >> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> >> index d562f57920f..4f491c6b833 100644
> >> --- a/gcc/tree-vect-patterns.cc
> >> +++ b/gcc/tree-vect-patterns.cc
> >> @@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
> >> && !TYPE_UNSIGNED (abd_out_type))
> >>   {
> >> tree unsign = unsigned_type_for (abd_out_type);
> >> -  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
> >> -  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
> >> -  unsign_vectype);
> >> +  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
> >> vectype_out);
> >> +  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
> >>   }
> >> return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt,
> >> vectype_out);
> >
>


Re: [PATCH] vect: Call vect_convert_output with the right vecitype [PR114108]

2024-03-14 Thread Tejas Belagod



Ping.

Thanks,
Tejas.

On 3/13/24 6:07 PM, Tejas Belagod wrote:

Ping!

On 3/7/24 4:14 PM, Tejas Belagod wrote:
This patch fixes a bug where vect_recog_abd_pattern called 
vect_convert_output

with the incorrect vecitype for the corresponding pattern_stmt.
vect_convert_output expects vecitype to be the vector form of the 
scalar type
of the LHS of pattern_stmt, but we were passing in the vector form of 
the LHS

of the new impending conversion statement.  This caused a skew in ABD's
pattern_stmt having the vectype of the following gimple pattern_stmt.

2024-03-06  Tejas Belagod  

gcc/ChangeLog:

PR middle-end/114108
* tree-vect-patterns.cc (vect_recog_abd_pattern): Call
vect_convert_output with the correct vecitype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr114108.c: New test.
---
  gcc/testsuite/gcc.dg/vect/pr114108.c | 19 +++
  gcc/tree-vect-patterns.cc    |  5 ++---
  2 files changed, 21 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/pr114108.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr114108.c 
b/gcc/testsuite/gcc.dg/vect/pr114108.c

new file mode 100644
index 000..b3075d41398
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr114108.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+#include "tree-vect.h"
+
+typedef signed char schar;
+
+__attribute__((noipa, noinline, optimize("O3")))
+void foo(const schar *a, const schar *b, schar *c, int n)
+{
+  for (int i = 0; i < n; i++)
+    {
+  unsigned u = __builtin_abs (a[i] - b[i]);
+  c[i] = u <= 7U ? u : 7U;
+    }
+}
+
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { target 
aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump "vect_recog_abd_pattern: detected" 
"vect" { target aarch64*-*-* } } } */

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index d562f57920f..4f491c6b833 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1576,9 +1576,8 @@ vect_recog_abd_pattern (vec_info *vinfo,
    && !TYPE_UNSIGNED (abd_out_type))
  {
    tree unsign = unsigned_type_for (abd_out_type);
-  tree unsign_vectype = get_vectype_for_scalar_type (vinfo, unsign);
-  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt,
-  unsign_vectype);
+  stmt = vect_convert_output (vinfo, stmt_vinfo, unsign, stmt, 
vectype_out);

+  vectype_out = get_vectype_for_scalar_type (vinfo, unsign);
  }
    return vect_convert_output (vinfo, stmt_vinfo, out_type, stmt, 
vectype_out);






[Committed] IBM Z: Fix -munaligned-symbols

2024-03-14 Thread Andreas Krebbel
With this fix we make sure that only symbols with a natural alignment
smaller than 2 are considered misaligned with
-munaligned-symbols. Background is that -munaligned-symbols is only
supposed to affect symbols whose natural alignment wouldn't be enough
to fulfill our ABI requirement of having all symbols at even
addresses. Because only these are the cases where we differ from other
architectures.

This fixes the unaligned-1 testcase, no regressions. Committed to mainline.

gcc/ChangeLog:

* config/s390/s390.cc (s390_encode_section_info): Adjust the check
for misaligned symbols.
* config/s390/s390.opt: Improve documentation.

gcc/testsuite/ChangeLog:

* gcc.target/s390/aligned-1.c: Add weak and void variables
incorporating the cases from unaligned-2.c.
* gcc.target/s390/unaligned-1.c: Likewise.
* gcc.target/s390/unaligned-2.c: Removed.
---
 gcc/config/s390/s390.cc |  15 ++-
 gcc/config/s390/s390.opt|   7 +-
 gcc/testsuite/gcc.target/s390/aligned-1.c   | 101 +--
 gcc/testsuite/gcc.target/s390/unaligned-1.c | 103 ++--
 gcc/testsuite/gcc.target/s390/unaligned-2.c |  16 ---
 5 files changed, 201 insertions(+), 41 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/s390/unaligned-2.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index e63965578f1..372a2324403 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -13802,10 +13802,19 @@ s390_encode_section_info (tree decl, rtx rtl, int 
first)
 that can go wrong (i.e. no FUNC_DECLs).
 All symbols without an explicit alignment are assumed to be 2
 byte aligned as mandated by our ABI.  This behavior can be
-overridden for external symbols with the -munaligned-symbols
-switch.  */
+overridden for external and weak symbols with the
+-munaligned-symbols switch.
+For all external symbols without explicit alignment
+DECL_ALIGN is already trimmed down to 8, however for weak
+symbols this does not happen.  These cases are catched by the
+type size check.  */
+  const_tree size = TYPE_SIZE (TREE_TYPE (decl));
+  unsigned HOST_WIDE_INT size_num = (tree_fits_uhwi_p (size)
+? tree_to_uhwi (size) : 0);
   if ((DECL_USER_ALIGN (decl) && DECL_ALIGN (decl) % 16)
- || (s390_unaligned_symbols_p && !decl_binds_to_current_def_p (decl)))
+ || (s390_unaligned_symbols_p
+ && !decl_binds_to_current_def_p (decl)
+ && (DECL_USER_ALIGN (decl) ? DECL_ALIGN (decl) % 16 : size_num < 
16)))
SYMBOL_FLAG_SET_NOTALIGN2 (XEXP (rtl, 0));
   else if (DECL_ALIGN (decl) % 32)
SYMBOL_FLAG_SET_NOTALIGN4 (XEXP (rtl, 0));
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index 901ae4beb01..a5b5aa95a12 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -332,7 +332,8 @@ Store all argument registers on the stack.
 
 munaligned-symbols
 Target Var(s390_unaligned_symbols_p) Init(0)
-Assume external symbols to be potentially unaligned.  By default all
-symbols without explicit alignment are assumed to reside on a 2 byte
-boundary as mandated by the IBM Z ABI.
+Assume external symbols, whose natural alignment would be 1, to be
+potentially unaligned.  By default all symbols without explicit
+alignment are assumed to reside on a 2 byte boundary as mandated by
+the IBM Z ABI.
 
diff --git a/gcc/testsuite/gcc.target/s390/aligned-1.c 
b/gcc/testsuite/gcc.target/s390/aligned-1.c
index 2dc99cf66bd..3f5a2611ef1 100644
--- a/gcc/testsuite/gcc.target/s390/aligned-1.c
+++ b/gcc/testsuite/gcc.target/s390/aligned-1.c
@@ -1,20 +1,103 @@
-/* Even symbols without explicite alignment are assumed to reside on a
+/* Even symbols without explicit alignment are assumed to reside on a
2 byte boundary, as mandated by the IBM Z ELF ABI, and therefore
can be accessed using the larl instruction.  */
 
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z900 -fno-section-anchors" } */
 
-extern unsigned char extern_implicitly_aligned;
-extern unsigned char extern_explicitly_aligned __attribute__((aligned(2)));
-unsigned char aligned;
+extern unsigned char extern_char;
+extern unsigned char extern_explicitly_aligned_char 
__attribute__((aligned(2)));
+extern unsigned char extern_explicitly_unaligned_char 
__attribute__((aligned(1)));
+extern unsigned char __attribute__((weak)) extern_weak_char;
+extern unsigned char extern_explicitly_aligned_weak_char 
__attribute__((weak,aligned(2)));
+extern unsigned char extern_explicitly_unaligned_weak_char 
__attribute__((weak,aligned(1)));
 
-unsigned char
+unsigned char normal_char;
+unsigned char explicitly_unaligned_char __attribute__((aligned(1)));
+unsigned char __attribute__((weak)) weak_char = 0;
+unsigned char explicitly_aligned_weak_char __attribute__((weak,aligned(2)));
+unsigned 

[PATCH v2] match.pd: Only merge truncation with conversion for -fno-signed-zeros

2024-03-14 Thread Joe Ramsay
This optimisation does not honour signed zeros, so should not be
enabled except with -fno-signed-zeros.

OK for master? I do not have commit rights for GCC, so if the patch
is fine would someone be able to commit for me? The bug is present
in all GCC versions from 12.1.0 onwards - is it possible to backport
this?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Thanks,
Joe

gcc/ChangeLog:

* match.pd: Fix truncation pattern for -fno-signed-zeroes

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/no_merge_trunc_signed_zero.c: New test.
---
Changes from v1, whitespace change only.
 gcc/match.pd  |  1 +
 .../aarch64/no_merge_trunc_signed_zero.c  | 24 +++
 2 files changed, 25 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 9ce313323a3..15a1e7350d4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4858,6 +4858,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
(float (fix_trunc @0))
(if (!flag_trapping_math
+   && !HONOR_SIGNED_ZEROS (type)
&& types_match (type, TREE_TYPE (@0))
&& direct_internal_fn_supported_p (IFN_TRUNC, type,
  OPTIMIZE_FOR_BOTH))
diff --git a/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c 
b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
new file mode 100644
index 000..b2c93e55567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-trapping-math -fsigned-zeros" } */
+
+#include 
+
+float
+f1 (float x)
+{
+  return (int) rintf(x);
+}
+
+double
+f2 (double x)
+{
+  return (long) rint(x);
+}
+
+/* { dg-final { scan-assembler "frintx\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "cvtzs\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\ts\[0-9\]+, s\[0-9\]+" } } */
+/* { dg-final { scan-assembler "frintx\\td\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "cvtzs\\td\[0-9\]+, d\[0-9\]+" } } */
+/* { dg-final { scan-assembler "scvtf\\td\[0-9\]+, d\[0-9\]+" } } */
+
-- 
2.27.0



Re: [PATCH] bitint: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Richard Biener
On Thu, 14 Mar 2024, Jakub Jelinek wrote:

> On Thu, Mar 14, 2024 at 09:48:45AM +0100, Richard Biener wrote:
> > Ugh.  OK, but I wonder whether we might want to simply delay
> > fixing the CFG for inserts before returns-twice?  Would that make
> > things less ugly?
> 
> You mean in lower_call just remember if we added anything before
> ECF_RETURNS_TWICE call (or say add such stmt into a vector) and
> then fix it up before the end of the pass?

Yeah, or just walk BBs with abnormal preds, whatever tracking is
easier.

> I can certainly try that and see what is shorter/more maintainable.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 1/3] bpf: Fix CO-RE field expression builtins

2024-03-14 Thread Jose E. Marchesi


> This patch corrects bugs within the CO-RE builtin field expression
> related builtins.
> The following bugs were identified and corrected based on the expected
> results of bpf-next selftests testsuite.
> It addresses the following problems:
>  - Expressions with pointer dereferencing now point to the BTF structure
>type, instead of the structure pointer type.
>  - Pointer addition to structure root is now identified and constructed
>in CO-RE relocations as if it is an array access. For example,
>   "&(s+2)->b" generates "2:1" as an access string where "2" is
>   refering to the access for "s+2".
>
> gcc/ChangeLog:
>   * config/bpf/core-builtins.cc (core_field_info): Add
>   support for POINTER_PLUS_EXPR in the root of the field expression.
>   (bpf_core_get_index): Likewise.
>   (pack_field_expr): Make the BTF type to point to the structure
>   related node, instead of its pointer type.
>   (make_core_safe_access_index): Correct to new code.
>
> gcc/testsuite/ChangeLog:
>   * gcc.target/bpf/core-attr-5.c: Correct.
>   * gcc.target/bpf/core-attr-6.c: Likewise.
>   * gcc.target/bpf/core-attr-struct-as-array.c: Add test case for
>   pointer arithmetics as array access use case.
> ---
>  gcc/config/bpf/core-builtins.cc   | 54 +++
>  gcc/testsuite/gcc.target/bpf/core-attr-5.c|  4 +-
>  gcc/testsuite/gcc.target/bpf/core-attr-6.c|  4 +-
>  .../bpf/core-attr-struct-as-array.c   | 35 
>  4 files changed, 82 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-struct-as-array.c
>
> diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
> index 8d8c54c1fb3d..4256fea15e49 100644
> --- a/gcc/config/bpf/core-builtins.cc
> +++ b/gcc/config/bpf/core-builtins.cc
> @@ -388,8 +388,8 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
>  
>src = root_for_core_field_info (src);
>  
> -  get_inner_reference (src, , , _off, , ,
> -, );
> +  tree root = get_inner_reference (src, , , _off, ,
> +, , );
>  
>/* Note: Use DECL_BIT_FIELD_TYPE rather than DECL_BIT_FIELD here, because 
> it
>   remembers whether the field in question was originally declared as a
> @@ -414,6 +414,23 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
>  {
>  case BPF_RELO_FIELD_BYTE_OFFSET:
>{
> + result = 0;
> + if (var_off == NULL_TREE
> + && TREE_CODE (root) == INDIRECT_REF
> + && TREE_CODE (TREE_OPERAND (root, 0)) == POINTER_PLUS_EXPR)
> +   {
> + tree node = TREE_OPERAND (root, 0);
> + tree offset = TREE_OPERAND (node, 1);
> + tree type = TREE_TYPE (TREE_OPERAND (node, 0));
> + type = TREE_TYPE (type);
> +
> + gcc_assert (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p 
> (offset)
> + && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE 
> (type)));

What if an expression with a non-constant offset (something like s+foo)
is passed to the builtin?  Wouldn't it be better to error there instead
of ICEing?

> +
> + HOST_WIDE_INT offset_i = tree_to_shwi (offset);
> + result += offset_i;
> +   }
> +
>   type = unsigned_type_node;
>   if (var_off != NULL_TREE)
> {
> @@ -422,9 +439,9 @@ core_field_info (tree src, enum btf_core_reloc_kind kind)
> }
>  
>   if (bitfieldp)
> -   result = start_bitpos / 8;
> +   result += start_bitpos / 8;
>   else
> -   result = bitpos / 8;
> +   result += bitpos / 8;
>}
>break;
>  
> @@ -552,6 +569,7 @@ bpf_core_get_index (const tree node, bool *valid)
>  {
>tree offset = TREE_OPERAND (node, 1);
>tree type = TREE_TYPE (TREE_OPERAND (node, 0));
> +  type = TREE_TYPE (type);
>  
>if (TREE_CODE (offset) == INTEGER_CST && tree_fits_shwi_p (offset)
> && COMPLETE_TYPE_P (type) && tree_fits_shwi_p (TYPE_SIZE (type)))
> @@ -627,14 +645,18 @@ compute_field_expr (tree node, unsigned int *accessors,
>  
>switch (TREE_CODE (node))
>  {
> -case ADDR_EXPR:
> -  return 0;
>  case INDIRECT_REF:
> -  accessors[0] = 0;
> -  return 1;
> -case POINTER_PLUS_EXPR:
> -  accessors[0] = bpf_core_get_index (node, valid);
> -  return 1;
> +  if (TREE_CODE (node = TREE_OPERAND (node, 0)) == POINTER_PLUS_EXPR)
> + {
> +   accessors[0] = bpf_core_get_index (node, valid);
> +   *access_node = TREE_OPERAND (node, 0);
> +   return 1;
> + }
> +  else
> + {
> +   accessors[0] = 0;
> +   return 1;
> + }
>  case COMPONENT_REF:
>n = compute_field_expr (TREE_OPERAND (node, 0), accessors,
> valid,
> @@ -660,6 +682,7 @@ compute_field_expr (tree node, unsigned int *accessors,
> access_node, false);
>return n;
>  
> +case 

Re: [PATCH] bitint: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Jakub Jelinek
On Thu, Mar 14, 2024 at 09:48:45AM +0100, Richard Biener wrote:
> Ugh.  OK, but I wonder whether we might want to simply delay
> fixing the CFG for inserts before returns-twice?  Would that make
> things less ugly?

You mean in lower_call just remember if we added anything before
ECF_RETURNS_TWICE call (or say add such stmt into a vector) and
then fix it up before the end of the pass?
I can certainly try that and see what is shorter/more maintainable.

Jakub



Re: [PATCH] bitint: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Richard Biener
On Thu, 14 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> This patch (on top of the just posted gsi_safe_insert* fixes patch)
> fixes the instrumentation of large/huge _BitInt SSA_NAME arguments of
> returns_twice calls.
> 
> In this case it isn't just a matter of using gsi_safe_insert_before instead
> of gsi_insert_before, we need to do more.
> 
> One thing is that unlike the asan/ubsan instrumentation which does just some
> checking, here we want the statement before the call to load into a SSA_NAME
> which is passed to the call.  With another edge we need to add a PHI,
> with one PHI argument the loaded SSA_NAME, another argument an uninitialized
> warning free SSA_NAME and a result and arrange for all 3 SSA_NAMEs to be
> preserved (i.e. stay as is, be no longer lowered afterwards).
> 
> Another problem is because edge_before_returns_twice_call may use
> copy_ssa_name, we can end up with large/huge _BitInt SSA_NAMEs we don't
> really track in the pass.  We need an underlying variable for those, but
> because of the way they are constructed we can find that easily, we can
> use the same underlying variable for the PHI arg from non-EDGE_ABNORMAL
> edge as we use for the corresponding PHI result.  The ugliest part of
> this is growing the partition if it needs to be growed (otherwise it is
> just a partition_union).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ugh.  OK, but I wonder whether we might want to simply delay
fixing the CFG for inserts before returns-twice?  Would that make
things less ugly?

Thanks,
Richard.

> 2024-03-14  Jakub Jelinek  
> 
>   PR tree-optimization/113466
>   * gimple-lower-bitint.cc (bitint_large_huge::lower_call): Handle
>   ECF_RETURNS_TWICE call arguments correctly.
>   (gimple_lower_bitint): Ignore PHIs where the PHI result is
>   in m_preserved bitmap.
> 
>   * gcc.dg/bitint-100.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj 2024-03-13 13:03:08.120117081 +0100
> +++ gcc/gimple-lower-bitint.cc2024-03-13 15:05:54.830303524 +0100
> @@ -5248,6 +5248,7 @@ bitint_large_huge::lower_call (tree obj,
>default:
>   break;
>}
> +  int returns_twice = (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0;
>for (unsigned int i = 0; i < nargs; ++i)
>  {
>tree arg = gimple_call_arg (stmt, i);
> @@ -5255,6 +5256,8 @@ bitint_large_huge::lower_call (tree obj,
> || TREE_CODE (TREE_TYPE (arg)) != BITINT_TYPE
> || bitint_precision_kind (TREE_TYPE (arg)) <= bitint_prec_middle)
>   continue;
> +  if (m_preserved == NULL)
> + m_preserved = BITMAP_ALLOC (NULL);
>if (SSA_NAME_IS_DEFAULT_DEF (arg)
> && (!SSA_NAME_VAR (arg) || VAR_P (SSA_NAME_VAR (arg
>   {
> @@ -5270,11 +5273,93 @@ bitint_large_huge::lower_call (tree obj,
>   v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
> arg = make_ssa_name (TREE_TYPE (arg));
> gimple *g = gimple_build_assign (arg, v);
> -   gsi_insert_before (, g, GSI_SAME_STMT);
> +   gsi_safe_insert_before (, g);
> +   if (returns_twice)
> + {
> +   basic_block bb = gimple_bb (stmt);
> +   gcc_checking_assert (EDGE_COUNT (bb->preds) == 2);
> +   edge e = EDGE_PRED (bb, 0), ead = EDGE_PRED (bb, 1);
> +   if ((ead->flags & EDGE_ABNORMAL) == 0)
> + std::swap (e, ead);
> +   gcc_checking_assert ((e->flags & EDGE_ABNORMAL) == 0
> +&& (ead->flags & EDGE_ABNORMAL));
> +   if (returns_twice == 1)
> + {
> +   /* edge_before_returns_twice_call can use copy_ssa_name
> +  for some PHIs, but in that case we need to put it
> +  into the same partition as the copied SSA_NAME.   */
> +   unsigned max_ver = 0;
> +   for (gphi_iterator gsi = gsi_start_phis (bb);
> +!gsi_end_p (gsi); gsi_next ())
> + {
> +   gphi *phi = gsi.phi ();
> +   tree lhs = gimple_phi_result (phi);
> +   tree arg = gimple_phi_arg_def_from_edge (phi, e);
> +   if (m_names
> +   && TREE_CODE (arg) == SSA_NAME
> +   && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
> +   && (bitint_precision_kind (TREE_TYPE (lhs))
> +   > bitint_prec_middle)
> +   && bitmap_bit_p (m_names, SSA_NAME_VERSION (lhs))
> +   && !bitmap_bit_p (m_names, SSA_NAME_VERSION (arg)))
> + max_ver = MAX (max_ver, SSA_NAME_VERSION (arg));
> + }
> +   if (max_ver != 0)
> + {
> +   if ((unsigned) m_map->var_partition->num_elements
> +   <= max_ver)
> + {
> +   partition p = partition_new (max_ver + 1);
> +   partition 

Re: [PATCH] gimple-iterator: Some gsi_safe_insert_*before fixes

2024-03-14 Thread Richard Biener
On Thu, 14 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> When trying to use the gsi_safe_insert*before APIs in bitint lowering,
> I've discovered 3 issues and the following patch addresses those:
> 
> 1) both split_block and split_edge update CDI_DOMINATORS if they are
>available, but because edge_before_returns_twice_call first splits
>and then adds an extra EDGE_ABNORMAL edge and then removes another
>one, the immediate dominators of both the new bb and the bb with
>returns_twice call need to change
> 2) the new EDGE_ABNORMAL edge had uninitialized probability; this patch
>copies the probability from the edge that is going to be removed
>and similarly copies other flags (EDGE_EXECUTABLE, EDGE_DFS_BACK,
>EDGE_IRREDUCIBLE_LOOP etc.)
> 3) if edge_before_returns_twice_call splits a block, then the bb with
>returns_twice call changes, so the gimple_stmt_iterator for it is
>no longer accurate, it points to the right statement, but gsi_bb
>and gsi_seq are no longer correct; the patch updates it
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK

> 2024-03-14  Jakub Jelinek  
> 
>   * gimple-iterator.cc (edge_before_returns_twice_call): Copy all
>   flags and probability from ad_edge to e edge.  If CDI_DOMINATORS
>   are computed, recompute immediate dominator of other_edge->src
>   and other_edge->dest.
>   (gsi_safe_insert_before, gsi_safe_insert_seq_before): Update *iter
>   for the returns_twice call case to the gsi_for_stmt (stmt) to deal
>   with update it for bb splitting.
> 
> --- gcc/gimple-iterator.cc.jj 2024-03-13 13:03:08.073117732 +0100
> +++ gcc/gimple-iterator.cc2024-03-13 15:16:45.294317700 +0100
> @@ -997,7 +997,18 @@ edge_before_returns_twice_call (basic_bl
> add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
>  e, gimple_phi_arg_location_from_edge (phi, ad_edge));
>   }
> +  e->flags = ad_edge->flags;
> +  e->probability = ad_edge->probability;
>remove_edge (ad_edge);
> +  if (dom_info_available_p (CDI_DOMINATORS))
> + {
> +   set_immediate_dominator (CDI_DOMINATORS, other_edge->src,
> +recompute_dominator (CDI_DOMINATORS,
> + other_edge->src));
> +   set_immediate_dominator (CDI_DOMINATORS, other_edge->dest,
> +recompute_dominator (CDI_DOMINATORS,
> + other_edge->dest));
> + }
>  }
>return other_edge;
>  }
> @@ -1045,6 +1056,7 @@ gsi_safe_insert_before (gimple_stmt_iter
>if (new_bb)
>   e = single_succ_edge (new_bb);
>adjust_before_returns_twice_call (e, g);
> +  *iter = gsi_for_stmt (stmt);
>  }
>else
>  gsi_insert_before (iter, g, GSI_SAME_STMT);
> @@ -1075,6 +1087,7 @@ gsi_safe_insert_seq_before (gimple_stmt_
> if (g == l)
>   break;
>   }
> +  *iter = gsi_for_stmt (stmt);
>  }
>else
>  gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

2024-03-14 Thread Jakub Jelinek
Hi!

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

2024-03-14  Jakub Jelinek  

PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.

* gcc.dg/pr114310.c: New test.

--- gcc/config/aarch64/aarch64.cc.jj2024-03-12 10:16:12.024101665 +0100
+++ gcc/config/aarch64/aarch64.cc   2024-03-13 18:55:39.147986554 +0100
@@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope
 rval = copy_to_mode_reg (r_mode, oldval);
   else
emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+  if (mode == TImode)
+   newval = force_reg (mode, newval);
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
--- gcc/testsuite/gcc.dg/pr114310.c.jj  2024-03-13 19:09:25.322597418 +0100
+++ gcc/testsuite/gcc.dg/pr114310.c 2024-03-13 19:08:50.802073314 +0100
@@ -0,0 +1,20 @@
+/* PR target/114310 */
+/* { dg-do run { target int128 } } */
+
+volatile __attribute__((aligned (sizeof (__int128_t __int128_t v = 10;
+
+int
+main ()
+{
+#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 0) != 10)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 15) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 0, (__int128_t) 42) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 31, (__int128_t) 35) != 42)
+__builtin_abort ();
+#endif
+  return 0;
+}

Jakub



[PATCH] bitint: Fix up adjustment of large/huge _BitInt arguments of returns_twice calls [PR113466]

2024-03-14 Thread Jakub Jelinek
Hi!

This patch (on top of the just posted gsi_safe_insert* fixes patch)
fixes the instrumentation of large/huge _BitInt SSA_NAME arguments of
returns_twice calls.

In this case it isn't just a matter of using gsi_safe_insert_before instead
of gsi_insert_before, we need to do more.

One thing is that unlike the asan/ubsan instrumentation which does just some
checking, here we want the statement before the call to load into a SSA_NAME
which is passed to the call.  With another edge we need to add a PHI,
with one PHI argument the loaded SSA_NAME, another argument an uninitialized
warning free SSA_NAME and a result and arrange for all 3 SSA_NAMEs to be
preserved (i.e. stay as is, be no longer lowered afterwards).

Another problem is because edge_before_returns_twice_call may use
copy_ssa_name, we can end up with large/huge _BitInt SSA_NAMEs we don't
really track in the pass.  We need an underlying variable for those, but
because of the way they are constructed we can find that easily, we can
use the same underlying variable for the PHI arg from non-EDGE_ABNORMAL
edge as we use for the corresponding PHI result.  The ugliest part of
this is growing the partition if it needs to be growed (otherwise it is
just a partition_union).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-14  Jakub Jelinek  

PR tree-optimization/113466
* gimple-lower-bitint.cc (bitint_large_huge::lower_call): Handle
ECF_RETURNS_TWICE call arguments correctly.
(gimple_lower_bitint): Ignore PHIs where the PHI result is
in m_preserved bitmap.

* gcc.dg/bitint-100.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-03-13 13:03:08.120117081 +0100
+++ gcc/gimple-lower-bitint.cc  2024-03-13 15:05:54.830303524 +0100
@@ -5248,6 +5248,7 @@ bitint_large_huge::lower_call (tree obj,
   default:
break;
   }
+  int returns_twice = (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0;
   for (unsigned int i = 0; i < nargs; ++i)
 {
   tree arg = gimple_call_arg (stmt, i);
@@ -5255,6 +5256,8 @@ bitint_large_huge::lower_call (tree obj,
  || TREE_CODE (TREE_TYPE (arg)) != BITINT_TYPE
  || bitint_precision_kind (TREE_TYPE (arg)) <= bitint_prec_middle)
continue;
+  if (m_preserved == NULL)
+   m_preserved = BITMAP_ALLOC (NULL);
   if (SSA_NAME_IS_DEFAULT_DEF (arg)
  && (!SSA_NAME_VAR (arg) || VAR_P (SSA_NAME_VAR (arg
{
@@ -5270,11 +5273,93 @@ bitint_large_huge::lower_call (tree obj,
v = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (arg), v);
  arg = make_ssa_name (TREE_TYPE (arg));
  gimple *g = gimple_build_assign (arg, v);
- gsi_insert_before (, g, GSI_SAME_STMT);
+ gsi_safe_insert_before (, g);
+ if (returns_twice)
+   {
+ basic_block bb = gimple_bb (stmt);
+ gcc_checking_assert (EDGE_COUNT (bb->preds) == 2);
+ edge e = EDGE_PRED (bb, 0), ead = EDGE_PRED (bb, 1);
+ if ((ead->flags & EDGE_ABNORMAL) == 0)
+   std::swap (e, ead);
+ gcc_checking_assert ((e->flags & EDGE_ABNORMAL) == 0
+  && (ead->flags & EDGE_ABNORMAL));
+ if (returns_twice == 1)
+   {
+ /* edge_before_returns_twice_call can use copy_ssa_name
+for some PHIs, but in that case we need to put it
+into the same partition as the copied SSA_NAME.   */
+ unsigned max_ver = 0;
+ for (gphi_iterator gsi = gsi_start_phis (bb);
+  !gsi_end_p (gsi); gsi_next ())
+   {
+ gphi *phi = gsi.phi ();
+ tree lhs = gimple_phi_result (phi);
+ tree arg = gimple_phi_arg_def_from_edge (phi, e);
+ if (m_names
+ && TREE_CODE (arg) == SSA_NAME
+ && TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
+ && (bitint_precision_kind (TREE_TYPE (lhs))
+ > bitint_prec_middle)
+ && bitmap_bit_p (m_names, SSA_NAME_VERSION (lhs))
+ && !bitmap_bit_p (m_names, SSA_NAME_VERSION (arg)))
+   max_ver = MAX (max_ver, SSA_NAME_VERSION (arg));
+   }
+ if (max_ver != 0)
+   {
+ if ((unsigned) m_map->var_partition->num_elements
+ <= max_ver)
+   {
+ partition p = partition_new (max_ver + 1);
+ partition o = m_map->var_partition;
+ for (int e = 0; e < o->num_elements; ++e)
+   {
+ p->elements[e].class_element
+   = o->elements[e].class_element;
+ 

[PATCH] gimple-iterator: Some gsi_safe_insert_*before fixes

2024-03-14 Thread Jakub Jelinek
Hi!

When trying to use the gsi_safe_insert*before APIs in bitint lowering,
I've discovered 3 issues and the following patch addresses those:

1) both split_block and split_edge update CDI_DOMINATORS if they are
   available, but because edge_before_returns_twice_call first splits
   and then adds an extra EDGE_ABNORMAL edge and then removes another
   one, the immediate dominators of both the new bb and the bb with
   returns_twice call need to change
2) the new EDGE_ABNORMAL edge had uninitialized probability; this patch
   copies the probability from the edge that is going to be removed
   and similarly copies other flags (EDGE_EXECUTABLE, EDGE_DFS_BACK,
   EDGE_IRREDUCIBLE_LOOP etc.)
3) if edge_before_returns_twice_call splits a block, then the bb with
   returns_twice call changes, so the gimple_stmt_iterator for it is
   no longer accurate, it points to the right statement, but gsi_bb
   and gsi_seq are no longer correct; the patch updates it

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-14  Jakub Jelinek  

* gimple-iterator.cc (edge_before_returns_twice_call): Copy all
flags and probability from ad_edge to e edge.  If CDI_DOMINATORS
are computed, recompute immediate dominator of other_edge->src
and other_edge->dest.
(gsi_safe_insert_before, gsi_safe_insert_seq_before): Update *iter
for the returns_twice call case to the gsi_for_stmt (stmt) to deal
with update it for bb splitting.

--- gcc/gimple-iterator.cc.jj   2024-03-13 13:03:08.073117732 +0100
+++ gcc/gimple-iterator.cc  2024-03-13 15:16:45.294317700 +0100
@@ -997,7 +997,18 @@ edge_before_returns_twice_call (basic_bl
  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
   e, gimple_phi_arg_location_from_edge (phi, ad_edge));
}
+  e->flags = ad_edge->flags;
+  e->probability = ad_edge->probability;
   remove_edge (ad_edge);
+  if (dom_info_available_p (CDI_DOMINATORS))
+   {
+ set_immediate_dominator (CDI_DOMINATORS, other_edge->src,
+  recompute_dominator (CDI_DOMINATORS,
+   other_edge->src));
+ set_immediate_dominator (CDI_DOMINATORS, other_edge->dest,
+  recompute_dominator (CDI_DOMINATORS,
+   other_edge->dest));
+   }
 }
   return other_edge;
 }
@@ -1045,6 +1056,7 @@ gsi_safe_insert_before (gimple_stmt_iter
   if (new_bb)
e = single_succ_edge (new_bb);
   adjust_before_returns_twice_call (e, g);
+  *iter = gsi_for_stmt (stmt);
 }
   else
 gsi_insert_before (iter, g, GSI_SAME_STMT);
@@ -1075,6 +1087,7 @@ gsi_safe_insert_seq_before (gimple_stmt_
  if (g == l)
break;
}
+  *iter = gsi_for_stmt (stmt);
 }
   else
 gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);

Jakub



Re: [PATCH v2] gcc, libcpp: Add warning switch for "#pragma once in main file" [PR89808]

2024-03-14 Thread Ken Matsui
On Sat, Mar 2, 2024 at 5:04 AM Ken Matsui  wrote:
>
> This patch adds a warning switch for "#pragma once in main file".  The
> warning option name is Wpragma-once-outside-header, which is the same
> as Clang.

Ping.

>
> PR preprocessor/89808
>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_common_handle_option): Handle
> OPT_Wpragma_once_outside_header.
> * c.opt (Wpragma_once_outside_header): Define new option.
>
> gcc/ChangeLog:
>
> * doc/invoke.texi (Warning Options): Document
> -Wno-pragma-once-outside-header.
>
> libcpp/ChangeLog:
>
> * include/cpplib.h (struct cpp_options): Define
> cpp_warn_pragma_once_outside_header.
> * directives.cc (do_pragma_once): Use
> cpp_warn_pragma_once_outside_header.
> * init.cc (cpp_create_reader): Handle
> cpp_warn_pragma_once_outside_header.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/Wpragma-once-outside-header.C: New test.
> * g++.dg/warn/Wno-pragma-once-outside-header.C: New test.
> * g++.dg/warn/Wpragma-once-outside-header.C: New test.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/c-family/c-opts.cc |  9 +
>  gcc/c-family/c.opt |  4 
>  gcc/doc/invoke.texi| 10 --
>  gcc/testsuite/g++.dg/Wpragma-once-outside-header.C |  5 +
>  .../g++.dg/warn/Wno-pragma-once-outside-header.C   |  5 +
>  .../g++.dg/warn/Wpragma-once-outside-header.C  |  5 +
>  libcpp/directives.cc   |  8 ++--
>  libcpp/include/cpplib.h|  4 
>  libcpp/init.cc |  1 +
>  9 files changed, 47 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/Wpragma-once-outside-header.C
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wno-pragma-once-outside-header.C
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wpragma-once-outside-header.C
>
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index be3058dca63..4edd8c6c515 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -430,6 +430,15 @@ c_common_handle_option (size_t scode, const char *arg, 
> HOST_WIDE_INT value,
>cpp_opts->warn_num_sign_change = value;
>break;
>
> +case OPT_Wpragma_once_outside_header:
> +  if (value == 0)
> +   cpp_opts->cpp_warn_pragma_once_outside_header = 0;
> +  else if (kind == DK_ERROR)
> +   cpp_opts->cpp_warn_pragma_once_outside_header = 2;
> +  else
> +   cpp_opts->cpp_warn_pragma_once_outside_header = 1;
> +  break;
> +
>  case OPT_Wunknown_pragmas:
>/* Set to greater than 1, so that even unknown pragmas in
>  system headers will be warned about.  */
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index b7a4a1a68e3..6841a5a5e81 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1180,6 +1180,10 @@ Wpragmas
>  C ObjC C++ ObjC++ Var(warn_pragmas) Init(1) Warning
>  Warn about misuses of pragmas.
>
> +Wpragma-once-outside-header
> +C ObjC C++ ObjC++ Var(warn_pragma_once_outside_header) Init(1) Warning
> +Warn about #pragma once outside of a header.
> +
>  Wprio-ctor-dtor
>  C ObjC C++ ObjC++ Var(warn_prio_ctor_dtor) Init(1) Warning
>  Warn if constructor or destructors with priorities from 0 to 100 are used.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index bdf05be387d..eeb8954bcdf 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -391,8 +391,8 @@ Objective-C and Objective-C++ Dialects}.
>  -Wpacked  -Wno-packed-bitfield-compat  -Wpacked-not-aligned  -Wpadded
>  -Wparentheses  -Wno-pedantic-ms-format
>  -Wpointer-arith  -Wno-pointer-compare  -Wno-pointer-to-int-cast
> --Wno-pragmas  -Wno-prio-ctor-dtor  -Wredundant-decls
> --Wrestrict  -Wno-return-local-addr  -Wreturn-type
> +-Wno-pragmas  -Wno-pragma-once-outside-header  -Wno-prio-ctor-dtor
> +-Wredundant-decls  -Wrestrict  -Wno-return-local-addr  -Wreturn-type
>  -Wno-scalar-storage-order  -Wsequence-point
>  -Wshadow  -Wshadow=global  -Wshadow=local  -Wshadow=compatible-local
>  -Wno-shadow-ivar
> @@ -7955,6 +7955,12 @@ Do not warn about misuses of pragmas, such as 
> incorrect parameters,
>  invalid syntax, or conflicts between pragmas.  See also
>  @option{-Wunknown-pragmas}.
>
> +@opindex Wno-pragma-once-outside-header
> +@opindex Wpragma-once-outside-header
> +@item -Wno-pragma-once-outside-header
> +Do not warn when @code{#pragma once} is used in a file that is not a header
> +file, such as a main file.
> +
>  @opindex Wno-prio-ctor-dtor
>  @opindex Wprio-ctor-dtor
>  @item -Wno-prio-ctor-dtor
> diff --git a/gcc/testsuite/g++.dg/Wpragma-once-outside-header.C 
> b/gcc/testsuite/g++.dg/Wpragma-once-outside-header.C
> new file mode 100644
> index 000..678bd4e7626
> --- /dev/null
> +++ 

Re: [PATCH] LoongArch: Remove unused and incorrect "sge_" define_insn

2024-03-14 Thread chenglulu



在 2024/3/13 下午9:03, Xi Ruoyao 写道:

If this insn is really used, we'll have something like

 slti $r4,$r0,$r5

in the code.  The assembler will reject it because slti wants 2
register operands and 1 immediate operand.  But we've not got any bug
report for this, indicating this define_insn is unused at all.

Note that do_store_flag (in expr.cc) is already converting x >= 1 to
x > 0 unconditionally, so this define_insn is indeed unused and we can
just remove it.

gcc/ChangeLog:

* config/loongarch/loongarch.md (any_ge): Remove.
(sge_): Remove.
---

Not fully tested but should be obvious.  Ok for trunk?


LGTM!

Thanks!



  gcc/config/loongarch/loongarch.md | 10 --
  1 file changed, 10 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 525e1e82183..18fd9c1e7d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -517,7 +517,6 @@ (define_code_iterator equality_op [eq ne])
  ;; These code iterators allow the signed and unsigned scc operations to use
  ;; the same template.
  (define_code_iterator any_gt [gt gtu])
-(define_code_iterator any_ge [ge geu])
  (define_code_iterator any_lt [lt ltu])
  (define_code_iterator any_le [le leu])
  
@@ -3355,15 +3354,6 @@ (define_insn "*sgt_"

[(set_attr "type" "slt")
 (set_attr "mode" "")])
  
-(define_insn "*sge_"

-  [(set (match_operand:GPR 0 "register_operand" "=r")
-   (any_ge:GPR (match_operand:X 1 "register_operand" "r")
-(const_int 1)))]
-  ""
-  "slti\t%0,%.,%1"
-  [(set_attr "type" "slt")
-   (set_attr "mode" "")])
-
  (define_insn "*slt_"
[(set (match_operand:GPR 0 "register_operand" "=r")
(any_lt:GPR (match_operand:X 1 "register_operand" "r")




Re: [PATCH] contrib/gcc-changelog/git_check_commit.py: Implement --num-commits

2024-03-14 Thread Ken Matsui
On Fri, Mar 8, 2024 at 8:42 AM Patrick Palka  wrote:
>
> On Wed, 28 Feb 2024, Ken Matsui wrote:
>
> > This patch implements a --num-commits (-n) flag for shorthand for
> > the range of hash~N..hash commits.

Ping.

> >
> > contrib/ChangeLog:
> >
> >   * gcc-changelog/git_check_commit.py: Implement --num-commits.
>
> LGTM
>
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  contrib/gcc-changelog/git_check_commit.py | 15 +++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/contrib/gcc-changelog/git_check_commit.py 
> > b/contrib/gcc-changelog/git_check_commit.py
> > index 8cca9f439a5..22e032e8b38 100755
> > --- a/contrib/gcc-changelog/git_check_commit.py
> > +++ b/contrib/gcc-changelog/git_check_commit.py
> > @@ -22,6 +22,12 @@ import argparse
> >
> >  from git_repository import parse_git_revisions
> >
> > +def nonzero_uint(value):
> > +ivalue = int(value)
> > +if ivalue <= 0:
> > +raise argparse.ArgumentTypeError('%s is not a non-zero positive 
> > integer' % value)
> > +return ivalue
> > +
> >  parser = argparse.ArgumentParser(description='Check git ChangeLog format '
> >   'of a commit')
> >  parser.add_argument('revisions', default='HEAD', nargs='?',
> > @@ -33,8 +39,17 @@ parser.add_argument('-p', '--print-changelog', 
> > action='store_true',
> >  help='Print final changelog entires')
> >  parser.add_argument('-v', '--verbose', action='store_true',
> >  help='Print verbose information')
> > +parser.add_argument('-n', '--num-commits', type=nonzero_uint, default=1,
> > +help='Number of commits to check (i.e. shorthand for '
> > +'hash~N..hash)')
> >  args = parser.parse_args()
> >
> > +if args.num_commits > 1:
> > +if '..' in args.revisions:
> > +print('ERR: --num-commits and range of revisions are mutually 
> > exclusive')
> > +exit(1)
> > +args.revisions = '{0}~{1}..{0}'.format(args.revisions, 
> > args.num_commits)
> > +
> >  retval = 0
> >  for git_commit in parse_git_revisions(args.git_path, args.revisions):
> >  res = 'OK' if git_commit.success else 'FAILED'
> > --
> > 2.44.0
> >
> >
>


Re: [PATCH V12]: Improve code sinking pass

2024-03-14 Thread Richard Biener
On Wed, Mar 13, 2024 at 9:27 PM Jeff Law  wrote:
>
>
>
> On 3/13/24 4:22 AM, Richard Biener wrote:
>
> >
> > ... this hunk is OK (please test and split it out separatley).  In the 
> > spirit of
> > moving the stmt the least amount (in this case not schedule it within the
> > basic-block).  In the same spirit one would choose an earlier basic-block
> > but only if the old choosen one post-dominates that, dominance isn't
> > a good criteria since you'd move it where the computation might not be
> > needed.  A practical testcase would be
> >
> >tem = a + b;
> >if (foo)
> >  bar ();
> >tem2 = tem + d;
> >
> > where we at the moment would sink 'tem = a+ b' to the block containing
> > 'tem2 = tem + d' not reducing the number of evaluations (of course bar()
> > might not return, but that's a minor detail).  Code motion like that should
> > be subject to register-pressure considerations which we do not estimate
> > here at all.  So it could be argued we shouldn't perform any sinking here.
> Agreed.  This looks more like a scheduling and register-pressure issue
> rather than a classic sinking issue.
>
> Sinking is supposed to be moving code to lesser executed points.  In the
> case above, the only way sinking into the tem2 = block would be if bar()
> doesn't return.  It just doesn't make sense to me from a sinking standpoint.
>
> The block execution data generally prevents this kind of gratuitous
> movement.
>
> I actually evaluated our sinking code several years ago against an
> implementation of Click's algorithm.  In general they were quite
> comparable in terms of selecting an "optimal" block from an execution
> standpoint.  There were a couple of fixes that were added to our
> implementation at that time, but again, generally we were picking
> sensible blocks.
>
>
> >
> > A good first-order heuristic would be to avoid the scheduling
> > when the number of non-virtual SSA uses on the stmt to be moved is bigger
> > than one.  For zero we reduce the lifetime of the def.  For one we're not
> > making things worse.  For more uses it depends on whether we're moving
> > within the lifetime of the uses and it becomes a global problem (we're
> > greedily moving dependent statements, so we even get "local global" wrong
> > then).
> >
> > That said, changing will cause regressions, given both before and after
> > is somewhat ad-hoc it's hard to argue one is more correct than the other.
> >
> > IMO scheduling should be left to a stmt scheduler on GIMPLE
> > (which we don't have).
> Click's work can function as a statement scheduler, though I'm not
> convinced it's actually a good one.  Essentially most statements are
> conceptually disassociated from their blocks, then re-scheduled by
> visiting defining statements of "pinned" instructions.  That model is
> mostly for driving redundancy elimination.  Scheduling is just a side
> effect.
>
>
>
> Bernd had a statement scheduler for gimple years ago, but it was
> somewhat controversial at the time and never moved forward enough to get
> integrated.  IIRC it ran just before or just after TER and its primary
> objective was to avoid some of the pathological cases that ultimately
> result in significant spilling after we're done with the bulk of the RTL
> pipeline.

Yeah, the most difficult thing with scheduling on GIMPLE is the interaction
with TER.  TER does have some "scheduling boundaries" it respects
(I'd have to look them up), so GIMPLE scheduling that only looks at
scheduling across such boundaries might be useful.

Richard.


Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu  wrote:
>
> On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
> >
> > On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> > >
> > > When we split
> > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
> > > SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 
> > > 84 {*movdi_internal}
> > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > >
> > > into
> > >
> > > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 
> > > ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 
> > > A32])
> > > (const_int 0 [0]))) "test.C":22:42 -1
> > > (nil)))
> > > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 
> > > {movv2di_internal}
> > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > (nil)))
> > >
> > > we must copy the REG_EH_REGION note to the first insn and split the block
> > > after the newly added insn.  The REG_EH_REGION on the second insn will be
> > > removed later since it no longer traps.
> > >
> > > Currently we only handle memory_operand, are there any other insns
> > > need to be handled???
> >
> > I think memory access is the only thing that can trap.
> >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> > > gcc-13/gcc-12 release branch.
> > > Ok for trunk and backport?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-features.cc
> > > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > > (convert_scalars_to_vector): Ditto.
> > > * config/i386/i386-features.h (class scalar_chain): New
> > > memeber control_flow_insns.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.target/i386/pr111822.C: New test.
> > > ---
> > >  gcc/config/i386/i386-features.cc | 48 ++--
> > >  gcc/config/i386/i386-features.h  |  1 +
> > >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> > >  3 files changed, 90 insertions(+), 4 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc 
> > > b/gcc/config/i386/i386-features.cc
> > > index 1de2a07ed75..2ed27a9ebdd 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
> > > *insn)
> > >  }
> > >else if (MEM_P (*op))
> > >  {
> > > +  rtx_insn* eh_insn, *movabs = NULL;
> > >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> > >
> > >/* Handle movabs.  */
> > >if (!memory_operand (*op, GET_MODE (*op)))
> > > {
> > >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > >
> > > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > >   *op = tmp2;
> > > }
> >
> > I may be missing something, but isn't the above a dead code? We have
> > if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
> It's PR91814 #c1, memory_operand will also check invalid memory addresses.

Oh, it is even my comment ;)

Perhaps the comment should be improved to something like:

"Emit MOVABS to load from a 64-bit absolute address to a GPR."

LGTM then.

Thanks,
Uros.

> >
> > Uros.
> >
> > >
> > > -  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > > -gen_gpr_to_xmm_move_src (vmode, 
> > > *op)),
> > > -   insn);
> > > +  eh_insn
> > > +   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > > +gen_gpr_to_xmm_move_src (vmode, 
> > > *op)),
> > > +   insn);
> > > +
> > > +  if (cfun->can_throw_non_call_exceptions)
> > > +   {
> > > + /* Handle REG_EH_REGION note.  */
> > > + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> > > + if (note)
> > > +   {
> > > + if (movabs)
> > > +   eh_insn = movabs;
> > > + control_flow_insns.safe_push (eh_insn);
> > > + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> > > +   }
> > > +   }
> > > +
> > >*op = gen_rtx_SUBREG (vmode, tmp, 0);
> > >
> > >if (dump_file)
> > > @@ -2494,6 +2510,7 @@ convert_scalars_to_vector (bool timode_p)
> > >  {
> > >basic_block bb;
> > >int converted_insns = 0;
> > > +  auto_vec control_flow_insns;
> > >
> > >bitmap_obstack_initialize (NULL);
> > >const machine_mode cand_mode[3] = { SImode, DImode, TImode };
> > > @@ -2575,6 +2592,11 @@ 

Re: [PATCH v2 01/12] extend.texi: Arrange pre-existing built-in traits alphabetically

2024-03-14 Thread Ken Matsui
On Thu, Mar 14, 2024 at 12:28 AM Ken Matsui  wrote:
>
> Ok for trunk?

It appears that my update notes have somehow vanished.  I have
rectified several inconsistencies and introduced sub-sections to
enhance readability.  Although I contemplated merging certain patches
for a more streamlined presentation, I ultimately opted to maintain
the existing structure of commits, believing it to better suit our
commit granularity.

>
> -- >8 --
>
> This patch arranges pre-existing built-in traits alphabetically for
> better codebase consistency and easier future integration of changes.
>
> gcc/ChangeLog:
>
> * doc/extend.texi (Type Traits): Arrange pre-existing built-in
> traits alphabetically.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/doc/extend.texi | 62 ++---
>  1 file changed, 31 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index f679c81acf2..b13f9d6f934 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -29499,15 +29499,6 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>
> -@defbuiltin{bool __has_nothrow_copy (@var{type})}
> -If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> -@code{true}, else if @var{type} is a cv-qualified class or union type
> -with copy constructors that are known not to throw an exception then
> -the trait is @code{true}, else it is @code{false}.
> -Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> -@code{void}, or an array of unknown bound.
> -@enddefbuiltin
> -
>  @defbuiltin{bool __has_nothrow_constructor (@var{type})}
>  If @code{__has_trivial_constructor (type)} is @code{true} then the trait
>  is @code{true}, else if @var{type} is a cv class or union type (or array
> @@ -29517,6 +29508,15 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>
> +@defbuiltin{bool __has_nothrow_copy (@var{type})}
> +If @code{__has_trivial_copy (type)} is @code{true} then the trait is
> +@code{true}, else if @var{type} is a cv-qualified class or union type
> +with copy constructors that are known not to throw an exception then
> +the trait is @code{true}, else it is @code{false}.
> +Requires: @var{type} shall be a complete type, (possibly cv-qualified)
> +@code{void}, or an array of unknown bound.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __has_trivial_assign (@var{type})}
>  If @var{type} is @code{const}- qualified or is a reference type then
>  the trait is @code{false}.  Otherwise if @code{__is_trivial (type)} is
> @@ -29527,15 +29527,6 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>
> -@defbuiltin{bool __has_trivial_copy (@var{type})}
> -If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> -type then the trait is @code{true}, else if @var{type} is a cv class
> -or union type with a trivial copy constructor ([class.copy]) then the trait
> -is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> -a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
> -bound.
> -@enddefbuiltin
> -
>  @defbuiltin{bool __has_trivial_constructor (@var{type})}
>  If @code{__is_trivial (type)} is @code{true} then the trait is @code{true},
>  else if @var{type} is a cv-qualified class or union type (or array thereof)
> @@ -29545,6 +29536,15 @@ Requires: @var{type} shall be a complete type, 
> (possibly cv-qualified)
>  @code{void}, or an array of unknown bound.
>  @enddefbuiltin
>
> +@defbuiltin{bool __has_trivial_copy (@var{type})}
> +If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
> +type then the trait is @code{true}, else if @var{type} is a cv class
> +or union type with a trivial copy constructor ([class.copy]) then the trait
> +is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
> +a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
> +bound.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __has_trivial_destructor (@var{type})}
>  If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference 
> type
>  then the trait is @code{true}, else if @var{type} is a cv class or union
> @@ -29560,6 +29560,13 @@ If @var{type} is a class type with a virtual 
> destructor
>  Requires: If @var{type} is a non-union class type, it shall be a complete 
> type.
>  @enddefbuiltin
>
> +@defbuiltin{bool __integer_pack (@var{length})}
> +When used as the pattern of a pack expansion within a template
> +definition, expands to a template argument pack containing integers
> +from @code{0} to @code{@var{length}-1}.  This is provided for
> +efficient implementation of @code{std::make_integer_sequence}.
> +@enddefbuiltin
> +
>  @defbuiltin{bool __is_abstract (@var{type})}
>  

Re: [PATCH] match.pd: Only merge truncation with conversion for -fno-signed-zeros

2024-03-14 Thread Richard Biener
On Wed, Mar 13, 2024 at 3:39 PM Joe Ramsay  wrote:
>
> This optimisation does not honour signed zeros, so should not be
> enabled except with -fno-signed-zeros.
>
> OK for master? I do not have commit rights for GCC, so if the patch
> is fine would someone be able to commit for me? The bug is present
> in all GCC versions from 12.1.0 onwards - is it possible to backport
> this?
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Thanks,
> Joe
>
> gcc/ChangeLog:
>
> * match.pd: Fix truncation pattern for -fno-signed-zeroes
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/no_merge_trunc_signed_zero.c: New test.
> ---
>  gcc/match.pd  |  2 +-
>  .../aarch64/no_merge_trunc_signed_zero.c  | 24 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 9ce313323a3..45c34c810cf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4857,7 +4857,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  #if GIMPLE
>  (simplify
> (float (fix_trunc @0))
> -   (if (!flag_trapping_math
> +   (if (!flag_trapping_math && !HONOR_SIGNED_ZEROS(type)

put the && to the next line and add a space before (type)

OK with that change.

Richard.

> && types_match (type, TREE_TYPE (@0))
> && direct_internal_fn_supported_p (IFN_TRUNC, type,
>   OPTIMIZE_FOR_BOTH))
> diff --git a/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c 
> b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
> new file mode 100644
> index 000..b2c93e55567
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/no_merge_trunc_signed_zero.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-trapping-math -fsigned-zeros" } */
> +
> +#include 
> +
> +float
> +f1 (float x)
> +{
> +  return (int) rintf(x);
> +}
> +
> +double
> +f2 (double x)
> +{
> +  return (long) rint(x);
> +}
> +
> +/* { dg-final { scan-assembler "frintx\\ts\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "cvtzs\\ts\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "scvtf\\ts\[0-9\]+, s\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "frintx\\td\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "cvtzs\\td\[0-9\]+, d\[0-9\]+" } } */
> +/* { dg-final { scan-assembler "scvtf\\td\[0-9\]+, d\[0-9\]+" } } */
> +
> --
> 2.27.0
>


[PATCH v2 02/12] extend.texi: Add documentation for __is_array

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_array): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b13f9d6f934..5aeb9bdd47a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29579,6 +29579,11 @@ If @var{type} is an aggregate type ([dcl.init.aggr]) 
the trait is
 Requires: If @var{type} is a class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __is_array (@var{type})}
+If @var{type} is an array type ([dcl.array]) the trait is @code{true},
+else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_base_of (@var{base_type}, @var{derived_type})}
 If @var{base_type} is a base class of @var{derived_type}
 ([class.derived]) then the trait is @code{true}, otherwise it is @code{false}.
-- 
2.44.0



Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Hongtao Liu
On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
>
> On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> >
> > When we split
> > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
> > SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 
> > 84 {*movdi_internal}
> >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> >
> > into
> >
> > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) 
> > [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
> > (const_int 0 [0]))) "test.C":22:42 -1
> > (nil)))
> > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
> >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > (nil)))
> >
> > we must copy the REG_EH_REGION note to the first insn and split the block
> > after the newly added insn.  The REG_EH_REGION on the second insn will be
> > removed later since it no longer traps.
> >
> > Currently we only handle memory_operand, are there any other insns
> > need to be handled???
>
> I think memory access is the only thing that can trap.
>
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> > gcc-13/gcc-12 release branch.
> > Ok for trunk and backport?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-features.cc
> > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > (convert_scalars_to_vector): Ditto.
> > * config/i386/i386-features.h (class scalar_chain): New
> > memeber control_flow_insns.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/i386/pr111822.C: New test.
> > ---
> >  gcc/config/i386/i386-features.cc | 48 ++--
> >  gcc/config/i386/i386-features.h  |  1 +
> >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> >  3 files changed, 90 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> >
> > diff --git a/gcc/config/i386/i386-features.cc 
> > b/gcc/config/i386/i386-features.cc
> > index 1de2a07ed75..2ed27a9ebdd 100644
> > --- a/gcc/config/i386/i386-features.cc
> > +++ b/gcc/config/i386/i386-features.cc
> > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
> > *insn)
> >  }
> >else if (MEM_P (*op))
> >  {
> > +  rtx_insn* eh_insn, *movabs = NULL;
> >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> >
> >/* Handle movabs.  */
> >if (!memory_operand (*op, GET_MODE (*op)))
> > {
> >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> >
> > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> >   *op = tmp2;
> > }
>
> I may be missing something, but isn't the above a dead code? We have
> if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
It's PR91814 #c1, memory_operand will also check invalid memory addresses.
>
> Uros.
>
> >
> > -  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > -gen_gpr_to_xmm_move_src (vmode, *op)),
> > -   insn);
> > +  eh_insn
> > +   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > +gen_gpr_to_xmm_move_src (vmode, 
> > *op)),
> > +   insn);
> > +
> > +  if (cfun->can_throw_non_call_exceptions)
> > +   {
> > + /* Handle REG_EH_REGION note.  */
> > + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> > + if (note)
> > +   {
> > + if (movabs)
> > +   eh_insn = movabs;
> > + control_flow_insns.safe_push (eh_insn);
> > + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> > +   }
> > +   }
> > +
> >*op = gen_rtx_SUBREG (vmode, tmp, 0);
> >
> >if (dump_file)
> > @@ -2494,6 +2510,7 @@ convert_scalars_to_vector (bool timode_p)
> >  {
> >basic_block bb;
> >int converted_insns = 0;
> > +  auto_vec control_flow_insns;
> >
> >bitmap_obstack_initialize (NULL);
> >const machine_mode cand_mode[3] = { SImode, DImode, TImode };
> > @@ -2575,6 +2592,11 @@ convert_scalars_to_vector (bool timode_p)
> >  chain->chain_id);
> > }
> >
> > + rtx_insn* iter_insn;
> > + unsigned int ii;
> > + FOR_EACH_VEC_ELT (chain->control_flow_insns, ii, iter_insn)
> > +   control_flow_insns.safe_push (iter_insn);
> > +
> >   delete chain;
> > }
> >  }
> > @@ -2643,6 +2665,24 @@ convert_scalars_to_vector (bool timode_p)
> >   DECL_INCOMING_RTL (parm) = gen_rtx_SUBREG 

[PATCH v2 04/12] extend.texi: Add documentation for __is_function

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_function): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ae06e4da022..a94ad7955fe 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29632,6 +29632,11 @@ is @code{true}, else it is @code{false}.
 Requires: If @var{type} is a class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __is_function (@var{type})}
+If @var{type} is a function type ([dcl.fct]) the trait is @code{true},
+else it is @code{false}.
+@enddefbuiltin
+
 @c FIXME Commented out for GCC 13, discuss user interface for GCC 14.
 @c @defbuiltin{bool __is_deducible (@var{template}, @var{type})}
 @c If template arguments for @code{template} can be deduced from
-- 
2.44.0



[PATCH v2 11/12] extend.texi: Add documentation for __remove_pointer

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__remove_pointer): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e4a4060be2b..10ddf50182d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29716,6 +29716,11 @@ If @var{type} is a cv union type ([basic.compound]) 
the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{@var{type} __remove_pointer (@var{ptr_type})}
+If @var{ptr_type} is a pointer type ([dcl.ptr]) then the trait is the
+@var{type} pointed to by @var{ptr_type}, else it is @var{ptr_type}.
+@enddefbuiltin
+
 @defbuiltin{bool __underlying_type (@var{type})}
 The underlying type of @var{type}.
 Requires: @var{type} shall be an enumeration type ([dcl.enum]).
-- 
2.44.0



[PATCH v2 12/12] extend.texi: Add subsections for type- and expression-yielding traits

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (Expression-yielding Type Traits): New
subsection.
(Type-yielding Type Traits): Likewise.
(__remove_pointer): Move under the Type-yielding Type Traits
subsection.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 10ddf50182d..5d0afbe9611 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29488,6 +29488,11 @@ compile-time determination of
 various characteristics of a type (or of a
 pair of types).
 
+@subsection Expression-yielding Type Traits
+
+These built-in traits yield an expression of type @code{bool}
+or @code{size_t}.
+
 @defbuiltin{bool __has_nothrow_assign (@var{type})}
 If @var{type} is @code{const}-qualified or is a reference type then
 the trait is @code{false}.  Otherwise if @code{__has_trivial_assign (type)}
@@ -29716,16 +29721,19 @@ If @var{type} is a cv union type ([basic.compound]) 
the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
-@defbuiltin{@var{type} __remove_pointer (@var{ptr_type})}
-If @var{ptr_type} is a pointer type ([dcl.ptr]) then the trait is the
-@var{type} pointed to by @var{ptr_type}, else it is @var{ptr_type}.
-@enddefbuiltin
-
 @defbuiltin{bool __underlying_type (@var{type})}
 The underlying type of @var{type}.
 Requires: @var{type} shall be an enumeration type ([dcl.enum]).
 @enddefbuiltin
 
+@subsection Type-yielding Type Traits
+
+These built-in traits yield a type.
+
+@defbuiltin{@var{type} __remove_pointer (@var{ptr_type})}
+If @var{ptr_type} is a pointer type ([dcl.ptr]) then the trait is the
+@var{type} pointed to by @var{ptr_type}, else it is @var{ptr_type}.
+@enddefbuiltin
 
 @node C++ Concepts
 @section C++ Concepts
-- 
2.44.0



[PATCH v2 07/12] extend.texi: Add documentation for __is_member_pointer

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_pointer): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7644ea5b80b..86ac0c7ed07 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29660,6 +29660,11 @@ If @var{type} is a pointer to member object type 
([dcl.mptr]) the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_pointer (@var{type})}
+If @var{type} is a pointer to member type ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH v2 09/12] extend.texi: Add documentation for __is_reference

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_reference): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 293eb5706f9..6af5294c7b0 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29683,6 +29683,11 @@ is @code{true}, else it is @code{false}.
 Requires: If @var{type} is a non-union class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __is_reference (@var{type})}
+If @var{type} is a reference type ([dcl.ref]) the trait is @code{true},
+else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_standard_layout (@var{type})}
 If @var{type} is a standard-layout type ([basic.types]) the trait is
 @code{true}, else it is @code{false}.
-- 
2.44.0



[PATCH v2 08/12] extend.texi: Add documentation for __is_object

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_object): New documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 86ac0c7ed07..293eb5706f9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29665,6 +29665,11 @@ If @var{type} is a pointer to member type ([dcl.mptr]) 
the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_object (@var{type})}
+If @var{type} is an object type ([basic.types]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH v2 01/12] extend.texi: Arrange pre-existing built-in traits alphabetically

2024-03-14 Thread Ken Matsui
Ok for trunk?

-- >8 --

This patch arranges pre-existing built-in traits alphabetically for
better codebase consistency and easier future integration of changes.

gcc/ChangeLog:

* doc/extend.texi (Type Traits): Arrange pre-existing built-in
traits alphabetically.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 62 ++---
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f679c81acf2..b13f9d6f934 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29499,15 +29499,6 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
-@defbuiltin{bool __has_nothrow_copy (@var{type})}
-If @code{__has_trivial_copy (type)} is @code{true} then the trait is
-@code{true}, else if @var{type} is a cv-qualified class or union type
-with copy constructors that are known not to throw an exception then
-the trait is @code{true}, else it is @code{false}.
-Requires: @var{type} shall be a complete type, (possibly cv-qualified)
-@code{void}, or an array of unknown bound.
-@enddefbuiltin
-
 @defbuiltin{bool __has_nothrow_constructor (@var{type})}
 If @code{__has_trivial_constructor (type)} is @code{true} then the trait
 is @code{true}, else if @var{type} is a cv class or union type (or array
@@ -29517,6 +29508,15 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __has_nothrow_copy (@var{type})}
+If @code{__has_trivial_copy (type)} is @code{true} then the trait is
+@code{true}, else if @var{type} is a cv-qualified class or union type
+with copy constructors that are known not to throw an exception then
+the trait is @code{true}, else it is @code{false}.
+Requires: @var{type} shall be a complete type, (possibly cv-qualified)
+@code{void}, or an array of unknown bound.
+@enddefbuiltin
+
 @defbuiltin{bool __has_trivial_assign (@var{type})}
 If @var{type} is @code{const}- qualified or is a reference type then
 the trait is @code{false}.  Otherwise if @code{__is_trivial (type)} is
@@ -29527,15 +29527,6 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
-@defbuiltin{bool __has_trivial_copy (@var{type})}
-If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
-type then the trait is @code{true}, else if @var{type} is a cv class
-or union type with a trivial copy constructor ([class.copy]) then the trait
-is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
-a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
-bound.
-@enddefbuiltin
-
 @defbuiltin{bool __has_trivial_constructor (@var{type})}
 If @code{__is_trivial (type)} is @code{true} then the trait is @code{true},
 else if @var{type} is a cv-qualified class or union type (or array thereof)
@@ -29545,6 +29536,15 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __has_trivial_copy (@var{type})}
+If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference
+type then the trait is @code{true}, else if @var{type} is a cv class
+or union type with a trivial copy constructor ([class.copy]) then the trait
+is @code{true}, else it is @code{false}.  Requires: @var{type} shall be
+a complete type, (possibly cv-qualified) @code{void}, or an array of unknown
+bound.
+@enddefbuiltin
+
 @defbuiltin{bool __has_trivial_destructor (@var{type})}
 If @code{__is_trivial (type)} is @code{true} or @var{type} is a reference type
 then the trait is @code{true}, else if @var{type} is a cv class or union
@@ -29560,6 +29560,13 @@ If @var{type} is a class type with a virtual destructor
 Requires: If @var{type} is a non-union class type, it shall be a complete type.
 @enddefbuiltin
 
+@defbuiltin{bool __integer_pack (@var{length})}
+When used as the pattern of a pack expansion within a template
+definition, expands to a template argument pack containing integers
+from @code{0} to @code{@var{length}-1}.  This is provided for
+efficient implementation of @code{std::make_integer_sequence}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_abstract (@var{type})}
 If @var{type} is an abstract class ([class.abstract]) then the trait
 is @code{true}, else it is @code{false}.
@@ -29589,12 +29596,6 @@ If @var{type} is a cv-qualified class type, and not a 
union type
 ([basic.compound]) the trait is @code{true}, else it is @code{false}.
 @enddefbuiltin
 
-@c FIXME Commented out for GCC 13, discuss user interface for GCC 14.
-@c @defbuiltin{bool __is_deducible (@var{template}, @var{type})}
-@c If template arguments for @code{template} can be deduced from
-@c @code{type} or obtained from default template arguments.
-@c @enddefbuiltin
-
 @defbuiltin{bool __is_empty 

[PATCH v2 06/12] extend.texi: Add documentation for __is_member_object_pointer

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_object_pointer): New
documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f08574318d0..7644ea5b80b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29655,6 +29655,11 @@ If @var{type} is a pointer to member function type 
([dcl.mptr]) the trait is
 @code{true}, else it is @code{false}.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_object_pointer (@var{type})}
+If @var{type} is a pointer to member object type ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH v2 03/12] extend.texi: Add documentation for __is_bounded_array

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_bounded_array): New documentation.
(__is_array): Add @anchor to get linked from
__is_bounded_array.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5aeb9bdd47a..ae06e4da022 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29579,9 +29579,11 @@ If @var{type} is an aggregate type ([dcl.init.aggr]) 
the trait is
 Requires: If @var{type} is a class type, it shall be a complete type.
 @enddefbuiltin
 
+@anchor{__is_array}
 @defbuiltin{bool __is_array (@var{type})}
 If @var{type} is an array type ([dcl.array]) the trait is @code{true},
 else it is @code{false}.
+See also: @ref{__is_bounded_array}.
 @enddefbuiltin
 
 @defbuiltin{bool __is_base_of (@var{base_type}, @var{derived_type})}
@@ -29596,6 +29598,13 @@ type (disregarding cv-qualifiers), @var{derived_type} 
shall be a complete
 type.  A diagnostic is produced if this requirement is not met.
 @enddefbuiltin
 
+@anchor{__is_bounded_array}
+@defbuiltin{bool __is_bounded_array (@var{type})}
+If @var{type} is an array type of known bound ([dcl.array])
+the trait is @code{true}, else it is @code{false}.
+See also: @ref{__is_array}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_class (@var{type})}
 If @var{type} is a cv-qualified class type, and not a union type
 ([basic.compound]) the trait is @code{true}, else it is @code{false}.
-- 
2.44.0



[PATCH v2 05/12] extend.texi: Add documentation for __is_member_function_pointer

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_member_function_pointer): New
documentation.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a94ad7955fe..f08574318d0 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29650,6 +29650,11 @@ Requires: @var{type} shall be a complete type, 
(possibly cv-qualified)
 @code{void}, or an array of unknown bound.
 @enddefbuiltin
 
+@defbuiltin{bool __is_member_function_pointer (@var{type})}
+If @var{type} is a pointer to member function type ([dcl.mptr]) the trait is
+@code{true}, else it is @code{false}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_pod (@var{type})}
 If @var{type} is a cv POD type ([basic.types]) then the trait is @code{true},
 else it is @code{false}.
-- 
2.44.0



[PATCH v2 10/12] extend.texi: Add documentation for __is_scoped_enum

2024-03-14 Thread Ken Matsui
gcc/ChangeLog:

* doc/extend.texi (__is_scoped_enum): New documentation.
(__is_enum): Add @anchor to get linked from __is_scoped_enum.

Signed-off-by: Ken Matsui 
---
 gcc/doc/extend.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6af5294c7b0..e4a4060be2b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -29621,9 +29621,11 @@ has no base classes @var{base_type} for which
 Requires: If @var{type} is a non-union class type, it shall be a complete type.
 @enddefbuiltin
 
+@anchor{__is_enum}
 @defbuiltin{bool __is_enum (@var{type})}
 If @var{type} is a cv enumeration type ([basic.compound]) the trait is
 @code{true}, else it is @code{false}.
+See also: @ref{__is_scoped_enum}.
 @enddefbuiltin
 
 @defbuiltin{bool __is_final (@var{type})}
@@ -29688,6 +29690,13 @@ If @var{type} is a reference type ([dcl.ref]) the 
trait is @code{true},
 else it is @code{false}.
 @enddefbuiltin
 
+@anchor{__is_scoped_enum}
+@defbuiltin{bool __is_scoped_enum (@var{type})}
+If @var{type} is a scoped enumeration type ([dcl.enum]) the trait is
+@code{true}, else it is @code{false}.
+See also: @ref{__is_enum}.
+@enddefbuiltin
+
 @defbuiltin{bool __is_standard_layout (@var{type})}
 If @var{type} is a standard-layout type ([basic.types]) the trait is
 @code{true}, else it is @code{false}.
-- 
2.44.0



Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
>
> When we split
> (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
> SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84 
> {*movdi_internal}
>  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
>
> into
>
> (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
> MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
> (const_int 0 [0]))) "test.C":22:42 -1
> (nil)))
> (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
>  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> (nil)))
>
> we must copy the REG_EH_REGION note to the first insn and split the block
> after the newly added insn.  The REG_EH_REGION on the second insn will be
> removed later since it no longer traps.
>
> Currently we only handle memory_operand, are there any other insns
> need to be handled???

I think memory access is the only thing that can trap.

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> gcc-13/gcc-12 release branch.
> Ok for trunk and backport?
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc
> (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> (convert_scalars_to_vector): Ditto.
> * config/i386/i386-features.h (class scalar_chain): New
> memeber control_flow_insns.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr111822.C: New test.
> ---
>  gcc/config/i386/i386-features.cc | 48 ++--
>  gcc/config/i386/i386-features.h  |  1 +
>  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
>  3 files changed, 90 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 1de2a07ed75..2ed27a9ebdd 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
> *insn)
>  }
>else if (MEM_P (*op))
>  {
> +  rtx_insn* eh_insn, *movabs = NULL;
>rtx tmp = gen_reg_rtx (GET_MODE (*op));
>
>/* Handle movabs.  */
>if (!memory_operand (*op, GET_MODE (*op)))
> {
>   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
>
> - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
>   *op = tmp2;
> }

I may be missing something, but isn't the above a dead code? We have
if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).

Uros.

>
> -  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> -gen_gpr_to_xmm_move_src (vmode, *op)),
> -   insn);
> +  eh_insn
> +   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> +gen_gpr_to_xmm_move_src (vmode, 
> *op)),
> +   insn);
> +
> +  if (cfun->can_throw_non_call_exceptions)
> +   {
> + /* Handle REG_EH_REGION note.  */
> + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> + if (note)
> +   {
> + if (movabs)
> +   eh_insn = movabs;
> + control_flow_insns.safe_push (eh_insn);
> + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> +   }
> +   }
> +
>*op = gen_rtx_SUBREG (vmode, tmp, 0);
>
>if (dump_file)
> @@ -2494,6 +2510,7 @@ convert_scalars_to_vector (bool timode_p)
>  {
>basic_block bb;
>int converted_insns = 0;
> +  auto_vec control_flow_insns;
>
>bitmap_obstack_initialize (NULL);
>const machine_mode cand_mode[3] = { SImode, DImode, TImode };
> @@ -2575,6 +2592,11 @@ convert_scalars_to_vector (bool timode_p)
>  chain->chain_id);
> }
>
> + rtx_insn* iter_insn;
> + unsigned int ii;
> + FOR_EACH_VEC_ELT (chain->control_flow_insns, ii, iter_insn)
> +   control_flow_insns.safe_push (iter_insn);
> +
>   delete chain;
> }
>  }
> @@ -2643,6 +2665,24 @@ convert_scalars_to_vector (bool timode_p)
>   DECL_INCOMING_RTL (parm) = gen_rtx_SUBREG (TImode, r, 0);
>   }
>   }
> +
> +  if (!control_flow_insns.is_empty ())
> +   {
> + free_dominance_info (CDI_DOMINATORS);
> +
> + unsigned int i;
> + rtx_insn* insn;
> + FOR_EACH_VEC_ELT (control_flow_insns, i, insn)
> +   if (control_flow_insn_p (insn))
> + {
> +   /* Split the block