[PATCH] MATCH: Support `(a != (CST+1)) & (a > CST)` optimizations

2023-09-13 Thread Andrew Pinski via Gcc-patches
Even though this is done via reassocation, match can support
these with a simple change to detect that the difference is just
one. This allows to optimize these earlier and even during phiopt
for an example.

This patch adds the following cases:
(a != (CST+1)) & (a > CST) -> a > (CST+1)
(a != (CST-1)) & (a < CST) -> a < (CST-1)
(a == (CST-1)) | (a >= CST) -> a >= (CST-1)
(a == (CST+1)) | (a <= CST) -> a <= (CST+1)

Canonicalizations of comparisons causes this case to show up more.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/106164

gcc/ChangeLog:

* match.pd (`(X CMP1 CST1) AND/IOR (X CMP2 CST2)`):
Expand to support constants that are off by one.

gcc/testsuite/ChangeLog:

* gcc.dg/pr21643.c: Update test now that match does
the combing of the comparisons.
* gcc.dg/tree-ssa/cmpbit-5.c: New test.
* gcc.dg/tree-ssa/phi-opt-35.c: New test.
---
 gcc/match.pd   | 44 ++-
 gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c   | 51 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-35.c | 13 ++
 4 files changed, 111 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-35.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 7ecf5568599..07ffd831132 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2970,10 +2970,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& operand_equal_p (@1, @2)))
 (with
  {
+  bool one_before = false;
+  bool one_after = false;
   int cmp = 0;
   if (TREE_CODE (@1) == INTEGER_CST
  && TREE_CODE (@2) == INTEGER_CST)
-   cmp = tree_int_cst_compare (@1, @2);
+   {
+ cmp = tree_int_cst_compare (@1, @2);
+ if (cmp < 0
+ && wi::to_wide (@1) == wi::to_wide (@2) - 1)
+   one_before = true;
+ if (cmp > 0
+ && wi::to_wide (@1) == wi::to_wide (@2) + 1)
+   one_after = true;
+   }
   bool val;
   switch (code2)
 {
@@ -2998,6 +3008,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& code2 == LE_EXPR
   && cmp == 0)
(lt @0 @1))
+  /* (a != (b+1)) & (a > b) -> a > (b+1) */
+  (if (code1 == NE_EXPR
+   && code2 == GT_EXPR
+  && one_after)
+   (gt @0 @1))
+  /* (a != (b-1)) & (a < b) -> a < (b-1) */
+  (if (code1 == NE_EXPR
+   && code2 == LT_EXPR
+  && one_before)
+   (lt @0 @1))
  )
 )
)
@@ -3069,10 +3089,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& operand_equal_p (@1, @2)))
 (with
  {
+  bool one_before = false;
+  bool one_after = false;
   int cmp = 0;
   if (TREE_CODE (@1) == INTEGER_CST
  && TREE_CODE (@2) == INTEGER_CST)
-   cmp = tree_int_cst_compare (@1, @2);
+   {
+ cmp = tree_int_cst_compare (@1, @2);
+ if (cmp < 0
+ && wi::to_wide (@1) == wi::to_wide (@2) - 1)
+   one_before = true;
+ if (cmp > 0
+ && wi::to_wide (@1) == wi::to_wide (@2) + 1)
+   one_after = true;
+   }
   bool val;
   switch (code2)
{
@@ -3097,6 +3127,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& code2 == LT_EXPR
   && cmp == 0)
(le @0 @1))
+  /* (a == (b-1)) | (a >= b) -> a >= (b-1) */
+  (if (code1 == EQ_EXPR
+   && code2 == GE_EXPR
+  && one_before)
+   (ge @0 @1))
+  /* (a == (b+1)) | (a <= b) -> a <= (b-1) */
+  (if (code1 == EQ_EXPR
+   && code2 == LE_EXPR
+  && one_after)
+   (le @0 @1))
  )
 )
)
diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
index 4e7f93d351a..42517b5af1e 100644
--- a/gcc/testsuite/gcc.dg/pr21643.c
+++ b/gcc/testsuite/gcc.dg/pr21643.c
@@ -86,4 +86,8 @@ f9 (unsigned char c)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "Optimizing range tests c_\[0-9\]*.D. 
-.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1" } }  */
+/* Note with match being able to simplify this, optimizing range tests is no 
longer needed here. */
+/* Equivalence: _7 | _2 -> c_5(D) <= 32 */
+/* old test: dg-final  scan-tree-dump-times "Optimizing range tests 
c_\[0-9\]*.D. -.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1"   */
+/* { dg-final { scan-tree-dump-times "Equivalence: _\[0-9\]+ \\\| _\[0-9\]+ -> 
c_\[0-9\]+.D. <= 32" 5 "reassoc1" } }  */
+/* { dg-final { scan-tree-dump-times "Equivalence: _\[0-9\]+ \& _\[0-9\]+ -> 
c_\[0-9\]+.D. > 32" 1 "reassoc1" } }  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c
new file mode 100644
index 000..d81a129825b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpbit-5.c
@@ -0,0 +1,51 @@
+/* PR tree-optimization/106164 */
+/* { dg-do compile } */
+/* { 

Re: [PATCH] [11/12/13/14 Regression] ABI break in _Hash_node_value_base since GCC 11 [PR 111050]

2023-09-13 Thread François Dumont via Gcc-patches

Author: TC 
Date:   Wed Sep 6 19:31:55 2023 +0200

    libstdc++: Force _Hash_node_value_base methods inline to fix abi 
(PR111050)


https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
    changed _Hash_node_value_base to no longer derive from 
_Hash_node_base, which means
    that its member functions expect _M_storage to be at a different 
offset. So explosions
    result if an out-of-line definition is emitted for any of the 
member functions (say,
    in a non-optimized build) and the resulting object file is then 
linked with code built

    using older version of GCC/libstdc++.

    libstdc++-v3/ChangeLog:

    PR libstdc++/111050
    * include/bits/hashtable_policy.h
    (_Hash_node_value_base<>::_M_valptr(), 
_Hash_node_value_base<>::_M_v())

    Add [[__gnu__::__always_inline__]].

Ok to commit ?

On 12/09/2023 18:09, Jonathan Wakely wrote:

On Mon, 11 Sept 2023 at 18:19, François Dumont  wrote:


On 11/09/2023 13:51, Jonathan Wakely wrote:

On Sun, 10 Sept 2023 at 14:57, François Dumont via Libstdc++
 wrote:

Following confirmation of the fix by TC here is the patch where I'm
simply adding a 'constexpr' on _M_next().

Please let me know this ChangeLog entry is correct. I would prefer this
patch to be assigned to 'TC' with me as co-author but I don't know how
to do such a thing. Unless I need to change my user git identity to do so ?

Sam already explained that, but please check with Tim how he wants to
be credited, if at all. He doesn't have a copyright assignment, and
hasn't added a DCO sign-off to the patch, but it's small enough to not
need it as this is the first contribution credited to him.



   libstdc++: Add constexpr qualification to _Hash_node::_M_next()

What has this constexpr addition got to do with the ABI change and the
always_inline attributes?

It certainly doesn't seem like it should be the summary line of the
git commit message.

Oops, sorry, that's what I had started to do before Tim submitted anything.

Here is latest version:

No patch attached, and the ChangeLog below still mentions the constexpr.

I've pinged Tim via another channel to ask him about the author attribution.



Author: TC 
Date:   Wed Sep 6 19:31:55 2023 +0200

  libstdc++: Force inline on _Hash_node_value_base methods to fix abi
(PR111050)

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
  changed _Hash_node_value_base to no longer derive from
_Hash_node_base, which means
  that its member functions expect _M_storage to be at a different
offset. So explosions
  result if an out-of-line definition is emitted for any of the
member functions (say,
  in a non-optimized build) and the resulting object file is then
linked with code built
  using older version of GCC/libstdc++.

  libstdc++-v3/ChangeLog:

  PR libstdc++/111050
  * include/bits/hashtable_policy.h
  (_Hash_node_value_base<>::_M_valptr(),
_Hash_node_value_base<>::_M_v())
  Add [[__gnu__::__always_inline__]].
  (_Hash_node<>::_M_next()): Add constexpr.

  Co-authored-by: François Dumont 

Ok for you TC (Tim ?) ?

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 347d468ea86..86b32fb15f2 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -327,18 +327,22 @@ namespace __detail
 
   __gnu_cxx::__aligned_buffer<_Value> _M_storage;
 
+  [[__gnu__::__always_inline__]]
   _Value*
   _M_valptr() noexcept
   { return _M_storage._M_ptr(); }
 
+  [[__gnu__::__always_inline__]]
   const _Value*
   _M_valptr() const noexcept
   { return _M_storage._M_ptr(); }
 
+  [[__gnu__::__always_inline__]]
   _Value&
   _M_v() noexcept
   { return *_M_valptr(); }
 
+  [[__gnu__::__always_inline__]]
   const _Value&
   _M_v() const noexcept
   { return *_M_valptr(); }


[committed] Limit header synopsis test to normal namespace

2023-09-13 Thread François Dumont via Gcc-patches

Committed as trivial.

    libstdc++: Limit  synopsis test to normal namespace

    libstdc++-v3/ChangeLog

    * testsuite/19_diagnostics/stacktrace/synopsis.cc: Add
    { dg-require-normal-namespace "" }.

François
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
index ece5d526fb9..21c94f34a13 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/synopsis.cc
@@ -1,6 +1,7 @@
 // { dg-options "-std=gnu++23" }
 // { dg-do compile { target c++23 } }
 // { dg-require-effective-target stacktrace }
+// { dg-require-normal-namespace "" }
 
 #include 
 


[PATCH] RISC-V: Fix ICE in get_avl_or_vl_reg[PR111395]

2023-09-13 Thread Juzhe-Zhong
This patch fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111395 ICE

PR target/111395

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (avl_info::operator==): Fix bug.
(vector_insn_info::global_merge): Ditto.
(vector_insn_info::get_avl_or_vl_reg): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fortran/pr111395.f90: New test.
* gcc.target/riscv/rvv/rvv-fortran.exp: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 31 --
 .../gcc.target/riscv/rvv/fortran/pr111395.f90 | 41 +++
 .../gcc.target/riscv/rvv/rvv-fortran.exp  | 35 
 3 files changed, 95 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111395.f90
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/rvv-fortran.exp

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f81361c4ccd..dc02246756d 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1652,6 +1652,8 @@ avl_info::operator== (const avl_info ) const
   /* Handle VLMAX AVL.  */
   if (vlmax_avl_p (m_value))
 return vlmax_avl_p (other.get_value ());
+  if (vlmax_avl_p (other.get_value ()))
+return false;
 
   /* If any source is undef value, we think they are not equal.  */
   if (!m_source || !other.get_source ())
@@ -2258,6 +2260,18 @@ vector_insn_info::global_merge (const vector_insn_info 
_info,
new_info.set_avl_source (first_set);
 }
 
+  /* Make sure VLMAX AVL always has a set_info the get VL.  */
+  if (vlmax_avl_p (new_info.get_avl ()))
+{
+  if (this->get_avl_source ())
+   new_info.set_avl_source (this->get_avl_source ());
+  else
+   {
+ gcc_assert (merge_info.get_avl_source ());
+ new_info.set_avl_source (merge_info.get_avl_source ());
+   }
+}
+
   new_info.fuse_sew_lmul (*this, merge_info);
   new_info.fuse_tail_policy (*this, merge_info);
   new_info.fuse_mask_policy (*this, merge_info);
@@ -2274,9 +2288,6 @@ vector_insn_info::get_avl_or_vl_reg (void) const
   if (!vlmax_avl_p (get_avl ()))
 return get_avl ();
 
-  if (get_avl_source ())
-return get_avl_reg_rtx ();
-
   rtx_insn *rinsn = get_insn ()->rtl ();
   if (has_vl_op (rinsn) || vsetvl_insn_p (rinsn))
 {
@@ -2288,14 +2299,9 @@ vector_insn_info::get_avl_or_vl_reg (void) const
return vl;
 }
 
-  /* A DIRTY (polluted EMPTY) block if:
-   - get_insn is scalar move (no AVL or VL operand).
-   - get_avl_source is null (no def in the current DIRTY block).
- Then we trace the previous insn which must be the insn
- already inserted in Phase 2 to get the VL operand for VLMAX.  */
-  rtx_insn *prev_rinsn = PREV_INSN (rinsn);
-  gcc_assert (prev_rinsn && vsetvl_insn_p (prev_rinsn));
-  return ::get_vl (prev_rinsn);
+  /* We always has avl_source if it is VLMAX AVL.  */
+  gcc_assert (get_avl_source ());
+  return get_avl_reg_rtx ();
 }
 
 bool
@@ -4054,7 +4060,8 @@ pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info 
*bb) const
 }
 
   /* Step1: Reshape the VL/VTYPE status to make sure everything compatible.  */
-  auto_vec pred_cfg_bbs = get_dominated_by (CDI_POST_DOMINATORS, 
cfg_bb);
+  auto_vec pred_cfg_bbs
+= get_dominated_by (CDI_POST_DOMINATORS, cfg_bb);
   FOR_EACH_EDGE (e, ei, cfg_bb->preds)
 {
   sbitmap avout = m_vector_manager->vector_avout[e->src->index];
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111395.f90 
b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111395.f90
new file mode 100644
index 000..71253fe6bc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/fortran/pr111395.f90
@@ -0,0 +1,41 @@
+! { dg-do compile }
+! { dg-options "-march=rv64gcv -mabi=lp64d -Ofast -std=legacy" }
+
+MODULE a
+  REAL b
+CONTAINS
+  SUBROUTINE c(d,KTE)
+REAL,DIMENSION(KTE) :: d,e,f,g
+REAL,DIMENSION(KTE) :: h
+i : DO j=1,b
+   z=k
+   DO l=m,n
+  IF(o>=p)THEN
+ IF(laf)THEN
+   DO l=m,n
+  d=h
+   ENDDO
+ENDIF
+  END SUBROUTINE c
+END MODULE a
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv-fortran.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv-fortran.exp
new file mode 100644
index 000..1e970a32a70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv-fortran.exp
@@ -0,0 +1,35 @@
+#   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for 

Re: [PATCH v1] rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

2023-09-13 Thread Kewen.Lin via Gcc-patches
Hi,

on 2023/9/13 00:39, Ajit Agarwal wrote:
> This patch removes zero extension from vctzlsbb as it already zero extends.
> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index
> 
> For rs6000 target we dont need zero_extend after vctzlsbb as vctzlsbb
> already zero extend.
> 
> 2023-09-12  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/vsx.md (vctzlsbb_zext_): New define_insn.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.target/powerpc/altivec-19.C: New testcase.
> ---
>  gcc/config/rs6000/vsx.md  | 17 ++---
>  gcc/testsuite/g++.target/powerpc/altivec-19.C | 10 ++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/altivec-19.C
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 19abfeb565a..42379409e5f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5846,11 +5846,22 @@
>[(set_attr "type" "vecsimple")])
>  
>  ;; Vector Count Trailing Zero Least-Significant Bits Byte
> -(define_insn "vctzlsbb_"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> +(define_insn "vctzlsbb_zext_"

Sorry, I didn't note this as well in the previous review, this
define_insn name can be "*vctzlsbb_zext_" as we don't need
any gen_vctzlsbb_zext_* for uses.

btw, once this changes the name in the changelog should be updated
accordingly.

> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
>   (unspec:SI
>[(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
> -  UNSPEC_VCTZLSBB))]
> +  UNSPEC_VCTZLSBB)))]
> +  "TARGET_P9_VECTOR"
> +  "vctzlsbb %0,%1"
> +  [(set_attr "type" "vecsimple")])
> +
> +;; Vector Count Trailing Zero Least-Significant Bits Byte
> +(define_insn "vctzlsbb_"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(unspec:SI
> + [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
> + UNSPEC_VCTZLSBB))]
>"TARGET_P9_VECTOR"
>"vctzlsbb %0,%1"
>[(set_attr "type" "vecsimple")])
> diff --git a/gcc/testsuite/g++.target/powerpc/altivec-19.C 
> b/gcc/testsuite/g++.target/powerpc/altivec-19.C
> new file mode 100644
> index 000..e49e5076af8
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/altivec-19.C
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

As the previous review comment, this line can be:

/* { dg-do compile } */

Okay for trunk with these two above nits fixed.  Thanks.

BR,
Kewen

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2 " } */ 
> +
> +#include 
> +
> +unsigned int foo (vector unsigned char a, vector unsigned char b) {
> +  return vec_first_match_or_eos_index (a, b);
> +}
> +/* { dg-final { scan-assembler-not {\mrldicl\M} } } */


[PATCH 10/10] vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE

2023-09-13 Thread Kewen Lin via Gcc-patches
For VMAT_CONTIGUOUS_REVERSE, the transform code in function
vectorizable_store generates a VEC_PERM_EXPR stmt before
storing, but it's never considered in costing.

This patch is to make it consider vec_perm in costing, it
adjusts the order of transform code a bit to make it easy
to early return for costing_p.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Consider generated
VEC_PERM_EXPR stmt for VMAT_CONTIGUOUS_REVERSE in costing as
vec_perm.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c: New test.
---
 .../costmodel/ppc/costmodel-vect-store-2.c| 29 +
 gcc/tree-vect-stmts.cc| 63 +++
 2 files changed, 65 insertions(+), 27 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c
new file mode 100644
index 000..72b67cf9040
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-additional-options "-mvsx" } */
+
+/* Verify we do cost the required vec_perm.  */
+
+int
+foo (int *a, int *b, int len)
+{
+  int i;
+  int *a1 = a;
+  int *a0 = a1 - 4;
+  for (i = 0; i < len; i++)
+{
+  *b = *a0 + *a1;
+  b--;
+  a0++;
+  a1++;
+}
+  return 0;
+}
+
+/* The reason why it doesn't check the exact count is that
+   we can get more than 1 vec_perm when it's compiled with
+   partial vector capability like Power10 (retrying for
+   the epilogue) or it's complied without unaligned vector
+   memory access support (realign).  */
+/* { dg-final { scan-tree-dump {\mvec_perm\M} "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3d451c80bca..ce925cc1d53 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9279,6 +9279,40 @@ vectorizable_store (vec_info *vinfo,
   stmt_vec_info next_stmt_info = first_stmt_info;
   for (i = 0; i < vec_num; i++)
{
+ if (!costing_p)
+   {
+ if (slp)
+   vec_oprnd = vec_oprnds[i];
+ else if (grouped_store)
+   /* For grouped stores vectorized defs are interleaved in
+  vect_permute_store_chain().  */
+   vec_oprnd = result_chain[i];
+   }
+
+ if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
+   {
+ if (costing_p)
+   inside_cost += record_stmt_cost (cost_vec, 1, vec_perm,
+stmt_info, 0, vect_body);
+ else
+   {
+ tree perm_mask = perm_mask_for_reverse (vectype);
+ tree perm_dest = vect_create_destination_var (
+   vect_get_store_rhs (stmt_info), vectype);
+ tree new_temp = make_ssa_name (perm_dest);
+
+ /* Generate the permute statement.  */
+ gimple *perm_stmt
+   = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd,
+  vec_oprnd, perm_mask);
+ vect_finish_stmt_generation (vinfo, stmt_info, perm_stmt,
+  gsi);
+
+ perm_stmt = SSA_NAME_DEF_STMT (new_temp);
+ vec_oprnd = new_temp;
+   }
+   }
+
  if (costing_p)
{
  vect_get_store_cost (vinfo, stmt_info, 1,
@@ -9294,8 +9328,6 @@ vectorizable_store (vec_info *vinfo,
 
  continue;
}
- unsigned misalign;
- unsigned HOST_WIDE_INT align;
 
  tree final_mask = NULL_TREE;
  tree final_len = NULL_TREE;
@@ -9315,13 +9347,8 @@ vectorizable_store (vec_info *vinfo,
dataref_ptr = bump_vector_ptr (vinfo, dataref_ptr, ptr_incr, gsi,
   stmt_info, bump);
 
- if (slp)
-   vec_oprnd = vec_oprnds[i];
- else if (grouped_store)
-   /* For grouped stores vectorized defs are interleaved in
-  vect_permute_store_chain().  */
-   vec_oprnd = result_chain[i];
-
+ unsigned misalign;
+ unsigned HOST_WIDE_INT align;
  align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info));
  if (alignment_support_scheme == dr_aligned)
misalign = 0;
@@ -9338,24 +9365,6 @@ vectorizable_store (vec_info *vinfo,
misalign);
  align = least_bit_hwi (misalign | align);
 
- if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
-   {
- tree perm_mask = perm_mask_for_reverse (vectype);
- tree perm_dest
-   = 

[PATCH 09/10] vect: Get rid of vect_model_store_cost

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is to eventually get rid of vect_model_store_cost,
it adjusts the costing for the remaining memory access types
VMAT_CONTIGUOUS{, _DOWN, _REVERSE} by moving costing close
to the transform code.  Note that in vect_model_store_cost,
there is one special handling for vectorizing a store into
the function result, since it's extra penalty and the
transform part doesn't have it, this patch keep it alone.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Remove.
(vectorizable_store): Adjust the costing for the remaining memory
access types VMAT_CONTIGUOUS{, _DOWN, _REVERSE}.
---
 gcc/tree-vect-stmts.cc | 137 +
 1 file changed, 44 insertions(+), 93 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e3ba8077091..3d451c80bca 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -951,81 +951,6 @@ cfun_returns (tree decl)
   return false;
 }
 
-/* Function vect_model_store_cost
-
-   Models cost for stores.  In the case of grouped accesses, one access
-   has the overhead of the grouped access attributed to it.  */
-
-static void
-vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies,
-  vect_memory_access_type memory_access_type,
-  dr_alignment_support alignment_support_scheme,
-  int misalignment,
-  vec_load_store_type vls_type, slp_tree slp_node,
-  stmt_vector_for_cost *cost_vec)
-{
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
- && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP
- && memory_access_type != VMAT_LOAD_STORE_LANES
- && memory_access_type != VMAT_CONTIGUOUS_PERMUTE);
-
-  unsigned int inside_cost = 0, prologue_cost = 0;
-
-  /* ???  Somehow we need to fix this at the callers.  */
-  if (slp_node)
-ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-  if (vls_type == VLS_STORE_INVARIANT)
-{
-  if (!slp_node)
-   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, 0, vect_prologue);
-}
-
-
-  /* Costs of the stores.  */
-  vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
-  misalignment, _cost, cost_vec);
-
-  /* When vectorizing a store into the function result assign
- a penalty if the function returns in a multi-register location.
- In this case we assume we'll end up with having to spill the
- vector result and do piecewise loads as a conservative estimate.  */
-  tree base = get_base_address (STMT_VINFO_DATA_REF (stmt_info)->ref);
-  if (base
-  && (TREE_CODE (base) == RESULT_DECL
- || (DECL_P (base) && cfun_returns (base)))
-  && !aggregate_value_p (base, cfun->decl))
-{
-  rtx reg = hard_function_value (TREE_TYPE (base), cfun->decl, 0, 1);
-  /* ???  Handle PARALLEL in some way.  */
-  if (REG_P (reg))
-   {
- int nregs = hard_regno_nregs (REGNO (reg), GET_MODE (reg));
- /* Assume that a single reg-reg move is possible and cheap,
-do not account for vector to gp register move cost.  */
- if (nregs > 1)
-   {
- /* Spill.  */
- prologue_cost += record_stmt_cost (cost_vec, ncopies,
-vector_store,
-stmt_info, 0, vect_epilogue);
- /* Loads.  */
- prologue_cost += record_stmt_cost (cost_vec, ncopies * nregs,
-scalar_load,
-stmt_info, 0, vect_epilogue);
-   }
-   }
-}
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: inside_cost = %d, "
- "prologue_cost = %d .\n", inside_cost, prologue_cost);
-}
-
-
 /* Calculate cost of DR's memory access.  */
 void
 vect_get_store_cost (vec_info *, stmt_vec_info stmt_info, int ncopies,
@@ -9223,6 +9148,11 @@ vectorizable_store (vec_info *vinfo,
   return true;
 }
 
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS
+ || memory_access_type == VMAT_CONTIGUOUS_DOWN
+ || memory_access_type == VMAT_CONTIGUOUS_PERMUTE
+ || memory_access_type == VMAT_CONTIGUOUS_REVERSE);
+
   unsigned inside_cost = 0, prologue_cost = 0;
   auto_vec result_chain (group_size);
   auto_vec vec_oprnds;
@@ -9257,10 +9187,9 @@ vectorizable_store (vec_info *vinfo,
 that there is no interleaving, DR_GROUP_SIZE is 1,
 and only one iteration of the loop will be executed.  */
  op = vect_get_store_rhs (next_stmt_info);
- if (costing_p
- && memory_access_type == 

[PATCH 03/10] vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_GATHER_SCATTER
in function vectorizable_store (all three cases), then we
won't depend on vect_model_load_store for its costing any
more.  This patch shouldn't have any functional changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_GATHER_SCATTER any more, remove VMAT_GATHER_SCATTER related
handlings and the related parameter gs_info.
(vect_build_scatter_store_calls): Add the handlings on costing with
one more argument cost_vec.
(vectorizable_store): Adjust the cost handling on VMAT_GATHER_SCATTER
without calling vect_model_store_cost any more.
---
 gcc/tree-vect-stmts.cc | 188 ++---
 1 file changed, 118 insertions(+), 70 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 36f7c5b9f4b..3f908242fee 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -959,12 +959,12 @@ cfun_returns (tree decl)
 static void
 vect_model_store_cost (vec_info *vinfo, stmt_vec_info stmt_info, int ncopies,
   vect_memory_access_type memory_access_type,
-  gather_scatter_info *gs_info,
   dr_alignment_support alignment_support_scheme,
   int misalignment,
   vec_load_store_type vls_type, slp_tree slp_node,
   stmt_vector_for_cost *cost_vec)
 {
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1012,18 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   /* Costs of the stores.  */
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_GATHER_SCATTER)
+  if (memory_access_type == VMAT_ELEMENTWISE)
 {
   unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  if (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl)
-   /* For emulated scatter N offset vector element extracts
-  (we assume the scalar scaling and ptr + offset add is consumed by
-  the load).  */
-   inside_cost += record_stmt_cost (cost_vec, ncopies * assumed_nunits,
-vec_to_scalar, stmt_info, 0,
-vect_body);
   /* N scalar stores plus extracting the elements.  */
   inside_cost += record_stmt_cost (cost_vec,
   ncopies * assumed_nunits,
@@ -1034,9 +1025,7 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 misalignment, _cost, cost_vec);
 
   if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_STRIDED_SLP
-  || (memory_access_type == VMAT_GATHER_SCATTER
- && gs_info->ifn == IFN_LAST && !gs_info->decl))
+  || memory_access_type == VMAT_STRIDED_SLP)
 {
   /* N scalar stores plus extracting the elements.  */
   unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
@@ -2999,7 +2988,8 @@ vect_build_gather_load_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
 static void
 vect_build_scatter_store_calls (vec_info *vinfo, stmt_vec_info stmt_info,
gimple_stmt_iterator *gsi, gimple **vec_stmt,
-   gather_scatter_info *gs_info, tree mask)
+   gather_scatter_info *gs_info, tree mask,
+   stmt_vector_for_cost *cost_vec)
 {
   loop_vec_info loop_vinfo = dyn_cast (vinfo);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
@@ -3009,6 +2999,30 @@ vect_build_scatter_store_calls (vec_info *vinfo, 
stmt_vec_info stmt_info,
   poly_uint64 scatter_off_nunits
 = TYPE_VECTOR_SUBPARTS (gs_info->offset_vectype);
 
+  /* FIXME: Keep the previous costing way in vect_model_store_cost by
+ costing N scalar stores, but it should be tweaked to use target
+ specific costs on related scatter store calls.  */
+  if (cost_vec)
+{
+  tree op = vect_get_store_rhs (stmt_info);
+  enum vect_def_type dt;
+  gcc_assert (vect_is_simple_use (op, vinfo, ));
+  unsigned int inside_cost, prologue_cost = 0;
+  if (dt == vect_constant_def || dt == vect_external_def)
+   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
+  stmt_info, 0, vect_prologue);
+  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+  inside_cost = record_stmt_cost (cost_vec, ncopies * assumed_nunits,
+ scalar_store, stmt_info, 0, vect_body);
+
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+  

[PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_CONTIGUOUS_PERMUTE
in function vectorizable_store.  We don't call function
vect_model_store_cost for it any more.  It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info.  This patch shouldn't have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_CONTIGUOUS_PERMUTE and remove VMAT_CONTIGUOUS_PERMUTE related
handlings.
(vectorizable_store): Adjust the cost handling on
VMAT_CONTIGUOUS_PERMUTE without calling vect_model_store_cost.
---
 gcc/tree-vect-stmts.cc | 128 -
 1 file changed, 74 insertions(+), 54 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index fbd16b8a487..e3ba8077091 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -967,10 +967,10 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_ELEMENTWISE
  && memory_access_type != VMAT_STRIDED_SLP
- && memory_access_type != VMAT_LOAD_STORE_LANES);
+ && memory_access_type != VMAT_LOAD_STORE_LANES
+ && memory_access_type != VMAT_CONTIGUOUS_PERMUTE);
+
   unsigned int inside_cost = 0, prologue_cost = 0;
-  stmt_vec_info first_stmt_info = stmt_info;
-  bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
 
   /* ???  Somehow we need to fix this at the callers.  */
   if (slp_node)
@@ -983,35 +983,6 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   stmt_info, 0, vect_prologue);
 }
 
-  /* Grouped stores update all elements in the group at once,
- so we want the DR for the first statement.  */
-  if (!slp_node && grouped_access_p)
-first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
-
-  /* True if we should include any once-per-group costs as well as
- the cost of the statement itself.  For SLP we only get called
- once per group anyhow.  */
-  bool first_stmt_p = (first_stmt_info == stmt_info);
-
-  /* We assume that the cost of a single store-lanes instruction is
- equivalent to the cost of DR_GROUP_SIZE separate stores.  If a grouped
- access is instead being provided by a permute-and-store operation,
- include the cost of the permutes.  */
-  if (first_stmt_p
-  && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
-{
-  /* Uses a high and low interleave or shuffle operations for each
-needed permute.  */
-  int group_size = DR_GROUP_SIZE (first_stmt_info);
-  int nstmts = ncopies * ceil_log2 (group_size) * group_size;
-  inside_cost = record_stmt_cost (cost_vec, nstmts, vec_perm,
- stmt_info, 0, vect_body);
-
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "vect_model_store_cost: strided group_size = %d .\n",
- group_size);
-}
 
   /* Costs of the stores.  */
   vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
@@ -8408,9 +8379,7 @@ vectorizable_store (vec_info *vinfo,
 costing, use the first one instead.  */
   if (grouped_store
  && !slp
- && first_stmt_info != stmt_info
- && (memory_access_type == VMAT_ELEMENTWISE
- || memory_access_type == VMAT_LOAD_STORE_LANES))
+ && first_stmt_info != stmt_info)
return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
@@ -9254,14 +9223,15 @@ vectorizable_store (vec_info *vinfo,
   return true;
 }
 
+  unsigned inside_cost = 0, prologue_cost = 0;
   auto_vec result_chain (group_size);
   auto_vec vec_oprnds;
   for (j = 0; j < ncopies; j++)
 {
   gimple *new_stmt;
-  if (j == 0 && !costing_p)
+  if (j == 0)
{
- if (slp)
+ if (slp && !costing_p)
{
  /* Get vectorized arguments for SLP_NODE.  */
  vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op,
@@ -9287,13 +9257,20 @@ vectorizable_store (vec_info *vinfo,
 that there is no interleaving, DR_GROUP_SIZE is 1,
 and only one iteration of the loop will be executed.  */
  op = vect_get_store_rhs (next_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies,
-op, gvec_oprnds[i]);
- vec_oprnd = (*gvec_oprnds[i])[0];
- dr_chain.quick_push (vec_oprnd);
+ if (costing_p
+ && memory_access_type == VMAT_CONTIGUOUS_PERMUTE)
+   update_prologue_cost (_cost, op);
+ else 

[PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

2023-09-13 Thread Kewen Lin via Gcc-patches
This costing adjustment patch series exposes one issue in
aarch64 specific costing adjustment for STP sequence.  It
causes the below test cases to fail:

  - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c

Take the below function extracted from ldp_stp_15.c as
example:

void
dup_8_int32_t (int32_t *x, int32_t val)
{
for (int i = 0; i < 8; ++i)
  x[i] = val;
}

Without my patch series, during slp1 it gets:

  val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body
  node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue

then the final vector cost is 3.

With my patch series, during slp1 it gets:

  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue

but the final vector cost is 17.  The unaligned_store count is
actually unchanged, but the final vector costs become different,
it's because the below aarch64 special handling makes the
different costs:

  /* Apply the heuristic described above m_stp_sequence_cost.  */
  if (m_stp_sequence_cost != ~0U)
{
  uint64_t cost = aarch64_stp_sequence_cost (count, kind,
 stmt_info, vectype);
  m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U);
}

For the former, since the count is 2, function
aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2".
While for the latter, it's separated into twice calls with
count 1, aarch64_stp_sequence_cost returns 2 for each time,
so it returns 4 in total.

For this case, the stmt with scalar_to_vec also contributes
4 to m_stp_sequence_cost, then the final m_stp_sequence_cost
are 6 (2+4) vs. 8 (4+4).

Considering scalar_costs->m_stp_sequence_cost is 8 and below
checking and re-assigning:

  else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost)
m_costs[vect_body] = 2 * scalar_costs->total_cost ();

For the former, the body cost of vector isn't changed; but
for the latter, the body cost of vector is double of scalar
cost which is 8 for this case, then it becomes 16 which is
bigger than what we expect.

I'm not sure why it adopts CEIL for the return value for
case unaligned_store in function aarch64_stp_sequence_cost,
but I tried to modify it with "return count;" (as it can
get back to previous cost), there is no failures exposed
in regression testing.  I expected that if the previous
unaligned_store count is even, this adjustment doesn't
change anything, if it's odd, the adjustment may reduce
it by one, but I'd guess it would be few.  Besides, as
the comments for m_stp_sequence_cost, the current
handlings seems temporary, maybe a tweak like this can be
accepted, so I posted this RFC/PATCH to request comments.
this one line change is considered.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_stp_sequence_cost): Return
count directly instead of the adjusted value computed with CEIL.
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37d414021ca..9fb4fbd883d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17051,7 +17051,7 @@ aarch64_stp_sequence_cost (unsigned int count, 
vect_cost_for_stmt kind,
  if (!aarch64_aligned_constant_offset_p (stmt_info, size))
return count * 2;
}
-  return CEIL (count, 2) * 2;
+  return count;
 
 case scalar_store:
   if (stmt_info && STMT_VINFO_DATA_REF (stmt_info))
-- 
2.31.1



[PATCH 01/10] vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case

2023-09-13 Thread Kewen Lin via Gcc-patches
When making/testing patches to move costing next to the
transform code for vectorizable_store, some ICEs got
exposed when I further refined the costing handlings on
VMAT_ELEMENTWISE.  The apparent cause is triggering the
assertion in rs6000 specific function for costing
rs6000_builtin_vectorization_cost:

  if (TARGET_ALTIVEC)
 /* Misaligned stores are not supported.  */
 gcc_unreachable ();

I used vect_get_store_cost instead of the original way by
record_stmt_cost with scalar_store for costing, that is to
use one unaligned_store instead, it matches what we use in
transforming, it's a vector store as below:

  else if (group_size >= const_nunits
   && group_size % const_nunits == 0)
{
   nstores = 1;
   lnel = const_nunits;
   ltype = vectype;
   lvectype = vectype;
}

So IMHO it's more consistent with vector store instead of
scalar store, with the given compilation option
-mno-allow-movmisalign, the misaligned vector store is
unexpected to be used in vectorizer, but why it's still
adopted?  In the current implementation of function
get_group_load_store_type, we always set alignment support
scheme as dr_unaligned_supported for VMAT_ELEMENTWISE, it
is true if we always adopt scalar stores, but as the above
code shows, we could use vector stores for some cases, so
we should use the correct alignment support scheme for it.

This patch is to ensure the vector store is supported by
further checking with vect_supportable_dr_alignment.  The
ICEs got exposed with patches moving costing next to the
transform but they haven't been landed, the test coverage
would be there once they get landed.  The affected test
cases are:
  - gcc.dg/vect/slp-45.c
  - gcc.dg/vect/vect-alias-check-{10,11,12}.c

btw, I tried to make some correctness test case, but I
realized that -mno-allow-movmisalign is mainly for noting
movmisalign optab and it doesn't guard for the actual hw
vector memory access insns, so I failed to make it unless
I also altered some conditions for them as it.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Ensure the generated
vector store for some case of VMAT_ELEMENTWISE is supported.
---
 gcc/tree-vect-stmts.cc | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..a5caaf0bca2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8558,10 +8558,18 @@ vectorizable_store (vec_info *vinfo,
  else if (group_size >= const_nunits
   && group_size % const_nunits == 0)
{
- nstores = 1;
- lnel = const_nunits;
- ltype = vectype;
- lvectype = vectype;
+ int mis_align = dr_misalignment (first_dr_info, vectype);
+ dr_alignment_support dr_align
+   = vect_supportable_dr_alignment (vinfo, dr_info, vectype,
+mis_align);
+ if (dr_align == dr_aligned
+ || dr_align == dr_unaligned_supported)
+   {
+ nstores = 1;
+ lnel = const_nunits;
+ ltype = vectype;
+ lvectype = vectype;
+   }
}
  ltype = build_aligned_type (ltype, TYPE_ALIGN (elem_type));
  ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-- 
2.31.1



[PATCH 05/10] vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP in function vectorizable_store.  We
don't call function vect_model_store_cost for them any more.

Like what we improved for PR82255 on load side, this change
helps us to get rid of unnecessary vec_to_scalar costing
for some case with VMAT_STRIDED_SLP.  One typical test case
gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c has been
associated.  And it helps some cases with some inconsistent
costing too.

Besides, this also special-cases the interleaving stores
for these two affected memory access types, since for the
interleaving stores the whole chain is vectorized when the
last store in the chain is reached, the other stores in the
group would be skipped.  To keep consistent with this and
follows the transforming handlings like iterating the whole
group, it only costs for the first store in the group.
Ideally we can only cost for the last one but it's not
trivial and using the first one is actually equivalent.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it won't get
VMAT_ELEMENTWISE and VMAT_STRIDED_SLP any more, and remove their
related handlings.
(vectorizable_store): Adjust the cost handling on VMAT_ELEMENTWISE
and VMAT_STRIDED_SLP without calling vect_model_store_cost.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c: New test.
---
 .../costmodel/ppc/costmodel-vect-store-1.c|  23 +++
 gcc/tree-vect-stmts.cc| 160 +++---
 2 files changed, 120 insertions(+), 63 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
new file mode 100644
index 000..ab5f3301492
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" }
+
+/* This test case is partially extracted from case
+   gcc.dg/vect/vect-avg-16.c, it's to verify we don't
+   cost a store with vec_to_scalar when we shouldn't.  */
+
+void
+test (signed char *restrict a, signed char *restrict b, signed char *restrict 
c,
+  int n)
+{
+  for (int j = 0; j < n; ++j)
+{
+  for (int i = 0; i < 16; ++i)
+   a[i] = (b[i] + c[i]) >> 1;
+  a += 20;
+  b += 20;
+  c += 20;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "vec_to_scalar" 0 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 048c14d291c..3d01168080a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -964,7 +964,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
   vec_load_store_type vls_type, slp_tree slp_node,
   stmt_vector_for_cost *cost_vec)
 {
-  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER);
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+ && memory_access_type != VMAT_ELEMENTWISE
+ && memory_access_type != VMAT_STRIDED_SLP);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -1010,29 +1012,9 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
  group_size);
 }
 
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   /* Costs of the stores.  */
-  if (memory_access_type == VMAT_ELEMENTWISE)
-{
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  /* N scalar stores plus extracting the elements.  */
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  scalar_store, stmt_info, 0, vect_body);
-}
-  else
-vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
-misalignment, _cost, cost_vec);
-
-  if (memory_access_type == VMAT_ELEMENTWISE
-  || memory_access_type == VMAT_STRIDED_SLP)
-{
-  /* N scalar stores plus extracting the elements.  */
-  unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
-  inside_cost += record_stmt_cost (cost_vec,
-  ncopies * assumed_nunits,
-  vec_to_scalar, stmt_info, 0, vect_body);
-}
+  vect_get_store_cost (vinfo, stmt_info, ncopies, alignment_support_scheme,
+  misalignment, _cost, cost_vec);
 
   /* When vectorizing a store into the function result assign
  a penalty if the function returns in a multi-register location.
@@ -8416,6 +8398,18 @@ vectorizable_store (vec_info *vinfo,
 "Vectorizing an unaligned 

[PATCH 06/10] vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch adjusts the cost handling on VMAT_LOAD_STORE_LANES
in function vectorizable_store.  We don't call function
vect_model_store_cost for it any more.  It's the case of
interleaving stores, so it skips all stmts excepting for
first_stmt_info, consider the whole group when costing
first_stmt_info.  This patch shouldn't have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vect_model_store_cost): Assert it will never
get VMAT_LOAD_STORE_LANES.
(vectorizable_store): Adjust the cost handling on VMAT_LOAD_STORE_LANES
without calling vect_model_store_cost.  Factor out new lambda function
update_prologue_cost.
---
 gcc/tree-vect-stmts.cc | 110 -
 1 file changed, 75 insertions(+), 35 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3d01168080a..fbd16b8a487 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -966,7 +966,8 @@ vect_model_store_cost (vec_info *vinfo, stmt_vec_info 
stmt_info, int ncopies,
 {
   gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
  && memory_access_type != VMAT_ELEMENTWISE
- && memory_access_type != VMAT_STRIDED_SLP);
+ && memory_access_type != VMAT_STRIDED_SLP
+ && memory_access_type != VMAT_LOAD_STORE_LANES);
   unsigned int inside_cost = 0, prologue_cost = 0;
   stmt_vec_info first_stmt_info = stmt_info;
   bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info);
@@ -8408,7 +8409,8 @@ vectorizable_store (vec_info *vinfo,
   if (grouped_store
  && !slp
  && first_stmt_info != stmt_info
- && memory_access_type == VMAT_ELEMENTWISE)
+ && (memory_access_type == VMAT_ELEMENTWISE
+ || memory_access_type == VMAT_LOAD_STORE_LANES))
return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
@@ -8479,6 +8481,31 @@ vectorizable_store (vec_info *vinfo,
 dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = 
%d\n",
 ncopies);
 
+  /* Check if we need to update prologue cost for invariant,
+ and update it accordingly if so.  If it's not for
+ interleaving store, we can just check vls_type; but if
+ it's for interleaving store, need to check the def_type
+ of the stored value since the current vls_type is just
+ for first_stmt_info.  */
+  auto update_prologue_cost = [&](unsigned *prologue_cost, tree store_rhs)
+  {
+gcc_assert (costing_p);
+if (slp)
+  return;
+if (grouped_store)
+  {
+   gcc_assert (store_rhs);
+   enum vect_def_type cdt;
+   gcc_assert (vect_is_simple_use (store_rhs, vinfo, ));
+   if (cdt != vect_constant_def && cdt != vect_external_def)
+ return;
+  }
+else if (vls_type != VLS_STORE_INVARIANT)
+  return;
+*prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info,
+   0, vect_prologue);
+  };
+
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
@@ -8646,14 +8673,8 @@ vectorizable_store (vec_info *vinfo,
  if (!costing_p)
vect_get_vec_defs (vinfo, next_stmt_info, slp_node, ncopies, op,
   _oprnds);
- else if (!slp)
-   {
- enum vect_def_type cdt;
- gcc_assert (vect_is_simple_use (op, vinfo, ));
- if (cdt == vect_constant_def || cdt == vect_external_def)
-   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
-  stmt_info, 0, vect_prologue);
-   }
+ else
+   update_prologue_cost (_cost, op);
  unsigned int group_el = 0;
  unsigned HOST_WIDE_INT
elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
@@ -8857,13 +8878,7 @@ vectorizable_store (vec_info *vinfo,
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
 {
   gcc_assert (!slp && grouped_store);
-  if (costing_p)
-   {
- vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
-alignment_support_scheme, misalignment,
-vls_type, slp_node, cost_vec);
- return true;
-   }
+  unsigned inside_cost = 0, prologue_cost = 0;
   for (j = 0; j < ncopies; j++)
{
  gimple *new_stmt;
@@ -8879,29 +8894,39 @@ vectorizable_store (vec_info *vinfo,
 DR_GROUP_SIZE is the exact number of stmts in the
 chain. Therefore, NEXT_STMT_INFO can't be NULL_TREE.  */
  op = vect_get_store_rhs (next_stmt_info);
- vect_get_vec_defs_for_operand (vinfo, next_stmt_info, ncopies,
-op, gvec_oprnds[i]);
- vec_oprnd = (*gvec_oprnds[i])[0];
- 

[PATCH 04/10] vect: Simplify costing on vectorizable_scan_store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is to simplify the costing on the case
vectorizable_scan_store without calling function
vect_model_store_cost any more.

I considered if moving the costing into function
vectorizable_scan_store is a good idea, for doing
that, we have to pass several variables down which
are only used for costing, and for now we just
want to keep the costing as the previous, haven't
tried to make this costing consistent with what the
transforming does, so I think we can leave it for now.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Adjust costing on
vectorizable_scan_store without calling vect_model_store_cost
any more.
---
 gcc/tree-vect-stmts.cc | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 3f908242fee..048c14d291c 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8432,11 +8432,23 @@ vectorizable_store (vec_info *vinfo,
   else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
 {
   gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
+  gcc_assert (!slp);
   if (costing_p)
{
- vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
-alignment_support_scheme, misalignment,
-vls_type, slp_node, cost_vec);
+ unsigned int inside_cost = 0, prologue_cost = 0;
+ if (vls_type == VLS_STORE_INVARIANT)
+   prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
+  stmt_info, 0, vect_prologue);
+ vect_get_store_cost (vinfo, stmt_info, ncopies,
+  alignment_support_scheme, misalignment,
+  _cost, cost_vec);
+
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_store_cost: inside_cost = %d, "
+"prologue_cost = %d .\n",
+inside_cost, prologue_cost);
+
  return true;
}
   return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, 
ncopies);
-- 
2.31.1



[PATCH 00/10] vect: Move costing next to the transform for vect store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch series is a follow up for the previous patch
series for vector load [1].  Some of associated test cases
show the benefits of this kind of structuring.  Like the
one on vect load, this patch series makes costing not call
function vect_model_store_cost any more but next to the
transform.

Most of them are organized according to the memory access
types of vector store, hope it can make the review and
bisection easy.  The changes just follow the handlings in
the function vect_model_store_cost first, then refine a bit
by referring to the transform code, I also checked them
with some typical test cases to verify.

The whole series can be bootstrapped and regtested
incrementally on:
  - x86_64-redhat-linux
  - aarch64-linux-gnu
  - powerpc64-linux-gnu P7, P8 and P9
  - powerpc64le-linux-gnu P8, P9 and P10

By considering the current vector test buckets are mainly
tested without cost model, I also verified the whole patch
series was neutral for SPEC2017 int/fp on Power9 at O2,
O3 and Ofast separately.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621460.html

Kewen Lin (10):
  vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case
  vect: Move vect_model_store_cost next to the transform in vectorizable_store
  vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER
  vect: Simplify costing on vectorizable_scan_store
  vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and 
VMAT_STRIDED_SLP
  vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES
  vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE
  aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost
  vect: Get rid of vect_model_store_cost
  vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE

 gcc/config/aarch64/aarch64.cc |   2 +-
 .../costmodel/ppc/costmodel-vect-store-1.c|  23 +
 .../costmodel/ppc/costmodel-vect-store-2.c|  29 +
 gcc/tree-vect-stmts.cc| 717 +++---
 4 files changed, 493 insertions(+), 278 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-1.c
 create mode 100644 
gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-store-2.c

-- 
2.31.1



[PATCH 02/10] vect: Move vect_model_store_cost next to the transform in vectorizable_store

2023-09-13 Thread Kewen Lin via Gcc-patches
This patch is an initial patch to move costing next to the
transform, it still adopts vect_model_store_cost for costing
but moves and duplicates it down according to the handlings
of different vect_memory_access_types or some special
handling need, hope it can make the subsequent patches easy
to review.  This patch should not have any functional
changes.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Move and duplicate the call
to vect_model_store_cost down to some different transform paths
according to the handlings of different vect_memory_access_types
or some special handling need.
---
 gcc/tree-vect-stmts.cc | 79 --
 1 file changed, 60 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a5caaf0bca2..36f7c5b9f4b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8372,7 +8372,8 @@ vectorizable_store (vec_info *vinfo,
return false;
 }
 
-  if (!vec_stmt) /* transformation not required.  */
+  bool costing_p = !vec_stmt;
+  if (costing_p) /* transformation not required.  */
 {
   STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
 
@@ -8401,11 +8402,6 @@ vectorizable_store (vec_info *vinfo,
 "Vectorizing an unaligned access.\n");
 
   STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-  vect_model_store_cost (vinfo, stmt_info, ncopies,
-memory_access_type, _info,
-alignment_support_scheme,
-misalignment, vls_type, slp_node, cost_vec);
-  return true;
 }
   gcc_assert (memory_access_type == STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info));
 
@@ -8415,12 +8411,27 @@ vectorizable_store (vec_info *vinfo,
 
   if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl)
 {
-  vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt,
- _info, mask);
+  if (costing_p)
+   vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+  _info, alignment_support_scheme, misalignment,
+  vls_type, slp_node, cost_vec);
+  else
+   vect_build_scatter_store_calls (vinfo, stmt_info, gsi, vec_stmt,
+   _info, mask);
   return true;
 }
   else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) >= 3)
-return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, ncopies);
+{
+  gcc_assert (memory_access_type == VMAT_CONTIGUOUS);
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+_info, alignment_support_scheme,
+misalignment, vls_type, slp_node, cost_vec);
+ return true;
+   }
+  return vectorizable_scan_store (vinfo, stmt_info, gsi, vec_stmt, 
ncopies);
+}
 
   if (grouped_store)
 {
@@ -8449,13 +8460,21 @@ vectorizable_store (vec_info *vinfo,
   else
 ref_type = reference_alias_ptr_type (DR_REF (first_dr_info->dr));
 
-  if (dump_enabled_p ())
-dump_printf_loc (MSG_NOTE, vect_location,
- "transform store. ncopies = %d\n", ncopies);
+  if (!costing_p && dump_enabled_p ())
+dump_printf_loc (MSG_NOTE, vect_location, "transform store. ncopies = 
%d\n",
+ncopies);
 
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
 {
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, memory_access_type,
+_info, alignment_support_scheme,
+misalignment, vls_type, slp_node, cost_vec);
+ return true;
+   }
+
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   gimple *incr;
@@ -8718,8 +8737,9 @@ vectorizable_store (vec_info *vinfo,
   else if (memory_access_type == VMAT_GATHER_SCATTER)
 {
   aggr_type = elem_type;
-  vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, _info,
-  , _offset, loop_lens);
+  if (!costing_p)
+   vect_get_strided_load_store_ops (stmt_info, loop_vinfo, gsi, _info,
+, _offset, loop_lens);
 }
   else
 {
@@ -8731,7 +8751,7 @@ vectorizable_store (vec_info *vinfo,
  memory_access_type, loop_lens);
 }
 
-  if (mask)
+  if (mask && !costing_p)
 LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
 
   /* In case the vectorization factor (VF) is bigger than the number
@@ -8782,6 +8802,13 @@ vectorizable_store (vec_info *vinfo,
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
 {
   gcc_assert (!slp && grouped_store);
+  if (costing_p)
+   {
+ vect_model_store_cost (vinfo, stmt_info, ncopies, 

Re: [PATCH] xtensa: Optimize several boolean evaluations of EQ/NE against constant zero

2023-09-13 Thread Max Filippov via Gcc-patches
On Fri, Sep 8, 2023 at 1:49 AM Takayuki 'January June' Suwa
 wrote:
>
> An idiomatic implementation of boolean evaluation of whether a register is
> zero or not in Xtensa is to assign 0 and 1 to the temporary and destination,
> and then issue the MOV[EQ/NE]Z machine instruction
> (See 8.3.2 Instruction Idioms, Xtensa ISA refman., p.599):
>
> ;; A2 = (A3 != 0) ? 1 : 0;
> movi.n  a9, 1
> movi.n  a2, 0
> movnez  a2, a9, a3  ;; if (A3 != 0) A2 = A9;
>
> As you can see in the above idiom, if the source and destination are the
> same register, a move instruction from the source to another temporary
> register must be prepended:
>
> ;; A2 = (A2 == 0) ? 1 : 0;
> mov.n   a10, a2
> movi.n  a9, 1
> movi.n  a2, 0
> moveqz  a2, a9, a10  ;; if (A10 == 0) A2 = A9;
>
> Fortunately, we can reduce the number of instructions and temporary
> registers with a few tweaks:
>
> ;; A2 = (A3 != 0) ? 1 : 0;
> movi.n  a2, 1
> moveqz  a2, a3, a3  ;; if (A3 == 0) A2 = A3;
>
> ;; A2 = (A2 != 0) ? 1 : 0;
> movi.n  a9, 1
> movnez  a2, a9, a2  ;; if (A2 != 0) A2 = A9;
>
> ;; A2 = (A3 == 0) ? 1 : 0;
> movi.n  a2, -1
> moveqz  a2, a3, a3  ;; if (A3 == 0) A2 = A3;
> addi.n  a2, a2, 1
>
> ;; A2 = (A2 == 0) ? 1 : 0;
> movi.n  a9, -1
> movnez  a2, a9, a2  ;; if (A2 != 0) A2 = A9;
> addi.n  a2, a2, 1
>
> Additionally, if TARGET_NSA is configured, the fact that it returns 32 iff
> the source of the NSAU machine instruction is 0, otherwise less than, can be
> used in boolean evaluation of EQ comparison.
>
> ;; A2 = (A3 == 0) ? 1 : 0;
> nsaua2, a3  ;; Source and destination can be the same register
> srlia2, a2, 5
>
> Furthermore, this patch also saves one instruction when determining whether
> the ANDing with mask values in which 1s are lined up from the upper or lower
> bit end (for example, 0xFFE0 or 0x003F) is 0 or not.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_expand_scc):
> Revert the changes from the last patch, as the work in the RTL
> expansion pass is too far to determine the physical registers.
> * config/xtensa/xtensa.md (*eqne_INT_MIN): Ditto.
> (eq_zero_NSA, eqne_zero, *eqne_zero_masked_bits): New patterns.
> ---
>  gcc/config/xtensa/xtensa.cc |  35 +--
>  gcc/config/xtensa/xtensa.md | 112 
>  2 files changed, 113 insertions(+), 34 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PING][PATCH v2] Add clang's invalid-noreturn warning flag

2023-09-13 Thread Julian Waters via Gcc-patches
Pinging again, this is needed for the Windows Java VM to compile under gcc

On Wed, Sep 13, 2023 at 11:09 AM Julian Waters 
wrote:

> Second desperate ping for patch
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627913.html
>


Re: [pushed][PATCH v2] LoongArch: Fix bug of 'di3_fake'.

2023-09-13 Thread chenglulu

Pushed to r14-3974.

在 2023/9/13 上午8:54, Lulu Cheng 写道:

PR 111334

gcc/ChangeLog:

* config/loongarch/loongarch.md: Fix bug of 'di3_fake'.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr111334.c: New test.
---
v1 -> v2:

Modify the template "*3", the SI type division operation
is not supported under the LA64 architecture.
---
  gcc/config/loongarch/loongarch.md | 20 ++
  gcc/testsuite/gcc.target/loongarch/pr111334.c | 39 +++
  2 files changed, 52 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/pr111334.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 1dc6b524416..4fcb6d781d5 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -72,6 +72,9 @@ (define_c_enum "unspec" [
UNSPEC_LUI_H_HI12
UNSPEC_TLS_LOW
  
+  ;; Fake div.w[u] mod.w[u]

+  UNSPEC_FAKE_ANY_DIV
+
UNSPEC_SIBCALL_VALUE_MULTIPLE_INTERNAL_1
UNSPEC_CALL_VALUE_MULTIPLE_INTERNAL_1
  ])
@@ -900,7 +903,7 @@ (define_expand "3"
 (match_operand:GPR 2 "register_operand")))]
""
  {
- if (GET_MODE (operands[0]) == SImode)
+ if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
{
  rtx reg1 = gen_reg_rtx (DImode);
  rtx reg2 = gen_reg_rtx (DImode);
@@ -920,9 +923,9 @@ (define_expand "3"
  })
  
  (define_insn "*3"

-  [(set (match_operand:GPR 0 "register_operand" "=r,,")
-   (any_div:GPR (match_operand:GPR 1 "register_operand" "r,r,0")
-(match_operand:GPR 2 "register_operand" "r,r,r")))]
+  [(set (match_operand:X 0 "register_operand" "=r,,")
+   (any_div:X (match_operand:X 1 "register_operand" "r,r,0")
+  (match_operand:X 2 "register_operand" "r,r,r")))]
""
  {
return loongarch_output_division (".\t%0,%1,%2", operands);
@@ -938,9 +941,12 @@ (define_insn "*3"
  (define_insn "di3_fake"
[(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
- (any_div:SI (match_operand:DI 1 "register_operand" "r,r,0")
- (match_operand:DI 2 "register_operand" "r,r,r"]
-  ""
+ (unspec:SI
+  [(subreg:SI
+(any_div:DI (match_operand:DI 1 "register_operand" "r,r,0")
+(match_operand:DI 2 "register_operand" "r,r,r")) 0)]
+ UNSPEC_FAKE_ANY_DIV)))]
+  "TARGET_64BIT"
  {
return loongarch_output_division (".w\t%0,%1,%2", operands);
  }
diff --git a/gcc/testsuite/gcc.target/loongarch/pr111334.c 
b/gcc/testsuite/gcc.target/loongarch/pr111334.c
new file mode 100644
index 000..47366afcb74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/pr111334.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned
+util_next_power_of_two (unsigned x)
+{
+  return (1 << __builtin_clz (x - 1));
+}
+
+extern int create_vec_from_array (void);
+
+struct ac_shader_args {
+struct {
+   unsigned char offset;
+   unsigned char size;
+} args[384];
+};
+
+struct isel_context {
+const struct ac_shader_args* args;
+int arg_temps[384];
+};
+
+
+void
+add_startpgm (struct isel_context* ctx, unsigned short arg_count)
+{
+
+  for (unsigned i = 0, arg = 0; i < arg_count; i++)
+{
+  unsigned size = ctx->args->args[i].size;
+  unsigned reg = ctx->args->args[i].offset;
+
+  if (reg % ( 4 < util_next_power_of_two (size)
+? 4 : util_next_power_of_two (size)))
+ ctx->arg_temps[i] = create_vec_from_array ();
+}
+}
+




Re:[pushed] [PATCH v4 00/22] Added support for ASX vector instructions.

2023-09-13 Thread chenglulu

Pushed to r14-3951.

在 2023/9/13 上午11:38, Xiaolong Chen 写道:

   In order to better test the function of the vector instruction, the 256
bit test cases are further split according to the function of the instruction.


Xiaolong Chen (22):
   LoongArch: Add tests for ASX vector xvadd/xvadda/xvaddi/xvaddwev/
 xvaddwodxvsadd instructions.
   LoongArch: Add tests for ASX vector xvhadd/xvhaddw/xvmaddwev/xvmaddwod
 instructions.
   LoongArch: Add tests for ASX vector subtraction instructions.
   LoongArch: Add tests for ASX vector xvmul/xvmod/xvdiv instructions.
   LoongArch: Add tests for ASX vector xvmax/xvmaxi/xvmin/xvmini
 instructions.
   LoongArch: Add tests for ASX vector
 xvldi/xvmskgez/xvmskltz/xvmsknz/xvmuh /xvsigncov instructions.
   LoongArch: Add tests for ASX vector xvand/xvandi/xvandn/xvor/xvori/
 xvnor/xvnori/xvxor/xvxori instructions.
   LoongArch: Add tests for ASX vector xvsll/xvsrl instructions.
   LoongArch: Add tests for ASX vector xvextl/xvsra/xvsran/xvsrarn
 instructions.
   LoongArch: Add tests for ASX vector
 xvssran/xvssrani/xvssrarn/xvssrarni/xvssrln/
 xvssrlni/xvssrlrn/xvssrlrni instructions.
   LoongArch: Add tests for ASX vector
 xvbitclr/xvbitclri/xvbitrev/xvbitrevi/
 xvbitsel/xvbitseli/xvbitset/xvbitseti/xvclo/xvclz/xvpcnt
 instructions.
   LoongArch: Add tests for ASX builtin functions.
   LoongArch: Add tests for ASX xvldrepl/xvstelm instruction generation.
   LoongArch: Add tests for ASX vector floating-point operation
 instruction.
   LoongArch: Add tests for ASX vector floating-point conversion
 instruction.
   LoongArch: Add tests for ASX vector comparison and selection
 instruction.
   LoongArch: Add tests for ASX vector xvfnmadd/xvfrstp/xvfstpi/xvhsubw/
 xvmsub/xvrotr/xvrotri/xvld/xvst instructions.
   LoongArch: Add tests for ASX vector
 xvabsd/xvavg/xvavgr/xvbsll/xvbsrl/xvneg/ xvsat instructions.
   LoongArch: Add tests for ASX vector
 xvfcmp{caf/ceq/cle/clt/cne/cor/cun} instructions.
   LoongArch: Add tests for ASX vector
 xvfcmp{saf/seq/sle/slt/sne/sor/sun} instructions.
   LoongArch: Add tests for ASX vector
 xvext2xv/xvexth/xvextins/xvilvh/xvilvl/xvinsgr2vr/
 xvinsve0/xvprem/xvpremi instructions.
   LoongArch: Add tests for ASX vector
 xvpackev/xvpackod/xvpickev/xvpickod/
 xvpickve2gr/xvreplgr2vr/xvreplve/xvreplve0/xvreplvei/xvshuf4i/xvshuf
 instructions.

  .../loongarch/vector/lasx/lasx-builtin.c  | 1509 
  .../loongarch/vector/lasx/lasx-xvabsd-1.c |  485 +
  .../loongarch/vector/lasx/lasx-xvabsd-2.c |  650 +++
  .../loongarch/vector/lasx/lasx-xvadd.c|  725 
  .../loongarch/vector/lasx/lasx-xvadda.c   |  785 
  .../loongarch/vector/lasx/lasx-xvaddi.c   |  427 +
  .../loongarch/vector/lasx/lasx-xvaddwev-1.c   |  740 
  .../loongarch/vector/lasx/lasx-xvaddwev-2.c   |  485 +
  .../loongarch/vector/lasx/lasx-xvaddwev-3.c   |  515 ++
  .../loongarch/vector/lasx/lasx-xvaddwod-1.c   |  530 ++
  .../loongarch/vector/lasx/lasx-xvaddwod-2.c   |  560 ++
  .../loongarch/vector/lasx/lasx-xvaddwod-3.c   |  485 +
  .../loongarch/vector/lasx/lasx-xvand.c|  155 ++
  .../loongarch/vector/lasx/lasx-xvandi.c   |  196 ++
  .../loongarch/vector/lasx/lasx-xvandn.c   |  125 ++
  .../loongarch/vector/lasx/lasx-xvavg-1.c  |  680 +++
  .../loongarch/vector/lasx/lasx-xvavg-2.c  |  560 ++
  .../loongarch/vector/lasx/lasx-xvavgr-1.c |  770 
  .../loongarch/vector/lasx/lasx-xvavgr-2.c |  650 +++
  .../loongarch/vector/lasx/lasx-xvbitclr.c |  635 +++
  .../loongarch/vector/lasx/lasx-xvbitclri.c|  515 ++
  .../loongarch/vector/lasx/lasx-xvbitrev.c |  650 +++
  .../loongarch/vector/lasx/lasx-xvbitrevi.c|  317 
  .../loongarch/vector/lasx/lasx-xvbitsel.c |  134 ++
  .../loongarch/vector/lasx/lasx-xvbitseli.c|  185 ++
  .../loongarch/vector/lasx/lasx-xvbitset.c |  620 +++
  .../loongarch/vector/lasx/lasx-xvbitseti.c|  405 +
  .../loongarch/vector/lasx/lasx-xvbsll_v.c |  130 ++
  .../loongarch/vector/lasx/lasx-xvbsrl_v.c |   64 +
  .../loongarch/vector/lasx/lasx-xvclo.c|  449 +
  .../loongarch/vector/lasx/lasx-xvclz.c|  504 ++
  .../loongarch/vector/lasx/lasx-xvdiv-1.c  |  485 +
  .../loongarch/vector/lasx/lasx-xvdiv-2.c  |  500 ++
  .../loongarch/vector/lasx/lasx-xvext2xv-1.c   |  515 ++
  .../loongarch/vector/lasx/lasx-xvext2xv-2.c   |  669 +++
  .../loongarch/vector/lasx/lasx-xvexth-1.c |  350 
  .../loongarch/vector/lasx/lasx-xvexth-2.c |  592 ++
  .../loongarch/vector/lasx/lasx-xvextl-1.c |   86 +
  .../loongarch/vector/lasx/lasx-xvextl-2.c |  163 ++
  .../loongarch/vector/lasx/lasx-xvextrins.c|  515 ++
  .../loongarch/vector/lasx/lasx-xvfadd_d.c |  545 ++
  .../loongarch/vector/lasx/lasx-xvfadd_s.c |  

Re:[pushed] [PATCH v4 00/23] Add tests for SX vector instructions.

2023-09-13 Thread chenglulu

Pushed to r14-3928.

在 2023/9/13 上午11:31, Xiaolong Chen 写道:

v3 -> v4:
   Modify the name of the patch file.

   In order to better test the function of the vector instruction, the 128 bit
test cases are further split according to the function of the instruction.


Xiaolong Chen (23):
   LoongArch: Add tests of -mstrict-align option.
   LoongArch: Add testsuite framework for Loongson SX/ASX.
   LoongArch: Add tests for Loongson SX builtin functions.
   LoongArch: Add tests for SX vector floating-point instructions.
   LoongArch: Add tests for SX vector addition instructions.
   LoongArch: Add tests for SX vector subtraction instructions.
   LoongArch: Add tests for SX vector addition vsadd instructions.
   LoongArch: Add tests for the SX vector multiplication instruction.
   LoongArch: Add tests for SX vector vavg/vavgr instructions.
   LoongArch: Add tests for SX vector vmax/vmaxi/vmin/vmini instructions.
   LoongArch: Add tests for SX vector vexth/vextl/vldi/vneg/vsat
 instructions.
   LoongArch: Add tests for SX vector
 vabsd/vmskgez/vmskltz/vmsknz/vsigncov instructions.
   LoongArch: Add tests for SX vector vdiv/vmod instructions.
   LoongArch: Add tests for SX vector
 vsll/vslli/vsrl/vsrli/vsrln/vsrlni/vsrlr /vsrlri/vslrlrn/vsrlrni
 instructions.
   LoongArch: Add tests for SX vector
 vrotr/vrotri/vsra/vsrai/vsran/vsrani /vsrarn/vsrarni instructions.
   LoongArch: Add tests for SX vector
 vssran/vssrani/vssrarn/vssrarni/vssrln /vssrlni/vssrlrn/vssrlrni
 instructions.
   LoongArch: Add tests for SX vector vbitclr/vbitclri/vbitrev/vbitrevi/
 vbitsel/vbitseli/vbitset/vbitseti/vclo/vclz/vpcnt instructions.
   LoongArch: Add tests for SX vector floating point arithmetic
 instructions.
   LoongArch: Add tests for SX vector vfrstp/vfrstpi/vseq/vseqi/vsle
 /vslei/vslt/vslti instructions.
   LoongArch: Add tests for SX vector vfcmp instructions.
   LoongArch: Add tests for SX vector handling and shuffle instructions.
   LoongArch: Add tests for SX vector vand/vandi/vandn/vor/vori/vnor/
 vnori/vxor/vxori instructions.
   LoongArch: Add tests for SX vector vfmadd/vfnmadd/vld/vst
 instructions.

  .../gcc.target/loongarch/strict-align.c   |   12 +
  .../loongarch/vector/loongarch-vector.exp |   42 +
  .../loongarch/vector/lsx/lsx-builtin.c| 1461 +
  .../loongarch/vector/lsx/lsx-vabsd-1.c|  272 +++
  .../loongarch/vector/lsx/lsx-vabsd-2.c|  398 +
  .../loongarch/vector/lsx/lsx-vadd.c   |  416 +
  .../loongarch/vector/lsx/lsx-vadda.c  |  344 
  .../loongarch/vector/lsx/lsx-vaddi.c  |  251 +++
  .../loongarch/vector/lsx/lsx-vaddwev-1.c  |  335 
  .../loongarch/vector/lsx/lsx-vaddwev-2.c  |  344 
  .../loongarch/vector/lsx/lsx-vaddwev-3.c  |  425 +
  .../loongarch/vector/lsx/lsx-vaddwod-1.c  |  408 +
  .../loongarch/vector/lsx/lsx-vaddwod-2.c  |  344 
  .../loongarch/vector/lsx/lsx-vaddwod-3.c  |  237 +++
  .../loongarch/vector/lsx/lsx-vand.c   |  159 ++
  .../loongarch/vector/lsx/lsx-vandi.c  |   67 +
  .../loongarch/vector/lsx/lsx-vandn.c  |  129 ++
  .../loongarch/vector/lsx/lsx-vavg-1.c |  398 +
  .../loongarch/vector/lsx/lsx-vavg-2.c |  308 
  .../loongarch/vector/lsx/lsx-vavgr-1.c|  299 
  .../loongarch/vector/lsx/lsx-vavgr-2.c|  317 
  .../loongarch/vector/lsx/lsx-vbitclr.c|  461 ++
  .../loongarch/vector/lsx/lsx-vbitclri.c   |  279 
  .../loongarch/vector/lsx/lsx-vbitrev.c|  407 +
  .../loongarch/vector/lsx/lsx-vbitrevi.c   |  336 
  .../loongarch/vector/lsx/lsx-vbitsel.c|  109 ++
  .../loongarch/vector/lsx/lsx-vbitseli.c   |   84 +
  .../loongarch/vector/lsx/lsx-vbitset.c|  371 +
  .../loongarch/vector/lsx/lsx-vbitseti.c   |  279 
  .../loongarch/vector/lsx/lsx-vbsll.c  |   83 +
  .../loongarch/vector/lsx/lsx-vbsrl.c  |   55 +
  .../loongarch/vector/lsx/lsx-vclo.c   |  266 +++
  .../loongarch/vector/lsx/lsx-vclz.c   |  265 +++
  .../loongarch/vector/lsx/lsx-vdiv-1.c |  299 
  .../loongarch/vector/lsx/lsx-vdiv-2.c |  254 +++
  .../loongarch/vector/lsx/lsx-vexth-1.c|  342 
  .../loongarch/vector/lsx/lsx-vexth-2.c|  182 ++
  .../loongarch/vector/lsx/lsx-vextl-1.c|   83 +
  .../loongarch/vector/lsx/lsx-vextl-2.c|   83 +
  .../loongarch/vector/lsx/lsx-vextrins.c   |  479 ++
  .../loongarch/vector/lsx/lsx-vfadd_d.c|  407 +
  .../loongarch/vector/lsx/lsx-vfadd_s.c|  470 ++
  .../loongarch/vector/lsx/lsx-vfclass_d.c  |   83 +
  .../loongarch/vector/lsx/lsx-vfclass_s.c  |   74 +
  .../loongarch/vector/lsx/lsx-vfcmp_caf.c  |  244 +++
  .../loongarch/vector/lsx/lsx-vfcmp_ceq.c  |  516 ++
  .../loongarch/vector/lsx/lsx-vfcmp_cle.c  |  530 ++
  

[PATCH v5] c++: Move consteval folding to cp_fold_r

2023-09-13 Thread Marek Polacek via Gcc-patches
On Wed, Sep 13, 2023 at 05:57:47PM -0400, Jason Merrill wrote:
> On 9/13/23 16:56, Marek Polacek wrote:
> > On Tue, Sep 12, 2023 at 05:26:25PM -0400, Jason Merrill wrote:
> > > On 9/8/23 14:24, Marek Polacek wrote:
> > > > +  switch (TREE_CODE (stmt))
> > > > +{
> > > > +/* Unfortunately we must handle code like
> > > > +false ? bar () : 42
> > > > +   where we have to check bar too.  */
> > > > +case COND_EXPR:
> > > > +  if (cp_fold_immediate_r (_OPERAND (stmt, 1), walk_subtrees, 
> > > > data))
> > > > +   return error_mark_node;
> > > > +  if (TREE_OPERAND (stmt, 2)
> > > > + && cp_fold_immediate_r (_OPERAND (stmt, 2), 
> > > > walk_subtrees, data))
> > > > +   return error_mark_node;
> > > 
> > > Is this necessary?  Doesn't walk_tree already walk into the arms of
> > > COND_EXPR?
> > 
> > Unfortunately yes.  The cp_fold call in cp_fold_r could fold the ?: into
> > a constant before we see it here.  I've added a comment saying just that.
> 
> Ah.  But in that case I guess we need to walk into the arms, not just check
> the top-level expression in them.
 
Arg, of course.  I was fooled into thinking that it would recurse, but
you're right.  Fixed by using cp_walk_tree as I intended.  Tested in
consteval34.C.

> But maybe cp_fold_r should do that before the cp_fold, instead of this
> function?

I...am not sure how that would be better than what I did.

> > > > +  break;
> > > > +
> > > >case PTRMEM_CST:
> > > >  if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL
> > > >   && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt)))
> > > > {
> > > > - if (!data->pset.add (stmt))
> > > > + if (!data->pset.add (stmt) && (complain & tf_error))
> > > > error_at (PTRMEM_CST_LOCATION (stmt),
> > > >   "taking address of an immediate function %qD",
> > > >   PTRMEM_CST_MEMBER (stmt));
> > > >   stmt = *stmt_p = build_zero_cst (TREE_TYPE (stmt));
> > > 
> > > It looks like this will overwrite *stmt_p even if we didn't give an error.
> > 
> > I suppose that could result in missing errors, adjusted.  And there's no
> > point in setting stmt.
> > > > - break;
> > > > + return error_mark_node;
> > > > }
> > > >  break;
> > > > +/* Expand immediate invocations.  */
> > > > +case CALL_EXPR:
> > > > +case AGGR_INIT_EXPR:
> > > > +  if (tree fn = cp_get_callee (stmt))
> > > > +   if (TREE_CODE (fn) != ADDR_EXPR || ADDR_EXPR_DENOTES_CALL_P 
> > > > (fn))
> > > > + if (tree fndecl = cp_get_fndecl_from_callee (fn, 
> > > > /*fold*/false))
> > > > +   if (DECL_IMMEDIATE_FUNCTION_P (fndecl))
> > > > + {
> > > > +   *stmt_p = stmt = cxx_constant_value (stmt, complain);
> > > 
> > > Likewise.
> > 
> > I think we have to keep setting *stmt_p to actually evaluate consteval
> > functions.
> 
> But only when it succeeds; we don't want to set it to error_mark_node if we
> aren't complaining.

Hmm, probably not.  Fixed, thanks.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In the review of P2564:

it turned out that in order to correctly handle an example in the paper,
we should stop doing immediate evaluation in build_over_call and
bot_replace, and instead do it in cp_fold_r.  This patch does that.

Another benefit is that this is a pretty significant simplification, at
least in my opinion.  Also, this fixes the c++/110997 ICE (but the test
doesn't compile yet).

The main drawback seems to be that cp_fold_r doesn't process
uninstantiated templates.  We still have to handle things like
"false ? foo () : 1".  To that end, I've added cp_fold_immediate, called
on dead branches in cxx_eval_conditional_expression.  Since in cxx_*
I can't rely on current_function_decl being available, I've added
another walk: a new overload for in_immediate_context that looks into
constexpr_ctx.

You'll see that I've reintroduced ADDR_EXPR_DENOTES_CALL_P here.  This
is to detect

  *()) ()
  (s.*::foo) ()

which were deemed ill-formed.

gcc/cp/ChangeLog:

* call.cc (in_immediate_context): No longer static.
(build_over_call): Set ADDR_EXPR_DENOTES_CALL_P.  Don't handle
immediate_invocation_p here.
* constexpr.cc (in_immediate_context): New overload.
(cxx_eval_call_expression): Use mce_true for DECL_IMMEDIATE_FUNCTION_P.
(cxx_eval_conditional_expression): Call cp_fold_immediate.
* cp-gimplify.cc (maybe_replace_decl): Make static.
(cp_fold_r): Expand immediate invocations.
(cp_fold_immediate_r): New.
(cp_fold_immediate): New.
* cp-tree.h (ADDR_EXPR_DENOTES_CALL_P): Define.
(cp_fold_immediate): Declare.
* tree.cc (bot_replace): Don't handle immediate invocations here.

libstdc++-v3/ChangeLog:

* 

[PATCH] libstdc++: Reduce integer std::to/from_chars symbol sizes

2023-09-13 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

For std::to_chars:

The constrained alias __integer_to_chars_result_type seems unnecessary
ever since r10-3080-g28f0075742ed58 got rid of the only public overload
which used it.  Now only non-public overloads are constrained by it
(through their return type) and these non-public overloads aren't used
in a SFINAE context, so the constraints have no observable effect.  So
this patch gets rid of this alias, which greatly reduces the symbol
sizes of the affected functions (since the expanded alias is quite
large).

For std::from_chars:

We can't get rid of the corresponding alias because it's constrains
the public integer std::from_chars overload.  But we can avoid having
the constraint bloat the mangled name by instead encoding it as a
defaulted template parameter.  We use the non-type parameter form

  enable_if_t<..., int> = 0

instead of the type parameter form

  typename = enable_if_t<...>

because the type form can be circumvented by providing an explicit template
argument to type parameter, e.g. 'std::from_chars(...)', so the
non-type form seems like the more robust choice.

In passing, use __is_standard_integer in the constraint.

libstdc++-v3/ChangeLog:

* include/std/charconv (__detail::__integer_to_chars_result_type):
Remove.
(__detail::__to_chars_16): Use to_chars_result as return type.
(__detail::__to_chars_10): Likewise.
(__detail::__to_chars_8): Likewise.
(__detail::__to_chars_2): Likewise.
(__detail::__to_chars_i): Likewise.
(__detail::__integer_from_chars_result_type): Pull out enable_if_t
condition into and replace with ...
(__detail::__integer_from_chars_enabled): ... this.  Use
__is_standard_integer instead of __is_signed_integer and
__is_unsigned_integer.
(from_chars): Encode constraint as a defaulted non-type template
parameter instead of within the return type.
---
 libstdc++-v3/include/std/charconv | 33 ++-
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 01711d38576..ec25ae139ba 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -79,17 +79,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 namespace __detail
 {
-  template
-using __integer_to_chars_result_type
-  = enable_if_t<__or_<__is_signed_integer<_Tp>,
- __is_unsigned_integer<_Tp>,
-#if defined __SIZEOF_INT128__ && defined __STRICT_ANSI__
- is_same<_Tp, signed __int128>,
- is_same<_Tp, unsigned __int128>,
-#endif
- is_same>>::value,
-   to_chars_result>;
-
   // Pick an unsigned type of suitable size. This is used to reduce the
   // number of specializations of __to_chars_len, __to_chars etc. that
   // get instantiated. For example, to_chars and to_chars
@@ -162,7 +151,7 @@ namespace __detail
 }
 
   template
-constexpr __integer_to_chars_result_type<_Tp>
+constexpr to_chars_result
 __to_chars_16(char* __first, char* __last, _Tp __val) noexcept
 {
   static_assert(__integer_to_chars_is_unsigned<_Tp>, "implementation bug");
@@ -208,7 +197,7 @@ namespace __detail
 }
 
   template
-constexpr __integer_to_chars_result_type<_Tp>
+constexpr to_chars_result
 __to_chars_10(char* __first, char* __last, _Tp __val) noexcept
 {
   static_assert(__integer_to_chars_is_unsigned<_Tp>, "implementation bug");
@@ -231,7 +220,7 @@ namespace __detail
 }
 
   template
-constexpr __integer_to_chars_result_type<_Tp>
+constexpr to_chars_result
 __to_chars_8(char* __first, char* __last, _Tp __val) noexcept
 {
   static_assert(__integer_to_chars_is_unsigned<_Tp>, "implementation bug");
@@ -284,7 +273,7 @@ namespace __detail
 }
 
   template
-constexpr __integer_to_chars_result_type<_Tp>
+constexpr to_chars_result
 __to_chars_2(char* __first, char* __last, _Tp __val) noexcept
 {
   static_assert(__integer_to_chars_is_unsigned<_Tp>, "implementation bug");
@@ -320,7 +309,7 @@ namespace __detail
 } // namespace __detail
 
   template
-constexpr __detail::__integer_to_chars_result_type<_Tp>
+constexpr to_chars_result
 __to_chars_i(char* __first, char* __last, _Tp __value, int __base = 10)
 {
   __glibcxx_assert(2 <= __base && __base <= 36);
@@ -548,17 +537,15 @@ namespace __detail
 }
 
   template
-using __integer_from_chars_result_type
-  = enable_if_t<__or_<__is_signed_integer<_Tp>,
- __is_unsigned_integer<_Tp>,
- is_same>>::value,
-   from_chars_result>;
+constexpr bool __integer_from_chars_enabled
+  = __or_<__is_standard_integer<_Tp>, is_same>>::value;
 
 } // namespace __detail
 
   /// std::from_chars for 

[PATCH] Improve error message for if with an else part while in switch

2023-09-13 Thread Andrew Pinski via Gcc-patches
While writing some match.pd code, I was trying to figure
out why I was getting an `expected ), got (` error message
while writing an if statement with an else clause. For switch
statements, the if statements cannot have an else clause so
it would be better to have a decent error message saying that
explictly.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* genmatch.cc (parser::parse_result): For an else clause
of an if statement inside a switch, error out explictly.
---
 gcc/genmatch.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a1925a747a7..03d325efdf6 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -4891,6 +4891,8 @@ parser::parse_result (operand *result, predicate_id 
*matcher)
ife->trueexpr = parse_result (result, matcher);
  else
ife->trueexpr = parse_op ();
+ if (peek ()->type == CPP_OPEN_PAREN)
+   fatal_at (peek(), "if inside switch cannot have an else");
  eat_token (CPP_CLOSE_PAREN);
}
  else
-- 
2.31.1



Re: [WIP RFC] analyzer: Move gcc.dg/analyzer tests to c-c++-common (3) [PR96395]

2023-09-13 Thread David Malcolm via Gcc-patches
On Mon, 2023-09-11 at 19:44 +0200, priour...@gmail.com wrote:
> From: benjamin priour 
> 
> Hi,
> 
> Patch below is mostly done, just have to check the formatting.
> Althought, I'd like your feedback on how to manage named_constants
> from enum in C++.
> 
> I've checked and the analyzer works as expected with them.
> However, C++ FE makes it so that given
> 
> enum
> {
>   NAMED = 0x1
> };
> 
> then in analyzer-language.cc:maybe_stash_named_constant
> 
>     tree t = tu.lookup_constant_by_id (id);
>     ...
>     logger->log ("%qs: %qE", name, t);
> 
> t is printed as 1 in C, but NAMED in C++.
> Should it be left as a "FE specifity",
> or should we aim for 1 in C++ as well ?

Thanks for the patch.

It seems that the patch consists of three parts:
(a) adding kf_bzero
(b) refactoring/moving c_translation_unit so it can be used by g++
(c) a whole bunch of tests being moved, some of which may depend on (a)
and (b); are there some that don't?

Given how big the (c) changes look like in a "diff", I'd prefer the (a)
and (b) changes to be split out as preliminaries, for readability.  

Presumably this change could be made part of (a):
* gcc.dg/analyzer/bzero-1.c: Moved to...
* c-c++-common/analyzer/bzero-1.c: ...here.

Does anything in the patch actually use (b)?  IIRC it's used by the
file-descriptor tests, so fd-*.c, pipe-*.c, etc.

As for your question, lookup_constant_by_id should return an
INTEGER_CST (or NULL_TREE), so presumably we want t to be printed as
'1' with both frontends.

Dave



Re: [PATCH] testsuite work-around compound-assignment-1.c C++ failures on various targets [PR111377]

2023-09-13 Thread David Malcolm via Gcc-patches
On Tue, 2023-09-12 at 09:02 +0200, Jakub Jelinek wrote:
> On Mon, Sep 11, 2023 at 11:11:30PM +0200, Jakub Jelinek via Gcc-
> patches wrote:
> > On Mon, Sep 11, 2023 at 07:27:57PM +0200, Benjamin Priour via Gcc-
> > patches wrote:
> > > Thanks for the report,
> > > 
> > > After investigation it seems the location of the new dejagnu
> > > directive for
> > > C++ differs depending on the configuration.
> > > The expected warning is still emitted, but its location differ
> > > slightly.
> > > I expect it to be not an issue per se of the analyzer, but a
> > > divergence in
> > > the FE between the two configurations.
> > 
> > I think the divergence is whether called_by_test_5b returns the
> > struct
> > in registers or in memory.  If in memory (like in the x86_64 -m32
> > case), we have
> >   [compound-assignment-1.c:71:21] D.3191 = called_by_test_5b ();
> > [return slot optimization]
> >   [compound-assignment-1.c:71:21 discrim 1] D.3191 ={v}
> > {CLOBBER(eol)};
> >   [compound-assignment-1.c:72:1] return;
> > in the IL, while if in registers (like x86_64 -m64 case), just
> >   [compound-assignment-1.c:71:21] D.3591 = called_by_test_5b ();
> >   [compound-assignment-1.c:72:1] return;
> > 
> > If you just want to avoid the differences, putting } on the same
> > line as the
> > call might be a usable workaround for that.
> 
> Here is the workaround in patch form.  Tested on x86_64-linux -m32/-
> m64, ok
> for trunk?

Yes, thanks!

Dave

> 
> 2023-09-12  Jakub Jelinek  
> 
> PR testsuite/111377
> * c-c++-common/analyzer/compound-assignment-1.c (test_5b):
> Move
> closing } to the same line as the call to work-around
> differences in
> diagnostics line.
> 
> --- gcc/testsuite/c-c++-common/analyzer/compound-assignment-
> 1.c.jj  2023-09-11 11:05:47.523727789 +0200
> +++ gcc/testsuite/c-c++-common/analyzer/compound-assignment-1.c 2023-
> 09-12 08:58:52.854231161 +0200
> @@ -68,5 +68,8 @@ called_by_test_5b (void)
>  
>  void test_5b (void)
>  {
> -  called_by_test_5b ();
> -} /* { dg-warning "leak of '.ptr_wrapper::ptr'" "" {
> target c++ } } */
> +  called_by_test_5b (); }
> +/* { dg-warning "leak of '.ptr_wrapper::ptr'" "" { target
> c++ } .-1 } */
> +/* The closing } above is intentionally on the same line as the
> call, because
> +   otherwise the exact line of the diagnostics depends on whether
> the
> +   called_by_test_5b () call satisfies aggregate_value_p or not.  */
> 
> 
> Jakub
> 



Re: [PATCH v4] c++: Move consteval folding to cp_fold_r

2023-09-13 Thread Jason Merrill via Gcc-patches

On 9/13/23 16:56, Marek Polacek wrote:

On Tue, Sep 12, 2023 at 05:26:25PM -0400, Jason Merrill wrote:

On 9/8/23 14:24, Marek Polacek wrote:
  

+  switch (TREE_CODE (stmt))
+{
+/* Unfortunately we must handle code like
+false ? bar () : 42
+   where we have to check bar too.  */
+case COND_EXPR:
+  if (cp_fold_immediate_r (_OPERAND (stmt, 1), walk_subtrees, data))
+   return error_mark_node;
+  if (TREE_OPERAND (stmt, 2)
+ && cp_fold_immediate_r (_OPERAND (stmt, 2), walk_subtrees, data))
+   return error_mark_node;


Is this necessary?  Doesn't walk_tree already walk into the arms of
COND_EXPR?


Unfortunately yes.  The cp_fold call in cp_fold_r could fold the ?: into
a constant before we see it here.  I've added a comment saying just that.


Ah.  But in that case I guess we need to walk into the arms, not just 
check the top-level expression in them.


But maybe cp_fold_r should do that before the cp_fold, instead of this 
function?



+  break;
+
   case PTRMEM_CST:
 if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL
  && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt)))
{
- if (!data->pset.add (stmt))
+ if (!data->pset.add (stmt) && (complain & tf_error))
error_at (PTRMEM_CST_LOCATION (stmt),
  "taking address of an immediate function %qD",
  PTRMEM_CST_MEMBER (stmt));
  stmt = *stmt_p = build_zero_cst (TREE_TYPE (stmt));


It looks like this will overwrite *stmt_p even if we didn't give an error.


I suppose that could result in missing errors, adjusted.  And there's no
point in setting stmt.
  

- break;
+ return error_mark_node;
}
 break;
+/* Expand immediate invocations.  */
+case CALL_EXPR:
+case AGGR_INIT_EXPR:
+  if (tree fn = cp_get_callee (stmt))
+   if (TREE_CODE (fn) != ADDR_EXPR || ADDR_EXPR_DENOTES_CALL_P (fn))
+ if (tree fndecl = cp_get_fndecl_from_callee (fn, /*fold*/false))
+   if (DECL_IMMEDIATE_FUNCTION_P (fndecl))
+ {
+   *stmt_p = stmt = cxx_constant_value (stmt, complain);


Likewise.


I think we have to keep setting *stmt_p to actually evaluate consteval
functions.


But only when it succeeds; we don't want to set it to error_mark_node if 
we aren't complaining.


Jason



Re: [PATCH 2/2 v2] Ada: Finalization of constrained subtypes of unconstrained synchronized private extensions

2023-09-13 Thread Gary Dismukes via Gcc-patches
Hi Richard,

I hope you're doing well.

I'm just following up on the patch (second version) that you sent us
recently for the problem you ran into with the generation of the
address finalization routine for synchronized private extensions.

Thanks very much for finding this fix and submitting your patch.
Your patch looks good, and we'll look into applying that to the
compiler, though no guarantees about when it will be added.  

Best regards,
  Gary Dismukes



[PATCH v4] c++: Move consteval folding to cp_fold_r

2023-09-13 Thread Marek Polacek via Gcc-patches
On Tue, Sep 12, 2023 at 05:26:25PM -0400, Jason Merrill wrote:
> On 9/8/23 14:24, Marek Polacek wrote:
> > +  if (!in_immediate_context ()
> > +  /* P2564: a subexpression of a manifestly constant-evaluated 
> > expression
> > +or conversion is an immediate function context.  */
> > +  && ctx->manifestly_const_eval != mce_true
> 
> I might check this first as a tiny optimization.

Done.
 
> > +  switch (TREE_CODE (stmt))
> > +{
> > +/* Unfortunately we must handle code like
> > +false ? bar () : 42
> > +   where we have to check bar too.  */
> > +case COND_EXPR:
> > +  if (cp_fold_immediate_r (_OPERAND (stmt, 1), walk_subtrees, 
> > data))
> > +   return error_mark_node;
> > +  if (TREE_OPERAND (stmt, 2)
> > + && cp_fold_immediate_r (_OPERAND (stmt, 2), walk_subtrees, data))
> > +   return error_mark_node;
> 
> Is this necessary?  Doesn't walk_tree already walk into the arms of
> COND_EXPR?

Unfortunately yes.  The cp_fold call in cp_fold_r could fold the ?: into
a constant before we see it here.  I've added a comment saying just that.

> > +  break;
> > +
> >   case PTRMEM_CST:
> > if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL
> >   && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt)))
> > {
> > - if (!data->pset.add (stmt))
> > + if (!data->pset.add (stmt) && (complain & tf_error))
> > error_at (PTRMEM_CST_LOCATION (stmt),
> >   "taking address of an immediate function %qD",
> >   PTRMEM_CST_MEMBER (stmt));
> >   stmt = *stmt_p = build_zero_cst (TREE_TYPE (stmt));
> 
> It looks like this will overwrite *stmt_p even if we didn't give an error.

I suppose that could result in missing errors, adjusted.  And there's no
point in setting stmt.
 
> > - break;
> > + return error_mark_node;
> > }
> > break;
> > +/* Expand immediate invocations.  */
> > +case CALL_EXPR:
> > +case AGGR_INIT_EXPR:
> > +  if (tree fn = cp_get_callee (stmt))
> > +   if (TREE_CODE (fn) != ADDR_EXPR || ADDR_EXPR_DENOTES_CALL_P (fn))
> > + if (tree fndecl = cp_get_fndecl_from_callee (fn, /*fold*/false))
> > +   if (DECL_IMMEDIATE_FUNCTION_P (fndecl))
> > + {
> > +   *stmt_p = stmt = cxx_constant_value (stmt, complain);
> 
> Likewise.

I think we have to keep setting *stmt_p to actually evaluate consteval
functions.  But again, no need to set stmt.

> > +   if (stmt == error_mark_node)
> > + return error_mark_node;
> > + }
> > +  break;
> > +
> >   case ADDR_EXPR:
> > if (TREE_CODE (TREE_OPERAND (stmt, 0)) == FUNCTION_DECL
> >   && DECL_IMMEDIATE_FUNCTION_P (TREE_OPERAND (stmt, 0)))
> > {
> > - error_at (EXPR_LOCATION (stmt),
> > -   "taking address of an immediate function %qD",
> > -   TREE_OPERAND (stmt, 0));
> > + if (complain & tf_error)
> > +   error_at (EXPR_LOCATION (stmt),
> > + "taking address of an immediate function %qD",
> > + TREE_OPERAND (stmt, 0));
> >   stmt = *stmt_p = build_zero_cst (TREE_TYPE (stmt));
> 
> Likewise.

Adjusted like case PTRMEM_CST.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In the review of P2564:

it turned out that in order to correctly handle an example in the paper,
we should stop doing immediate evaluation in build_over_call and
bot_replace, and instead do it in cp_fold_r.  This patch does that.

Another benefit is that this is a pretty significant simplification, at
least in my opinion.  Also, this fixes the c++/110997 ICE (but the test
doesn't compile yet).

The main drawback seems to be that cp_fold_r doesn't process
uninstantiated templates.  We still have to handle things like
"false ? foo () : 1".  To that end, I've added cp_fold_immediate, called
on dead branches in cxx_eval_conditional_expression.  Since in cxx_*
I can't rely on current_function_decl being available, I've added
another walk: a new overload for in_immediate_context that looks into
constexpr_ctx.

You'll see that I've reintroduced ADDR_EXPR_DENOTES_CALL_P here.  This
is to detect

  *()) ()
  (s.*::foo) ()

which were deemed ill-formed.

gcc/cp/ChangeLog:

* call.cc (in_immediate_context): No longer static.
(build_over_call): Set ADDR_EXPR_DENOTES_CALL_P.  Don't handle
immediate_invocation_p here.
* constexpr.cc (in_immediate_context): New overload.
(cxx_eval_call_expression): Use mce_true for DECL_IMMEDIATE_FUNCTION_P.
(cxx_eval_conditional_expression): Call cp_fold_immediate.
* cp-gimplify.cc (maybe_replace_decl): Make static.
(cp_fold_r): Expand immediate invocations.
(cp_fold_immediate_r): New.
(cp_fold_immediate): New.
* cp-tree.h (ADDR_EXPR_DENOTES_CALL_P): Define.
(cp_fold_immediate): Declare.
* tree.cc 

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix friend declarations

2023-09-13 Thread Jonathan Wakely via Gcc-patches
On Wed, 13 Sept 2023 at 21:47, François Dumont  wrote:
>
> It's working and what's I've committed.

Nice, thanks!


>
> Thanks
>
> On 12/09/2023 19:04, Jonathan Wakely wrote:
> > On Tue, 12 Sept 2023 at 17:47, Jonathan Wakely  wrote:
> >> On Wed, 23 Aug 2023 at 18:35, François Dumont via Libstdc++
> >>  wrote:
> >>> Hi
> >>>
> >>> The few tests that are failing in versioned namespace mode are due to
> >>> those friend declarations.
> >>>
> >>> This is a fix proposal even if I considered 2 other options:
> >>>
> >>> 1. Make __format::_Arg_store a struct and so do not bother with friend
> >>> declarations.
> >>>
> >>> 2. Consider it as a compiler bug and do nothing. In this case I think we
> >>> might still need this patch to avoid a non-working format library in
> >>> versioned namespace mode in gcc 14 if compiler bug is not fixed.
> >> It definitely is a compiler bug, this is PR c++/59256.
> >>
> >> Please add a comment to the new macro definition, so we remember to
> >> remove it when it's not needed:
> >>
> >>
> >> #if _GLIBCXX_INLINE_VERSION
> >> // Needed because of PR c++/59526
> >> # define _GLIBCXX_STD_V std::__8
> >> #else
> >> # define _GLIBCXX_STD_V std
> >> #endif
> >>
> >>
> >> OK with that change, thanks.
> > Actually, are you sure the friend std::basic_format_args declaration
> > needs to change?
> >
> > I only see errors for the friend function, not the friend class. So
> > this seems to fix it:
> >
> > --- a/libstdc++-v3/include/std/format
> > +++ b/libstdc++-v3/include/std/format
> > @@ -3437,7 +3437,13 @@ namespace __format
> >
> >template
> > friend auto
> > -   std::make_format_args(_Argz&&...) noexcept;
> > +#if _GLIBCXX_INLINE_VERSION
> > +   // Needed for PR c++/59526
> > +   std::__8::
> > +#else
> > +   std::
> > +#endif
> > +   make_format_args(_Argz&&...) noexcept;
> >
> >// For a sufficiently small number of arguments we only store values.
> >// basic_format_args can get the types from the _Args pack.
> >
> >
> >
> >
> >>
> >>> I can also define _GLIBCXX_STD_V at  level to limit impact.
> >>>
> >>>   libstdc++: [_GLIBCXX_INLINE_VERSION] Fix  friend 
> >>> declarations
> >>>
> >>>   GCC do not consider the inline namespace in friend declarations. We
> >>> need
> >>>   to explicit this namespace.
> >>>
> >>>   libstdc++-v3/ChangeLog:
> >>>
> >>>   * include/bits/c++config (_GLIBCXX_STD_V): New macro giving
> >>> current
> >>>   std namespace with optionally the version namespace.
> >>>   * include/std/format (std::__format::_Arg_store): Use
> >>> latter on friend
> >>>   declarations.
> >>>
> >>> Tested under versioned mode.
> >>>
> >>> Ok to commit ?
> >>>
> >>> François
>



Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-09-13 Thread Segher Boessenkool
Hi!

On Fri, Jun 30, 2023 at 02:26:35PM -0500, Pat Haugen wrote:
> gcc/
> * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
> scalar modulo.

"Check whether the modulo instruction is disabled?"

> * config/rs6000/rs6000.md (mod3, *mod3): Disable.
> (define_expand umod3): New.
> (define_insn umod3): Rename to *umod3 and disable.
> (umodti3, modti3): Disable.

None of these patterns are disabled!  Instead, the new TARGET_* thing
is used.

> +/* Disable generation of scalar modulo instructions due to performance issues
> +   with certain input values.  This can be removed in the future when the
> +   issues have been resolved.  */
> +#define RS6000_DISABLE_SCALAR_MODULO 1

I think that is a bit optimistic -- in the future we will still want to
support older cores ;-)

> -  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
>"vmoduq %0,%1,%2"

Did we ever test if this insn in fact is slower as well?  I don't mean
either way, orthogonality is good, but just for my enlightenment.

> +++ b/gcc/testsuite/gcc.target/powerpc/clone1.c

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */

Xfail, but heh.  No need to change that.

> +/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
> +/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */

Thanks for the \m \M, it is much harder to determine if the tests
actually work, without that :-)

With improved changelog: okay for trunk.  Okay for all backports as
well (after some soak time).

Thank you!


Segher


Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix friend declarations

2023-09-13 Thread François Dumont via Gcc-patches

It's working and what's I've committed.

Thanks

On 12/09/2023 19:04, Jonathan Wakely wrote:

On Tue, 12 Sept 2023 at 17:47, Jonathan Wakely  wrote:

On Wed, 23 Aug 2023 at 18:35, François Dumont via Libstdc++
 wrote:

Hi

The few tests that are failing in versioned namespace mode are due to
those friend declarations.

This is a fix proposal even if I considered 2 other options:

1. Make __format::_Arg_store a struct and so do not bother with friend
declarations.

2. Consider it as a compiler bug and do nothing. In this case I think we
might still need this patch to avoid a non-working format library in
versioned namespace mode in gcc 14 if compiler bug is not fixed.

It definitely is a compiler bug, this is PR c++/59256.

Please add a comment to the new macro definition, so we remember to
remove it when it's not needed:


#if _GLIBCXX_INLINE_VERSION
// Needed because of PR c++/59526
# define _GLIBCXX_STD_V std::__8
#else
# define _GLIBCXX_STD_V std
#endif


OK with that change, thanks.

Actually, are you sure the friend std::basic_format_args declaration
needs to change?

I only see errors for the friend function, not the friend class. So
this seems to fix it:

--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3437,7 +3437,13 @@ namespace __format

   template
friend auto
-   std::make_format_args(_Argz&&...) noexcept;
+#if _GLIBCXX_INLINE_VERSION
+   // Needed for PR c++/59526
+   std::__8::
+#else
+   std::
+#endif
+   make_format_args(_Argz&&...) noexcept;

   // For a sufficiently small number of arguments we only store values.
   // basic_format_args can get the types from the _Args pack.







I can also define _GLIBCXX_STD_V at  level to limit impact.

  libstdc++: [_GLIBCXX_INLINE_VERSION] Fix  friend declarations

  GCC do not consider the inline namespace in friend declarations. We
need
  to explicit this namespace.

  libstdc++-v3/ChangeLog:

  * include/bits/c++config (_GLIBCXX_STD_V): New macro giving
current
  std namespace with optionally the version namespace.
  * include/std/format (std::__format::_Arg_store): Use
latter on friend
  declarations.

Tested under versioned mode.

Ok to commit ?

François


Re: [PATCHSET] Reintroduce targetrustm hooks

2023-09-13 Thread Iain Buclaw via Gcc-patches
Excerpts from Arthur Cohen's message of September 7, 2023 3:41 pm:
> Alright, was not expecting to mess up this patchset so bad so here we go:
> 
> This patchset reintroduces proper targetrustm hooks without the old
> problematic mess of macros we had, which had been removed for the first
> merge of gccrs upstream.
> 
> Tested on x86-64 GNU Linux, and has also been present in our development
> repository for a long time - added by this pull-request from Iain [1]
> which was merged in October 2022.
> 
> Ok for trunk?
> 
> [PATCH 01/14] rust: Add skeleton support and documentation for
> [PATCH 02/14] rust: Reintroduce TARGET_RUST_CPU_INFO hook
> [PATCH 03/14] rust: Reintroduce TARGET_RUST_OS_INFO hook
> [PATCH 04/14] rust: Implement TARGET_RUST_CPU_INFO for i[34567]86-*-*
> [PATCH 05/14] rust: Implement TARGET_RUST_OS_INFO for *-*-darwin*
> [PATCH 06/14] rust: Implement TARGET_RUST_OS_INFO for *-*-freebsd*
> [PATCH 07/14] rust: Implement TARGET_RUST_OS_INFO for *-*-netbsd*
> [PATCH 08/14] rust: Implement TARGET_RUST_OS_INFO for *-*-openbsd*
> [PATCH 09/14] rust: Implement TARGET_RUST_OS_INFO for *-*-solaris2*.
> [PATCH 10/14] rust: Implement TARGET_RUST_OS_INFO for *-*-dragonfly*
> [PATCH 11/14] rust: Implement TARGET_RUST_OS_INFO for *-*-vxworks*
> [PATCH 12/14] rust: Implement TARGET_RUST_OS_INFO for *-*-fuchsia*.
> [PATCH 13/14] rust: Implement TARGET_RUST_OS_INFO for
> [PATCH 14/14] rust: Implement TARGET_RUST_OS_INFO for *-*-*linux*.
> 

Thanks for eventually getting round to this.

As the co-author of this patch series, I'm not going to look at it.

FWIW, these being Rust-specific target changes isolated to just
Rust-specific files, you should have the automony to commit without
needing any request for review - at least this is my understanding when
have made D-specific target changes in the past that have not touched
common back-end headers.

I'll let someone else confirm and check over the shared parts touched by
the patch however.

For reviewers, this is pretty much a mirror of the D front-end's CPU and
OS-specific target hooks (D has built-in version identifiers, not
built-in attributes, but both Rust and D are otherwise the same in the
kind of information exposed by them).

> [1]: https://github.com/Rust-GCC/gccrs/pull/1543
> 

The other GitHub pull request that added these is here.

https://github.com/Rust-GCC/gccrs/pull/1596

Regards,
Iain.


Re: [PATCH] Allow target attributes in non-gnu namespaces

2023-09-13 Thread Iain Buclaw via Gcc-patches
Excerpts from Richard Sandiford via Gcc-patches's message of September 8, 2023 
6:29 pm:
> Currently there are four static sources of attributes:
> 
> - LANG_HOOKS_ATTRIBUTE_TABLE
> - LANG_HOOKS_COMMON_ATTRIBUTE_TABLE
> - LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE
> - TARGET_ATTRIBUTE_TABLE
> 
> All of the attributes in these tables go in the "gnu" namespace.
> This means that they can use the traditional GNU __attribute__((...))
> syntax and the standard [[gnu::...]] syntax.
> 
> Standard attributes are registered dynamically with a null namespace.
> There are no supported attributes in other namespaces (clang, vendor
> namespaces, etc.).
> 
> This patch tries to generalise things by making the namespace
> part of the attribute specification.
> 
> It's usual for multiple attributes to be defined in the same namespace,
> so rather than adding the namespace to each individual definition,
> it seemed better to group attributes in the same namespace together.
> This would also allow us to reuse the same table for clang attributes
> that are written with the GNU syntax, or other similar situations
> where the attribute can be accessed via multiple "spellings".
> 
> The patch therefore adds a scoped_attribute_specs that contains
> a namespace and a list of attributes in that namespace.
> 

Changes to the D front-end in this patch look reasonable to me.

Regards,
Iain.

> 
> 
> gcc/d/
>   * d-tree.h (d_langhook_attribute_table): Replace with...
>   (d_langhook_gnu_attribute_table): ...this.
>   (d_langhook_common_attribute_table): Change type to
>   scoped_attribute_specs.
>   * d-attribs.cc (d_langhook_common_attribute_table): Change type to
>   scoped_attribute_specs, using...
>   (d_langhook_common_attributes): ...this as the underlying array.
>   (d_langhook_attribute_table): Replace with...
>   (d_langhook_gnu_attributes, d_langhook_gnu_attribute_table): ...these
>   new globals.
>   (uda_attribute_p): Update accordingly, and update for new
>   targetm.attribute_table type.
>   * d-lang.cc (d_langhook_attribute_table): New global.
>   (LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Delete.
> 
> ---
>  gcc/d/d-attribs.cc  |  35 ++---
>  gcc/d/d-lang.cc |   8 +-
>  gcc/d/d-tree.h  |   4 +-
> 
> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> index cc46220ddc2..78215bc88bc 100644
> --- a/gcc/d/d-attribs.cc
> +++ b/gcc/d/d-attribs.cc
> @@ -162,7 +162,7 @@ extern const struct attribute_spec::exclusions 
> attr_cold_hot_exclusions[] =
>  
>  /* Table of machine-independent attributes.
> For internal use (marking of built-ins) only.  */
> -const attribute_spec d_langhook_common_attribute_table[] =
> +static const attribute_spec d_langhook_common_attributes[] =
>  {
>ATTR_SPEC ("noreturn", 0, 0, true, false, false, false,
>handle_noreturn_attribute, attr_noreturn_exclusions),
> @@ -190,11 +190,15 @@ const attribute_spec 
> d_langhook_common_attribute_table[] =
>handle_fnspec_attribute, NULL),
>ATTR_SPEC ("omp declare simd", 0, -1, true,  false, false, false,
>handle_omp_declare_simd_attribute, NULL),
> -  ATTR_SPEC (NULL, 0, 0, false, false, false, false, NULL, NULL),
> +};
> +
> +const scoped_attribute_specs d_langhook_common_attribute_table =
> +{
> +  "gnu", d_langhook_common_attributes
>  };
>  
>  /* Table of D language attributes exposed by `gcc.attribute' UDAs.  */
> -const attribute_spec d_langhook_attribute_table[] =
> +static const attribute_spec d_langhook_gnu_attributes[] =
>  {
>ATTR_SPEC ("noinline", 0, 0, true, false, false, false,
>d_handle_noinline_attribute, attr_noinline_exclusions),
> @@ -238,9 +242,12 @@ const attribute_spec d_langhook_attribute_table[] =
>d_handle_used_attribute, NULL),
>ATTR_SPEC ("visibility", 1, 1, false, false, false, false,
>d_handle_visibility_attribute, NULL),
> -  ATTR_SPEC (NULL, 0, 0, false, false, false, false, NULL, NULL),
>  };
>  
> +const scoped_attribute_specs d_langhook_gnu_attribute_table =
> +{
> +  "gnu", d_langhook_gnu_attributes
> +};
>  
>  /* Insert the type attribute ATTRNAME with value VALUE into TYPE.
> Returns a new variant of the original type declaration.  */
> @@ -283,20 +290,14 @@ uda_attribute_p (const char *name)
>  
>/* Search both our language, and target attribute tables.
>   Common and format attributes are kept internal.  */
> -  for (const attribute_spec *p = d_langhook_attribute_table; p->name; p++)
> -{
> -  if (get_identifier (p->name) == ident)
> - return true;
> -}
> +  for (const attribute_spec  : d_langhook_gnu_attributes)
> +if (get_identifier (p.name) == ident)
> +  return true;
>  
> -  if (targetm.attribute_table)
> -{
> -  for (const attribute_spec *p = targetm.attribute_table; p->name; p++)
> - {
> -   if (get_identifier (p->name) == ident)
> - return true;
> - }
> -}
> 

Re: [PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-09-13 Thread Fangrui Song via Gcc-patches
On Tue, Aug 22, 2023 at 12:19 AM Fangrui Song  wrote:
>
> On Tue, Aug 1, 2023 at 12:51 PM Fangrui Song  wrote:
> >
> > When using -mcmodel=medium, large data objects larger than the
> > -mlarge-data-threshold threshold are placed into large data sections
> > (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
> > .l* sections into separate output sections.  If small and medium code
> > model object files are mixed, the .l* sections won't exert relocation
> > overflow pressure on sections in object files built with -mcmodel=small.
> >
> > However, when using -mcmodel=large, -mlarge-data-threshold doesn't
> > apply.  This means that the .rodata/.data/.bss sections may exert
> > relocation overflow pressure on sections in -mcmodel=small object files.
> >
> > This patch allows -mcmodel=large to generate .l* sections and drops an
> > unneeded documentation restriction that the value must be the same.
> >
> > Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> > ("Large data sections for the large code model")
> >
> > Signed-off-by: Fangrui Song 
> >
> > ---
> > Changes from v1 
> > (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
> > * Clarify commit message. Add link to 
> > https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> >
> > Changes from v2
> > * Drop an uneeded limitation in the documentation.
> >
> > Changes from v3
> > * Change scan-assembler directives to use \. to match literal .
> > ---
> >  gcc/config/i386/i386.cc| 15 +--
> >  gcc/config/i386/i386.opt   |  2 +-
> >  gcc/doc/invoke.texi|  6 +++---
> >  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
> >  4 files changed, 26 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
> >
> > [...]
>
> Ping:)

Ping:) https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625993.html

(I don't have write access to gcc.)


-- 
宋方睿


[PATCH] c++: optimize unification of class specializations [PR89231]

2023-09-13 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Since the LHS of a qualified-id is a non-deduced context, it effectively
means we can't deduce from outer template arguments of a class template
specialization.  And checking for equality between the TI_TEMPLATE of a
class specialization parm/arg already implies that the outer template
arguments are the same.  Hence recursing into outer template arguments
during unification of class specializations is redundant, so this patch
makes unify recurse only into innermost arguments.

This incidentally fixes the testcase from PR89231 because there
more_specialized_partial_inst considers the two partial specializations
to be unordered ultimately because unify for identical
parm=arg=A::Collect gets confused when it recurses into
parm=arg={Ps...} since the level of Ps doesn't match the innermost level
of tparms that we're actually deducing.

PR c++/89231

gcc/cp/ChangeLog:

* pt.cc (try_class_unification): Strengthen TI_TEMPLATE equality
test by not calling most_general_template.  Only unify the
innermost levels of template arguments.
(unify) : Only unify the innermost levels of
template arguments.  Don't unify template arguments if the
template is not primary.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic-partial3.C: New test.
---
 gcc/cp/pt.cc  | 17 +++--
 .../g++.dg/cpp0x/variadic-partial3.C  | 19 +++
 2 files changed, 30 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 838179d5fe3..c88e9cd0fa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23999,8 +23999,7 @@ try_class_unification (tree tparms, tree targs, tree 
parm, tree arg,
 return NULL_TREE;
   else if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM)
 /* Matches anything.  */;
-  else if (most_general_template (CLASSTYPE_TI_TEMPLATE (arg))
-  != most_general_template (CLASSTYPE_TI_TEMPLATE (parm)))
+  else if (CLASSTYPE_TI_TEMPLATE (arg) != CLASSTYPE_TI_TEMPLATE (parm))
 return NULL_TREE;
 
   /* We need to make a new template argument vector for the call to
@@ -24041,8 +24040,10 @@ try_class_unification (tree tparms, tree targs, tree 
parm, tree arg,
   if (TREE_CODE (parm) == BOUND_TEMPLATE_TEMPLATE_PARM)
 err = unify_bound_ttp_args (tparms, targs, parm, arg, explain_p);
   else
-err = unify (tparms, targs, CLASSTYPE_TI_ARGS (parm),
-CLASSTYPE_TI_ARGS (arg), UNIFY_ALLOW_NONE, explain_p);
+err = unify (tparms, targs,
+INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
+INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (arg)),
+UNIFY_ALLOW_NONE, explain_p);
 
   return err ? NULL_TREE : arg;
 }
@@ -25167,11 +25168,15 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
/* There's no chance of unification succeeding.  */
return unify_type_mismatch (explain_p, parm, arg);
 
- return unify (tparms, targs, CLASSTYPE_TI_ARGS (parm),
-   CLASSTYPE_TI_ARGS (t), UNIFY_ALLOW_NONE, explain_p);
+ if (PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (t)))
+   return unify (tparms, targs,
+ INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (parm)),
+ INNERMOST_TEMPLATE_ARGS (CLASSTYPE_TI_ARGS (t)),
+ UNIFY_ALLOW_NONE, explain_p);
}
   else if (!same_type_ignoring_top_level_qualifiers_p (parm, arg))
return unify_type_mismatch (explain_p, parm, arg);
+
   return unify_success (explain_p);
 
 case METHOD_TYPE:
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C
new file mode 100644
index 000..5af60711320
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-partial3.C
@@ -0,0 +1,19 @@
+// PR c++/89231
+// { dg-do compile { target c++11 } }
+
+template
+struct A {
+  template
+  struct Collect { };
+
+  template>
+  struct Seq;
+
+  template
+  struct Seq> : Seq> { };
+
+  template
+  struct Seq<0, I, Collect> : Collect { };
+};
+
+A::Seq<4> test;
-- 
2.42.0.158.g94e83dcf5b



[PATCH] c++: unifying identical tmpls from current inst [PR108347]

2023-09-13 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Here more_specialized_partial_spec considers the two partial
specializations to be unordered ultimately because unify for identical
parm=arg=A::C returns failure due to C being dependent.

This patch fixes this by relaxing unify's early-exit identity test to
also accept dependent decls; we can't deduce anything further from them
anyway.  In passing this patch removes the CONST_DECL case of unify:
we should never see the CONST_DECL version of a template parameter here,
and for other CONST_DECLs (such as enumerators) it seems we can rely on
them already having been folded to their DECL_INITIAL.

PR c++/108347

gcc/cp/ChangeLog:

* pt.cc (unify): Return unify_success for identical dependent
DECL_P 'arg' and 'parm'.
: Remove handling.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp41.C: New test.
---
 gcc/cp/pt.cc  | 10 ++
 gcc/testsuite/g++.dg/template/ttp41.C | 17 +
 2 files changed, 19 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp41.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 838179d5fe3..c311a6b88f5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24568,7 +24568,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
  even if ARG == PARM, since we won't record unifications for the
  template parameters.  We might need them if we're trying to
  figure out which of two things is more specialized.  */
-  if (arg == parm && !uses_template_parms (parm))
+  if (arg == parm
+  && (DECL_P (parm) || !uses_template_parms (parm)))
 return unify_success (explain_p);
 
   /* Handle init lists early, so the rest of the function can assume
@@ -25286,13 +25287,6 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   return unify (tparms, targs, TREE_TYPE (parm), TREE_TYPE (arg),
strict, explain_p);
 
-case CONST_DECL:
-  if (DECL_TEMPLATE_PARM_P (parm))
-   return unify (tparms, targs, DECL_INITIAL (parm), arg, strict, 
explain_p);
-  if (arg != scalar_constant_value (parm))
-   return unify_template_argument_mismatch (explain_p, parm, arg);
-  return unify_success (explain_p);
-
 case FIELD_DECL:
 case TEMPLATE_DECL:
   /* Matched cases are handled by the ARG == PARM test above.  */
diff --git a/gcc/testsuite/g++.dg/template/ttp41.C 
b/gcc/testsuite/g++.dg/template/ttp41.C
new file mode 100644
index 000..c81e5dd2405
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp41.C
@@ -0,0 +1,17 @@
+// PR c++/108347
+
+template
+struct A {
+  template struct C { };
+
+  template class TT, class U>
+  struct B;
+
+  template
+  struct B;
+
+  template
+  struct B { };
+};
+
+A::B::C, const int*> b;
-- 
2.42.0.158.g94e83dcf5b



Re: [PATCH] AArch64: List official cores before codenames

2023-09-13 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> List official cores first so that -cpu=native does not show a codename with -v
> or in errors/warnings.

Nice spot.

> Passes regress, OK for commit?
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-cores.def (neoverse-n1): Place before ares.
> (neoverse-v1): Place before zeus.
> (neoverse-v2): Place before demeter.
> * config/aarch64/aarch64-tune.md: Regenerate.

OK, thanks.  OK for backports too from my POV.

Richard

> ---
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 
> dbac497ef3aab410eb81db185b2e9532186888bb..3894f2afc27e71523e5a413fa45c144222082934
>  100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -115,8 +115,8 @@ AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  
> (F16, RCPC, DOTPROD, S
>  AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
>  AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
>  AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
> -AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
> DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
> +AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
> PROFILE), neoversen1, 0x41, 0xd0c, -1)
>  AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
> DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
>
>  /* Cavium ('C') cores. */
> @@ -143,8 +143,8 @@ AARCH64_CORE("thunderx3t110",  thunderx3t110,  
> thunderx3t110, V8_3A,  (CRYPTO, S
>  /* ARMv8.4-A Architecture Processors.  */
>
>  /* Arm ('A') cores.  */
> -AARCH64_CORE("zeus", zeus, cortexa57, V8_4A,  (SVE, I8MM, BF16, PROFILE, 
> SSBS, RNG), neoversev1, 0x41, 0xd40, -1)
>  AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, V8_4A,  (SVE, I8MM, BF16, 
> PROFILE, SSBS, RNG), neoversev1, 0x41, 0xd40, -1)
> +AARCH64_CORE("zeus", zeus, cortexa57, V8_4A,  (SVE, I8MM, BF16, PROFILE, 
> SSBS, RNG), neoversev1, 0x41, 0xd40, -1)
>  AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, V8_4A,  (SVE, 
> I8MM, BF16, PROFILE, SSBS, RNG), neoverse512tvb, INVALID_IMP, INVALID_CORE, 
> -1)
>
>  /* Qualcomm ('Q') cores. */
> @@ -182,7 +182,7 @@ AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  
> (SVE2_BITPERM, MEMTAG, I8M
>
>  AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
>
> -AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
> RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
>  AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
> SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
> +AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
> RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
>
>  #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 
> 2170980dddb0d5d410a49631ad26ff2e346b39dd..69e5357fa814e4733b05f7164bfa11e4aa04
>  100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -   
> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
> +   
> 

Re: gcc-patches From rewriting mailman settings (Was: [Linaro-TCWG-CI] gcc patch #75674: FAIL: 68 regressions)

2023-09-13 Thread Iain Sandoe
Hi Mark,

> On 12 Sep 2023, at 16:00, Mark Wielaard  wrote:

> Adding Jeff to CC who is the official gcc-patches mailinglist admin.
> 
> On Tue, 2023-09-12 at 11:08 +0400, Maxim Kuvyrkov wrote:
>> Normally, notifications from Linaro TCWG precommit CI are sent only to
>> patch author and patch submitter.  In this case the sender was rewritten
>> to "Benjamin Priour via Gcc-patches ",
>> which was detected by Patchwork [1] as patch submitter.
> 
> BTW. Really looking forward to your talk at Cauldron about this!
> 
>> Is "From:" re-write on gcc-patches@ mailing list a side-effect of [2]?
>> I see that some, but not all messages to gcc-patches@ have their
>> "From:" re-written.
>> 
>> Also, do you know if re-write of "From:" on gcc-patches@ is expected?
> 
> Yes, it is expected for emails that come from domains with a dmarc
> policy. That is because the current settings of the gcc-patches
> mailinglist might slightly alter the message or headers in a way that
> invalidates the DKIM signature. Without From rewriting those messages
> would be bounced by recipients that check the dmarc policy/dkim
> signature.
> 
> As you noticed the glibc hackers have recently worked together with the
> sourceware overseers to upgrade mailman and alter the postfix and the
> libc-alpha mailinglist setting so it doesn't require From rewriting
> anymore (the message and header aren't altered anymore to invalidate
> the DKIM signatures).
> 
> We (Jeff or anyone else with mailman admin privs) could use the same
> settings for gcc-patches. The settings that need to be set are in that
> bug:
> 
> - subject_prefix (general): (empty)
> - from_is_list (general): No
> - anonymous_list (general): No
> - first_strip_reply_to (general): No
> - reply_goes_to_list (general): Poster
> - reply_to_address (general): (empty)
> - include_sender_header (general): No
> - drop_cc (general): No
> - msg_header (nondigest): (empty)
> - msg_footer (nondigest): (empty)
> - scrub_nondigest (nondigest): No
> - dmarc_moderation_action (privacy): Accept
> - filter_content (contentfilter): No
> 
> The only visible change (apart from no more From rewriting) is that
> HTML multi-parts aren't scrubbed anymore (that would be a message
> altering issue). The html part is still scrubbed from the
> inbox.sourceware.org archive, so b4 works just fine. But I don't know
> what patchwork.sourceware.org does with HTML attachements. Of course
> people really shouldn't sent HTML attachments to gcc-patches, so maybe
> this is no real problem.
> 
> Let me know if you want Jeff (or me or one of the other overseers) make
> the above changes to the gcc-patches mailman settings.

yes, please!
Iain

> 
> Cheers,
> 
> Mark
> 
>> [1] https://patchwork.sourceware.org/project/gcc/list/
>> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=29713
> 



Re: [PATCH 2/2] libstdc++: Add dg-require-thread-fence in several tests

2023-09-13 Thread Jonathan Wakely via Gcc-patches
On Wed, 13 Sept 2023 at 16:38, Christophe Lyon via Libstdc++
 wrote:
>
> On Tue, 12 Sept 2023 at 11:07, Jonathan Wakely  wrote:
>
> > On Tue, 12 Sept 2023 at 08:59, Christophe Lyon
> >  wrote:
> > > I've noticed several undefined references to
> > __glibcxx_backtrace_create_state too
> > > 19_diagnostics/stacktrace/current.cc
> > > 19_diagnostics/stacktrace/entry.cc
> > > 19_diagnostics/stacktrace/stacktrace.cc
> >
> > Odd. These were changed in r14-3812-gb96b554592c5cb to link to
> > libstdc++exp.a instead of libstdc++_libbacktrace.a, and
> > __glibcxx_backtrace_create_state should be part of libstdc++exp.a now.
> > If the target doesn't support libbacktrace then the symbols will be
> > missing from libstdc++exp.a, but then the test should fail to match
> > the effective target "stacktrace".
> >
> > Strange, it looks like these libs were not correctly rebuilt after I
> rebased to have your patches.
> I've rebuilt from scratch and these undefined references are not present
> indeed.

Great! Thanks for checking.

I sent a proposed patch that should remove most of the other
unnecessary uses of atomics, as suggested yesterday:
https://patchwork.sourceware.org/project/gcc/patch/20230913123226.2083892-1-jwak...@redhat.com/
Would you be able to check whether that improves the results further for arm4t?
I think with that, you should only need the dg-require-thread-fence
for the 9 or so tests under 29_atomics/ which really do require atomic
synchronization.


Re: [PATCH 2/2] libstdc++: Add dg-require-thread-fence in several tests

2023-09-13 Thread Christophe Lyon via Gcc-patches
On Tue, 12 Sept 2023 at 11:07, Jonathan Wakely  wrote:

> On Tue, 12 Sept 2023 at 08:59, Christophe Lyon
>  wrote:
> >
> >
> >
> > On Mon, 11 Sept 2023 at 18:11, Jonathan Wakely 
> wrote:
> >>
> >> On Mon, 11 Sept 2023 at 16:40, Christophe Lyon
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Mon, 11 Sept 2023 at 17:22, Jonathan Wakely 
> wrote:
> >> >>
> >> >> On Mon, 11 Sept 2023 at 14:57, Christophe Lyon
> >> >>  wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, 11 Sept 2023 at 15:12, Jonathan Wakely 
> wrote:
> >> >> >>
> >> >> >> On Mon, 11 Sept 2023 at 13:36, Christophe Lyon
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Mon, 11 Sept 2023 at 12:59, Jonathan Wakely <
> jwak...@redhat.com> wrote:
> >> >> >> >>
> >> >> >> >> On Sun, 10 Sept 2023 at 20:31, Christophe Lyon
> >> >> >> >>  wrote:
> >> >> >> >> >
> >> >> >> >> > Some targets like arm-eabi with newlib and default settings
> rely on
> >> >> >> >> > __sync_synchronize() to ensure synchronization.  Newlib does
> not
> >> >> >> >> > implement it by default, to make users aware they have to
> take special
> >> >> >> >> > care.
> >> >> >> >> >
> >> >> >> >> > This makes a few tests fail to link.
> >> >> >> >>
> >> >> >> >> Does this mean those features are unusable on the target, or
> just that
> >> >> >> >> users need to provide their own __sync_synchronize to use them?
> >> >> >> >
> >> >> >> >
> >> >> >> > IIUC the user is expected to provide them.
> >> >> >> > Looks like we discussed this in the past :-)
> >> >> >> > In
> https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg01632.html,
> >> >> >> > see the pointer to Ramana's comment:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html
> >> >> >>
> >> >> >> Oh yes, thanks for the reminder!
> >> >> >>
> >> >> >> >
> >> >> >> > The default arch for arm-eabi is armv4t which is very old.
> >> >> >> > When running the testsuite with something more recent (either
> as default by configuring GCC --with-arch=XXX or by forcing -march/-mcpu
> via dejagnu's target-board), the compiler generates barrier instructions
> and there are no such errors.
> >> >> >>
> >> >> >> Ah yes, that's fine then.
> >> >> >>
> >> >> >> > For instance, here is a log with the defaults:
> >> >> >> >
> https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-arm_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-arm_eabi
> >> >> >> > and a log when we target cortex-m0 which is still a very small
> cpu but has barriers:
> >> >> >> >
> https://git.linaro.org/toolchain/ci/base-artifacts/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi.git/tree/00-sumfiles?h=linaro-local/ci/tcwg_gnu_embed_check_gcc/master-thumb_m0_eabi
> >> >> >> >
> >> >> >> > I somehow wanted to get rid of such errors with the default
> configuration
> >> >> >>
> >> >> >> Yep, that makes sense, and we'll still be testing them for newer
> >> >> >> arches on the target, so it's not completely disabling those
> parts of
> >> >> >> the testsuite.
> >> >> >>
> >> >> >> But I'm still curious why some of those tests need this change. I
> >> >> >> think the ones I noted below are probably failing for some other
> >> >> >> reasons.
> >> >> >>
> >> >> > Just looked at  23_containers/span/back_assert_neg.cc, the linker
> says it needs
> >> >> > arm-eabi/libstdc++-v3/src/.libs/libstdc++.a(debug.o) to resolve
> >> >> > ./back_assert_neg-back_assert_neg.o
> (std::__glibcxx_assert_fail(char const*, int, char const*, char const*))
> >> >> > and indeed debug.o has a reference to __sync_synchronize
> >> >>
> >> >> Aha, that's just because I put __glibcxx_assert_fail in debug.o, but
> >> >> there are no dependencies on anything else in that file, including
> the
> >> >> _M_detach member function that uses atomics.
> >> >
> >> > indeed
> >> >
> >> >
> >> >>
> >> >> This would also be solved by -Wl,--gc-sections :-)
> >> >
> >> > :-)
> >> >
> >> >>
> >> >> I think it would be better to move __glibcxx_assert_fail to a new
> >> >> file, so that it doesn't make every assertion unnecessarily depend on
> >> >> __sync_synchronize. I'll do that now.
> >> >
> >> > Sounds like a good idea, thanks.
> >>
> >> Done now at r14-3846-g4a2766ed00a479
> >> >
> >> >>
> >> >> We could also make the atomics in debug.o conditional, so that debug
> >> >> mode doesn't depend on __sync_synchronize for single-threaded
> targets.
> >> >> Does the arm4t arch have pthreads support in newlib?  I didn't bother
> >> >
> >> > No ( grep _GLIBCXX_HAS_GTHREADS
> $objdir/arm-eabi/libstdc++-v3/include/arm-eabi/bits/c++config returns:
> >> > /* #undef _GLIBCXX_HAS_GTHREADS */
> >> >
> >> >> making the use of atomics conditional, because performance is not
> >> >> really a priority for debug mode bookkeeping. But the problem here
> >> >> isn't just a slight performance overhead of atomics, it's that they
> >> >> aren't even supported for arm4t.
> >> >
> >> > OK thanks.
> >> >
> >> > So finally, this uncovered at least a "bug" that
> 

Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-09-13 Thread Wilco Dijkstra via Gcc-patches

ping

From: Wilco Dijkstra
Sent: 02 June 2023 18:28
To: GCC Patches 
Cc: Richard Sandiford ; Kyrylo Tkachov 

Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 
[PR110061] 
 

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries, gives better performance than locking atomics and is what
most users expect.

Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not 
supported.
This results in an implicit store which is invisible to software as long as the 
given
address is writeable (which will be true when using atomics in actual code).

A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.

Passes regress, OK for commit?

libatomic/
    PR target/110061
    config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics.
    config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
    State we have lock-free atomics.

---

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 
05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab9341a9567b1cf
 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -22,6 +22,21 @@
    .  */
 
 
+/* AArch64 128-bit lock-free atomic implementation.
+
+   128-bit atomics are now lock-free for all AArch64 architecture versions.
+   This is backwards compatible with existing binaries and gives better
+   performance than locking atomics.
+
+   128-bit atomic loads use a exclusive loop if LSE2 is not supported.
+   This results in an implicit store which is invisible to software as long
+   as the given address is writeable.  Since all other atomics have explicit
+   writes, this will be true when using atomics in actual code.
+
+   The libat__16 entry points are ARMv8.0.
+   The libat__16_i1 entry points are used when LSE2 is available.  */
+
+
 .arch   armv8-a+lse
 
 #define ENTRY(name) \
@@ -37,6 +52,10 @@ name:    \
 .cfi_endproc;   \
 .size name, .-name;
 
+#define ALIAS(alias,name)  \
+   .global alias;  \
+   .set alias, name;
+
 #define res0 x0
 #define res1 x1
 #define in0  x2
@@ -70,6 +89,24 @@ name:    \
 #define SEQ_CST 5
 
 
+ENTRY (libat_load_16)
+   mov x5, x0
+   cbnz    w1, 2f
+
+   /* RELAXED.  */
+1: ldxp    res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 1b
+   ret
+
+   /* ACQUIRE/CONSUME/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 2b
+   ret
+END (libat_load_16)
+
+
 ENTRY (libat_load_16_i1)
 cbnz    w1, 1f
 
@@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)
 END (libat_load_16_i1)
 
 
+ENTRY (libat_store_16)
+   cbnz    w4, 2f
+
+   /* RELAXED.  */
+1: ldxp    xzr, tmp0, [x0]
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+2: ldxp    xzr, tmp0, [x0]
+   stlxp   w4, in0, in1, [x0]
+   cbnz    w4, 2b
+   ret
+END (libat_store_16)
+
+
 ENTRY (libat_store_16_i1)
 cbnz    w4, 1f
 
@@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)
 ret
 
 /* RELEASE/SEQ_CST.  */
-1: ldaxp   xzr, tmp0, [x0]
+1: ldxp    xzr, tmp0, [x0]
 stlxp   w4, in0, in1, [x0]
 cbnz    w4, 1b
 ret
 END (libat_store_16_i1)
 
 
-ENTRY (libat_exchange_16_i1)
+ENTRY (libat_exchange_16)
 mov x5, x0
 cbnz    w4, 2f
 
@@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)
 stxp    w4, in0, in1, [x5]
 cbnz    w4, 3b
 ret
-4:
-   cmp w4, RELEASE
-   b.ne    6f
 
-   /* RELEASE.  */
-5: ldxp    res0, res1, [x5]
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   res0, res1, [x5]
 stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 5b
+   cbnz    w4, 4b
 ret
+END (libat_exchange_16)
 
-   /* ACQ_REL/SEQ_CST.  */
-6: ldaxp   res0, res1, [x5]
-   stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 6b
+
+ENTRY (libat_compare_exchange_16)
+   ldp exp0, exp1, [x1]
+   cbz w4, 3f
+   cmp w4, RELEASE
+   b.hs    4f
+
+   /* ACQUIRE/CONSUME.  */
+1: ldaxp   tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2f
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   mov x0, 1
 ret
-END (libat_exchange_16_i1)
+
+2: stp tmp0, tmp1, [x1]
+   mov x0, 0
+   ret
+
+   /* RELAXED.  */
+3: ldxp    tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2b
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 3b
+   mov x0, 1
+   ret
+
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   tmp0, tmp1, 

Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-09-13 Thread Wilco Dijkstra via Gcc-patches

ping


From: Wilco Dijkstra
Sent: 04 August 2023 16:05
To: GCC Patches ; Richard Sandiford 

Cc: Kyrylo Tkachov 
Subject: [PATCH] libatomic: Improve ifunc selection on AArch64 
 

Add support for ifunc selection based on CPUID register.  Neoverse N1 supports
atomic 128-bit load/store, so use the FEAT_USCAT ifunc like newer Neoverse
cores.

Passes regress, OK for commit?

libatomic/
    config/linux/aarch64/host-config.h (ifunc1): Use CPUID in ifunc
    selection.

---

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 
851c78c01cd643318aaa52929ce4550266238b79..e5dc33c030a4bab927874fa6c69425db463fdc4b
 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -26,7 +26,7 @@
 
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1 (hwcap & HWCAP_USCAT)
+#  define IFUNC_COND_1 ifunc1 (hwcap)
 # else
 #  define IFUNC_COND_1  (hwcap & HWCAP_ATOMICS)
 # endif
@@ -50,4 +50,28 @@
 #undef MAYBE_HAVE_ATOMIC_EXCHANGE_16
 #define MAYBE_HAVE_ATOMIC_EXCHANGE_16   1
 
+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+    return true;
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+    return true;
+
+  return false;
+}
+#endif
+
 #include_next 


[PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-09-13 Thread Wilco Dijkstra via Gcc-patches

__sync_val_compare_and_swap may be used on 128-bit types and either calls the
outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic if
the value is stored successfully using STXP, but the current implementations
do not perform the store if the comparison fails.  In this case the value 
returned
is not read atomically.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
PR target/111404
* config/aarch64/aarch64.cc (aarch64_split_compare_and_swap):
For 128-bit store the loaded value and loop if needed.

libgcc/ChangeLog/
PR target/111404
* config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using
either new value or loaded value.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
5e8d0a0c91bc7719de2a8c5627b354cf905a4db0..c44c0b979d0cc3755c61dcf566cfddedccebf1ea
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23413,11 +23413,11 @@ aarch64_split_compare_and_swap (rtx operands[])
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
-  is_weak = (operands[4] != const0_rtx);
   model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
   model = memmodel_from_int (INTVAL (model_rtx));
+  is_weak = operands[4] != const0_rtx && mode != TImode;
 
   /* When OLDVAL is zero and we want the strong version we can emit a tighter
 loop:
@@ -23478,6 +23478,33 @@ aarch64_split_compare_and_swap (rtx operands[])
   else
 aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 
+  /* 128-bit LDAXP is not atomic unless STLXP succeeds.  So for a mismatch,
+ store the returned value and loop if the STLXP fails.  */
+  if (mode == TImode)
+{
+  rtx_code_label *label3 = gen_label_rtx ();
+  emit_jump_insn (gen_rtx_SET (pc_rtx, gen_rtx_LABEL_REF (Pmode, label3)));
+  emit_barrier ();
+
+  emit_label (label2);
+  aarch64_emit_store_exclusive (mode, scratch, mem, rval, model_rtx);
+
+  if (aarch64_track_speculation)
+   {
+ /* Emit an explicit compare instruction, so that we can correctly
+track the condition codes.  */
+ rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
+ x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+   }
+  else
+   x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+   gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
+  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  label2 = label3;
+}
+
   emit_label (label2);
 
   /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index 
dde3a28e07b13669533dfc5e8fac0a9a6ac33dbd..ba05047ff02b6fc5752235bffa924fc4a2f48c04
 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -160,6 +160,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define tmp0   16
 #define tmp1   17
 #define tmp2   15
+#define tmp3   14
+#define tmp4   13
 
 #define BTI_C  hint34
 
@@ -233,10 +235,11 @@ STARTFN   NAME(cas)
 0: LDXPx0, x1, [x4]
cmp x0, x(tmp0)
ccmpx1, x(tmp1), #0, eq
-   bne 1f
-   STXPw(tmp2), x2, x3, [x4]
-   cbnzw(tmp2), 0b
-1: BARRIER
+   cselx(tmp2), x2, x0, eq
+   cselx(tmp3), x3, x1, eq
+   STXPw(tmp4), x(tmp2), x(tmp3), [x4]
+   cbnzw(tmp4), 0b
+   BARRIER
ret
 
 #endif



[AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen

2023-09-13 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression:
FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
ins\\tv0.s\\[1\\], v1.s\\[0\\] 3

This happens because for the following function from vect_copy_lane_1.c:
float32x2_t
__attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
float32x2_t b)
{
  return vcopy_lane_f32 (a, 1, b, 0);
}

Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
it got lowered to following sequence in .optimized dump:
   [local count: 1073741824]:
  _4 = BIT_FIELD_REF ;
  __a_5 = BIT_INSERT_EXPR ;
  return __a_5;

The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
to vector permutation and now thus gets lowered to:

   [local count: 1073741824]:
  __a_4 = VEC_PERM_EXPR ;
  return __a_4;

Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
in aarch64_expand_vec_perm_const_1, it now generates:

test_copy_lane_f32:
zip1v0.2s, v0.2s, v1.2s
ret

Similarly for test_copy_lane_[us]32.
The attached patch adjusts the tests to reflect the change in code-gen
and the tests pass.
OK to commit ?

Thanks,
Prathamesh
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c 
b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
index 2848be564d5..811dc678b92 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
@@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
 BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
 BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
 BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
-/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } 
} */
+/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
 BUILD_TEST (int64x1_t,   int64x1_t,   , , s64, 0, 0)
 BUILD_TEST (uint64x1_t,  uint64x1_t,  , , u64, 0, 0)
 BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)


[PATCH] AArch64: List official cores before codenames

2023-09-13 Thread Wilco Dijkstra via Gcc-patches
List official cores first so that -cpu=native does not show a codename with -v
or in errors/warnings.

Passes regress, OK for commit?

gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (neoverse-n1): Place before ares.
(neoverse-v1): Place before zeus.
(neoverse-v2): Place before demeter.
* config/aarch64/aarch64-tune.md: Regenerate.

---

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
dbac497ef3aab410eb81db185b2e9532186888bb..3894f2afc27e71523e5a413fa45c144222082934
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -115,8 +115,8 @@ AARCH64_CORE("cortex-a65",  cortexa65, cortexa53, V8_2A,  
(F16, RCPC, DOTPROD, S
 AARCH64_CORE("cortex-a65ae",  cortexa65ae, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd43, -1)
 AARCH64_CORE("cortex-x1",  cortexx1, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE), neoversen1, 0x41, 0xd44, -1)
 AARCH64_CORE("cortex-x1c",  cortexx1c, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, 
SSBS, PROFILE, PAUTH), neoversen1, 0x41, 0xd4c, -1)
-AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-n1",  neoversen1, cortexa57, V8_2A,  (F16, RCPC, 
DOTPROD, PROFILE), neoversen1, 0x41, 0xd0c, -1)
+AARCH64_CORE("ares",  ares, cortexa57, V8_2A,  (F16, RCPC, DOTPROD, PROFILE), 
neoversen1, 0x41, 0xd0c, -1)
 AARCH64_CORE("neoverse-e1",  neoversee1, cortexa53, V8_2A,  (F16, RCPC, 
DOTPROD, SSBS), cortexa73, 0x41, 0xd4a, -1)
 
 /* Cavium ('C') cores. */
@@ -143,8 +143,8 @@ AARCH64_CORE("thunderx3t110",  thunderx3t110,  
thunderx3t110, V8_3A,  (CRYPTO, S
 /* ARMv8.4-A Architecture Processors.  */
 
 /* Arm ('A') cores.  */
-AARCH64_CORE("zeus", zeus, cortexa57, V8_4A,  (SVE, I8MM, BF16, PROFILE, SSBS, 
RNG), neoversev1, 0x41, 0xd40, -1)
 AARCH64_CORE("neoverse-v1", neoversev1, cortexa57, V8_4A,  (SVE, I8MM, BF16, 
PROFILE, SSBS, RNG), neoversev1, 0x41, 0xd40, -1)
+AARCH64_CORE("zeus", zeus, cortexa57, V8_4A,  (SVE, I8MM, BF16, PROFILE, SSBS, 
RNG), neoversev1, 0x41, 0xd40, -1)
 AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, V8_4A,  (SVE, I8MM, 
BF16, PROFILE, SSBS, RNG), neoverse512tvb, INVALID_IMP, INVALID_CORE, -1)
 
 /* Qualcomm ('Q') cores. */
@@ -182,7 +182,7 @@ AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
 
-AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
+AARCH64_CORE("demeter", demeter, cortexa57, V9A, (I8MM, BF16, SVE2_BITPERM, 
RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
 
 #undef AARCH64_CORE
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
2170980dddb0d5d410a49631ad26ff2e346b39dd..69e5357fa814e4733b05f7164bfa11e4aa04
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,ares,neoversen1,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,zeus,neoversev1,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexx2,cortexx3,neoversen2,demeter,neoversev2"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexx2,cortexx3,neoversen2,neoversev2,demeter"
(const (symbol_ref "((enum 

Re: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-13 Thread juzhe.zh...@rivai.ai
Just realize this patch cause some unexpected ICE FAILs in GCC regression.

Now, V2: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630194.html 
has fully passed the regression.




juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-09-13 21:01
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]
This patch fixes PR111391: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391
 
PR target/111391
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS to scalar move.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr111391.c: New test.
 
---
gcc/config/riscv/riscv.cc | 29 +++
.../riscv/rvv/autovec/partial/slp-9.c |  1 -
.../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 ++
3 files changed, 57 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d04ddd69e0..b7daad7cbb5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2513,6 +2513,35 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+  /* Expand
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Expand this data movement instead of simply forbid it since
+ we can improve the code generation for this following scenario
+ by RVV auto-vectorization:
+   (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI))
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Since RVV mode and scalar mode are in different REG_CLASS,
+ we need to explicitly move data from V_REGS to GR_REGS by scalar move.  */
+  if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src
+{
+  rtx subreg = force_reg (GET_MODE (SUBREG_REG (src)), SUBREG_REG (src));
+  machine_mode imode = GET_MODE_INNER (GET_MODE (subreg));
+  unsigned int ratio = GET_MODE_SIZE (mode).to_constant ()
+/ GET_MODE_SIZE (imode).to_constant ();
+  poly_int64 nunits = GET_MODE_NUNITS (GET_MODE (subreg));
+  nunits = exact_div (nunits, ratio);
+  scalar_mode smode = as_a (mode);
+  machine_mode vmode
+ = riscv_vector::get_vector_mode (smode, nunits).require ();
+  rtx tmp = gen_reg_rtx (mode);
+  rtx index
+ = gen_int_mode (exact_div (SUBREG_BYTE (src), GET_MODE_SIZE (smode)),
+ Pmode);
+  emit_insn (gen_vec_extract (vmode, vmode, tmp,
+   gen_lowpart (vmode, subreg), index));
+  emit_move_insn (dest, tmp);
+  return true;
+}
   /* Expand
(set (reg:QI target) (mem:QI (address)))
  to
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
index 5fba27c7a35..7c42438c9d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
@@ -29,4 +29,3 @@
TEST_ALL (VEC_PERM)
/* { dg-final { scan-assembler-times {viota.m} 2 } } */
-/* { dg-final { scan-assembler-not {vmv\.v\.i} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
new file mode 100644
index 000..a7f64c937c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -Wno-int-conversion 
-Wno-implicit-function -Wno-incompatible-pointer-types 
-Wno-implicit-function-declaration -Ofast -ftree-vectorize" } */
+
+int d ();
+typedef struct
+{
+  int b;
+} c;
+int
+e (char *f, long g)
+{
+  f += g;
+  while (g--)
+*--f = d;
+}
+
+int
+d (c * f)
+{
+  while (h ())
+switch (f->b)
+  case 'Q':
+  {
+ long a;
+ e (, sizeof (a));
+ i (a);
+  }
+}
-- 
2.36.3
 


[PATCH V2] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-13 Thread Juzhe-Zhong
This patch fixes PR111391: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391

PR target/111391

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS to scalar 
move.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr111391.c: New test.

---
 gcc/config/riscv/riscv.cc | 29 +++
 .../riscv/rvv/autovec/partial/slp-9.c |  1 -
 .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 ++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d04ddd69e0..becaec9ebb6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2513,6 +2513,35 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+  /* Expand
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Expand this data movement instead of simply forbid it since
+ we can improve the code generation for this following scenario
+ by RVV auto-vectorization:
+   (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI))
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Since RVV mode and scalar mode are in different REG_CLASS,
+ we need to explicitly move data from V_REGS to GR_REGS by scalar move.  */
+  if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src
+{
+  rtx reg = SUBREG_REG (src);
+  machine_mode rmode = GET_MODE (reg);
+  if (!register_operand (reg, rmode))
+   reg = force_reg (rmode, SUBREG_REG (src));
+  unsigned int nunits = GET_MODE_SIZE (rmode).to_constant ()
+   / GET_MODE_SIZE (mode).to_constant ();
+  machine_mode vmode
+   = riscv_vector::get_vector_mode (as_a (mode), nunits)
+   .require ();
+  rtx tmp = gen_reg_rtx (mode);
+  rtx index = gen_int_mode (SUBREG_BYTE (src).to_constant ()
+ / GET_MODE_SIZE (mode).to_constant (),
+   Pmode);
+  emit_insn (
+   gen_vec_extract (vmode, vmode, tmp, gen_lowpart (vmode, reg), index));
+  emit_move_insn (dest, tmp);
+  return true;
+}
   /* Expand
(set (reg:QI target) (mem:QI (address)))
  to
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
index 5fba27c7a35..7c42438c9d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
@@ -29,4 +29,3 @@
 TEST_ALL (VEC_PERM)
 
 /* { dg-final { scan-assembler-times {viota.m} 2 } } */
-/* { dg-final { scan-assembler-not {vmv\.v\.i} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
new file mode 100644
index 000..a7f64c937c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -Wno-int-conversion 
-Wno-implicit-function -Wno-incompatible-pointer-types 
-Wno-implicit-function-declaration -Ofast -ftree-vectorize" } */
+
+int d ();
+typedef struct
+{
+  int b;
+} c;
+int
+e (char *f, long g)
+{
+  f += g;
+  while (g--)
+*--f = d;
+}
+
+int
+d (c * f)
+{
+  while (h ())
+switch (f->b)
+  case 'Q':
+  {
+   long a;
+   e (, sizeof (a));
+   i (a);
+  }
+}
-- 
2.36.3



Re: [PATCH] Checking undefined_p before using the vr

2023-09-13 Thread Andrew MacLeod via Gcc-patches



On 9/12/23 21:42, Jiufu Guo wrote:

Hi,

Richard Biener  writes:


On Thu, 7 Sep 2023, Jiufu Guo wrote:


Hi,

As discussed in PR111303:

For pattern "(X + C) / N": "div (plus@3 @0 INTEGER_CST@1) INTEGER_CST@2)",
Even if "X" has value-range and "X + C" does not overflow, "@3" may still
be undefined. Like below example:

_3 = _2 + -5;
if (0 != 0)
   goto ; [34.00%]
else
   goto ; [66.00%]
;;  succ:   3
;;  4

;; basic block 3, loop depth 0
;;  pred:   2
_5 = _3 / 5;
;;  succ:   4

The whole pattern "(_2 + -5 ) / 5" is in "bb 3", but "bb 3" would be
unreachable (because "if (0 != 0)" is always false).
And "get_range_query (cfun)->range_of_expr (vr3, @3)" is checked in
"bb 3", "range_of_expr" gets an "undefined vr3". Where "@3" is "_5".

So, before using "vr3", it would be safe to check "!vr3.undefined_p ()".

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this ok for trunk?

OK, but I wonder why ->range_of_expr () doesn't return false for
undefined_p ()?  While "undefined" technically means we can treat
it as nonnegative_p (or not, maybe but maybe not both), we seem to
not want to do that.  So why expose it at all to ranger users
(yes, internally we in some places want to handle undefined).

I guess, currently, it returns true and then lets the user check
undefined_p, maybe because it tries to only return false if the
type of EXPR is unsupported.


false is returned if no range can be calculated for any reason. The most 
common ones are unsupported types or in some cases, statements that are 
not understood.  FALSE means you cannot use the range being passed in.




Let "range_of_expr" return false for undefined_p would save checking
undefined_p again when using the APIs.

undefined is a perfectly acceptable range.  It can be used to represent 
either values which has not been initialized, or more frequently it 
identifies values that cannot occur due to conflicting/unreachable 
code.  VARYING means it can be any range, UNDEFINED means this is 
unusable, so treat it accordingly.  Its propagated like any other range.


The only reason you are having issues is you are then asking for the 
type of the range, and an undefined range currently has no type, for 
historical reasons.


Andrew

Andrew




[PATCH] RISC-V: Expand VLS mode to scalar mode move[PR111391]

2023-09-13 Thread Juzhe-Zhong
This patch fixes PR111391: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391

PR target/111391

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand VLS to scalar 
move.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/slp-9.c: Adapt test.
* gcc.target/riscv/rvv/autovec/pr111391.c: New test.

---
 gcc/config/riscv/riscv.cc | 29 +++
 .../riscv/rvv/autovec/partial/slp-9.c |  1 -
 .../gcc.target/riscv/rvv/autovec/pr111391.c   | 28 ++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d04ddd69e0..b7daad7cbb5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2513,6 +2513,35 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
}
   return true;
 }
+  /* Expand
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Expand this data movement instead of simply forbid it since
+ we can improve the code generation for this following scenario
+ by RVV auto-vectorization:
+   (set (reg:V8QI 149) (vec_duplicate:V8QI (reg:QI))
+   (set (reg:DI target) (subreg:DI (reg:V8QI reg) 0))
+ Since RVV mode and scalar mode are in different REG_CLASS,
+ we need to explicitly move data from V_REGS to GR_REGS by scalar move.  */
+  if (SUBREG_P (src) && riscv_v_ext_mode_p (GET_MODE (SUBREG_REG (src
+{
+  rtx subreg = force_reg (GET_MODE (SUBREG_REG (src)), SUBREG_REG (src));
+  machine_mode imode = GET_MODE_INNER (GET_MODE (subreg));
+  unsigned int ratio = GET_MODE_SIZE (mode).to_constant ()
+  / GET_MODE_SIZE (imode).to_constant ();
+  poly_int64 nunits = GET_MODE_NUNITS (GET_MODE (subreg));
+  nunits = exact_div (nunits, ratio);
+  scalar_mode smode = as_a (mode);
+  machine_mode vmode
+   = riscv_vector::get_vector_mode (smode, nunits).require ();
+  rtx tmp = gen_reg_rtx (mode);
+  rtx index
+   = gen_int_mode (exact_div (SUBREG_BYTE (src), GET_MODE_SIZE (smode)),
+   Pmode);
+  emit_insn (gen_vec_extract (vmode, vmode, tmp,
+ gen_lowpart (vmode, subreg), index));
+  emit_move_insn (dest, tmp);
+  return true;
+}
   /* Expand
(set (reg:QI target) (mem:QI (address)))
  to
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
index 5fba27c7a35..7c42438c9d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
@@ -29,4 +29,3 @@
 TEST_ALL (VEC_PERM)
 
 /* { dg-final { scan-assembler-times {viota.m} 2 } } */
-/* { dg-final { scan-assembler-not {vmv\.v\.i} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
new file mode 100644
index 000..a7f64c937c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr111391.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -Wno-int-conversion 
-Wno-implicit-function -Wno-incompatible-pointer-types 
-Wno-implicit-function-declaration -Ofast -ftree-vectorize" } */
+
+int d ();
+typedef struct
+{
+  int b;
+} c;
+int
+e (char *f, long g)
+{
+  f += g;
+  while (g--)
+*--f = d;
+}
+
+int
+d (c * f)
+{
+  while (h ())
+switch (f->b)
+  case 'Q':
+  {
+   long a;
+   e (, sizeof (a));
+   i (a);
+  }
+}
-- 
2.36.3



RE: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Wednesday, September 13, 2023 8:46 PM
To: juzhe.zh...@rivai.ai; gcc-patches 
Cc: rdapp@gmail.com; kito.cheng ; Kito.cheng 
; jeffreyalaw 
Subject: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

> Yes. We need the additional helper function since I will cal emit_insn 
> (gen_vec_extract (mode, mode)
> in the following patch which fixes PR111391 ICE.

OK.

Regards
 Robin



Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread Robin Dapp via Gcc-patches
> Yes. We need the additional helper function since I will cal emit_insn 
> (gen_vec_extract (mode, mode)
> in the following patch which fixes PR111391 ICE.

OK.

Regards
 Robin



Re: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread juzhe.zh...@rivai.ai
>> Do we need the additional helper function? 
Yes. We need the additional helper function since I will cal emit_insn 
(gen_vec_extract (mode, mode)
in the following patch which fixes PR111391 ICE.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-09-13 20:31
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization
> -(define_expand "vec_extract"
> +(define_expand "@vec_extract"
 
Do we need the additional helper function?  If not let's rather not
add them for build-time reasons.  The rest is OK, no need for v2.
 
Regards
Robin
 


[PATCH] libstdc++: Remove some more unconditional uses of atomics

2023-09-13 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and aarch64-linux. I intend to push this to trunk.

-- >8 --

These atomics cause linker errors on arm4t where __sync_synchronize is
not defined. For single-threaded targets we don't need the atomics.

libstdc++-v3/ChangeLog:

* include/experimental/io_context (io_context) [!_GLIBCXX_HAS_GTHREADS]:
Use a plain integer for _M_work_count for single-threaded
targets.
* src/c++17/memory_resource.cc [!_GLIBCXX_HAS_GTHREADS]
(atomic_mem_res): Use unsynchronized type for single-threaded
targets.
---
 libstdc++-v3/include/experimental/io_context |  4 ++
 libstdc++-v3/src/c++17/memory_resource.cc| 49 ++--
 2 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/include/experimental/io_context 
b/libstdc++-v3/include/experimental/io_context
index c59f8c8e73b..c878d5a7025 100644
--- a/libstdc++-v3/include/experimental/io_context
+++ b/libstdc++-v3/include/experimental/io_context
@@ -562,7 +562,11 @@ inline namespace v1
}
   };
 
+#ifdef _GLIBCXX_HAS_GTHREADS
 atomic _M_work_count;
+#else
+count_type _M_work_count;
+#endif
 mutable execution_context::mutex_type  _M_mtx;
 queue>_M_op;
 bool   _M_stopped = false;
diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index c0c7cf0cf83..63856eadaf5 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -27,9 +27,9 @@
 #include 
 #include  // has_single_bit, bit_ceil, bit_width
 #include 
+#include  // std::__exchange
 #if ATOMIC_POINTER_LOCK_FREE != 2
 # include// std::mutex, std::lock_guard
-# include // std::__exchange
 #endif
 
 #if __has_cpp_attribute(clang::require_constant_initialization)
@@ -94,10 +94,31 @@ namespace pmr
 
 __constinit constant_init newdel_res{};
 __constinit constant_init null_res{};
-#if ATOMIC_POINTER_LOCK_FREE == 2
+
+#ifndef _GLIBCXX_HAS_GTHREADS
+# define _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
+// Single-threaded, no need for synchronization
+struct atomic_mem_res
+{
+  constexpr
+  atomic_mem_res(memory_resource* r) : val(r) { }
+
+  memory_resource* val;
+
+  memory_resource* load(std::memory_order) const
+  {
+   return val;
+  }
+
+  memory_resource* exchange(memory_resource* r, std::memory_order)
+  {
+   return std::__exchange(val, r);
+  }
+};
+#elif ATOMIC_POINTER_LOCK_FREE == 2
 using atomic_mem_res = atomic;
 # define _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
-#elif defined(_GLIBCXX_HAS_GTHREADS)
+#else
 // Can't use pointer-width atomics, define a type using a mutex instead:
 struct atomic_mem_res
 {
@@ -123,27 +144,7 @@ namespace pmr
return std::__exchange(val, r);
   }
 };
-#else
-# define _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
-// Single-threaded, no need for synchronization
-struct atomic_mem_res
-{
-  constexpr
-  atomic_mem_res(memory_resource* r) : val(r) { }
-
-  memory_resource* val;
-
-  memory_resource* load(std::memory_order) const
-  {
-   return val;
-  }
-
-  memory_resource* exchange(memory_resource* r, std::memory_order)
-  {
-   return std::__exchange(val, r);
-  }
-};
-#endif // ATOMIC_POINTER_LOCK_FREE == 2
+#endif
 
 #ifdef _GLIBCXX_ATOMIC_MEM_RES_CAN_BE_CONSTANT_INITIALIZED
 __constinit constant_init default_res{_res.obj};
-- 
2.41.0



[PATCH 1/2] RISC-V: Cleanup redundant reduction patterns after refactor vector mode

2023-09-13 Thread Lehua Ding
This patch cleanups redundant reduction patterns after Juzhe change vector mode
from fixed-size to scalable-size. For example, whether it is zvl32b, zvl64b,
zvl128b, RVVM1SI indicates that it occupies a vector register. Therefore, it is
easy to map vector modes to LMUL1 vector modes with define_mode_attr without
creating a separate pattern for each LMUL1 Mode. For example, this patch can
combine four patterns (@pred_reduc_,
@pred_reduc_
@pred_reduc_,
@pred_reduc_) to a single pattern
@pred_reduc_.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_reduction): Adjust call.
* config/riscv/riscv-vector-builtins-bases.cc: Adjust call.
* config/riscv/vector-iterators.md: New iterators and attrs.
* config/riscv/vector.md 
(@pred_reduc_):
Removed.
(@pred_reduc_): Removed.
(@pred_reduc_): Removed.
(@pred_reduc_): Removed.
(@pred_reduc_): Added.
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Added.
(@pred_widen_reduc_plus): Removed.
(@pred_reduc_): Removed.
(@pred_reduc_): Removed.
(@pred_reduc_): Removed.
(@pred_reduc_plus): Removed.
(@pred_reduc_plus): Removed.
(@pred_reduc_plus): Added.
(@pred_reduc_plus): Removed.
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Removed.
(@pred_widen_reduc_plus): Added.

---
 gcc/config/riscv/riscv-v.cc   |   4 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  15 +-
 gcc/config/riscv/vector-iterators.md  |  47 ++-
 gcc/config/riscv/vector.md| 369 +++---
 4 files changed, 101 insertions(+), 334 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 76e6094f45b..68b36d9dc4f 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3230,7 +3230,7 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
reduction_type type)
= code_for_pred_reduc_plus (type == reduction_type::UNORDERED
  ? UNSPEC_UNORDERED
  : UNSPEC_ORDERED,
-   vmode, m1_mode);
+   vmode);
   if (type == reduction_type::MASK_LEN_FOLD_LEFT)
{
  rtx mask = ops[3];
@@ -3243,7 +3243,7 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, 
reduction_type type)
 }
   else
 {
-  insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
+  insn_code icode = code_for_pred_reduc (code, vmode);
   emit_vlmax_insn (icode, REDUCE_OP, reduc_ops);
 }
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ee218a03017..c54ea6f0560 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1541,8 +1541,7 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
+return e.use_exact_insn (code_for_pred_reduc (CODE, e.vector_mode ()));
   }
 };
 
@@ -1555,9 +1554,8 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
-e.vector_mode (),
-e.ret_mode ()));
+return e.use_exact_insn (
+  code_for_pred_widen_reduc_plus (UNSPEC, e.vector_mode ()));
   }
 };
 
@@ -1576,7 +1574,7 @@ public:
   rtx expand (function_expander ) const override
   {
 return e.use_exact_insn (
-  code_for_pred_reduc_plus (UNSPEC, e.vector_mode (), e.ret_mode ()));
+  code_for_pred_reduc_plus (UNSPEC, e.vector_mode ()));
   }
 };
 
@@ -1594,9 +1592,8 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (code_for_pred_widen_reduc_plus (UNSPEC,
-e.vector_mode (),
-e.ret_mode ()));
+return e.use_exact_insn (
+  code_for_pred_widen_reduc_plus (UNSPEC, e.vector_mode ()));
   }
 };
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e70a9bc5c74..deb89cbcedc 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -671,6 +671,15 @@
   RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
 ])
 
+(define_mode_iterator VF_HS [
+  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
+  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
+  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+
+  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF 

[PATCH 2/2] RISC-V: Refactor vector reduction patterns

2023-09-13 Thread Lehua Ding
This patch adjust reduction patterns struct, change it from:
   (any_reduc:VI
 (vec_duplicate:VI
   (vec_select:
 (match_operand: 4 "register_operand"  "   vr,   
vr")
 (parallel [(const_int 0)])))
 (match_operand:VI   3 "register_operand"  "   vr,   
vr"))
to:
   (unspec: [
 (match_operand:VI3 "register_operand"  "   vr,   
vr")
 (match_operand: 4 "register_operand"  "   vr,   
vr")
   ] ANY_REDUC)

The reason for the change is that the semantics of the previous pattern is 
incorrect.
GCC does not have a standard rtx code to express the reduction calculation 
process.
It makes more sense to use UNSPEC.

Further, all reduction icode are geted by the UNSPEC and MODE (code_for_pred 
(unspec, mode)),
so that all reduction patterns can have a uniform icode name. After this 
adjust, widen_reducop
and widen_freducop are redundant.

gcc/ChangeLog:

* config/riscv/autovec.md: Change rtx code to unspec.
* config/riscv/riscv-protos.h (expand_reduction): Change prototype.
* config/riscv/riscv-v.cc (expand_reduction): Change prototype.
* config/riscv/riscv-vector-builtins-bases.cc (class widen_reducop):
Removed.
(class widen_freducop): Removed.
* config/riscv/vector-iterators.md (minu): Add reduc unspec, iterators, 
attrs.
* config/riscv/vector.md (@pred_reduc_): Change name.
(@pred_): New name.
(@pred_widen_reduc_plus): Change name.
(@pred_reduc_plus): Change name.
(@pred_widen_reduc_plus): Change name.

---
 gcc/config/riscv/autovec.md   |  27 ++--
 gcc/config/riscv/riscv-protos.h   |   2 +-
 gcc/config/riscv/riscv-v.cc   |  13 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  82 
 gcc/config/riscv/vector-iterators.md  |  62 +++--
 gcc/config/riscv/vector.md| 118 +-
 6 files changed, 152 insertions(+), 152 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4a6b8f8c939..16ac125f53f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2091,7 +2091,7 @@
(match_operand:VI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_SUM, operands, CONST0_RTX 
(mode));
   DONE;
 })
 
@@ -2102,7 +2102,7 @@
 {
   int prec = GET_MODE_PRECISION (mode);
   rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), mode);
-  riscv_vector::expand_reduction (SMAX, operands, min);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, operands, min);
   DONE;
 })
 
@@ -2111,7 +2111,7 @@
(match_operand:VI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_MAXU, operands, CONST0_RTX 
(mode));
   DONE;
 })
 
@@ -2122,7 +2122,7 @@
 {
   int prec = GET_MODE_PRECISION (mode);
   rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED), mode);
-  riscv_vector::expand_reduction (SMIN, operands, max);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_MIN, operands, max);
   DONE;
 })
 
@@ -2133,7 +2133,7 @@
 {
   int prec = GET_MODE_PRECISION (mode);
   rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED), mode);
-  riscv_vector::expand_reduction (UMIN, operands, max);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_MINU, operands, max);
   DONE;
 })
 
@@ -2142,7 +2142,7 @@
(match_operand:VI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_AND, operands, CONSTM1_RTX 
(mode));
   DONE;
 })
 
@@ -2151,7 +2151,7 @@
(match_operand:VI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_OR, operands, CONST0_RTX 
(mode));
   DONE;
 })
 
@@ -2160,7 +2160,7 @@
(match_operand:VI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_XOR, operands, CONST0_RTX 
(mode));
   DONE;
 })
 
@@ -2178,7 +2178,8 @@
(match_operand:VF 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (mode));
+  riscv_vector::expand_reduction (UNSPEC_REDUC_SUM_UNORDERED, operands,
+  CONST0_RTX (mode));
   DONE;
 })
 
@@ -2190,7 +2191,7 @@
   REAL_VALUE_TYPE rv;
   real_inf (, true);
   rtx f = const_double_from_real_value (rv, mode);
-  riscv_vector::expand_reduction (SMAX, operands, f);
+  riscv_vector::expand_reduction (UNSPEC_REDUC_MAX, operands, f);
   DONE;
 })
 
@@ -2202,7 +2203,7 @@
   

Re: [PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread Robin Dapp via Gcc-patches
> -(define_expand "vec_extract"
> +(define_expand "@vec_extract"

Do we need the additional helper function?  If not let's rather not
add them for build-time reasons.  The rest is OK, no need for v2.

Regards
 Robin


[PATCH] RISC-V: Support VLS modes VEC_EXTRACT auto-vectorization

2023-09-13 Thread Juzhe-Zhong
This patch support VLS modes VEC_EXTRACT to fix PR111391:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111391

I need VLS modes VEC_EXTRACT to fix this issue.

I have run the whole gcc testsuite, notice this patch increase these 4 FAILs:
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++14  scan-tree-dump-not 
optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++17  scan-tree-dump-not 
optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++20  scan-tree-dump-not 
optimized "vector"
FAIL: c-c++-common/vector-subscript-4.c  -std=gnu++98  scan-tree-dump-not 
optimized "vector"

After analysis and comparing with LLVM:
https://godbolt.org/z/ozhfKhj5Y

with this patch, GCC generate similar codegen like LLVM (Previously it can not 
be vectorized).

This patch is the prerequisite patch to fix an ICE.

So let's ignore those increased 4 dump IR FAILs since ICE is un-acceptable 
wheras dump FAILs are acceptable (But we should remember and eventually fix 
dump IR FAILs too).

gcc/ChangeLog:

* config/riscv/autovec.md (vec_extract): Add VLS modes.
(@vec_extract): Ditto.
* config/riscv/vector.md: Ditto

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add more def.
* gcc.target/riscv/rvv/autovec/vls/extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/extract-2.c: New test.

---
 gcc/config/riscv/autovec.md   |   4 +-
 gcc/config/riscv/vector.md|  10 +-
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  57 +++-
 .../riscv/rvv/autovec/vls/extract-1.c | 122 +
 .../riscv/rvv/autovec/vls/extract-2.c | 123 ++
 5 files changed, 307 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/extract-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/extract-2.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 50c0104550b..223f6b2d6e7 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1442,10 +1442,10 @@
 ;; -
 ;;  [INT,FP] Extract a vector element.
 ;; -
-(define_expand "vec_extract"
+(define_expand "@vec_extract"
   [(set (match_operand: 0 "register_operand")
  (vec_select:
-   (match_operand:V  1 "register_operand")
+   (match_operand:V_VLS  1 "register_operand")
(parallel
 [(match_operand  2 "nonmemory_operand")])))]
   "TARGET_VECTOR"
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 58e659e5cd4..4630af6cbff 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -8096,7 +8096,7 @@
   [(set (match_operand: 0 "register_operand")
(unspec:
  [(vec_select:
-(match_operand:VI 1 "reg_or_mem_operand")
+(match_operand:V_VLSI 1 "reg_or_mem_operand")
 (parallel [(const_int 0)]))
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE))]
   "TARGET_VECTOR"
@@ -8113,7 +8113,7 @@
   [(set (match_operand: 0 "register_operand"   "=r")
(unspec:
  [(vec_select:
-(match_operand:VI 1 "register_operand" "vr")
+(match_operand:V_VLSI 1 "register_operand" "vr")
 (parallel [(const_int 0)]))
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE))]
   "TARGET_VECTOR"
@@ -8147,7 +8147,7 @@
 (truncate:SI
  (unspec:
[(vec_select:
-  (match_operand:VI_D 1 "register_operand" "vr")
+  (match_operand:V_VLSI_D 1 "register_operand" "vr")
   (parallel [(const_int 0)]))
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
   "TARGET_VECTOR"
@@ -8159,7 +8159,7 @@
   [(set (match_operand: 0 "register_operand")
(unspec:
  [(vec_select:
-(match_operand:VF 1 "reg_or_mem_operand")
+(match_operand:V_VLSF 1 "reg_or_mem_operand")
 (parallel [(const_int 0)]))
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE))]
   "TARGET_VECTOR"
@@ -8176,7 +8176,7 @@
   [(set (match_operand: 0 "register_operand"   "=f")
(unspec:
  [(vec_select:
-(match_operand:VF 1 "register_operand" "vr")
+(match_operand:V_VLSF 1 "register_operand" "vr")
 (parallel [(const_int 0)]))
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE))]
   "TARGET_VECTOR"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
index c7df879dbde..79b4fbc6d93 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h
@@ -45,7 +45,53 @@ typedef int64_t v32di __attribute__ ((vector_size (256)));
 typedef int64_t v64di __attribute__ ((vector_size (512)));
 typedef int64_t 

Re: [PATCH 1/2] MATCH: [PR111364] Add some more minmax cmp operand simplifications

2023-09-13 Thread Andrew Pinski via Gcc-patches
On Tue, Sep 12, 2023 at 11:45 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Sep 12, 2023 at 5:31 PM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > This adds a few more minmax cmp operand simplifications which were missed 
> > before.
> > `MIN(a,b) < a` -> `a > b`
> > `MIN(a,b) >= a` -> `a <= b`
> > `MAX(a,b) > a` -> `a < b`
> > `MAX(a,b) <= a` -> `a >= b`
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu.
>
> OK.  I wonder if any of these are also valid for FP types?

I was thinking about that too. I will look into that later this week.

Thanks,
Andrew

>
> > Note gcc.dg/pr96708-negative.c needed to updated to remove the
> > check for MIN/MAX as they have been optimized (correctly) away.
> >
> > PR tree-optimization/111364
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`MIN (X, Y) == X`): Extend
> > to min/lt, min/ge, max/gt, max/le.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/execute/minmaxcmp-1.c: New test.
> > * gcc.dg/tree-ssa/minmaxcmp-2.c: New test.
> > * gcc.dg/pr96708-negative.c: Update testcase.
> > * gcc.dg/pr96708-positive.c: Add comment about `return 0`.
> > ---
> >  gcc/match.pd  |  8 +--
> >  .../gcc.c-torture/execute/minmaxcmp-1.c   | 51 +++
> >  gcc/testsuite/gcc.dg/pr96708-negative.c   |  4 +-
> >  gcc/testsuite/gcc.dg/pr96708-positive.c   |  1 +
> >  gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c   | 30 +++
> >  5 files changed, 89 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 51985c1bad4..36e3da4841b 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3902,9 +3902,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(maxmin @0 (bit_not @1
> >
> >  /* MIN (X, Y) == X -> X <= Y  */
> > -(for minmax (min min max max)
> > - cmp(eq  ne  eq  ne )
> > - out(le  gt  ge  lt )
> > +/* MIN (X, Y) < X -> X > Y  */
> > +/* MIN (X, Y) >= X -> X <= Y  */
> > +(for minmax (min min min min max max max max)
> > + cmp(eq  ne  lt  ge  eq  ne  gt  le )
> > + out(le  gt  gt  le  ge  lt  lt  ge )
> >   (simplify
> >(cmp:c (minmax:c @0 @1) @0)
> >(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c 
> > b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> > new file mode 100644
> > index 000..6705a053768
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> > @@ -0,0 +1,51 @@
> > +#define func(vol, op1, op2)\
> > +_Bool op1##_##op2##_##vol (int a, int b)   \
> > +{  \
> > + vol int x = op_##op1(a, b);   \
> > + return op_##op2(x, a);\
> > +}
> > +
> > +#define op_lt(a, b) ((a) < (b))
> > +#define op_le(a, b) ((a) <= (b))
> > +#define op_eq(a, b) ((a) == (b))
> > +#define op_ne(a, b) ((a) != (b))
> > +#define op_gt(a, b) ((a) > (b))
> > +#define op_ge(a, b) ((a) >= (b))
> > +#define op_min(a, b) ((a) < (b) ? (a) : (b))
> > +#define op_max(a, b) ((a) > (b) ? (a) : (b))
> > +
> > +
> > +#define funcs(a) \
> > + a(min,lt) \
> > + a(max,lt) \
> > + a(min,gt) \
> > + a(max,gt) \
> > + a(min,le) \
> > + a(max,le) \
> > + a(min,ge) \
> > + a(max,ge) \
> > + a(min,ne) \
> > + a(max,ne) \
> > + a(min,eq) \
> > + a(max,eq)
> > +
> > +#define funcs1(a,b) \
> > +func(,a,b) \
> > +func(volatile,a,b)
> > +
> > +funcs(funcs1)
> > +
> > +#define test(op1,op2)   \
> > +do {\
> > +  if (op1##_##op2##_(x,y) != op1##_##op2##_volatile(x,y))   \
> > +__builtin_abort();  \
> > +} while(0);
> > +
> > +int main()
> > +{
> > +  for(int x = -10; x < 10; x++)
> > +for(int y = -10; y < 10; y++)
> > +{
> > +funcs(test)
> > +}
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/pr96708-negative.c 
> > b/gcc/testsuite/gcc.dg/pr96708-negative.c
> > index 91964d3b971..c9c1aa85558 100644
> > --- a/gcc/testsuite/gcc.dg/pr96708-negative.c
> > +++ b/gcc/testsuite/gcc.dg/pr96708-negative.c
> > @@ -42,7 +42,7 @@ int main()
> >  return 0;
> >  }
> >
> > -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" } } */
> > -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
> > +/* Even though test[1-4] originally has MIN/MAX, those can be optimized 
> > away
> > +   into just comparing a and b arguments. */
> >  /* { dg-final { scan-tree-dump-times "return 0;" 1 "optimized" } } */
> >  /* { dg-final { scan-tree-dump-not { "return 1;" } "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/pr96708-positive.c 
> > b/gcc/testsuite/gcc.dg/pr96708-positive.c
> > index 65af85344b6..12c5fedfd30 100644
> > --- a/gcc/testsuite/gcc.dg/pr96708-positive.c
> > +++ 

gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-09-13 Thread juzhe.zh...@rivai.ai
Thanks Robin for fixing it.

-  : cond (cond_in), else_value (else_value_in)
+  : cond (cond_in), else_value (else_value_in), len (NULL_TREE),
+bias (NULL_TREE)It seems that you shouldn't include this fix in the patch?
+
+  if (len)
+{
+  /* If we had a COND_LEN before we need to ensure that it stays that
+way.  */
+  gimple_match_op old_op = *res_op;
+  *res_op = cond_op;
+  maybe_resimplify_conditional_op (seq, res_op, valueize);
+
+  auto cfn = combined_fn (res_op->code);
+  if (internal_fn_p (cfn)
+ && internal_fn_len_index (as_internal_fn (cfn)) != -1)
+   return true;
+
+  *res_op = old_op;
+  return false;
+}
+  else
+{
+  *res_op = cond_op;
+  maybe_resimplify_conditional_op (seq, res_op, valueize);
+  return true;
+}
This looks odd to me. 

Currently, we never has cond_len_xxx with dummy length (length = VF) and we 
always use cond_xxx if we don't have a loop mask.
So, the length of cond_len_xxx is always generated by MIN or SELET_VL. 
I think we don't need the gimple simplification like cond_len -> into argument 
value.

But we need this following optimization:

negate + cond_len_fma -> cond_len_fnma/cond_len_fms/cond_len_fnms.
That's what I want to support in gimple fold.

Let's see more comments from Richard and Richi.



juzhe.zh...@rivai.ai


[PATCH] tree-optimization/111387 - BB SLP and irreducible regions

2023-09-13 Thread Richard Biener via Gcc-patches
When we split an irreducible region for BB vectorization analysis
the defensive handling of external backedge defs in
vect_get_and_check_slp_defs doesn't work since that relies on
dominance info to identify a backedge.  The testcase also shows
we are iterating over the function in a sub-optimal way which is
why we split the irreducible region in the first place.  The fix
is to mark backedges and use EDGE_DFS_BACK to identify them and
to use the region RPO compute which can produce a RPO order keeping
cycles in a better order (and as side effect marks backedges).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111387
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Check
EDGE_DFS_BACK when doing BB vectorization.
(vect_slp_function): Use rev_post_order_and_mark_dfs_back_seme
to compute RPO and mark backedges.

* gcc.dg/torture/pr111387.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111387.c | 34 +
 gcc/tree-vect-slp.cc| 16 +---
 2 files changed, 46 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111387.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111387.c 
b/gcc/testsuite/gcc.dg/torture/pr111387.c
new file mode 100644
index 000..e14eeef6e4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111387.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize -fno-vect-cost-model" } */
+
+struct {
+  unsigned a;
+  unsigned b;
+} c;
+int d, e, f, g, h;
+int main()
+{
+  if (c.b && g && g > 7)
+goto i;
+ j:
+  if (c.a) {
+int k = 0;
+unsigned l = c.b;
+if (0) {
+m:
+  k = l = c.b;
+}
+c.a = k;
+c.b = l;
+  }
+  if (0) {
+  i:
+goto m;
+  }
+  if (d)
+goto j;
+  for (f = 5; f; f--)
+if (h)
+  e = 0;
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0cf6e02285e..a3e54ebf62a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -628,9 +628,13 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
char swap,
{
  oprnd = gimple_arg (stmt_info->stmt, opno);
  if (gphi *stmt = dyn_cast  (stmt_info->stmt))
-   backedge = dominated_by_p (CDI_DOMINATORS,
-  gimple_phi_arg_edge (stmt, opno)->src,
-  gimple_bb (stmt_info->stmt));
+   {
+ edge e = gimple_phi_arg_edge (stmt, opno);
+ backedge = (is_a  (vinfo)
+ ? e->flags & EDGE_DFS_BACK
+ : dominated_by_p (CDI_DOMINATORS, e->src,
+   gimple_bb (stmt_info->stmt)));
+   }
}
   if (TREE_CODE (oprnd) == VIEW_CONVERT_EXPR)
oprnd = TREE_OPERAND (oprnd, 0);
@@ -7771,7 +7775,11 @@ vect_slp_function (function *fun)
 {
   bool r = false;
   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (fun));
-  unsigned n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);
+  auto_bitmap exit_bbs;
+  bitmap_set_bit (exit_bbs, EXIT_BLOCK);
+  edge entry = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (fun));
+  unsigned n = rev_post_order_and_mark_dfs_back_seme (fun, entry, exit_bbs,
+ true, rpo, NULL);
 
   /* For the moment split the function into pieces to avoid making
  the iteration on the vector mode moot.  Split at points we know
-- 
2.35.3


Re: [PATCH] RISC-V: Support cond vmulh.vv and vmulu.vv

2023-09-13 Thread Lehua Ding

Committed, thanks Kito.

On 2023/9/13 16:50, Kito Cheng wrote:

LGTM, thanks :)

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:


This patch adds combine patterns to combine vmulh[u].vv + vcond_mask
to mask vmulh[u].vv. For vmulsu.vv, it can not be produced in midend
currently. We will send another patch to take this issue.

gcc/ChangeLog:

 * config/riscv/autovec-opt.md (*cond_3_highpart):
 New combine pattern.
 * config/riscv/autovec.md (smul3_highpart): Mrege smul and umul.
 (3_highpart): Merged pattern.
 (umul3_highpart): Mrege smul and umul.
 * config/riscv/vector-iterators.md (umul): New iterators.
 (UNSPEC_VMULHU): New iterators.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c: New test.

---
  gcc/config/riscv/autovec-opt.md   | 23 -
  gcc/config/riscv/autovec.md   | 22 ++--
  gcc/config/riscv/vector-iterators.md  |  4 +++
  .../riscv/rvv/autovec/cond/cond_mulh-1.c  | 29 
  .../riscv/rvv/autovec/cond/cond_mulh-2.c  | 30 
  .../riscv/rvv/autovec/cond/cond_mulh_run-1.c  | 32 +
  .../riscv/rvv/autovec/cond/cond_mulh_run-2.c  | 34 +++
  7 files changed, 154 insertions(+), 20 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 0d2721f0b29..552be48bf73 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -970,6 +970,28 @@
  }
   [(set_attr "type" "vnshift")])

+;; Combine vmulh.vv/vmulhu.vv + vcond_mask
+(define_insn_and_split "*cond_3_highpart"
+   [(set (match_operand:VFULLI 0 "register_operand")
+(if_then_else:VFULLI
+  (match_operand: 1 "register_operand")
+  (mulh:VFULLI
+(match_operand:VFULLI 2 "register_operand")
+(match_operand:VFULLI 3 "register_operand"))
+  (match_operand:VFULLI 4 "register_operand")))]
+   "TARGET_VECTOR && can_create_pseudo_p ()"
+   "#"
+   "&& 1"
+   [(const_int 0)]
+{
+  insn_code icode = code_for_pred_mulh (, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[4],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_binop (icode, ops);
+   DONE;
+}
+[(set_attr "type" "vector")])
+
  ;; 
=
  ;; Combine extend + binop to widen_binop
  ;; 
=
@@ -1172,7 +1194,6 @@
  }
  [(set_attr "type" "vfwmul")])

-
  ;; 
=
  ;; Misc combine patterns
  ;; 
=
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index e9dd40af935..b4ac22bb97b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1569,9 +1569,9 @@
  ;; - vmulhu.vv
  ;; -

-(define_insn_and_split "smul3_highpart"
+(define_insn_and_split "3_highpart"
[(set (match_operand:VFULLI 0 "register_operand")
-(smul_highpart:VFULLI
+(mulh:VFULLI
(match_operand:VFULLI 1 "register_operand")
(match_operand:VFULLI 2 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
@@ -1579,23 +1579,7 @@
"&& 1"
[(const_int 0)]
  {
-  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHS, mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
-  DONE;
-}
-[(set_attr "type" "vimul")])
-
-(define_insn_and_split "umul3_highpart"
-  [(set (match_operand:VFULLI 0 "register_operand")
-(umul_highpart:VFULLI
-  (match_operand:VFULLI 1 "register_operand")
-  (match_operand:VFULLI 2 "register_operand")))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHU, mode);
+  insn_code icode = code_for_pred_mulh (, mode);
riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
DONE;
  }
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 2f7f7cbe08c..e70a9bc5c74 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ 

Re: [PATCH] RISC-V: Support cond vnsrl/vnsra

2023-09-13 Thread Lehua Ding

Committed, thanks Kito.

On 2023/9/13 15:56, Kito Cheng wrote:

LGTM

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:


This patch add combine patterns to combine vnsra.w[vxi] + vcond_mask
to a mask vnsra.w[vxi].

gcc/ChangeLog:

 * config/riscv/autovec-opt.md 
(*cond_vtrunc):
 New combine pattern.
 (*cond_trunc): Ditto.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c: New 
test.
 * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c: New 
test.

---
  gcc/config/riscv/autovec-opt.md   | 46 +++
  .../rvv/autovec/cond/cond_narrow_shift-1.c| 27 +++
  .../rvv/autovec/cond/cond_narrow_shift-2.c| 30 
  .../rvv/autovec/cond/cond_narrow_shift-3.c| 30 
  .../autovec/cond/cond_narrow_shift_run-1.c| 29 
  .../autovec/cond/cond_narrow_shift_run-2.c| 30 
  .../autovec/cond/cond_narrow_shift_run-3.c| 31 +
  7 files changed, 223 insertions(+)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index f759525f96b..0d2721f0b29 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -924,6 +924,52 @@
 DONE;
  })

+;; Combine vnsra + vcond_mask
+(define_insn_and_split 
"*cond_vtrunc"
+  [(set (match_operand: 0 "register_operand")
+ (if_then_else:
+   (match_operand: 1 "register_operand")
+   (truncate:
+ (any_shiftrt:VWEXTI
+   (match_operand:VWEXTI 2 "register_operand")
+  (any_extend:VWEXTI
+ (match_operand: 3 "vector_shift_operand"
+   (match_operand: 4 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_narrow (, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[4],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_binop (icode, ops);
+  DONE;
+}
+ [(set_attr "type" "vnshift")])
+
+(define_insn_and_split "*cond_trunc"
+  [(set (match_operand: 0 "register_operand")
+ (if_then_else:
+   (match_operand: 1 "register_operand")
+   (truncate:
+ (any_shiftrt:VWEXTI
+   (match_operand:VWEXTI 2 "register_operand")
+  (match_operand: 3 "csr_operand")))
+   (match_operand: 4 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_narrow_scalar (, 
mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], gen_lowpart (Pmode, 
operands[3]),
+   operands[4], gen_int_mode (GET_MODE_NUNITS (mode), 
Pmode)};
+  riscv_vector::expand_cond_len_binop (icode, ops);
+  DONE;
+}
+ [(set_attr "type" "vnshift")])
+
  ;; 
=
  ;; Combine extend + binop to widen_binop
  ;; 
=
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
new file mode 100644
index 000..d068110a8a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d 
--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include 
+
+#define DEF_LOOP(TYPE1, TYPE2) 
\
+  void __attribute__ ((noipa)) 
\
+  test_##TYPE1##_##TYPE2 (TYPE2 *__restrict r, TYPE2 *__restrict a,
  \
+   TYPE1 *__restrict b, int n)\
+  {
\
+for (int i = 0; i < n; ++i)
\
+  r[i] = a[i] > 20 ? (TYPE2) (b[i] >> 3) : r[i]; 

Re: [PATCH] RISC-V: Support cond vfsgnj.vv autovec pattern

2023-09-13 Thread Lehua Ding

Committed, thanks Kito.

On 2023/9/13 16:49, Kito Cheng wrote:

LGTM

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:


This patch add combine patterns to combine vfsgnj.vv + vcond_mask
to mask vfsgnj.vv. For vfsgnjx.vv, it can not be produced in midend
currently. We will send another patch to take this issue.

gcc/ChangeLog:

 * config/riscv/autovec-opt.md (*copysign_neg): Move.
 (*cond_copysign): New combine pattern.
 * config/riscv/riscv-v.cc (needs_fp_rounding): Extend.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h: New test.
 * gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c: New test.

---
  gcc/config/riscv/autovec-opt.md   | 68 +
  gcc/config/riscv/riscv-v.cc   |  4 +-
  .../rvv/autovec/cond/cond_copysign-run.c  | 99 +++
  .../rvv/autovec/cond/cond_copysign-rv32gcv.c  | 12 +++
  .../rvv/autovec/cond/cond_copysign-rv64gcv.c  | 12 +++
  .../rvv/autovec/cond/cond_copysign-template.h | 81 +++
  .../rvv/autovec/cond/cond_copysign-zvfh-run.c | 93 +
  7 files changed, 349 insertions(+), 20 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 58e80044f1e..f759525f96b 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -609,6 +609,10 @@
 (set_attr "mode" "")
 (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])

+;; 
=
+;; Combine op + vmerge to cond_op
+;; 
=
+
  ;; Combine  and vcond_mask generated by midend into cond_len_
  ;; Currently supported operations:
  ;;   abs(FP)
@@ -651,25 +655,6 @@
DONE;
  })

-;; Combine vlmax neg and UNSPEC_VCOPYSIGN
-(define_insn_and_split "*copysign_neg"
-  [(set (match_operand:VF 0 "register_operand")
-(neg:VF
-  (unspec:VF [
-(match_operand:VF 1 "register_operand")
-(match_operand:VF 2 "register_operand")
-  ] UNSPEC_VCOPYSIGN)))]
-  "TARGET_VECTOR && can_create_pseudo_p ()"
-  "#"
-  "&& 1"
-  [(const_int 0)]
-{
-  riscv_vector::emit_vlmax_insn (code_for_pred_ncopysign (mode),
-  riscv_vector::BINARY_OP, operands);
-  DONE;
-}
-[(set_attr "type" "vector")])
-
  ;; Combine sign_extend/zero_extend(vf2) and vcond_mask
  (define_insn_and_split "*cond_"
[(set (match_operand:VWEXTI 0 "register_operand")
@@ -918,6 +903,27 @@
  }
  [(set_attr "type" "vector")])

+;; Combine vfsgnj.vv + vcond_mask
+(define_insn_and_split "*cond_copysign"
+   [(set (match_operand:VF 0 "register_operand")
+(if_then_else:VF
+  (match_operand: 1 "register_operand")
+  (unspec:VF
+   [(match_operand:VF 2 "register_operand")
+(match_operand:VF 3 "register_operand")] UNSPEC_VCOPYSIGN)
+  (match_operand:VF 4 "register_operand")))]
+   "TARGET_VECTOR && can_create_pseudo_p ()"
+   "#"
+   "&& 1"
+   [(const_int 0)]
+{
+  insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], operands[4],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_binop (icode, ops);
+   DONE;
+})
+
  ;; 
=
  ;; Combine extend + binop to widen_binop
  ;; 
=
@@ -1119,3 +1125,27 @@
DONE;
  }
  [(set_attr "type" "vfwmul")])
+
+
+;; 
=
+;; Misc combine patterns
+;; 
=
+
+;; Combine vlmax neg and UNSPEC_VCOPYSIGN
+(define_insn_and_split "*copysign_neg"
+  [(set (match_operand:VF 0 "register_operand")
+(neg:VF
+  (unspec:VF [
+(match_operand:VF 1 "register_operand")
+(match_operand:VF 2 "register_operand")
+  ] UNSPEC_VCOPYSIGN)))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  

Re: [PATCH] Tweak language choice in config-list.mk

2023-09-13 Thread Christophe Lyon via Gcc-patches
Hi!


On Thu, 7 Sept 2023 at 11:30, Richard Sandiford via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> When I tried to use config-list.mk, the build for every triple except
> the build machine's failed for m2.  This is because, unlike other
> languages, m2 builds target objects during all-gcc.  The build will
> therefore fail unless you have access to an appropriate binutils
> (or an equivalent).  That's quite a big ask for over 100 targets. :)
>
> This patch therefore makes m2 an optional inclusion.
>
> Doing that wasn't entirely straightforward though.  The current
> configure line includes "--enable-languages=all,...", which means
> that the "..." can only force languages to be added that otherwise
> wouldn't have been.  (I.e. the only effect of the "..." is to
> override configure autodetection.)
>
> The choice of all,ada and:
>
>   # Make sure you have a recent enough gcc (with ada support) in your path
> so
>   # that --enable-werror-always will work.
>
> make it clear that lack of GNAT should be a build failure rather than
> silently ignored.  This predates the D frontend, which requires GDC
> in the same way that Ada requires GNAT.  I don't know of a reason
> why D should be treated differently.
>
> The patch therefore expands the "all" into a specific list of
> languages.
>
> That in turn meant that Fortran had to be handled specially,
> since bpf and mmix don't support Fortran.
>
> Perhaps there's an argument that m2 shouldn't build target objects
> during all-gcc, but (a) it works for practical usage and (b) the
> patch is an easy workaround.  I'd be happy for the patch to be
> reverted if the build system changes.
>
> OK to install?
>
> Richard
>
>
> gcc/
> * contrib/config-list.mk (OPT_IN_LANGUAGES): New variable.
> ($(LIST)): Replace --enable-languages=all with a specifc list.
> Disable fortran on bpf and mmix.  Enable the languages in
> OPT_IN_LANGUAGES.
> ---
>  contrib/config-list.mk | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index e570b13c71b..50ecb014bc0 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -12,6 +12,11 @@ TEST=all-gcc
>  # supply an absolute path.
>  GCC_SRC_DIR=../../gcc
>
> +# Define this to ,m2 if you want to build Modula-2.  Modula-2 builds
> target
> +# objects during all-gcc, so it can only be included if you've installed
> +# binutils (or an equivalent) for each target.
> +OPT_IN_LANGUAGES=
> +
>  # Use -j / -l make arguments and nice to assure a smooth
> resource-efficient
>  # load on the build machine, e.g. for 24 cores:
>  # svn co svn://gcc.gnu.org/svn/gcc/branches/foo-branch gcc
> @@ -126,17 +131,23 @@ $(LIST): make-log-dir
> TGT=`echo $@ | awk 'BEGIN { FS = "OPT" }; { print $$1 }'`
> &&\
> TGT=`$(GCC_SRC_DIR)/config.sub $$TGT` &&
>   \
> case $$TGT in
>  \
> -   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix*
> | bpf-*-*)\
> +   bpf-*-*)
>   \
> ADDITIONAL_LANGUAGES="";
>   \
> ;;
>   \
> -   *)
>   \
> +   *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix*
> | bpf-*-*)\
>
Am I misreading, or are you matching bpf here and above? From your commit
message, I think bpf should either be only above (and define
ADDITIONAL_LANGUAGES to "") and along with mmix (if it supports go) ?



> +   ADDITIONAL_LANGUAGES=",fortran";
>   \
> +   ;;
>   \
> +   mmix-*-*)
>  \
> ADDITIONAL_LANGUAGES=",go";
>  \
> ;;
>   \
> +   *)
>   \
> +   ADDITIONAL_LANGUAGES=",fortran,go";
>  \
> +   ;;
>   \
> esac &&
>  \
> $(GCC_SRC_DIR)/configure
>   \
> --target=$(subst SCRIPTS,`pwd`/../scripts/,$(subst
> OPT,$(empty) -,$@))  \
> --enable-werror-always ${host_options}
>   \
> -   --enable-languages=all,ada$$ADDITIONAL_LANGUAGES;
>  \
> +
>  
> --enable-languages=c,ada,c++,d,lto,objc,obj-c++,rust$$ADDITIONAL_LANGUAGES$(OPT_IN_LANGUAGES);
> \
> ) > log/$@-config.out 2>&1
>
>  $(LOGFILES) : log/%-make.out : %
> --
> 2.25.1
>
>


[PATCH] LoongArch: Reimplement multilib build option handling.

2023-09-13 Thread Yang Yujie
Library build options from --with-multilib-list used to be processed with
*self_spec, which missed the driver's initial canonicalization.  This
caused limitations on CFLAGS override and the use of driver-only options
like -m[no]-lsx.

The problem is solved by promoting the injection rules of --with-multilib-list
options to the first element of DRIVER_SELF_SPECS, to make them execute before
the canonialization.  The library-build options are also hard-coded in
the driver and can be used conveniently by the builders of other non-gcc
libraries via the use of -fmultiflags.

Bootstrapped and tested on loongarch64-linux-gnu.

ChangeLog:

* config-ml.in: Remove unneeded loongarch clause.
* configure.ac: Register custom makefile fragments mt-loongarch-*
for loongarch targets.
* configure: Regenerate.

config/ChangeLog:

* mt-loongarch-mlib: New file.  Pass -fmultiflags when building
target libraries (FLAGS_FOR_TARGET).
* mt-loongarch-elf: New file.
* mt-loongarch-gnu: New file.

gcc/ChangeLog:

* config.gcc: Pass the default ABI via TM_MULTILIB_CONFIG.
* config/loongarch/loongarch-driver.h: Invoke MLIB_SELF_SPECS
before the driver canonicalization routines.
* config/loongarch/loongarch.h: Move definitions of CC1_SPEC etc.
to loongarch-driver.h
* config/loongarch/t-linux: Move multilib-related definitions to
t-multilib.
* config/loongarch/t-multilib: New file.  Inject library build
options obtained from --with-multilib-list.
* config/loongarch/t-loongarch: Same.
---
 config-ml.in| 10 
 config/mt-loongarch-elf |  1 +
 config/mt-loongarch-gnu |  2 +
 config/mt-loongarch-mlib|  1 +
 configure   |  6 +++
 configure.ac|  6 +++
 gcc/config.gcc  |  6 +--
 gcc/config/loongarch/loongarch-driver.h | 42 +++
 gcc/config/loongarch/loongarch.h| 50 --
 gcc/config/loongarch/t-linux| 66 +++-
 gcc/config/loongarch/t-loongarch|  2 +-
 gcc/config/loongarch/t-multilib | 68 +
 12 files changed, 137 insertions(+), 123 deletions(-)
 create mode 100644 config/mt-loongarch-elf
 create mode 100644 config/mt-loongarch-gnu
 create mode 100644 config/mt-loongarch-mlib
 create mode 100644 gcc/config/loongarch/t-multilib

diff --git a/config-ml.in b/config-ml.in
index ad0db078171..68854a4f16c 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -301,16 +301,6 @@ arm-*-*)
  done
fi
;;
-loongarch*-*)
-   old_multidirs="${multidirs}"
-   multidirs=""
-   for x in ${old_multidirs}; do
-   case "$x" in
-   `${CC-gcc} --print-multi-directory`) : ;;
-   *) multidirs="${multidirs} ${x}" ;;
-   esac
-   done
-   ;;
 m68*-*-*)
if [ x$enable_softfloat = xno ]
then
diff --git a/config/mt-loongarch-elf b/config/mt-loongarch-elf
new file mode 100644
index 000..bbf29bb578a
--- /dev/null
+++ b/config/mt-loongarch-elf
@@ -0,0 +1 @@
+include $(srcdir)/config/mt-loongarch-mlib
diff --git a/config/mt-loongarch-gnu b/config/mt-loongarch-gnu
new file mode 100644
index 000..dfefb44ede3
--- /dev/null
+++ b/config/mt-loongarch-gnu
@@ -0,0 +1,2 @@
+include $(srcdir)/config/mt-gnu
+include $(srcdir)/config/mt-loongarch-mlib
diff --git a/config/mt-loongarch-mlib b/config/mt-loongarch-mlib
new file mode 100644
index 000..4cfe568f1fc
--- /dev/null
+++ b/config/mt-loongarch-mlib
@@ -0,0 +1 @@
+FLAGS_FOR_TARGET += -fmultiflags
diff --git a/configure b/configure
index 28f0913bdd4..8fc163d36bd 100755
--- a/configure
+++ b/configure
@@ -9683,6 +9683,12 @@ case "${target}" in
   spu-*-*)
 target_makefile_frag="config/mt-spu"
 ;;
+  loongarch*-*linux* | loongarch*-*gnu*)
+target_makefile_frag="config/mt-loongarch-gnu"
+;;
+  loongarch*-*elf*)
+target_makefile_frag="config/mt-loongarch-elf"
+;;
   mips*-sde-elf* | mips*-mti-elf* | mips*-img-elf*)
 target_makefile_frag="config/mt-sde"
 ;;
diff --git a/configure.ac b/configure.ac
index 5d25dc864c3..1d16530140a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2810,6 +2810,12 @@ case "${target}" in
   spu-*-*)
 target_makefile_frag="config/mt-spu"
 ;;
+  loongarch*-*linux* | loongarch*-*gnu*)
+target_makefile_frag="config/mt-loongarch-gnu"
+;;
+  loongarch*-*elf*)
+target_makefile_frag="config/mt-loongarch-elf"
+;;
   mips*-sde-elf* | mips*-mti-elf* | mips*-img-elf*)
 target_makefile_frag="config/mt-sde"
 ;;
diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2fe7c7ceef..3a70e64ccd2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2482,7 +2482,7 @@ loongarch*-*-linux*)
tm_file="elfos.h gnu-user.h linux.h linux-android.h glibc-stdint.h 
${tm_file}"

Re: [PATCH] RISC-V: Support cond vmulh.vv and vmulu.vv

2023-09-13 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:
>
> This patch adds combine patterns to combine vmulh[u].vv + vcond_mask
> to mask vmulh[u].vv. For vmulsu.vv, it can not be produced in midend
> currently. We will send another patch to take this issue.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md (*cond_3_highpart):
> New combine pattern.
> * config/riscv/autovec.md (smul3_highpart): Mrege smul and umul.
> (3_highpart): Merged pattern.
> (umul3_highpart): Mrege smul and umul.
> * config/riscv/vector-iterators.md (umul): New iterators.
> (UNSPEC_VMULHU): New iterators.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c: New test.
>
> ---
>  gcc/config/riscv/autovec-opt.md   | 23 -
>  gcc/config/riscv/autovec.md   | 22 ++--
>  gcc/config/riscv/vector-iterators.md  |  4 +++
>  .../riscv/rvv/autovec/cond/cond_mulh-1.c  | 29 
>  .../riscv/rvv/autovec/cond/cond_mulh-2.c  | 30 
>  .../riscv/rvv/autovec/cond/cond_mulh_run-1.c  | 32 +
>  .../riscv/rvv/autovec/cond/cond_mulh_run-2.c  | 34 +++
>  7 files changed, 154 insertions(+), 20 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_mulh_run-2.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 0d2721f0b29..552be48bf73 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -970,6 +970,28 @@
>  }
>   [(set_attr "type" "vnshift")])
>
> +;; Combine vmulh.vv/vmulhu.vv + vcond_mask
> +(define_insn_and_split "*cond_3_highpart"
> +   [(set (match_operand:VFULLI 0 "register_operand")
> +(if_then_else:VFULLI
> +  (match_operand: 1 "register_operand")
> +  (mulh:VFULLI
> +(match_operand:VFULLI 2 "register_operand")
> +(match_operand:VFULLI 3 "register_operand"))
> +  (match_operand:VFULLI 4 "register_operand")))]
> +   "TARGET_VECTOR && can_create_pseudo_p ()"
> +   "#"
> +   "&& 1"
> +   [(const_int 0)]
> +{
> +  insn_code icode = code_for_pred_mulh (, mode);
> +  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[4],
> +   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
> +  riscv_vector::expand_cond_len_binop (icode, ops);
> +   DONE;
> +}
> +[(set_attr "type" "vector")])
> +
>  ;; 
> =
>  ;; Combine extend + binop to widen_binop
>  ;; 
> =
> @@ -1172,7 +1194,6 @@
>  }
>  [(set_attr "type" "vfwmul")])
>
> -
>  ;; 
> =
>  ;; Misc combine patterns
>  ;; 
> =
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index e9dd40af935..b4ac22bb97b 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1569,9 +1569,9 @@
>  ;; - vmulhu.vv
>  ;; -
>
> -(define_insn_and_split "smul3_highpart"
> +(define_insn_and_split "3_highpart"
>[(set (match_operand:VFULLI 0 "register_operand")
> -(smul_highpart:VFULLI
> +(mulh:VFULLI
>(match_operand:VFULLI 1 "register_operand")
>(match_operand:VFULLI 2 "register_operand")))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
> @@ -1579,23 +1579,7 @@
>"&& 1"
>[(const_int 0)]
>  {
> -  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHS, mode);
> -  riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
> -  DONE;
> -}
> -[(set_attr "type" "vimul")])
> -
> -(define_insn_and_split "umul3_highpart"
> -  [(set (match_operand:VFULLI 0 "register_operand")
> -(umul_highpart:VFULLI
> -  (match_operand:VFULLI 1 "register_operand")
> -  (match_operand:VFULLI 2 "register_operand")))]
> -  "TARGET_VECTOR && can_create_pseudo_p ()"
> -  "#"
> -  "&& 1"
> -  [(const_int 0)]
> -{
> -  insn_code icode = code_for_pred_mulh (UNSPEC_VMULHU, mode);
> +  insn_code icode = code_for_pred_mulh (, mode);
>riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
>DONE;
>  }
> diff --git a/gcc/config/riscv/vector-iterators.md 
> 

Re: [PATCH] RISC-V: Support cond vfsgnj.vv autovec pattern

2023-09-13 Thread Kito Cheng via Gcc-patches
LGTM

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:
>
> This patch add combine patterns to combine vfsgnj.vv + vcond_mask
> to mask vfsgnj.vv. For vfsgnjx.vv, it can not be produced in midend
> currently. We will send another patch to take this issue.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md (*copysign_neg): Move.
> (*cond_copysign): New combine pattern.
> * config/riscv/riscv-v.cc (needs_fp_rounding): Extend.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c: New 
> test.
>
> ---
>  gcc/config/riscv/autovec-opt.md   | 68 +
>  gcc/config/riscv/riscv-v.cc   |  4 +-
>  .../rvv/autovec/cond/cond_copysign-run.c  | 99 +++
>  .../rvv/autovec/cond/cond_copysign-rv32gcv.c  | 12 +++
>  .../rvv/autovec/cond/cond_copysign-rv64gcv.c  | 12 +++
>  .../rvv/autovec/cond/cond_copysign-template.h | 81 +++
>  .../rvv/autovec/cond/cond_copysign-zvfh-run.c | 93 +
>  7 files changed, 349 insertions(+), 20 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-run.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-template.h
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_copysign-zvfh-run.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 58e80044f1e..f759525f96b 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -609,6 +609,10 @@
> (set_attr "mode" "")
> (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
> +;; 
> =
> +;; Combine op + vmerge to cond_op
> +;; 
> =
> +
>  ;; Combine  and vcond_mask generated by midend into cond_len_
>  ;; Currently supported operations:
>  ;;   abs(FP)
> @@ -651,25 +655,6 @@
>DONE;
>  })
>
> -;; Combine vlmax neg and UNSPEC_VCOPYSIGN
> -(define_insn_and_split "*copysign_neg"
> -  [(set (match_operand:VF 0 "register_operand")
> -(neg:VF
> -  (unspec:VF [
> -(match_operand:VF 1 "register_operand")
> -(match_operand:VF 2 "register_operand")
> -  ] UNSPEC_VCOPYSIGN)))]
> -  "TARGET_VECTOR && can_create_pseudo_p ()"
> -  "#"
> -  "&& 1"
> -  [(const_int 0)]
> -{
> -  riscv_vector::emit_vlmax_insn (code_for_pred_ncopysign (mode),
> -  riscv_vector::BINARY_OP, operands);
> -  DONE;
> -}
> -[(set_attr "type" "vector")])
> -
>  ;; Combine sign_extend/zero_extend(vf2) and vcond_mask
>  (define_insn_and_split "*cond_"
>[(set (match_operand:VWEXTI 0 "register_operand")
> @@ -918,6 +903,27 @@
>  }
>  [(set_attr "type" "vector")])
>
> +;; Combine vfsgnj.vv + vcond_mask
> +(define_insn_and_split "*cond_copysign"
> +   [(set (match_operand:VF 0 "register_operand")
> +(if_then_else:VF
> +  (match_operand: 1 "register_operand")
> +  (unspec:VF
> +   [(match_operand:VF 2 "register_operand")
> +(match_operand:VF 3 "register_operand")] UNSPEC_VCOPYSIGN)
> +  (match_operand:VF 4 "register_operand")))]
> +   "TARGET_VECTOR && can_create_pseudo_p ()"
> +   "#"
> +   "&& 1"
> +   [(const_int 0)]
> +{
> +  insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, mode);
> +  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[4],
> +   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
> +  riscv_vector::expand_cond_len_binop (icode, ops);
> +   DONE;
> +})
> +
>  ;; 
> =
>  ;; Combine extend + binop to widen_binop
>  ;; 
> =
> @@ -1119,3 +1125,27 @@
>DONE;
>  }
>  [(set_attr "type" "vfwmul")])
> +
> +
> +;; 
> =
> +;; Misc combine patterns
> +;; 
> =
> +
> +;; Combine vlmax neg and UNSPEC_VCOPYSIGN
> +(define_insn_and_split "*copysign_neg"
> +  [(set (match_operand:VF 0 "register_operand")
> +(neg:VF
> +  (unspec:VF [
> +(match_operand:VF 1 "register_operand")
> +(match_operand:VF 2 

[PATCH] tree-optimization/111397 - missed copy propagation involving abnormal dest

2023-09-13 Thread Richard Biener via Gcc-patches
The following extends the previous enhancement to copy propagation
involving abnormals.  We can easily replace abnormal uses by not
abnormal uses and only need to preserve the abnormals in PHI arguments
flowing in from abnormal edges.  This changes the may_propagate_copy
argument indicating we are not propagating into a PHI node to indicate
whether we know we are not propagating into a PHI argument from an
abnormal PHI instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111397
* tree-ssa-propagate.cc (may_propagate_copy): Change optional
argument to specify whether the PHI destination doesn't flow in
from an abnormal PHI.
(propagate_value): Adjust.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Indicate abnormal
PHI dest.
* tree-ssa-sccvn.cc (eliminate_dom_walker::before_dom_children):
Likewise.
(process_bb): Likewise.

* gcc.dg/uninit-pr111397.c: New testcase.
---
 gcc/testsuite/gcc.dg/uninit-pr111397.c | 15 +++
 gcc/tree-ssa-forwprop.cc   |  2 +-
 gcc/tree-ssa-propagate.cc  | 20 +---
 gcc/tree-ssa-sccvn.cc  |  5 +++--
 4 files changed, 32 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/uninit-pr111397.c

diff --git a/gcc/testsuite/gcc.dg/uninit-pr111397.c 
b/gcc/testsuite/gcc.dg/uninit-pr111397.c
new file mode 100644
index 000..ec12f9d642a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr111397.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wuninitialized" } */
+
+int globalVar = 1;
+int __attribute__ ((__returns_twice__)) test_setjmpex(void *context);
+
+void testfn()
+{
+  int localVar = globalVar;
+  while (!localVar) {
+  test_setjmpex(__builtin_frame_address (0)); // { dg-bogus 
"uninitialized" }
+  if (globalVar)
+   break;
+  }
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 047f9237dd4..94ca47a9726 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -4070,7 +4070,7 @@ pass_forwprop::execute (function *fun)
  continue;
tree val = fwprop_ssa_val (arg);
if (val != arg
-   && may_propagate_copy (arg, val))
+   && may_propagate_copy (arg, val, !(e->flags & EDGE_ABNORMAL)))
  propagate_value (use_p, val);
  }
 
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index cb68b419b8c..a29c49328ad 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -1032,11 +1032,12 @@ substitute_and_fold_engine::substitute_and_fold 
(basic_block block)
 
 
 /* Return true if we may propagate ORIG into DEST, false otherwise.
-   If DEST_NOT_PHI_ARG_P is true then assume the propagation does
-   not happen into a PHI argument which relaxes some constraints.  */
+   If DEST_NOT_ABNORMAL_PHI_EDGE_P is true then assume the propagation does
+   not happen into a PHI argument which flows in from an abnormal edge
+   which relaxes some constraints.  */
 
 bool
-may_propagate_copy (tree dest, tree orig, bool dest_not_phi_arg_p)
+may_propagate_copy (tree dest, tree orig, bool dest_not_abnormal_phi_edge_p)
 {
   tree type_d = TREE_TYPE (dest);
   tree type_o = TREE_TYPE (orig);
@@ -1056,9 +1057,9 @@ may_propagate_copy (tree dest, tree orig, bool 
dest_not_phi_arg_p)
   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig))
 return false;
   /* Similarly if DEST flows in from an abnormal edge then the copy cannot be
- propagated.  If we know we do not propagate into a PHI argument this
+ propagated.  If we know we do not propagate into such a PHI argument this
  does not apply.  */
-  else if (!dest_not_phi_arg_p
+  else if (!dest_not_abnormal_phi_edge_p
   && TREE_CODE (dest) == SSA_NAME
   && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (dest))
 return false;
@@ -1162,8 +1163,13 @@ void
 propagate_value (use_operand_p op_p, tree val)
 {
   if (flag_checking)
-gcc_assert (may_propagate_copy (USE_FROM_PTR (op_p), val,
-   !is_a  (USE_STMT (op_p;
+{
+  bool ab = (is_a  (USE_STMT (op_p))
+&& (gimple_phi_arg_edge (as_a  (USE_STMT (op_p)),
+ PHI_ARG_INDEX_FROM_USE (op_p))
+->flags & EDGE_ABNORMAL));
+  gcc_assert (may_propagate_copy (USE_FROM_PTR (op_p), val, !ab));
+}
   replace_exp (op_p, val);
 }
 
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index d9487be302b..1eaf5f6a363 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -7399,7 +7399,8 @@ eliminate_dom_walker::before_dom_children (basic_block b)
  || virtual_operand_p (arg))
continue;
  tree sprime = eliminate_avail (b, arg);
- if (sprime && may_propagate_copy (arg, sprime))
+ if (sprime && may_propagate_copy (arg, sprime,
+

Re: [PATCH] RISC-V: Support cond vnsrl/vnsra

2023-09-13 Thread Kito Cheng via Gcc-patches
LGTM

On Wed, Sep 13, 2023 at 12:25 AM Lehua Ding  wrote:
>
> This patch add combine patterns to combine vnsra.w[vxi] + vcond_mask
> to a mask vnsra.w[vxi].
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md 
> (*cond_vtrunc):
> New combine pattern.
> (*cond_trunc): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c: New 
> test.
>
> ---
>  gcc/config/riscv/autovec-opt.md   | 46 +++
>  .../rvv/autovec/cond/cond_narrow_shift-1.c| 27 +++
>  .../rvv/autovec/cond/cond_narrow_shift-2.c| 30 
>  .../rvv/autovec/cond/cond_narrow_shift-3.c| 30 
>  .../autovec/cond/cond_narrow_shift_run-1.c| 29 
>  .../autovec/cond/cond_narrow_shift_run-2.c| 30 
>  .../autovec/cond/cond_narrow_shift_run-3.c| 31 +
>  7 files changed, 223 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift_run-3.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index f759525f96b..0d2721f0b29 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -924,6 +924,52 @@
> DONE;
>  })
>
> +;; Combine vnsra + vcond_mask
> +(define_insn_and_split 
> "*cond_vtrunc"
> +  [(set (match_operand: 0 "register_operand")
> + (if_then_else:
> +   (match_operand: 1 "register_operand")
> +   (truncate:
> + (any_shiftrt:VWEXTI
> +   (match_operand:VWEXTI 2 "register_operand")
> +  (any_extend:VWEXTI
> + (match_operand: 3 "vector_shift_operand"
> +   (match_operand: 4 "register_operand")))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  insn_code icode = code_for_pred_narrow (, mode);
> +  rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
> operands[4],
> +   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
> +  riscv_vector::expand_cond_len_binop (icode, ops);
> +  DONE;
> +}
> + [(set_attr "type" "vnshift")])
> +
> +(define_insn_and_split "*cond_trunc"
> +  [(set (match_operand: 0 "register_operand")
> + (if_then_else:
> +   (match_operand: 1 "register_operand")
> +   (truncate:
> + (any_shiftrt:VWEXTI
> +   (match_operand:VWEXTI 2 "register_operand")
> +  (match_operand: 3 "csr_operand")))
> +   (match_operand: 4 "register_operand")))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  insn_code icode = code_for_pred_narrow_scalar (, 
> mode);
> +  rtx ops[] = {operands[0], operands[1], operands[2], gen_lowpart (Pmode, 
> operands[3]),
> +   operands[4], gen_int_mode (GET_MODE_NUNITS (mode), 
> Pmode)};
> +  riscv_vector::expand_cond_len_binop (icode, ops);
> +  DONE;
> +}
> + [(set_attr "type" "vnshift")])
> +
>  ;; 
> =
>  ;; Combine extend + binop to widen_binop
>  ;; 
> =
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
> new file mode 100644
> index 000..d068110a8a8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_narrow_shift-1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d 
> --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#include 
> +
> +#define DEF_LOOP(TYPE1, TYPE2)   
>   \
> +  void __attribute__ ((noipa))   
>   \
> +  test_##TYPE1##_##TYPE2 (TYPE2 *__restrict r, TYPE2 *__restrict a,  
> \
> +   TYPE1 *__restrict b, int n)   
>  \
> +  {

Re: [PATCH 1/2 v2] Ada: Synchronized private extensions are always limited

2023-09-13 Thread Arnaud Charlet via Gcc-patches
> No worries, and sorry for the trouble. I’m going to try using a different 
> client for the gcc mailing list, it doesn’t seem to like Outlook. Thanks for 
> catching that mistake!
> 
> Please advise how I can get this patch actually applied, given my lack of 
> commit privilege.

You first need to follow instructions from https://gcc.gnu.org/contribute.html
and in particular meet the legal requirements.

Then get someone with write approval to commit the change.

Arno


Re: [PATCH] MATCH: Simplify `(X % Y) < Y` pattern.

2023-09-13 Thread Richard Biener via Gcc-patches
On Wed, Sep 13, 2023 at 12:11 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This merges the two patterns to catch
> `(X % Y) < Y` and `Y > (X % Y)` into one by
> using :c on the comparison operator.
> It does not change any code generation nor
> anything else. It is more to allow for better
> maintainability of this pattern.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * match.pd (`Y > (X % Y)`): Merge
> into ...
> (`(X % Y) < Y`): Pattern by adding `:c`
> on the comparison.
> ---
>  gcc/match.pd | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39c7ea1088f..24fd29863fb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1483,14 +1483,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* X % Y is smaller than Y.  */
>  (for cmp (lt ge)
>   (simplify
> -  (cmp (trunc_mod @0 @1) @1)
> +  (cmp:c (trunc_mod @0 @1) @1)
>(if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> { constant_boolean_node (cmp == LT_EXPR, type); })))
> -(for cmp (gt le)
> - (simplify
> -  (cmp @1 (trunc_mod @0 @1))
> -  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> -   { constant_boolean_node (cmp == GT_EXPR, type); })))
>
>  /* x | ~0 -> ~0  */
>  (simplify
> --
> 2.31.1
>


Re: [PATCH v2 08/11] Native complex ops: Add explicit vector of complex

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 7:26 PM Joseph Myers  wrote:
>
> On Tue, 12 Sep 2023, Sylvain Noiry via Gcc-patches wrote:
>
> > Summary:
> > Allow the creation and usage of builtins vectors of complex
> > in C, using __attribute__ ((vector_size ()))
>
> If you're adding a new language feature like this, you need to update
> extend.texi to explain the valid uses of the attribute for complex types,
> and (under "Vector Extensions") the valid uses of the resulting vectors.
> You also need to add testcases to the testsuite for such vectors - both
> execution tests covering valid uses of the vectors, and tests that invalid
> declarations or uses of such vectors (uses with any operator, or other
> operand to such operator, that aren't valid) are properly rejected - go
> through all cases of operators, with one or two complex vector operands,
> of the same or different types, and with different choices for what type
> the other operand might be when one has complex vector type, and make sure
> they are all properly tested and do have the desired and documented
> semantics.
>
> If the intended semantics are the same for C and C++, the tests should be
> c-c++-common tests.  Any cases where the intended semantics are different
> will need separate tests for each language or appropriately conditional
> test assertions in c-c++-common.

And to add - in other related discussions we always rejected adding vector types
of composite types.  I realize that if the hardware supports vector complex
arithmetic instructions this might be the first true good reason to allow these.

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 2/2] MATCH: Move `X <= MAX(X, Y)` before `MIN (X, C1) < C2` pattern

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 5:41 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Since matching C1 as C2 here will decrease how much other simplifications
> will need to happen to get the final answer.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK

Richard.

> gcc/ChangeLog:
>
> * match.pd (`X <= MAX(X, Y)`):
> Move before `MIN (X, C1) < C2` pattern.
> ---
>  gcc/match.pd | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 36e3da4841b..34b67df784e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3931,13 +3931,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (if (wi::lt_p (wi::to_wide (@1), wi::to_wide (@2),
>   TYPE_SIGN (TREE_TYPE (@0
>  (cmp @0 @2)
> -/* MIN (X, C1) < C2 -> X < C2 || C1 < C2  */
> -(for minmax (min min max max min min max max)
> - cmp(lt  le  gt  ge  gt  ge  lt  le )
> - comb   (bit_ior bit_ior bit_ior bit_ior bit_and bit_and bit_and bit_and)
> - (simplify
> -  (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
> -  (comb (cmp @0 @2) (cmp @1 @2
>
>  /* X <= MAX(X, Y) -> true
> X > MAX(X, Y) -> false
> @@ -3949,6 +3942,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cmp:c @0 (minmax:c @0 @1))
>{ constant_boolean_node (cmp == GE_EXPR || cmp == LE_EXPR, type); } ))
>
> +/* MIN (X, C1) < C2 -> X < C2 || C1 < C2  */
> +(for minmax (min min max max min min max max)
> + cmp(lt  le  gt  ge  gt  ge  lt  le )
> + comb   (bit_ior bit_ior bit_ior bit_ior bit_and bit_and bit_and bit_and)
> + (simplify
> +  (cmp (minmax @0 INTEGER_CST@1) INTEGER_CST@2)
> +  (comb (cmp @0 @2) (cmp @1 @2
> +
>  /* Undo fancy ways of writing max/min or other ?: expressions, like
> a - ((a - b) & -(a < b))  and  a - (a - b) * (a < b) into (a < b) ? b : a.
> People normally use ?: and that is what we actually try to optimize.  */
> --
> 2.31.1
>


Re: [PATCH 1/2] MATCH: [PR111364] Add some more minmax cmp operand simplifications

2023-09-13 Thread Richard Biener via Gcc-patches
On Tue, Sep 12, 2023 at 5:31 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This adds a few more minmax cmp operand simplifications which were missed 
> before.
> `MIN(a,b) < a` -> `a > b`
> `MIN(a,b) >= a` -> `a <= b`
> `MAX(a,b) > a` -> `a < b`
> `MAX(a,b) <= a` -> `a >= b`
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.  I wonder if any of these are also valid for FP types?

> Note gcc.dg/pr96708-negative.c needed to updated to remove the
> check for MIN/MAX as they have been optimized (correctly) away.
>
> PR tree-optimization/111364
>
> gcc/ChangeLog:
>
> * match.pd (`MIN (X, Y) == X`): Extend
> to min/lt, min/ge, max/gt, max/le.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/execute/minmaxcmp-1.c: New test.
> * gcc.dg/tree-ssa/minmaxcmp-2.c: New test.
> * gcc.dg/pr96708-negative.c: Update testcase.
> * gcc.dg/pr96708-positive.c: Add comment about `return 0`.
> ---
>  gcc/match.pd  |  8 +--
>  .../gcc.c-torture/execute/minmaxcmp-1.c   | 51 +++
>  gcc/testsuite/gcc.dg/pr96708-negative.c   |  4 +-
>  gcc/testsuite/gcc.dg/pr96708-positive.c   |  1 +
>  gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c   | 30 +++
>  5 files changed, 89 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 51985c1bad4..36e3da4841b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3902,9 +3902,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(maxmin @0 (bit_not @1
>
>  /* MIN (X, Y) == X -> X <= Y  */
> -(for minmax (min min max max)
> - cmp(eq  ne  eq  ne )
> - out(le  gt  ge  lt )
> +/* MIN (X, Y) < X -> X > Y  */
> +/* MIN (X, Y) >= X -> X <= Y  */
> +(for minmax (min min min min max max max max)
> + cmp(eq  ne  lt  ge  eq  ne  gt  le )
> + out(le  gt  gt  le  ge  lt  lt  ge )
>   (simplify
>(cmp:c (minmax:c @0 @1) @0)
>(if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> diff --git a/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> new file mode 100644
> index 000..6705a053768
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/minmaxcmp-1.c
> @@ -0,0 +1,51 @@
> +#define func(vol, op1, op2)\
> +_Bool op1##_##op2##_##vol (int a, int b)   \
> +{  \
> + vol int x = op_##op1(a, b);   \
> + return op_##op2(x, a);\
> +}
> +
> +#define op_lt(a, b) ((a) < (b))
> +#define op_le(a, b) ((a) <= (b))
> +#define op_eq(a, b) ((a) == (b))
> +#define op_ne(a, b) ((a) != (b))
> +#define op_gt(a, b) ((a) > (b))
> +#define op_ge(a, b) ((a) >= (b))
> +#define op_min(a, b) ((a) < (b) ? (a) : (b))
> +#define op_max(a, b) ((a) > (b) ? (a) : (b))
> +
> +
> +#define funcs(a) \
> + a(min,lt) \
> + a(max,lt) \
> + a(min,gt) \
> + a(max,gt) \
> + a(min,le) \
> + a(max,le) \
> + a(min,ge) \
> + a(max,ge) \
> + a(min,ne) \
> + a(max,ne) \
> + a(min,eq) \
> + a(max,eq)
> +
> +#define funcs1(a,b) \
> +func(,a,b) \
> +func(volatile,a,b)
> +
> +funcs(funcs1)
> +
> +#define test(op1,op2)   \
> +do {\
> +  if (op1##_##op2##_(x,y) != op1##_##op2##_volatile(x,y))   \
> +__builtin_abort();  \
> +} while(0);
> +
> +int main()
> +{
> +  for(int x = -10; x < 10; x++)
> +for(int y = -10; y < 10; y++)
> +{
> +funcs(test)
> +}
> +}
> diff --git a/gcc/testsuite/gcc.dg/pr96708-negative.c 
> b/gcc/testsuite/gcc.dg/pr96708-negative.c
> index 91964d3b971..c9c1aa85558 100644
> --- a/gcc/testsuite/gcc.dg/pr96708-negative.c
> +++ b/gcc/testsuite/gcc.dg/pr96708-negative.c
> @@ -42,7 +42,7 @@ int main()
>  return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" } } */
> -/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" } } */
> +/* Even though test[1-4] originally has MIN/MAX, those can be optimized away
> +   into just comparing a and b arguments. */
>  /* { dg-final { scan-tree-dump-times "return 0;" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-not { "return 1;" } "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/pr96708-positive.c 
> b/gcc/testsuite/gcc.dg/pr96708-positive.c
> index 65af85344b6..12c5fedfd30 100644
> --- a/gcc/testsuite/gcc.dg/pr96708-positive.c
> +++ b/gcc/testsuite/gcc.dg/pr96708-positive.c
> @@ -42,6 +42,7 @@ int main()
>  return 0;
>  }
>
> +/* Note main has one `return 0`. */
>  /* { dg-final { scan-tree-dump-times "return 0;" 3 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
>  /* { dg-final { scan-tree-dump-not { "MAX_EXPR" } "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmaxcmp-2.c
> new file 

RE: [PATCH v1] RISC-V: Bugfix PR111362 for incorrect frm emit

2023-09-13 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, September 13, 2023 2:16 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Bugfix PR111362 for incorrect frm emit

LGTM :)

On Wed, Sep 13, 2023 at 2:07 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> When the mode switching from NONE to CALL, we will restore the
> frm but lack some check if we have static frm insn in cfun.
>
> This patch would like to fix this by adding static frm insn check.
>
> gcc/ChangeLog:
>
> * PR target/111362
> * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Bugfix.
>
> gcc/testsuite/ChangeLog:
>
> * PR target/111362
> * gcc.target/riscv/rvv/base/no-honor-frm-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc|  2 +-
>  .../gcc.target/riscv/rvv/base/no-honor-frm-1.c   | 12 
>  2 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 9d04ddd69e0..762937b0e37 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -9173,7 +9173,7 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
>rtx frm = gen_int_mode (mode, SImode);
>
>if (mode == riscv_vector::FRM_DYN_CALL
> -   && prev_mode != riscv_vector::FRM_DYN)
> +   && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
> /* No need to emit when prev mode is DYN already.  */
> emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
>else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
> new file mode 100644
> index 000..b2e0f217bfa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +void foo (void) {
> +  for (unsigned i = 0; i < sizeof(foo); i++)
> +__builtin_printf("%d", i);
> +}
> +
> +/* { dg-final { scan-assembler-not {fsrmi\s+[axs][0-9]+,\s*[01234]} } } */
> +/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */
> +/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
> +/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */
> --
> 2.34.1
>


Re: [PATCH v1] RISC-V: Bugfix PR111362 for incorrect frm emit

2023-09-13 Thread Kito Cheng via Gcc-patches
LGTM :)

On Wed, Sep 13, 2023 at 2:07 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> When the mode switching from NONE to CALL, we will restore the
> frm but lack some check if we have static frm insn in cfun.
>
> This patch would like to fix this by adding static frm insn check.
>
> gcc/ChangeLog:
>
> * PR target/111362
> * config/riscv/riscv.cc (riscv_emit_frm_mode_set): Bugfix.
>
> gcc/testsuite/ChangeLog:
>
> * PR target/111362
> * gcc.target/riscv/rvv/base/no-honor-frm-1.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv.cc|  2 +-
>  .../gcc.target/riscv/rvv/base/no-honor-frm-1.c   | 12 
>  2 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 9d04ddd69e0..762937b0e37 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -9173,7 +9173,7 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
>rtx frm = gen_int_mode (mode, SImode);
>
>if (mode == riscv_vector::FRM_DYN_CALL
> -   && prev_mode != riscv_vector::FRM_DYN)
> +   && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
> /* No need to emit when prev mode is DYN already.  */
> emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
>else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
> new file mode 100644
> index 000..b2e0f217bfa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +void foo (void) {
> +  for (unsigned i = 0; i < sizeof(foo); i++)
> +__builtin_printf("%d", i);
> +}
> +
> +/* { dg-final { scan-assembler-not {fsrmi\s+[axs][0-9]+,\s*[01234]} } } */
> +/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */
> +/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
> +/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */
> --
> 2.34.1
>


[PATCH v1] RISC-V: Bugfix PR111362 for incorrect frm emit

2023-09-13 Thread Pan Li via Gcc-patches
From: Pan Li 

When the mode switching from NONE to CALL, we will restore the
frm but lack some check if we have static frm insn in cfun.

This patch would like to fix this by adding static frm insn check.

gcc/ChangeLog:

* PR target/111362
* config/riscv/riscv.cc (riscv_emit_frm_mode_set): Bugfix.

gcc/testsuite/ChangeLog:

* PR target/111362
* gcc.target/riscv/rvv/base/no-honor-frm-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc|  2 +-
 .../gcc.target/riscv/rvv/base/no-honor-frm-1.c   | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9d04ddd69e0..762937b0e37 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -9173,7 +9173,7 @@ riscv_emit_frm_mode_set (int mode, int prev_mode)
   rtx frm = gen_int_mode (mode, SImode);
 
   if (mode == riscv_vector::FRM_DYN_CALL
-   && prev_mode != riscv_vector::FRM_DYN)
+   && prev_mode != riscv_vector::FRM_DYN && STATIC_FRM_P (cfun))
/* No need to emit when prev mode is DYN already.  */
emit_insn (gen_fsrmsi_restore_volatile (backup_reg));
   else if (mode == riscv_vector::FRM_DYN_EXIT && STATIC_FRM_P (cfun)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
new file mode 100644
index 000..b2e0f217bfa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/no-honor-frm-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+void foo (void) {
+  for (unsigned i = 0; i < sizeof(foo); i++)
+__builtin_printf("%d", i);
+}
+
+/* { dg-final { scan-assembler-not {fsrmi\s+[axs][0-9]+,\s*[01234]} } } */
+/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */
+/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
+/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */
-- 
2.34.1