git gcc-commit-mklog doesn't extract PR number to ChangeLog

2021-06-09 Thread Xionghu Luo via Gcc-patches
Hi,
I noticed that the "git gcc-commit-mklog" command doesn't extract PR
number from title to ChangeLog automatically, then the committed patch
doesn't update the related bugzilla PR website after check in the patch?
Martin, what's your opinion about this since you are much familar about
this? Thanks.


-- 
BR,
Xionghu


[PATCH] c++: Extend std::is_constant_evaluated in if warning [PR100995]

2021-06-09 Thread Marek Polacek via Gcc-patches
Jakub pointed me at

which shows that our existing warning could be extended to handle more
cases.  This patch implements that.

A minor annoyance was handling macros, in libstdc++ we have

  reference operator[](size_type __pos) {
  __glibcxx_assert(__pos <= size());
  ...
  }

wherein __glibcxx_assert expands to

  if (__builtin_is_constant_evaluated() && !bool(__pos <= size())
...

but I'm of a mind to not warn on that.

Possible tweaks: merge the "always true" warnings and say something
about a manifestly evaluated context, and perhaps add an early exit
for TREE_CONSTANT trees because for constexpr if we are going to call
maybe_warn_for_constant_evaluated twice.

Once consteval if makes it in, we should tweak this warning one more
time.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/100995

gcc/cp/ChangeLog:

* semantics.c (find_std_constant_evaluated_r): New.
(maybe_warn_for_constant_evaluated): New.
(finish_if_stmt_cond): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/is-constant-evaluated9.C: Add dg-warning.
* g++.dg/cpp2a/is-constant-evaluated12.C: New test.
---
 gcc/cp/semantics.c| 85 ---
 .../g++.dg/cpp2a/is-constant-evaluated12.C| 79 +
 .../g++.dg/cpp2a/is-constant-evaluated9.C |  4 +-
 3 files changed, 152 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated12.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index f506a239864..07459a357e2 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -927,6 +927,75 @@ is_std_constant_evaluated_p (tree fn)
   return name && id_equal (name, "is_constant_evaluated");
 }
 
+/* Callback function for maybe_warn_for_constant_evaluated that looks
+   for calls to std::is_constant_evaluated in TP.  */
+
+static tree
+find_std_constant_evaluated_r (tree *tp, int *walk_subtrees, void *)
+{
+  tree t = *tp;
+
+  if (TYPE_P (t) || TREE_CONSTANT (t))
+{
+  *walk_subtrees = false;
+  return NULL_TREE;
+}
+
+  switch (TREE_CODE (t))
+{
+case CALL_EXPR:
+  if (is_std_constant_evaluated_p (t))
+   return t;
+  break;
+case EXPR_STMT:
+  /* Don't warn in statement expressions.  */
+  *walk_subtrees = false;
+  return NULL_TREE;
+default:
+  break;
+}
+
+  return NULL_TREE;
+}
+
+/* In certain contexts, std::is_constant_evaluated() is always true (for
+   instance, in a consteval function or in a constexpr if), or always false
+   (e.g., in a non-constexpr non-consteval function) so give the user a clue.  
*/
+
+static void
+maybe_warn_for_constant_evaluated (tree cond, bool constexpr_if)
+{
+  if (!warn_tautological_compare)
+return;
+
+  /* Suppress warning for std::is_constant_evaluated if the conditional
+ comes from a macro.  */
+  if (from_macro_expansion_at (EXPR_LOCATION (cond)))
+return;
+
+  cond = cp_walk_tree_without_duplicates (, find_std_constant_evaluated_r,
+ NULL);
+  if (cond)
+{
+  if (constexpr_if)
+   warning_at (EXPR_LOCATION (cond), OPT_Wtautological_compare,
+   "% always evaluates to "
+   "true in %");
+  else if (!DECL_DECLARED_CONSTEXPR_P (current_function_decl)
+  /* C++17 lambda op() is implicitly constexpr but finish_function
+ may not have marked it as such.  */
+  && !(cxx_dialect >= cxx17
+   && LAMBDA_TYPE_P (CP_DECL_CONTEXT (current_function_decl
+   warning_at (EXPR_LOCATION (cond), OPT_Wtautological_compare,
+   "% always evaluates to "
+   "false in a non-% function");
+  else if (DECL_IMMEDIATE_FUNCTION_P (current_function_decl))
+   warning_at (EXPR_LOCATION (cond), OPT_Wtautological_compare,
+   "% always evaluates to "
+   "true in a % function");
+}
+}
+
 /* Process the COND of an if-statement, which may be given by
IF_STMT.  */
 
@@ -942,23 +1011,11 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
 converted to bool.  */
   && TYPE_MAIN_VARIANT (TREE_TYPE (cond)) == boolean_type_node)
 {
-  /* if constexpr (std::is_constant_evaluated()) is always true,
-so give the user a clue.  */
-  if (warn_tautological_compare)
-   {
- tree t = cond;
- if (TREE_CODE (t) == CLEANUP_POINT_EXPR)
-   t = TREE_OPERAND (t, 0);
- if (TREE_CODE (t) == CALL_EXPR
- && is_std_constant_evaluated_p (t))
-   warning_at (EXPR_LOCATION (cond), OPT_Wtautological_compare,
-   "%qs always evaluates to true in %",
-   "std::is_constant_evaluated");
-   }
-
+  maybe_warn_for_constant_evaluated (cond, /*constexpr_if=*/true);
 

Re: [PATCH v4 2/2] x86: Add vec_duplicate expander

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 4:39 PM H.J. Lu  wrote:
>
> 1. Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.  This can be used by memset expander to avoid vec_duplicate
> when loading from constant pool is more efficient.
> 2. Add vec_duplicate expander and enable vec_duplicate from a
> non-standard SSE constant integer only if vector broadcast is available.
>
> * config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
> New function.
> * config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
> New prototype.
> * config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
> (vec_duplicate): New expander.
> * doc/md.texi: Update vec_duplicate.
> ---
>  gcc/config/i386/i386-expand.c | 21 +
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/sse.md| 21 +
>  gcc/doc/md.texi   |  2 --
>  4 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 7b62e163ab5..e89f116627a 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -15664,6 +15664,27 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
> rtx vec, int elt)
>  }
>  }
>
> +/* Expand integer vec_duplicate.  Return true if successful.  */
> +
> +bool
> +ix86_expand_integer_vec_duplicate (rtx *operands)
> +{
> +  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
> + if vector broadcast is available.  */
> +  if (CONST_INT_P (operands[1])
> +  && (!TARGET_AVX2
> + || standard_sse_constant_p (operands[1],
> + GET_MODE (operands[0]
> +return false;
> +
> +  if (!ix86_expand_vector_init_duplicate (false,
> + GET_MODE (operands[0]),
> + operands[0], operands[1]))
> +gcc_unreachable ();
> +
> +  return true;
> +}
> +
>  /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC

Small update to use

+  bool ok = ix86_expand_vector_init_duplicate (false,
+GET_MODE (operands[0]),
+operands[0],
+operands[1]);
+  gcc_assert (ok);

-- 
H.J.
From 66e4cc1a22c00afd09abc30220a189de7840e6e6 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jun 2021 14:23:04 -0700
Subject: [PATCH v4 2/2] x86: Add vec_duplicate expander

1. Update vec_duplicate to allow to fail so that backend can only allow
broadcasting an integer constant to a vector when broadcast instruction
is available.  This can be used by memset expander to avoid vec_duplicate
when loading from constant pool is more efficient.
2. Add vec_duplicate expander and enable vec_duplicate from a
non-standard SSE constant integer only if vector broadcast is available.

	* config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
	New function.
	* config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
	New prototype.
	* config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
	(vec_duplicate): New expander.
	* doc/md.texi: Update vec_duplicate.
---
 gcc/config/i386/i386-expand.c | 22 ++
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/sse.md| 21 +
 gcc/doc/md.texi   |  2 --
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 7b62e163ab5..d92a9841023 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15664,6 +15664,28 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
 }
 }
 
+/* Expand integer vec_duplicate.  Return true if successful.  */
+
+bool
+ix86_expand_integer_vec_duplicate (rtx *operands)
+{
+  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
+ if vector broadcast is available.  */
+  if (CONST_INT_P (operands[1])
+  && (!TARGET_AVX2
+	  || standard_sse_constant_p (operands[1],
+  GET_MODE (operands[0]
+return false;
+
+  bool ok = ix86_expand_vector_init_duplicate (false,
+	   GET_MODE (operands[0]),
+	   operands[0],
+	   operands[1]);
+  gcc_assert (ok);
+
+  return true;
+}
+
 /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
The upper bits of DEST are undefined, though they shouldn't cause
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6050242085f..8b740078344 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -260,6 +260,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void 

[PATCH v4 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-09 Thread H.J. Lu via Gcc-patches
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
operands to vector broadcast from an integer with AVX2.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory  : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   textdata bss dec hex filename
132   0   0 132  84 memory.o
122   0   0 122  7a broadcast.o
$

3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
$

gcc/

PR target/100865
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
New prototype.
(ix86_byte_broadcast): New function.
(ix86_convert_const_wide_int_to_broadcast): Likewise.
(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
size is 16 bytes or bigger.
(ix86_broadcast_from_integer_constant): New function.
(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
to broadcast if mode size is 16 bytes or bigger.
* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
prototype.
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.

gcc/testsuite/

PR target/100865
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
broadcast.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f_cond_move.c: Also pass
-mprefer-vector-width=512 and expect integer broadcast.
* gcc.target/i386/pr100865-1.c: New test.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11a.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-12a.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.
---
 gcc/config/i386/i386-expand.c | 187 --
 gcc/config/i386/i386-protos.h |   2 +
 gcc/config/i386/i386.c|  13 ++
 .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
 .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 +++
 gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
 gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c|  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c|  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 

[PATCH v4 0/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-09 Thread H.J. Lu via Gcc-patches
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
operands to vector broadcast from an integer with AVX2.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

shows that broadcast is a little bit faster on Intel Core i7-8559U:

$ make
gcc -g -I. -O2   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory  : 147215
broadcast   : 121213
vec_dup_sse2: 171366
$

broadcast is also smaller:

$ size memory.o broadcast.o
   textdata bss dec hex filename
132   0   0 132  84 memory.o
122   0   0 122  7a broadcast.o
$

3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
$

5. Update vec_duplicate to allow to fail so that backend can only allow
broadcasting an integer constant to a vector when broadcast instruction
is available.  This can be used by memset expander to avoid vec_duplicate
when loading from constant pool is more efficient.
6. Add vec_duplicate expander and enable vec_duplicate from a
non-standard SSE constant integer only if vector broadcast is available.

H.J. Lu (2):
  x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
  x86: Add vec_duplicate expander

 gcc/config/i386/i386-expand.c | 208 +-
 gcc/config/i386/i386-protos.h |   3 +
 gcc/config/i386/i386.c|  13 ++
 gcc/config/i386/sse.md|  21 ++
 gcc/doc/md.texi   |   2 -
 .../i386/avx512f-broadcast-pr87767-1.c|   7 +-
 .../i386/avx512f-broadcast-pr87767-5.c|   5 +-
 .../gcc.target/i386/avx512f_cond_move.c   |   4 +-
 .../i386/avx512vl-broadcast-pr87767-1.c   |  12 +-
 .../i386/avx512vl-broadcast-pr87767-5.c   |   9 +-
 gcc/testsuite/gcc.target/i386/pr100865-1.c|  13 ++
 gcc/testsuite/gcc.target/i386/pr100865-10a.c  |  33 +++
 gcc/testsuite/gcc.target/i386/pr100865-10b.c  |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-11a.c  |  23 ++
 gcc/testsuite/gcc.target/i386/pr100865-11b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-12a.c  |  20 ++
 gcc/testsuite/gcc.target/i386/pr100865-12b.c  |   8 +
 gcc/testsuite/gcc.target/i386/pr100865-2.c|  14 ++
 gcc/testsuite/gcc.target/i386/pr100865-3.c|  15 ++
 gcc/testsuite/gcc.target/i386/pr100865-4a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-4b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-5a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-5b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-6a.c   |  16 ++
 gcc/testsuite/gcc.target/i386/pr100865-6b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-7a.c   |  17 ++
 gcc/testsuite/gcc.target/i386/pr100865-7b.c   |   9 +
 gcc/testsuite/gcc.target/i386/pr100865-8a.c   |  24 ++
 gcc/testsuite/gcc.target/i386/pr100865-8b.c   |   7 +
 gcc/testsuite/gcc.target/i386/pr100865-9a.c   |  25 +++
 gcc/testsuite/gcc.target/i386/pr100865-9b.c   |   7 +
 31 files changed, 563 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-10b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-11b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-12b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100865-8a.c
 create mode 100644 

[PATCH v4 2/2] x86: Add vec_duplicate expander

2021-06-09 Thread H.J. Lu via Gcc-patches
1. Update vec_duplicate to allow to fail so that backend can only allow
broadcasting an integer constant to a vector when broadcast instruction
is available.  This can be used by memset expander to avoid vec_duplicate
when loading from constant pool is more efficient.
2. Add vec_duplicate expander and enable vec_duplicate from a
non-standard SSE constant integer only if vector broadcast is available.

* config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
New function.
* config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
New prototype.
* config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
(vec_duplicate): New expander.
* doc/md.texi: Update vec_duplicate.
---
 gcc/config/i386/i386-expand.c | 21 +
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/sse.md| 21 +
 gcc/doc/md.texi   |  2 --
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 7b62e163ab5..e89f116627a 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15664,6 +15664,27 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
 }
 }
 
+/* Expand integer vec_duplicate.  Return true if successful.  */
+
+bool
+ix86_expand_integer_vec_duplicate (rtx *operands)
+{
+  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
+ if vector broadcast is available.  */
+  if (CONST_INT_P (operands[1])
+  && (!TARGET_AVX2
+ || standard_sse_constant_p (operands[1],
+ GET_MODE (operands[0]
+return false;
+
+  if (!ix86_expand_vector_init_duplicate (false,
+ GET_MODE (operands[0]),
+ operands[0], operands[1]))
+gcc_unreachable ();
+
+  return true;
+}
+
 /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
The upper bits of DEST are undefined, though they shouldn't cause
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6050242085f..8b740078344 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -260,6 +260,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
+extern bool ix86_expand_integer_vec_duplicate (rtx *);
 
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2a34756be2a..f094e5a2586 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -24570,3 +24570,24 @@
   "TARGET_WIDEKL"
   "aes\t{%0}"
   [(set_attr "type" "other")])
+
+;; Modes handled by broadcast patterns.
+(define_mode_iterator INT_BROADCAST_MODE
+  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_64BIT")
+   (V4DI "TARGET_AVX && TARGET_64BIT") (V2DI "TARGET_64BIT")])
+
+;; Broadcast from an integer.  NB: Enable broadcast only if we can move
+;; from GPR to SSE register directly.
+(define_expand "vec_duplicate"
+  [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand")
+   (vec_duplicate:INT_BROADCAST_MODE
+ (match_operand: 1 "general_operand")))]
+  "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC"
+{
+  if (!ix86_expand_integer_vec_duplicate (operands))
+FAIL;
+  DONE;
+})
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..e66c41c4779 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5077,8 +5077,6 @@ the mode appropriate for one element of @var{m}.
 This pattern only handles duplicates of non-constant inputs.  Constant
 vectors go through the @code{mov@var{m}} pattern instead.
 
-This pattern is not allowed to @code{FAIL}.
-
 @cindex @code{vec_series@var{m}} instruction pattern
 @item @samp{vec_series@var{m}}
 Initialize vector output operand 0 so that element @var{i} is equal to
-- 
2.31.1



Re: [PATCH v3 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 12:44 AM Uros Bizjak  wrote:
>
> [For some reason this message didn't reach my gmail account]
>
> > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
> > operands to vector broadcast from an integer with AVX2.
> > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> > won't increase stack alignment requirement and blocks transformation by
> > the combine pass.
> > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> > from memory.
> > 4. Update avx512f_cond_move.c to expect integer broadcast.
>
> +  else if (TARGET_64BIT
> +   && ix86_broadcast (val, GET_MODE_BITSIZE (DImode),
> +  val_broadcast))
> +{
> +  /* NB: MOVQ takes a 32-bit signed immediate operand.  */
> +  if (trunc_int_for_mode (val_broadcast, SImode) != val_broadcast)
> + return nullptr;
> +  broadcast_mode = DImode;
> +}
> +  else
> +return nullptr;
>
> We have MOVABS insn and movdi_internal knows when to switch between
> MOVQ and MOVABS.

Fixed in the v4 patch.

> +  if (!ix86_expand_vector_init_duplicate (false, vector_mode, target,
> +  GEN_INT (val_broadcast)))
> +gcc_unreachable ();
>
> We are using:
>
> bool ok = ix86_expand_vector_init_duplicate (...);
> gcc_assert (ok);
>
> idiom throughout i386/. Let's keep it this way.

Fixed in the v4 patch.

> +  if (REGNO (target) < FIRST_PSEUDO_REGISTER)
> +target = gen_rtx_REG (mode, REGNO (target));
> +  else
> +target = convert_to_mode (mode, target, 1);
> +
>
> This is not needed. lowpart_subreg should do the trick when changing
> mode of hard regs (also see comment for ix86_gen_scratch_sse_rtx).

Fixed in the v4 patch.

> +  rtx first;
> +
> +  if (can_create_pseudo_p ()
> +  && GET_MODE_SIZE (mode) >= 16
> +  && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +  && (MEM_P (op1)
> +  && SYMBOL_REF_P (XEXP (op1, 0))
> +  && CONSTANT_POOL_ADDRESS_P (XEXP (op1, 0)))
> +  && (first = ix86_broadcast_from_integer_constant (mode, op1)))
> +{
> +  /* Broadcast to XMM/YMM/ZMM register from an integer constant.  */
> +  op1 = ix86_gen_scratch_sse_rtx (mode, false);
> +  if (!ix86_expand_vector_init_duplicate (false, mode, op1, first))
> + gcc_unreachable ();
> +  emit_move_insn (op0, op1);
> +  return;
>
> Please try to avoid assignment inside the condition. And also use
> "gcc_assert (ok)" here.

Fixed in the v4 patch.

> +/* Return a scratch register in MODE for vector load and store.  If
> +   CONSTANT_INT_BROADCAST is true, it is used to hold constant integer
> +   broadcast result.  */
> +
> +rtx
> +ix86_gen_scratch_sse_rtx (machine_mode mode,
> +  bool constant_int_broadcast)
>
> This function should always return hard reg, simply:
>
> return gen_rtx_REG (mode, (TARGET_64BIT
>   ? LAST_REX_SSE_REG : LAST_SSE_REG));
>
> The complications with pseudo does not bring us anything (at the end
> we need a hard reg anyway, and I guess reload knows quite well how to
> avoid used temporary).

Fixed in the v4 patch.

> The function can then be renamed to ix86_gen_scratch_sse_reg.

I'd like to keep the ix86_gen_scratch_sse_rtx name:

rtx
ix86_gen_scratch_sse_rtx (machine_mode mode)
{
  if (TARGET_SSE)
return gen_rtx_REG (mode, (TARGET_64BIT
   ? LAST_REX_SSE_REG
   : LAST_SSE_REG));
  else
return gen_reg_rtx (mode);
}

so that it can be used to implement the target hook for memset later.

> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
> broadcast.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512f_cond_move.c: Also pass
> -mprefer-vector-width=512 and expect integer broadcast.
>
> No review for the above changes for AVX512 tests, someone else should
> check if the new code is better here.

A small benchmark:

https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

shows that integer broadcast is faster than embedded memory broadcast:

$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
$

Thanks.

-- 
H.J.


Re: [PATCH v3 2/2] x86: Add vec_duplicate expander

2021-06-09 Thread H.J. Lu via Gcc-patches
On Tue, Jun 8, 2021 at 11:41 PM Uros Bizjak  wrote:
>
> On Tue, Jun 8, 2021 at 7:59 PM H.J. Lu  wrote:
> >
> > 1. Update vec_duplicate to allow to fail so that backend can only allow
> > broadcasting an integer constant to a vector when broadcast instruction
> > is available.  This can be used by memset expander to avoid vec_duplicate
> > when loading from constant pool is more efficient.
> > 2. Add vec_duplicate expander and enable vec_duplicate from a
> > non-standard SSE constant integer only if vector broadcast is available.
> >
> > * config/i386/i386-expand.c (ix86_expand_integer_vec_duplicate):
> > New function.
> > * config/i386/i386-protos.h (ix86_expand_integer_vec_duplicat):
> > New prototype.
> > * config/i386/sse.md (INT_BROADCAST_MODE): New mode iterator.
> > (vec_duplicate): New expander.
> > * doc/md.texi: Update vec_duplicate.
> > ---
> >  gcc/config/i386/i386-expand.c | 21 +
> >  gcc/config/i386/i386-protos.h |  1 +
> >  gcc/config/i386/sse.md| 20 
> >  gcc/doc/md.texi   |  2 --
> >  4 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 29d96805d9d..145e028353c 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -15669,6 +15669,27 @@ ix86_expand_vector_extract (bool mmx_ok, rtx 
> > target, rtx vec, int elt)
> >  }
> >  }
> >
> > +/* Expand integer vec_duplicate.  Return true if successful.  */
> > +
> > +bool
> > +ix86_expand_integer_vec_duplicate (rtx *operands)
> > +{
> > +  /* Enable VEC_DUPLICATE from a non-standard SSE constant integer only
> > + if vector broadcast is available.  */
> > +  if (CONST_INT_P (operands[1])
> > +  && (!TARGET_AVX2
> > + || standard_sse_constant_p (operands[1],
> > + GET_MODE (operands[0]
> > +return false;
> > +
> > +  if (!ix86_expand_vector_init_duplicate (false,
> > + GET_MODE (operands[0]),
> > + operands[0], operands[1]))
> > +gcc_unreachable ();
> > +
> > +  return true;
> > +}
> > +
> >  /* Generate code to copy vector bits i / 2 ... i - 1 from vector SRC
> > to bits 0 ... i / 2 - 1 of vector DEST, which has the same mode.
> > The upper bits of DEST are undefined, though they shouldn't cause
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 578750a2532..dc191dc18ec 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -260,6 +260,7 @@ extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, 
> > bool, bool);
> >  extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
> >  extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
> >  extern void ix86_expand_sse2_abs (rtx, rtx);
> > +extern bool ix86_expand_integer_vec_duplicate (rtx *);
> >
> >  /* In i386-c.c  */
> >  extern void ix86_target_macros (void);
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 2a34756be2a..a227295cc1d 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -24570,3 +24570,23 @@
> >"TARGET_WIDEKL"
> >"aes\t{%0}"
> >[(set_attr "type" "other")])
> > +
> > +;; Modes handled by broadcast patterns.
> > +(define_mode_iterator INT_BROADCAST_MODE
> > +  [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
> > +   (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI
> > +   (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI
> > +   (V8DI "TARGET_AVX512F") (V4DI "TARGET_64BIT") V2DI])
>
> In ix86_convert_const_wide_int_to_broadcast, there is:
>
> +  else if (TARGET_64BIT
> +   && ix86_broadcast (val, GET_MODE_BITSIZE (DImode),
> +  val_broadcast))
>
> So, shouldn't all modes with DImode inner mode have TARGET_64BIT predicate?

Fixed in the v4 patch.

Thanks.

> Uros.
>
> > +;; Broadcast from an integer.  NB: Enable broadcast only if we can move
> > +;; from GPR to SSE register directly.
> > +(define_expand "vec_duplicate"
> > +  [(set (match_operand:INT_BROADCAST_MODE 0 "register_operand")
> > +   (vec_duplicate:INT_BROADCAST_MODE
> > + (match_operand: 1 "general_operand")))]
> > +  "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC"
> > +{
> > +  if (!ix86_expand_integer_vec_duplicate (operands))
> > +FAIL;
> > +  DONE;
> > +})
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 00caf3844cc..e66c41c4779 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5077,8 +5077,6 @@ the mode appropriate for one element of @var{m}.
> >  This pattern only handles duplicates of non-constant inputs.  Constant
> >  vectors go through the @code{mov@var{m}} pattern instead.
> >
> > -This pattern is not allowed to @code{FAIL}.
> > -
> >  @cindex @code{vec_series@var{m}} instruction pattern
> >  @item @samp{vec_series@var{m}}
> >  

Re: [PATCH] rs6000: Support more short/char to float conversion

2021-06-09 Thread Segher Boessenkool
Hi!

On Fri, May 07, 2021 at 10:30:38AM +0800, Kewen.Lin wrote:
> For some cases that when we load unsigned char/short values from
> the appropriate unsigned char/short memories and convert them to
> double/single precision floating point value, there would be
> implicit conversions to int first.  It makes GCC not leverage the
> P9 instructions lxsibzx/lxsihzx.  This patch is to add the related
> define_insn_and_split to support this kind of scenario.

> +/* { dg-final { scan-assembler "lxsibzx"  } } */
> +/* { dg-final { scan-assembler "lxsihzx"  } } */
> +/* { dg-final { scan-assembler "vextsb2d" } } */
> +/* { dg-final { scan-assembler "vextsh2d" } } */

On my unpatched compiler all these already work, but you say they don't?

For the first two I get
lxsibzx 33,0,3
vextsb2d 0,1
xscvsxddp 0,32
fadd 1,0,1
blr
and
lbz 9,0(3)
mtvsrwa 0,9
fcfid 0,0
fadd 1,0,1
blr
is that different for you?

In either case, use \m and \M please.

> +/* { dg-final { scan-assembler-not "mfvsrd"   } } */
> +/* { dg-final { scan-assembler-not "mfvsrwz"  } } */
> +/* { dg-final { scan-assembler-not "mtvsrd"   } } */
> +/* { dg-final { scan-assembler-not "mtvsrwa"  } } */
> +/* { dg-final { scan-assembler-not "mtvsrwz"  } } */

Here as well, or you could just do

/* { dg-final { scan-assembler-not "\mm[tf]vsr"  } } */

in this case, since no VSR<->GPR moves should happen at all.


Segher


Re: [r12-1330 Regression] FAIL: libgomp.fortran/pr100981-2.f90 -Os execution test on Linux/x86_64

2021-06-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 09, 2021 at 12:08:39PM -0700, sunil.k.pandey via Gcc-patches wrote:
> On Linux/x86_64,
> 
> 374f93da97fb0378453d503f3cfea4d7a923a89c is the first bad commit
> commit 374f93da97fb0378453d503f3cfea4d7a923a89c
> Author: Richard Biener 
> Date:   Wed Jun 9 14:48:35 2021 +0200
> 
> tree-optimization/100981 - fix SLP patterns involving reductions
> 
> caused
> 
> FAIL: libgomp.fortran/pr100981-2.f90   -O0  execution test
> FAIL: libgomp.fortran/pr100981-2.f90   -O1  execution test
> FAIL: libgomp.fortran/pr100981-2.f90   -O2  execution test
> FAIL: libgomp.fortran/pr100981-2.f90   -O3 -g  execution test
> FAIL: libgomp.fortran/pr100981-2.f90   -Os  execution test

Aren't the dsdotr and dsdoti variables uninitialized?
Shouldn't they be initialized to zero?

Following fixes it for me on i686-linux, but I haven't checked if it
fails with the patch reverted.

2021-06-10  Jakub Jelinek  

* testsuite/libgomp.fortran/pr100981-2.f90 (cdcdot): Initialize
dsdotr and dsdoti to 0.

--- libgomp/testsuite/libgomp.fortran/pr100981-2.f90.jj 2021-06-09 
22:51:44.548834216 +0200
+++ libgomp/testsuite/libgomp.fortran/pr100981-2.f902021-06-10 
01:08:18.056464950 +0200
@@ -9,6 +9,8 @@ complex function cdcdot(n, cx)
   double precision :: dsdotr, dsdoti, dt1, dt3
 
   kx = 1
+  dsdotr = 0
+  dsdoti = 0
   do i = 1, n
  dt1 = real(cx(kx))
  dt3 = aimag(cx(kx))


Jakub



[committed] analyzer: make various region_model member functions const

2021-06-09 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 53cb324cb4f9475d4eabcd9f5a858c5edaacc0cf.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model::get_lvalue_1): Make const.
(region_model::get_lvalue): Likewise.
(region_model::get_rvalue_1): Likewise.
(region_model::get_rvalue): Likewise.
(region_model::deref_rvalue): Likewise.
(region_model::get_rvalue_for_bits): Likewise.
* region-model.h (region_model::get_lvalue): Likewise.
(region_model::get_rvalue): Likewise.
(region_model::deref_rvalue): Likewise.
(region_model::get_rvalue_for_bits): Likewise.
(region_model::get_lvalue_1): Likewise.
(region_model::get_rvalue_1): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc | 16 
 gcc/analyzer/region-model.h  | 16 
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 0d363fb15d3..551ee796b11 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1213,7 +1213,7 @@ region_model::handle_phi (const gphi *phi,
emitting any diagnostics to CTXT.  */
 
 const region *
-region_model::get_lvalue_1 (path_var pv, region_model_context *ctxt)
+region_model::get_lvalue_1 (path_var pv, region_model_context *ctxt) const
 {
   tree expr = pv.m_tree;
 
@@ -1312,7 +1312,7 @@ assert_compat_types (tree src_type, tree dst_type)
emitting any diagnostics to CTXT.  */
 
 const region *
-region_model::get_lvalue (path_var pv, region_model_context *ctxt)
+region_model::get_lvalue (path_var pv, region_model_context *ctxt) const
 {
   if (pv.m_tree == NULL_TREE)
 return NULL;
@@ -1326,7 +1326,7 @@ region_model::get_lvalue (path_var pv, 
region_model_context *ctxt)
recent stack frame if it's a local).  */
 
 const region *
-region_model::get_lvalue (tree expr, region_model_context *ctxt)
+region_model::get_lvalue (tree expr, region_model_context *ctxt) const
 {
   return get_lvalue (path_var (expr, get_stack_depth () - 1), ctxt);
 }
@@ -1337,7 +1337,7 @@ region_model::get_lvalue (tree expr, region_model_context 
*ctxt)
emitting any diagnostics to CTXT.  */
 
 const svalue *
-region_model::get_rvalue_1 (path_var pv, region_model_context *ctxt)
+region_model::get_rvalue_1 (path_var pv, region_model_context *ctxt) const
 {
   gcc_assert (pv.m_tree);
 
@@ -1441,7 +1441,7 @@ region_model::get_rvalue_1 (path_var pv, 
region_model_context *ctxt)
emitting any diagnostics to CTXT.  */
 
 const svalue *
-region_model::get_rvalue (path_var pv, region_model_context *ctxt)
+region_model::get_rvalue (path_var pv, region_model_context *ctxt) const
 {
   if (pv.m_tree == NULL_TREE)
 return NULL;
@@ -1457,7 +1457,7 @@ region_model::get_rvalue (path_var pv, 
region_model_context *ctxt)
recent stack frame if it's a local).  */
 
 const svalue *
-region_model::get_rvalue (tree expr, region_model_context *ctxt)
+region_model::get_rvalue (tree expr, region_model_context *ctxt) const
 {
   return get_rvalue (path_var (expr, get_stack_depth () - 1), ctxt);
 }
@@ -1624,7 +1624,7 @@ region_model::region_exists_p (const region *reg) const
 
 const region *
 region_model::deref_rvalue (const svalue *ptr_sval, tree ptr_tree,
-   region_model_context *ctxt)
+   region_model_context *ctxt) const
 {
   gcc_assert (ptr_sval);
   gcc_assert (POINTER_TYPE_P (ptr_sval->get_type ()));
@@ -1705,7 +1705,7 @@ region_model::deref_rvalue (const svalue *ptr_sval, tree 
ptr_tree,
 const svalue *
 region_model::get_rvalue_for_bits (tree type,
   const region *reg,
-  const bit_range )
+  const bit_range ) const
 {
   const svalue *sval = get_store_value (reg);
   if (const compound_svalue *compound_sval = sval->dyn_cast_compound_svalue ())
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 5e43e547199..e251a5b245c 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -501,17 +501,17 @@ class region_model
   int get_stack_depth () const;
   const frame_region *get_frame_at_index (int index) const;
 
-  const region *get_lvalue (path_var pv, region_model_context *ctxt);
-  const region *get_lvalue (tree expr, region_model_context *ctxt);
-  const svalue *get_rvalue (path_var pv, region_model_context *ctxt);
-  const svalue *get_rvalue (tree expr, region_model_context *ctxt);
+  const region *get_lvalue (path_var pv, region_model_context *ctxt) const;
+  const region *get_lvalue (tree expr, region_model_context *ctxt) const;
+  const svalue *get_rvalue (path_var pv, region_model_context *ctxt) const;
+  const svalue *get_rvalue (tree expr, region_model_context *ctxt) const;
 
   const region *deref_rvalue (const svalue *ptr_sval, tree ptr_tree,
-  

[PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-06-09 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

we should be able to simplify the length of a substring with known
constant bounds.  The attached patch adds this.

Regtested on x86_64-pc-linux-gnu.

OK for mainline?  Since this should be rather safe, to at least 11-branch?

Thanks,
Harald


Fortran - simplify length of substring with constant bounds

gcc/fortran/ChangeLog:

PR fortran/100950
* simplify.c (substring_has_constant_len): New.
(gfc_simplify_len): Handle case of substrings with constant
bounds.

gcc/testsuite/ChangeLog:

PR fortran/100950
* gfortran.dg/pr100950.f90: New test.

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index c27b47aa98f..016ec259518 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -4512,6 +4512,60 @@ gfc_simplify_leadz (gfc_expr *e)
 }


+/* Check for constant length of a substring.  */
+
+static bool
+substring_has_constant_len (gfc_expr *e)
+{
+  ptrdiff_t istart, iend;
+  size_t length;
+  bool equal_length = false;
+
+  if (e->ts.type != BT_CHARACTER
+  || !(e->ref && e->ref->type == REF_SUBSTRING)
+  || !e->ref->u.ss.start
+  || e->ref->u.ss.start->expr_type != EXPR_CONSTANT
+  || !e->ref->u.ss.end
+  || e->ref->u.ss.end->expr_type != EXPR_CONSTANT
+  || !e->ref->u.ss.length
+  || !e->ref->u.ss.length->length
+  || e->ref->u.ss.length->length->expr_type != EXPR_CONSTANT)
+return false;
+
+  /* Basic checks on substring starting and ending indices.  */
+  if (!gfc_resolve_substring (e->ref, _length))
+return false;
+
+  istart = gfc_mpz_get_hwi (e->ref->u.ss.start->value.integer);
+  iend = gfc_mpz_get_hwi (e->ref->u.ss.end->value.integer);
+  length = gfc_mpz_get_hwi (e->ref->u.ss.length->length->value.integer);
+
+  if (istart <= iend)
+{
+  if (istart < 1)
+	{
+	  gfc_error ("Substring start index (%ld) at %L below 1",
+		 (long) istart, >ref->u.ss.start->where);
+	  return false;
+	}
+  if (iend > (ssize_t) length)
+	{
+	  gfc_error ("Substring end index (%ld) at %L exceeds string "
+		 "length", (long) iend, >ref->u.ss.end->where);
+	  return false;
+	}
+  length = iend - istart + 1;
+}
+  else
+length = 0;
+
+  /* Fix substring length.  */
+  e->value.character.length = length;
+
+  return true;
+}
+
+
 gfc_expr *
 gfc_simplify_len (gfc_expr *e, gfc_expr *kind)
 {
@@ -4547,6 +4601,13 @@ gfc_simplify_len (gfc_expr *e, gfc_expr *kind)
of the unlimited polymorphic entity.  To get the _len component the last
_data ref needs to be stripped and a ref to the _len component added.  */
 return gfc_get_len_component (e->symtree->n.sym->assoc->target, k);
+  else if (substring_has_constant_len (e))
+{
+  result = gfc_get_constant_expr (BT_INTEGER, k, >where);
+  mpz_set_si (result->value.integer,
+		  e->value.character.length);
+  return range_check (result, "LEN");
+}
   else
 return NULL;
 }
diff --git a/gcc/testsuite/gfortran.dg/pr100950.f90 b/gcc/testsuite/gfortran.dg/pr100950.f90
new file mode 100644
index 000..f06db45b0b4
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr100950.f90
@@ -0,0 +1,18 @@
+! { dg-do run }
+! PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514
+
+program p
+  character(8), parameter :: u = "123"
+  character(8):: x = "", s
+  character(2):: w(2) = [character(len(x(3:4))) :: 'a','b' ]
+  character(*), parameter :: y(*) = [character(len(u(3:4))) :: 'a','b' ]
+  character(*), parameter :: z(*) = [character(len(x(3:4))) :: 'a','b' ]
+  if (len (y) /= 2) stop 1
+  if (len (z) /= 2) stop 2
+  if (any (w /= y)) stop 3
+  if (len ([character(len(u(3:4))) :: 'a','b' ]) /= 2)  stop 4
+  if (len ([character(len(x(3:4))) :: 'a','b' ]) /= 2)  stop 5
+  if (any ([character(len(x(3:4))) :: 'a','b' ]  /= y)) stop 6
+  write(s,*) [character(len(x(3:4))) :: 'a','b' ]
+  if (s /= " a b") stop 7
+end


Re: [PATCH] rs6000: Add new __builtin_vsx_build_pair and __builtin_mma_build_acc built-ins

2021-06-09 Thread Segher Boessenkool
Hi!

On Wed, Jun 09, 2021 at 04:05:37PM -0500, Peter Bergner wrote:
> The __builtin_vsx_assemble_pair and __builtin_mma_assemble_acc built-ins
> currently assign their first source operand to the first VSX register
> in a pair/quad, their second operand to the second register in a pair/quad, 
> etc.
> This is not endian friendly and forces the user to generate different calls
> depending on endianness.  In agreement with the POWER LLVM team, we've
> decided to lightly deprecate the assemble built-ins and replace them with
> "build" built-ins that automatically handle endianness so the same built-in
> call and be used for both little-endian and big-endian compiles.  We are not
> removing the assemble built-ins, since there is code in the wild that use
> them, but we are removing their documentation to encourage the use of the
> new "build" variants.

It is better if you *do* document the old names, but say "use the new
stuff", I think?  Or is there so little material with the old names
out there that no one will notice?

> +  /* The ASSEMBLE builtin source operands are reversed in little-endian
> +  mode, so reorder them.  */
> +  if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
> + pat = GEN_FCN (icode) (op[0], op[2], op[1]);
> +  else
> + pat = GEN_FCN (icode) (op[0], op[1], op[2]);

I think this reads simpler as
  /* The ASSEMBLE builtin source operands are reversed in little-endian
 mode, so reorder them.  */
  if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
std::swap (op[1], op[2]);
  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
do you agree?

> +  /* The ASSEMBLE builtin source operands are reversed in little-endian
> +  mode, so reorder them.  */
> +  if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
> + pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]);
> +  else
> + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);

And
  /* The ASSEMBLE builtin source operands are reversed in little-endian
 mode, so reorder them.  */
  if (fcode == MMA_BUILTIN_AS> +SEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
{
  std::swap (op[1], [op[4]);
  std::swap (op[2], [op[3]);
}
  pat = GEN_FCN (icode) (op> +[0], op[1], op[2], op[3], op[4]);
for that then of course.

> @@ -14151,7 +14161,8 @@ mma_init_builtins (void)
> if (gimple_func && mode == XOmode)
>   op[nopnds++] = build_pointer_type (vector_quad_type_node);
> else if (gimple_func && mode == OOmode

Pleae write the
   && mode == OOmode
on a new line as well then?

> +   int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i;

Space before colon.

Okay for trunk and 11 with at least that space fixed.  Thanks!


Segher


Re: [patch] Reuse non-gimple_reg variable for inlining

2021-06-09 Thread Eric Botcazou
> I'm afraid the inliner would need to prove the to be inlined callee doesn't
> modify its own copy of the variable too, because if it modifies it (at least
> in C/C++ const can be cast away), then this introduces wrong-code, see
> PR100994 for details.

Then please remove the TREE_READONLY marker in C/C++ if this is a lie.

-- 
Eric Botcazou




Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 2:03 PM Jeff Law  wrote:
>
>
>
> On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote:
>
> On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
>  wrote:
>
> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
>  wrote:
>
> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu  wrote:
>
> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
>  wrote:
>
> "H.J. Lu"  writes:
>
> Update vec_duplicate to allow to fail so that backend can only allow
> broadcasting an integer constant to a vector when broadcast instruction
> is available.
>
> I'm not sure why we need this to fail though.  Once the optab is defined
> for target X, the optab should handle all duplicates for target X,
> even if there are different strategies it can use.
>
> AIUI the case you want to make conditional is the constant case.
> I guess the first question is: why don't we simplify those CONSTRUCTORs
> to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> as a constructor here.
>
> The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
>
> If we can't rely on that happening, then would it work to change:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && ((icode = optab_handler (vec_duplicate_optab, mode))
> != CODE_FOR_nothing)
> && (elt = uniform_vector_p (exp)))
>
> to something like:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && (elt = uniform_vector_p (exp)))
>   {
> if (TREE_CODE (elt) == INTEGER_CST
> || TREE_CODE (elt) == POLY_INT_CST
> || TREE_CODE (elt) == REAL_CST
> || TREE_CODE (elt) == FIXED_CST)
>   {
> rtx src = gen_const_vec_duplicate (mode, expand_normal 
> (node));
> emit_move_insn (target, src);
> break;
>   }
> …
>   }
>
> I will give it a try.
>
> I can confirm that veclower leaves us with an unfolded constant CTOR.
> If you file a PR to remind me I'll fix that.
>
> The attached untested patch fixes this for the testcase.
>
> Here is the patch + the testcase.
>
>
> 0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch
>
> From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Mon, 7 Jun 2021 20:08:13 +0200
> Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
>  lowering
>
> When vector lowering creates piecewise ops make sure to create
> VECTOR_CSTs instead of CONSTRUCTORs when possible.
>
> gcc/
>
> 2021-06-07  Richard Biener  
>
> PR middle-end/100951
> * tree-vect-generic.c (): Build a VECTOR_CST if all
> elements are constant.
>
> gcc/testsuite/
>
> 2021-06-07  H.J. Lu  
>
> PR middle-end/100951
> * gcc.target/i386/pr100951.c: New test.
>
> Assuming this passed testing it is OK.
> jeff

Richard has committed:

commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
Author: Richard Biener 
Date:   Mon Jun 7 20:08:13 2021 +0200

middle-end/100951 - make sure to generate VECTOR_CST in lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.


-- 
H.J.


Re: [PATCH 1/2] c-family: Copy DECL_USER_ALIGN even if DECL_ALIGN is similar.

2021-06-09 Thread Jason Merrill via Gcc-patches

On 6/9/21 4:47 AM, Robin Dapp wrote:
As you say, the logic is convoluted.  Let's simplify it rather than 
make

it more convoluted.  One possibility would be to change || to | toavoid
the shortcut, and then

bool note = lastalign > curalign;
if (note)
 curalign = lastalign;


I went with your suggestion in the attached v2.  Regtested and
bootstrapped on s390x, x86 and ppc64le.


OK.


urg, I did not commit yet because I fiddled around with my tests again. 
Using dg-notes checks (and thus being "forced" to handle all cases 
individually) I realized that the above is not enough as

it reintroduces the same problem that I was originally trying to avoid:

We want to warn if lastalign == curalign but DECL_USER_ALIGN (last_decl) 
!= DECL_USER_ALIGN (decl), i.e. when the user explicitly specified 
__attribute__ ((aligned (8))) on s390 (see example in my first mail).


What about checking

lastalign > curalign || (lastalign == curalign && DECL_USER_ALIGN 
(last_decl) != DECL_USER_ALIGN (decl))


instead and changing the associated comment?


I might use > instead of !=, but sure.


Even then it's not really satisfactory to emit a note that complains
about "aligned (8)" at a declaration where it was not even explicitly
specified.  But then, the situation is similar for other attributes as 
well and we would need to explicitly store the location where an 
attribute was specified first.




[PATCH] rs6000: Add new __builtin_vsx_build_pair and __builtin_mma_build_acc built-ins

2021-06-09 Thread Peter Bergner via Gcc-patches
The __builtin_vsx_assemble_pair and __builtin_mma_assemble_acc built-ins
currently assign their first source operand to the first VSX register
in a pair/quad, their second operand to the second register in a pair/quad, etc.
This is not endian friendly and forces the user to generate different calls
depending on endianness.  In agreement with the POWER LLVM team, we've
decided to lightly deprecate the assemble built-ins and replace them with
"build" built-ins that automatically handle endianness so the same built-in
call and be used for both little-endian and big-endian compiles.  We are not
removing the assemble built-ins, since there is code in the wild that use
them, but we are removing their documentation to encourage the use of the
new "build" variants.

This passed bootstrap and regtesting on both powerpc64-linux and
powerpc64le-linux with no regressions.  I also verified that we continue
to generate the same code as before on some small unit tests and that we
agree with what LLVM generates.  Ok for trunk?

I'd also like to backport this in time for gcc 11.2.  Ok for the release
branch after it has baked on trunk for a few days?

Peter


gcc/
* config/rs6000/rs6000-builtin.def (build_pair): New built-in.
(build_acc): Likewise.
* config/rs6000/rs6000-call.c (mma_expand_builtin): Swap assemble
source operands in little-endian mode.
(rs6000_gimple_fold_mma_builtin): Handle VSX_BUILTIN_BUILD_PAIR.
(mma_init_builtins): Likewise.
* config/rs6000/rs6000.c (rs6000_split_multireg_move): Handle endianness
ordering for the MMA assemble and build source operands.
* doc/extend.texi (__builtin_vsx_build_acc, __builtin_mma_build_pair):
Document.
(__builtin_mma_assemble_acc, __builtin_mma_assemble_pair): Remove
documentation.

gcc/testsuite
* gcc.target/powerpc/mma-builtin-4.c (__builtin_vsx_build_pair): Add
tests.  Update expected counts.
* gcc.target/powerpc/mma-builtin-5.c (__builtin_mma_build_acc): Add
tests.  Update expected counts.

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 609bebdfd74..4043e14ed3f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -3207,6 +3207,7 @@ BU_MMA_2 (DISASSEMBLE_ACC, "disassemble_acc", QUAD, 
mma_disassemble_acc)
 BU_MMA_V2 (DISASSEMBLE_PAIR, "disassemble_pair", PAIR, vsx_disassemble_pair)
 BU_COMPAT (VSX_BUILTIN_DISASSEMBLE_PAIR, "mma_disassemble_pair")
 
+BU_MMA_V3 (BUILD_PAIR, "build_pair",   MISC, vsx_assemble_pair)
 BU_MMA_V3 (ASSEMBLE_PAIR,   "assemble_pair",   MISC, vsx_assemble_pair)
 BU_COMPAT (VSX_BUILTIN_ASSEMBLE_PAIR, "mma_assemble_pair")
 BU_MMA_3 (XVBF16GER2,  "xvbf16ger2",   MISC, mma_xvbf16ger2)
@@ -3239,6 +3240,7 @@ BU_MMA_3 (XVI8GER4SPP,"xvi8ger4spp",  QUAD, 
mma_xvi8ger4spp)
 BU_MMA_3 (XVI16GER2PP, "xvi16ger2pp",  QUAD, mma_xvi16ger2pp)
 BU_MMA_3 (XVI16GER2SPP,"xvi16ger2spp", QUAD, mma_xvi16ger2spp)
 
+BU_MMA_5 (BUILD_ACC,   "build_acc",MISC, mma_assemble_acc)
 BU_MMA_5 (ASSEMBLE_ACC, "assemble_acc",MISC, mma_assemble_acc)
 BU_MMA_5 (PMXVF32GER,  "pmxvf32ger",   MISC, mma_pmxvf32ger)
 BU_MMA_5 (PMXVF64GER,  "pmxvf64ger",   PAIR, mma_pmxvf64ger)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..8deb9a3d207 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -10118,13 +10118,23 @@ mma_expand_builtin (tree exp, rtx target, bool 
*expandedp)
   pat = GEN_FCN (icode) (op[0], op[1]);
   break;
 case 3:
-  pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
+   pat = GEN_FCN (icode) (op[0], op[2], op[1]);
+  else
+   pat = GEN_FCN (icode) (op[0], op[1], op[2]);
   break;
 case 4:
   pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
   break;
 case 5:
-  pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+  /* The ASSEMBLE builtin source operands are reversed in little-endian
+mode, so reorder them.  */
+  if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
+   pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]);
+  else
+   pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
   break;
 case 6:
   pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
@@ -11835,7 +11845,7 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator 
*gsi)
   gcc_unreachable ();
 }
 
-  if (fncode == VSX_BUILTIN_ASSEMBLE_PAIR)
+  if (fncode == VSX_BUILTIN_BUILD_PAIR || fncode == VSX_BUILTIN_ASSEMBLE_PAIR)
 lhs = make_ssa_name (vector_pair_type_node);
   else
 lhs = 

Re: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/7/2021 2:33 PM, H.J. Lu via Gcc-patches wrote:

On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
 wrote:

On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
 wrote:

On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu  wrote:

On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
 wrote:

"H.J. Lu"  writes:

Update vec_duplicate to allow to fail so that backend can only allow
broadcasting an integer constant to a vector when broadcast instruction
is available.

I'm not sure why we need this to fail though.  Once the optab is defined
for target X, the optab should handle all duplicates for target X,
even if there are different strategies it can use.

AIUI the case you want to make conditional is the constant case.
I guess the first question is: why don't we simplify those CONSTRUCTORs
to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
as a constructor here.

The particular testcase for vec_duplicate is gcc.dg/pr100239.c.


If we can't rely on that happening, then would it work to change:

 /* Try using vec_duplicate_optab for uniform vectors.  */
 if (!TREE_SIDE_EFFECTS (exp)
 && VECTOR_MODE_P (mode)
 && eltmode == GET_MODE_INNER (mode)
 && ((icode = optab_handler (vec_duplicate_optab, mode))
 != CODE_FOR_nothing)
 && (elt = uniform_vector_p (exp)))

to something like:

 /* Try using vec_duplicate_optab for uniform vectors.  */
 if (!TREE_SIDE_EFFECTS (exp)
 && VECTOR_MODE_P (mode)
 && eltmode == GET_MODE_INNER (mode)
 && (elt = uniform_vector_p (exp)))
   {
 if (TREE_CODE (elt) == INTEGER_CST
 || TREE_CODE (elt) == POLY_INT_CST
 || TREE_CODE (elt) == REAL_CST
 || TREE_CODE (elt) == FIXED_CST)
   {
 rtx src = gen_const_vec_duplicate (mode, expand_normal (node));
 emit_move_insn (target, src);
 break;
   }
 …
   }

I will give it a try.

I can confirm that veclower leaves us with an unfolded constant CTOR.
If you file a PR to remind me I'll fix that.

The attached untested patch fixes this for the testcase.


Here is the patch + the testcase.


0001-middle-end-100951-make-sure-to-generate-VECTOR_CST-i.patch

 From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 7 Jun 2021 20:08:13 +0200
Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
  lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

gcc/

2021-06-07  Richard Biener  

PR middle-end/100951
* tree-vect-generic.c (): Build a VECTOR_CST if all
elements are constant.

gcc/testsuite/

2021-06-07  H.J. Lu  

PR middle-end/100951
* gcc.target/i386/pr100951.c: New test.

Assuming this passed testing it is OK.
jeff


[committed] libstdc++: Only support atomic_ref::wait tests which are always lockfree

2021-06-09 Thread Thomas Rodgers
Fixes a regression on arm32 targets.

libstdc++/ChangeLog:
* testsuite/29_atomics/atomic_ref/wait_notify.cc: Guard
test logic with constexpr check for is_always_lock_free.

As discussed on IRC.

Tested x86_64-pc-linux-gnu, committed to master, backported to
releases/gcc-11.
---
 .../29_atomics/atomic_ref/wait_notify.cc  | 23 +++
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
index 2500dddf884..c21c3a11ab5 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -30,17 +30,20 @@ template
   void
   test (S va, S vb)
   {
-S aa{ va };
-S bb{ vb };
-std::atomic_ref a{ aa };
-a.wait(bb);
-std::thread t([&]
+if constexpr (std::atomic_ref::is_always_lock_free)
   {
-a.store(bb);
-a.notify_one();
-  });
-a.wait(aa);
-t.join();
+S aa{ va };
+S bb{ vb };
+std::atomic_ref a{ aa };
+a.wait(bb);
+std::thread t([&]
+  {
+a.store(bb);
+a.notify_one();
+  });
+a.wait(aa);
+t.join();
+  }
   }
 
 int
-- 
2.26.2



Re: [PATCH/RFC] combine: Tweak the condition of last_set invalidation

2021-06-09 Thread Segher Boessenkool
Hi!

On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
> Currently we have the check:
> 
>   if (!insn
> || (value && rsp->last_set_table_tick >= label_tick_ebb_start))
>   rsp->last_set_invalid = 1; 
> 
> which means if we want to record some value for some reg and
> this reg got refered before in a valid scope,

If we already know it is *set* in this same extended basic block.
Possibly by the same instruction btw.

> we invalidate the
> set of reg (last_set_invalid to 1).  It avoids to find the wrong
> set for one reg reference, such as the case like:
> 
>... op regX  // this regX could find wrong last_set below
>regX = ...   // if we think this set is valid
>... op regX

Yup, exactly.

> But because of retry's existence, the last_set_table_tick could
> be set by some later reference insns, but we see it's set due
> to retry on the set (for that reg) insn again, such as:
> 
>insn 1
>insn 2
> 
>regX = ... --> (a)
>... op regX--> (b)
>
>insn 3
> 
>// assume all in the same BB.
> 
> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
> (3 insns -> 2 insns),

This will delete insn 1 and write the combined result to insns 2 and 3.

> retrying from insn1 or insn2 again:

Always 2, but your point remains valid.

> it will scan insn (a) again, the below condition holds for regX:
> 
>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
> 
> it will mark this set as invalid set.  But actually the
> last_set_table_tick here is set by insn (b) before retrying, so it
> should be safe to be taken as valid set.

Yup.

> This proposal is to check whether the last_set_table safely happens
> after the current set, make the set still valid if so.

> Full SPEC2017 building shows this patch gets more sucessful combines
> from 1902208 to 1902243 (trivial though).

Do you have some example, or maybe even a testcase?  :-)

> +  /* Record the luid of the insn whose expression involving register n.  */
> +
> +  intlast_set_table_luid;

"Record the luid of the insn for which last_set_table_tick was set",
right?

> -static void update_table_tick (rtx);
> +static void update_table_tick (rtx, int);

Please remove this declaration instead, the function is not used until
after its actual definition :-)

> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
>for (r = regno; r < endregno; r++)
>   {
> reg_stat_type *rsp = _stat[r];
> -   rsp->last_set_table_tick = label_tick;
> +   if (rsp->last_set_table_tick >= label_tick_ebb_start)
> + {
> +   /* Later references should not have lower ticks.  */
> +   gcc_assert (label_tick >= rsp->last_set_table_tick);

This should be obvious, but checking it won't hurt, okay.

> +   /* Should pick up the lowest luid if the references
> +  are in the same block.  */
> +   if (label_tick == rsp->last_set_table_tick
> +   && rsp->last_set_table_luid > insn_luid)
> + rsp->last_set_table_luid = insn_luid;

Why?  Is it conservative for the check you will do later?  Please spell
this out, it is crucial!

> @@ -13359,7 +13378,10 @@ record_value_for_reg (rtx reg, rtx_insn *insn, rtx 
> value)
>  
>/* Mark registers that are being referenced in this value.  */
>if (value)
> -update_table_tick (value);
> +{
> +  gcc_assert (insn);
> +  update_table_tick (value, DF_INSN_LUID (insn));
> +}

Don't add that assert please.  If you really want one it should come
right at the start of the function, not 60 lines later :-)

Looks good if I understood this correctly :-)


Segher


Re: [PATCH] tree-optimization/97832 - handle associatable chains in SLP discovery

2021-06-09 Thread Christophe Lyon via Gcc-patches
On Wed, 9 Jun 2021 at 18:56, Alex Coplan via Gcc-patches
 wrote:
>
> Hi Richi,
>
> On 09/06/2021 14:42, Richard Biener via Gcc-patches wrote:
> > On Mon, May 31, 2021 at 5:00 PM Richard Biener  wrote:
> > >
> > > This makes SLP discovery handle associatable (including mixed
> > > plus/minus) chains better by swapping operands across the whole
> > > chain.  To work this adds caching of the 'matches' lanes for
> > > failed SLP discovery attempts, thereby fixing a failed SLP
> > > discovery for the slp-pr98855.cc testcase which results in
> > > building an operand from scalars as expected.  Unfortunately
> > > this makes us trip over the cost threshold so I'm XFAILing the
> > > testcase for now.
> > >
> > > For BB vectorization all this doesn't work because we have no way
> > > to distinguish good from bad associations as we eventually build
> > > operands from scalars and thus not fail in the classical sense.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, I'll re-do
> > > last years SPEC tests as well.  Now that it is stage1 I'm considering
> > > to push this if there are no further comments given I plan to
> > > re-use some of the machinery for vectorization of BB reductions.
> >
> > Now finally pushed as ce670e4faafb296d1f1a7828d20f8c8ba4686797
>
> Looks like this introduces an ICE on aarch64:

And on arm too, if that helps reproducing it.

>
> spawn -ignore SIGHUP /data/ajc/toolchain/builds/rel/gcc/xgcc 
> -B/data/ajc/toolchain/builds/rel/gcc/ 
> /home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c 
> -fdiagnostics-plain-output -O3 -S -o pr86179.s
> during GIMPLE pass: vect
> /home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c: In function 
> 'c':
> /home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c:7:6: internal 
> compiler error: in vect_slp_analyze_node_operations, at tree-vect-slp.c:
> 0x1132edb vect_slp_analyze_node_operations
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4442
> 0x1132757 vect_slp_analyze_node_operations
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
> 0x1132757 vect_slp_analyze_node_operations
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
> 0x1132757 vect_slp_analyze_node_operations
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
> 0x1132757 vect_slp_analyze_node_operations
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
> 0x11355cf vect_slp_analyze_operations(vec_info*)
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4592
> 0x110cbe3 vect_analyze_loop_2
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-loop.c:2396
> 0x110e4af vect_analyze_loop(loop*, vec_info_shared*)
> /home/alecop01/toolchain/src/gcc/gcc/tree-vect-loop.c:2986
> 0x114381b try_vectorize_loop_1
> /home/alecop01/toolchain/src/gcc/gcc/tree-vectorizer.c:1009
> 0x11442d3 vectorize_loops()
> /home/alecop01/toolchain/src/gcc/gcc/tree-vectorizer.c:1243
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> compiler exited with status 1
> FAIL: gcc.dg/pr86179.c (internal compiler error)
>
> Alex
>
> >
> > > Richard.
> > >
> > > 2021-05-31  Richard Biener  
> > >
> > > PR tree-optimization/97832
> > > * tree-vectorizer.h (_slp_tree::failed): New.
> > > * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
> > > failed member.
> > > (_slp_tree::~_slp_tree): Free failed.
> > > (vect_build_slp_tree): Retain failed nodes and record
> > > matches in them, copying that back out when running
> > > into a cached fail.  Dump start and end of discovery.
> > > (dt_sort_cmp): New.
> > > (vect_build_slp_tree_2): Handle associatable chains
> > > together doing more aggressive operand swapping.
> > >
> > > * gcc.dg/vect/pr97832-1.c: New testcase.
> > > * gcc.dg/vect/pr97832-2.c: Likewise.
> > > * gcc.dg/vect/pr97832-3.c: Likewise.
> > > * g++.dg/vect/slp-pr98855.cc: XFAIL.
> > > ---
> > >  gcc/testsuite/g++.dg/vect/slp-pr98855.cc |   4 +-
> > >  gcc/testsuite/gcc.dg/vect/pr97832-1.c|  17 +
> > >  gcc/testsuite/gcc.dg/vect/pr97832-2.c|  29 ++
> > >  gcc/testsuite/gcc.dg/vect/pr97832-3.c|  50 +++
> > >  gcc/testsuite/gcc.dg/vect/slp-50.c   |  20 +
> > >  gcc/tree-vect-slp.c  | 445 ++-
> > >  gcc/tree-vectorizer.h|   5 +
> > >  7 files changed, 560 insertions(+), 10 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-1.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-2.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-3.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-50.c


Re: [PATCH] c++: matching deduced template template parameters [PR67829]

2021-06-09 Thread Patrick Palka via Gcc-patches
On Wed, 9 Jun 2021, Patrick Palka wrote:

> During deduction, when the template of a BOUND_TEMPLATE_TEMPLATE_PARM is

Ah sorry, this should instead say "when the template of _the argument for_
a BOUND_TEMPLATE_TEMPLATE_PARM is ..."

> a template template parameter, we need to consider the
> TEMPLATE_TEMPLATE_PARAMETER rather than the TEMPLATE_DECL thereof,
> because the canonical form of a template template parameter in a
> template argument list is the TEMPLATE_TEMPLATE_PARAMETER tree.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/67829
> 
> gcc/cp/ChangeLog:
> 
>   * pt.c (unify) : When
>   the TEMPLATE_DECL of a BOUND_TEMPLATE_TEMPLATE_PARM argument is
>   a template template parameter, adjust to the
>   TEMPLATE_TEMPLATE_PARAMETER before falling through.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/ttp34.C: New test.
>   * g++.dg/template/ttp34a.C: New test.
>   * g++.dg/template/ttp34b.C: New test.
> ---
>  gcc/cp/pt.c|  4 
>  gcc/testsuite/g++.dg/template/ttp34.C  | 14 ++
>  gcc/testsuite/g++.dg/template/ttp34a.C | 14 ++
>  gcc/testsuite/g++.dg/template/ttp34b.C | 14 ++
>  4 files changed, 46 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/template/ttp34.C
>  create mode 100644 gcc/testsuite/g++.dg/template/ttp34a.C
>  create mode 100644 gcc/testsuite/g++.dg/template/ttp34b.C
> 
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index 05679b12973..963a182b9e5 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -23555,6 +23555,10 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
> int strict,
>   return 1;
>  
> arg = TYPE_TI_TEMPLATE (arg);
> +   if (TREE_CODE (TREE_TYPE (arg)) == TEMPLATE_TEMPLATE_PARM)
> + /* If the template is a template template parameter, use the
> +TEMPLATE_TEMPLATE_PARM for matching.  */
> + arg = TREE_TYPE (arg);
>  
> /* Fall through to deduce template name.  */
>   }
> diff --git a/gcc/testsuite/g++.dg/template/ttp34.C 
> b/gcc/testsuite/g++.dg/template/ttp34.C
> new file mode 100644
> index 000..67094063ba5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/ttp34.C
> @@ -0,0 +1,14 @@
> +// PR c++/67829
> +
> +template class Purr;
> +
> +template class, class, class>
> +class Meow;
> +
> +template class P>
> +class Meow, int> { }; // 1
> +
> +template class P, class T>
> +class Meow, T>; // 2
> +
> +Meow, int> kitty;
> diff --git a/gcc/testsuite/g++.dg/template/ttp34a.C 
> b/gcc/testsuite/g++.dg/template/ttp34a.C
> new file mode 100644
> index 000..e3303dcf212
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/ttp34a.C
> @@ -0,0 +1,14 @@
> +// PR c++/67829
> +
> +template class Purr;
> +
> +template class, class>
> +class Meow;
> +
> +template class P>
> +class Meow > { }; // 1
> +
> +template class P, class T>
> +class Meow >; // 2
> +
> +Meow > kitty;
> diff --git a/gcc/testsuite/g++.dg/template/ttp34b.C 
> b/gcc/testsuite/g++.dg/template/ttp34b.C
> new file mode 100644
> index 000..ed3b3e8ab05
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/ttp34b.C
> @@ -0,0 +1,14 @@
> +// PR c++/67829
> +
> +template class Purr;
> +
> +template class>
> +class Meow;
> +
> +template class P>
> +class Meow, P> { }; // 1
> +
> +template class P, class T>
> +class Meow, P>; // 2
> +
> +Meow, Purr> kitty;
> -- 
> 2.32.0.rc2
> 
> 



[PATCH] c++: normalization of non-templated return-type-req [PR100946]

2021-06-09 Thread Patrick Palka via Gcc-patches
Here the satisfaction cache is conflating the satisfaction value of the
two return-type-requirements because the corresponding constrained
'auto's have level 2, but they capture an empty current_template_parms.
This ultimately causes the satisfaction cache to think the type
constraint doesn't depend on the deduced type of the expression.

When normalizing the constraints on an 'auto', the assumption made by
normalize_placeholder_type_constraints is that the level of the 'auto'
is one greater than the captured current_template_parms, an assumption
which is not holding here.  To fix this, this patch adds a dummy level
to current_template_parms to match processing_template_decl when parsing
a non-templated return-type-requirement.  This patch also makes us
verify this assumption upon creation of a constrained 'auto'.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/100946

gcc/cp/ChangeLog:

* parser.c (cp_parser_compound_requirement): When parsing a
non-templated return-type-requirement, add a dummy level
to current_template_parms.
* pt.c (make_constrained_placeholder_type): Verify the depth
of current_template_parms is consistent with the level of
the 'auto'.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-return-req3.C: New test.
---
 gcc/cp/parser.c   | 12 
 gcc/cp/pt.c   |  8 
 gcc/testsuite/g++.dg/cpp2a/concepts-return-req3.C |  6 ++
 3 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req3.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d59a829d0b9..8278a5608ae 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29181,6 +29181,18 @@ cp_parser_compound_requirement (cp_parser *parser)
   cp_lexer_consume_token (parser->lexer);
   cp_token *tok = cp_lexer_peek_token (parser->lexer);
 
+  auto ctp = make_temp_override (current_template_parms);
+  if (!current_template_parms)
+   {
+ /* We're parsing a return-type-requirement within a non-templated
+requires-expression.  Update current_template_parms to agree with
+processing_template_decl so that the normalization context that's
+captured by the corresponding constrained 'auto' is sensible.  */
+ gcc_checking_assert (processing_template_decl == 1);
+ current_template_parms
+   = build_tree_list (size_int (1), make_tree_vec (0));
+   }
+
   bool saved_result_type_constraint_p = 
parser->in_result_type_constraint_p;
   parser->in_result_type_constraint_p = true;
   /* C++20 allows either a type-id or a type-constraint. Parsing
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b0155a9c370..05679b12973 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -28131,6 +28131,14 @@ make_constrained_placeholder_type (tree type, tree 
con, tree args)
   expr = build_concept_check (expr, type, args, tf_warning_or_error);
   --processing_template_decl;
 
+  /* Verify the normalization context is consistent with the level of
+ this 'auto'.  */
+  if (TEMPLATE_TYPE_LEVEL (type) == 1)
+gcc_checking_assert (!current_template_parms);
+  else
+gcc_checking_assert (1 + TMPL_PARMS_DEPTH (current_template_parms)
+== TEMPLATE_TYPE_LEVEL (type));
+
   PLACEHOLDER_TYPE_CONSTRAINTS_INFO (type)
 = build_tree_list (current_template_parms, expr);
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-return-req3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req3.C
new file mode 100644
index 000..a546c6457be
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req3.C
@@ -0,0 +1,6 @@
+// PR c++/100946
+// { dg-do compile { target c++20 } }
+
+template concept C = __is_same(T, int);
+static_assert(requires { { 0 } -> C; });
+static_assert(requires { { true } -> C; }); // { dg-error "failed" }
-- 
2.32.0.rc2



[PATCH] c++: matching deduced template template parameters [PR67829]

2021-06-09 Thread Patrick Palka via Gcc-patches
During deduction, when the template of a BOUND_TEMPLATE_TEMPLATE_PARM is
a template template parameter, we need to consider the
TEMPLATE_TEMPLATE_PARAMETER rather than the TEMPLATE_DECL thereof,
because the canonical form of a template template parameter in a
template argument list is the TEMPLATE_TEMPLATE_PARAMETER tree.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/67829

gcc/cp/ChangeLog:

* pt.c (unify) : When
the TEMPLATE_DECL of a BOUND_TEMPLATE_TEMPLATE_PARM argument is
a template template parameter, adjust to the
TEMPLATE_TEMPLATE_PARAMETER before falling through.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp34.C: New test.
* g++.dg/template/ttp34a.C: New test.
* g++.dg/template/ttp34b.C: New test.
---
 gcc/cp/pt.c|  4 
 gcc/testsuite/g++.dg/template/ttp34.C  | 14 ++
 gcc/testsuite/g++.dg/template/ttp34a.C | 14 ++
 gcc/testsuite/g++.dg/template/ttp34b.C | 14 ++
 4 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp34.C
 create mode 100644 gcc/testsuite/g++.dg/template/ttp34a.C
 create mode 100644 gcc/testsuite/g++.dg/template/ttp34b.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 05679b12973..963a182b9e5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -23555,6 +23555,10 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
return 1;
 
  arg = TYPE_TI_TEMPLATE (arg);
+ if (TREE_CODE (TREE_TYPE (arg)) == TEMPLATE_TEMPLATE_PARM)
+   /* If the template is a template template parameter, use the
+  TEMPLATE_TEMPLATE_PARM for matching.  */
+   arg = TREE_TYPE (arg);
 
  /* Fall through to deduce template name.  */
}
diff --git a/gcc/testsuite/g++.dg/template/ttp34.C 
b/gcc/testsuite/g++.dg/template/ttp34.C
new file mode 100644
index 000..67094063ba5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp34.C
@@ -0,0 +1,14 @@
+// PR c++/67829
+
+template class Purr;
+
+template class, class, class>
+class Meow;
+
+template class P>
+class Meow, int> { }; // 1
+
+template class P, class T>
+class Meow, T>; // 2
+
+Meow, int> kitty;
diff --git a/gcc/testsuite/g++.dg/template/ttp34a.C 
b/gcc/testsuite/g++.dg/template/ttp34a.C
new file mode 100644
index 000..e3303dcf212
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp34a.C
@@ -0,0 +1,14 @@
+// PR c++/67829
+
+template class Purr;
+
+template class, class>
+class Meow;
+
+template class P>
+class Meow > { }; // 1
+
+template class P, class T>
+class Meow >; // 2
+
+Meow > kitty;
diff --git a/gcc/testsuite/g++.dg/template/ttp34b.C 
b/gcc/testsuite/g++.dg/template/ttp34b.C
new file mode 100644
index 000..ed3b3e8ab05
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ttp34b.C
@@ -0,0 +1,14 @@
+// PR c++/67829
+
+template class Purr;
+
+template class>
+class Meow;
+
+template class P>
+class Meow, P> { }; // 1
+
+template class P, class T>
+class Meow, P>; // 2
+
+Meow, Purr> kitty;
-- 
2.32.0.rc2



Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/9/2021 12:15 PM, Thomas Rodgers wrote:


On 2021-06-09 09:23, Jeff Law via Gcc-patches wrote:




On 5/25/2021 2:23 PM, Paul Eggert wrote:

The GCC manual's documentation of -fno-trampolines was apparently
written from an Ada point of view. However, when I read it I
understandably mistook it to say that -fno-trampolines also works for
C, C++, etc. It doesn't: it is silently ignored for these languages,
and I assume for any language other than Ada.

This confusion caused me to go in the wrong direction in a Gnulib
dicussion, as I mistakenly thought that entire C apps with nested
functions could be compiled with -fno-trampolines and then use nested
C functions in stack overflow handlers where the alternate stack
is allocated via malloc. I was wrong, as this won't work on common
platforms like x86-64 where malloc yields non-executable storage.

gcc/
* doc/invoke.texi (Code Gen Options):
* doc/tm.texi.in (Trampolines):
Document that -fno-trampolines and -ftrampolines work
only with Ada.
So Martin Uecker probably has the most state on this.  IIRC when we 
last discussed -fno-trampolines the belief was that it could be 
easily made to work independent of the language, but that it was 
ultimately an ABI change.   That ultimately derailed plans to use 
-fno-trampolines for other languages in the immediate term.


The patch is fine, I just wanted to give you a bit of background on 
the state.   I'll go ahead and commit it for you.


Jeff

This patch (commit 4a0c4eaea32) is currently breaking the compilation 
with "Verify that you have permission to grant a GFDL license for 
all". It appears that tm.texi and tm.texi.in are out of sync.


My fault.  I applied the change to tm.texi by hand and mucked up a 
newline.    HJ fixed it for me, then I mucked up  again, then fixed it 
again...   I'll refrain for complaining about GDFL for everyone's sanity...


jeff


Re: [patch] Reuse non-gimple_reg variable for inlining

2021-06-09 Thread Jakub Jelinek via Gcc-patches
On Mon, May 03, 2021 at 10:04:20AM +0200, Eric Botcazou wrote:
> Hi,
> 
> when a call to a function is inlined and takes a parameter whose type is not
> gimple_reg, a variable is created in the caller to hold a copy of the argument
> passed in the call with the following comment:
> 
>   /* We may produce non-gimple trees by adding NOPs or introduce
>invalid sharing when operand is not really constant.
>It is not big deal to prohibit constant propagation here as
>we will constant propagate in DOM1 pass anyway.  *
> 
> Of course the second sentence of the comment does not apply to non-gimple_reg
> values, unless they get SRAed later, because we do not do constant propagation
> for them.  This for example prevents two identical calls to a pure function
> from being merged in the attached Ada testcase.
> 
> Therefore the attached patch attempts to reuse a read-only or non-addressable
> local DECL of the caller, the hitch being that expand_call_inline needs to be
> prevented from creating a CLOBBER for the cases where it ends uo being reused.

I'm afraid the inliner would need to prove the to be inlined callee doesn't
modify its own copy of the variable too, because if it modifies it (at least
in C/C++ const can be cast away), then this introduces wrong-code, see
PR100994 for details.

> Tested on x86-64/Linux, OK for the mainline?
> 
> 
> 2021-05-03  Eric Botcazou  
> 
>   * tree-inline.c (setup_one_parameter): Do not create a variable if the
>   value is either a read-only DECL or a non-addressable local variable.
>   Register the variable thus reused instead of creating a new one.
>   (expand_call_inline): Do not generate a CLOBBER for these variables.
> 
> 
> 2021-05-03  Eric Botcazou  
> 
>   * gnat.dg/opt94.adb: New test.
>   * gnat.dg/opt94_pkg.ads, opt94.adb: New helper.

Jakub



[r12-1330 Regression] FAIL: libgomp.fortran/pr100981-2.f90 -Os execution test on Linux/x86_64

2021-06-09 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

374f93da97fb0378453d503f3cfea4d7a923a89c is the first bad commit
commit 374f93da97fb0378453d503f3cfea4d7a923a89c
Author: Richard Biener 
Date:   Wed Jun 9 14:48:35 2021 +0200

tree-optimization/100981 - fix SLP patterns involving reductions

caused

FAIL: libgomp.fortran/pr100981-2.f90   -O0  execution test
FAIL: libgomp.fortran/pr100981-2.f90   -O1  execution test
FAIL: libgomp.fortran/pr100981-2.f90   -O2  execution test
FAIL: libgomp.fortran/pr100981-2.f90   -O3 -g  execution test
FAIL: libgomp.fortran/pr100981-2.f90   -Os  execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1330/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/pr100981-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.fortran/pr100981-2.f90 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-09 Thread Andrew MacLeod via Gcc-patches

On 6/9/21 7:32 AM, Richard Biener wrote:

On Tue, Jun 8, 2021 at 4:31 PM Andrew MacLeod  wrote:


an iteration causes a relation
to become "better" it should be updated.

do a very simplistic thing when trying to simplify downstream conditions
based on earlier ones, abusing their known-expressions hash tables
by, for example, registering (a < b) == 1, (a > b) == 0, (a == b) == 0,
(a != b) == 1 for an earlier a < b condition on the true edge.  So I wonder
if this relation code can be somehow used there.  In VN there's the
extra complication that it iterates, but DOM is just a DOM-walk and
the VN code also has a non-iterating mode (but not a DOM walk).

Note VN does optimistic iteration thus relations will become only "worse",
thus we somehow need to be able to remove relations we added during
the last iteration.  That is, in the first iteration a if (a > b) might be
registered as a > 1 when b has (optimistic) value b but in the second
we have to make it a > b when b dropped to varying for example.

The optimistic part of VN is that it treats all edges as not executable
initially and thus it ignores values on non-executable edges in PHI
nodes.

Yeah, haven't had to do that yet.  The add method currently does an 
intersection with any relation it finds already present, it shouldn't be 
too much work to add an alternate add routine that says "replace".  In 
fact, that may be also be adequate for my purposes, since typically the 
second time a relation is added, it *should* be either the same, or 
"better" for me.   The tricky part is I think it may already include any 
relation that dominates the currently location..  but that is addressable.


I'll keep an eye on this as I'm prepping the code and writing it up.


The API is for registering is pretty straightforward:

void register_relation (gimple *stmt, relation_kind k, tree op1, tree
op2);
void register_relation (edge e, relation_kind k, tree op1, tree op2);

so all you'd have to do when a < b is encountered is to register  (a
LT_EXPR b) on the true edge, and (a GE_EXPR b) on the false edge.  Then
any queries downstream should be reflected.




Of course the code is also used to simplify

   if (a > b)
  c = a != b;

but the relation oracle should be able to handle that as well I guess.



As we start using the code more, we may find we want/need a few more
wrappers around some of this so that you can transparently ask what the
RHS folds to without any ranger present, just with relations.  Those'll
be fairly trivial to add...

The relation oracle is going to be directly accessible from the
get_range_query(cfun) range_query class.  I'll do a big writeup when i
submit it and we should be able to make it usable in any of those places.

OK, so using this from DOM should be pretty straight-forward (you may
want to try to use it there as proof of API sanity).  When it's in I'll see if
it fits (iterative) VN.

Yeah, it should be quite straightforward as part of DOM.. and should be 
easy to set and query in parallel with whats there to verify its getting 
the same results.  Famous last words.  But yes, this would be further 
proof its evaluating what we expect on top of doing what EVRP did.


Slight delay while I'm sorting out some minor API issues to enable using 
this without ranger right off the bat, as well as interaction with some 
recent changes I made.


Andrew





Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-09 Thread Martin Sebor via Gcc-patches

On 6/9/21 12:50 PM, Aldy Hernandez wrote:



On 6/9/21 7:10 PM, Martin Sebor wrote:

On 6/7/21 12:29 PM, Aldy Hernandez via Gcc-patches wrote:





Mostly just a question of the type choices in the implementation
of the ssa_equiv_stack class: m_stack is an auto_vec while
m_replacements is a plain array.  I'd expect both to be the same
(auto_vec).  Is there a reason for this choice?

If not, I'd suggest to use auto_vec.  That way the ssa_equiv_stack
class shouldn't need a dtor (auto_vec doesn't need to be explicitly
released before it's destroyed), though it should delete its copy
and assignment.


TBH, the ssa_equiv_stack was lifted from evrp_range_analyzer (there are 
also 2 more copies of the same mechanism in tree-ssa-scopedtables.h). 
The code there already used an auto_vec, so I left that.  However, for a 
mere array of trees, I thought it'd be simpler to use new/delete.  Which 
in retrospect was easy to get wrong, as can be seen by:


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100984

Would something like the code below work?  It should also fix the PR.

Thanks.
Aldy

diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 7e1cf51239a..808d47506be 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -61,19 +61,18 @@ public:

  private:
    auto_vec> m_stack;
-  tree *m_replacements;
+  auto_vec m_replacements;
    const std::pair  m_marker = std::make_pair (NULL, NULL);
  };

  ssa_equiv_stack::ssa_equiv_stack ()
  {
-  m_replacements = new tree[num_ssa_names] ();
+  m_replacements.safe_grow_cleared (num_ssa_names);
  }

  ssa_equiv_stack::~ssa_equiv_stack ()
  {
    m_stack.release ();
-  delete m_replacements;
  }


Yes, this is what I was suggesting, with the additional removal of
the dtor above (since release() is called from the auto_vec dtor).

(Once auto_vec is safe to copy and assign we'll also get copy and
assignment for nothing.)

Martin

PS Enhancing -Wmismatched-new-delete to detect the bug in PR 100984
is on my to do list.


[PATCH] Update doc/tm.texi.in to fix commit 4a0c4eaea32

2021-06-09 Thread H.J. Lu via Gcc-patches
PR other/100735
* doc/tm.texi.in (Trampolines): Add a missing blank line.
---
 gcc/doc/tm.texi.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 20501607716..33532f092b6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3828,6 +3828,7 @@ addresses.  Since GCC's generic function descriptors are
 not ABI-compliant, this option is typically used only on a
 per-language basis (notably by Ada) or when it can otherwise be
 applied to the whole program.
+
 For languages other than Ada, the @code{-ftrampolines} and
 @code{-fno-trampolines} options currently have no effect, and
 trampolines are always generated on platforms that need them
-- 
2.31.1



Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 11:15 AM Thomas Rodgers
 wrote:
>
> On 2021-06-09 09:23, Jeff Law via Gcc-patches wrote:
>
> > On 5/25/2021 2:23 PM, Paul Eggert wrote:
> >
> >> The GCC manual's documentation of -fno-trampolines was apparently
> >> written from an Ada point of view. However, when I read it I
> >> understandably mistook it to say that -fno-trampolines also works for
> >> C, C++, etc. It doesn't: it is silently ignored for these languages,
> >> and I assume for any language other than Ada.
> >>
> >> This confusion caused me to go in the wrong direction in a Gnulib
> >> dicussion, as I mistakenly thought that entire C apps with nested
> >> functions could be compiled with -fno-trampolines and then use nested
> >> C functions in stack overflow handlers where the alternate stack
> >> is allocated via malloc. I was wrong, as this won't work on common
> >> platforms like x86-64 where malloc yields non-executable storage.
> >>
> >> gcc/
> >> * doc/invoke.texi (Code Gen Options):
> >> * doc/tm.texi.in (Trampolines):
> >> Document that -fno-trampolines and -ftrampolines work
> >> only with Ada.
> > So Martin Uecker probably has the most state on this.  IIRC when we
> > last discussed -fno-trampolines the belief was that it could be easily
> > made to work independent of the language, but that it was ultimately an
> > ABI change.   That ultimately derailed plans to use -fno-trampolines
> > for other languages in the immediate term.
> >
> > The patch is fine, I just wanted to give you a bit of background on the
> > state.   I'll go ahead and commit it for you.
> >
> > Jeff
>
> This patch (commit 4a0c4eaea32) is currently breaking the compilation
> with "Verify that you have permission to grant a GFDL license for all".
> It appears that tm.texi and tm.texi.in are out of sync.

I am checking in this to unbreak it:

diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 20501607716..33532f092b6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3828,6 +3828,7 @@ addresses.  Since GCC's generic function descriptors are
 not ABI-compliant, this option is typically used only on a
 per-language basis (notably by Ada) or when it can otherwise be
 applied to the whole program.
+
 For languages other than Ada, the @code{-ftrampolines} and
 @code{-fno-trampolines} options currently have no effect, and
 trampolines are always generated on platforms that need them

-- 
H.J.


Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-09 Thread Aldy Hernandez via Gcc-patches




On 6/9/21 7:10 PM, Martin Sebor wrote:

On 6/7/21 12:29 PM, Aldy Hernandez via Gcc-patches wrote:





Mostly just a question of the type choices in the implementation
of the ssa_equiv_stack class: m_stack is an auto_vec while
m_replacements is a plain array.  I'd expect both to be the same
(auto_vec).  Is there a reason for this choice?

If not, I'd suggest to use auto_vec.  That way the ssa_equiv_stack
class shouldn't need a dtor (auto_vec doesn't need to be explicitly
released before it's destroyed), though it should delete its copy
and assignment.


TBH, the ssa_equiv_stack was lifted from evrp_range_analyzer (there are 
also 2 more copies of the same mechanism in tree-ssa-scopedtables.h). 
The code there already used an auto_vec, so I left that.  However, for a 
mere array of trees, I thought it'd be simpler to use new/delete.  Which 
in retrospect was easy to get wrong, as can be seen by:


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100984

Would something like the code below work?  It should also fix the PR.

Thanks.
Aldy

diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 7e1cf51239a..808d47506be 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -61,19 +61,18 @@ public:

 private:
   auto_vec> m_stack;
-  tree *m_replacements;
+  auto_vec m_replacements;
   const std::pair  m_marker = std::make_pair (NULL, NULL);
 };

 ssa_equiv_stack::ssa_equiv_stack ()
 {
-  m_replacements = new tree[num_ssa_names] ();
+  m_replacements.safe_grow_cleared (num_ssa_names);
 }

 ssa_equiv_stack::~ssa_equiv_stack ()
 {
   m_stack.release ();
-  delete m_replacements;
 }

 // Pushes a marker at the given point.



Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-09 Thread Iain Sandoe via Gcc-patches
Jeff et. al.

> On 9 Jun 2021, at 17:23, Jeff Law via Gcc-patches  
> wrote:
> On 5/25/2021 2:23 PM, Paul Eggert wrote:
>> The GCC manual's documentation of -fno-trampolines was apparently
>> written from an Ada point of view. However, when I read it I
>> understandably mistook it to say that -fno-trampolines also works for
>> C, C++, etc. It doesn't: it is silently ignored for these languages,
>> and I assume for any language other than Ada.
>> 
>> This confusion caused me to go in the wrong direction in a Gnulib
>> dicussion, as I mistakenly thought that entire C apps with nested
>> functions could be compiled with -fno-trampolines and then use nested
>> C functions in stack overflow handlers where the alternate stack
>> is allocated via malloc. I was wrong, as this won't work on common
>> platforms like x86-64 where malloc yields non-executable storage.
>> 
>> gcc/
>> * doc/invoke.texi (Code Gen Options):
>> * doc/tm.texi.in (Trampolines):
>> Document that -fno-trampolines and -ftrampolines work
>> only with Ada.
> So Martin Uecker probably has the most state on this.  IIRC when we last 
> discussed -fno-trampolines the belief was that it could be easily made to 
> work independent of the language, but that it was ultimately an ABI change.   
> That ultimately derailed plans to use -fno-trampolines for other languages in 
> the immediate term.

This is correct, it’s not technically too hard to make it work for another 
language (I have a hack in my arm64-darwin branch that does this for gfortran). 
 As noted for most ports it is an ABI break and thus not usable outside a such 
a work-around.

For the record (for the arm64-darwin port in the first instance), together with 
some of my friends at embecosm we plan to implement a solution to the 
trampoline that does not require executable stack and does not require an ABI 
break.  Perhaps such a solution will be of interest to other ports that do not 
want executable stack.

We’re not quite ready to post the design yet - but will do so in the next few 
weeks (all being well).

cheers
Iain



Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-09 Thread Thomas Rodgers

On 2021-06-09 09:23, Jeff Law via Gcc-patches wrote:


On 5/25/2021 2:23 PM, Paul Eggert wrote:


The GCC manual's documentation of -fno-trampolines was apparently
written from an Ada point of view. However, when I read it I
understandably mistook it to say that -fno-trampolines also works for
C, C++, etc. It doesn't: it is silently ignored for these languages,
and I assume for any language other than Ada.

This confusion caused me to go in the wrong direction in a Gnulib
dicussion, as I mistakenly thought that entire C apps with nested
functions could be compiled with -fno-trampolines and then use nested
C functions in stack overflow handlers where the alternate stack
is allocated via malloc. I was wrong, as this won't work on common
platforms like x86-64 where malloc yields non-executable storage.

gcc/
* doc/invoke.texi (Code Gen Options):
* doc/tm.texi.in (Trampolines):
Document that -fno-trampolines and -ftrampolines work
only with Ada.
So Martin Uecker probably has the most state on this.  IIRC when we 
last discussed -fno-trampolines the belief was that it could be easily 
made to work independent of the language, but that it was ultimately an 
ABI change.   That ultimately derailed plans to use -fno-trampolines 
for other languages in the immediate term.


The patch is fine, I just wanted to give you a bit of background on the 
state.   I'll go ahead and commit it for you.


Jeff


This patch (commit 4a0c4eaea32) is currently breaking the compilation 
with "Verify that you have permission to grant a GFDL license for all". 
It appears that tm.texi and tm.texi.in are out of sync.


[committed] d: TypeInfo error when using slice copy on Structs (PR100964)

2021-06-09 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes a compiler error when using a slice copy on structs.
Known limitation: does not work for struct with postblit or dtor.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline, and backported to the gcc-9, gcc-10, and gcc-11
release branches.

Regards,
Iain.

---
gcc/d/ChangeLog:

PR d/100964
* dmd/MERGE: Merge upstream dmd 4a4e46a6f.
---
 gcc/d/dmd/MERGE  |  2 +-
 gcc/d/dmd/expression.c   |  7 +--
 gcc/testsuite/gdc.test/compilable/betterCarray.d | 10 ++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index e22e3d1ebf7..a617f285eac 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-f3fdeb578f8cc6d9426d47d2fa144d2078f9ab29
+4a4e46a6f304a667e0c05d4455706ec2056ffddc
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/expression.c b/gcc/d/dmd/expression.c
index 2592b38d961..88f13e9669b 100644
--- a/gcc/d/dmd/expression.c
+++ b/gcc/d/dmd/expression.c
@@ -1044,8 +1044,11 @@ bool Expression::checkPostblit(Scope *sc, Type *t)
 t = t->baseElemOf();
 if (t->ty == Tstruct)
 {
-// Bugzilla 11395: Require TypeInfo generation for array concatenation
-semanticTypeInfo(sc, t);
+if (global.params.useTypeInfo)
+{
+// Bugzilla 11395: Require TypeInfo generation for array 
concatenation
+semanticTypeInfo(sc, t);
+}
 
 StructDeclaration *sd = ((TypeStruct *)t)->sym;
 if (sd->postblit)
diff --git a/gcc/testsuite/gdc.test/compilable/betterCarray.d 
b/gcc/testsuite/gdc.test/compilable/betterCarray.d
index 74c80be3b95..3f48b042bde 100644
--- a/gcc/testsuite/gdc.test/compilable/betterCarray.d
+++ b/gcc/testsuite/gdc.test/compilable/betterCarray.d
@@ -15,3 +15,13 @@ int foo(int[] a, int i)
 {
 return a[i];
 }
+
+/**/
+// https://issues.dlang.org/show_bug.cgi?id=19234
+void issue19234()
+{
+static struct A {}
+A[10] a;
+A[10] b;
+b[] = a[];
+}
-- 
2.27.0



[committed] d: Respect explicit align(N) type alignment (PR100935)

2021-06-09 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes the .alignof property to report the correct alignment
if an explicit one was given for the type.

It was previously the natural type alignment, defined as the maximum of
the field alignments for an aggregate.  Make sure an explicit align(N)
overrides it.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline, and backported to the gcc-9, gcc-10, and gcc-11
release branches.

Regards,
Iain.

---
gcc/d/ChangeLog:

PR d/100935
* dmd/MERGE: Merge upstream dmd f3fdeb578.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/mtype.c |  5 -
 .../gdc.test/compilable/aggr_alignment.d  | 20 +++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index d29d462f42f..e22e3d1ebf7 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-b7d146c4c34469f876a63f26ff19091a7f9d54d7
+f3fdeb578f8cc6d9426d47d2fa144d2078f9ab29
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/mtype.c b/gcc/d/dmd/mtype.c
index 9ef8ab4e5f4..6cccf40df98 100644
--- a/gcc/d/dmd/mtype.c
+++ b/gcc/d/dmd/mtype.c
@@ -2040,7 +2040,10 @@ Expression *Type::getProperty(Loc loc, Identifier 
*ident, int flag)
 }
 else if (ident == Id::__xalignof)
 {
-e = new IntegerExp(loc, alignsize(), Type::tsize_t);
+unsigned explicitAlignment = alignment();
+unsigned naturalAlignment = alignsize();
+unsigned actualAlignment = (explicitAlignment == STRUCTALIGN_DEFAULT ? 
naturalAlignment : explicitAlignment);
+e = new IntegerExp(loc, actualAlignment, Type::tsize_t);
 }
 else if (ident == Id::_init)
 {
diff --git a/gcc/testsuite/gdc.test/compilable/aggr_alignment.d 
b/gcc/testsuite/gdc.test/compilable/aggr_alignment.d
index bf602ff31a4..0c727e2fec5 100644
--- a/gcc/testsuite/gdc.test/compilable/aggr_alignment.d
+++ b/gcc/testsuite/gdc.test/compilable/aggr_alignment.d
@@ -27,6 +27,26 @@ static assert(C2.int1.offsetof == payloadOffset + 8);
 static assert(C2.alignof == size_t.sizeof);
 static assert(__traits(classInstanceSize, C2) == payloadOffset + 12);
 
+align(8) struct PaddedStruct
+{
+bool flag;
+align(2) S1 s1;
+}
+
+static assert(PaddedStruct.s1.offsetof == 2);
+static assert(PaddedStruct.alignof == 8);
+static assert(PaddedStruct.sizeof == 16);
+
+align(1) struct UglyStruct
+{
+bool flag;
+int i;
+ubyte u;
+}
+
+static assert(UglyStruct.i.offsetof == 4);
+static assert(UglyStruct.alignof == 1);
+static assert(UglyStruct.sizeof == 9);
 
 /***/
 // https://issues.dlang.org/show_bug.cgi?id=19914
-- 
2.27.0



Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-09 Thread Martin Sebor via Gcc-patches

On 6/7/21 12:29 PM, Aldy Hernandez via Gcc-patches wrote:



On 6/7/21 3:30 PM, Richard Biener wrote:

On Mon, Jun 7, 2021 at 12:10 PM Aldy Hernandez via Gcc-patches
 wrote:


The substitute_and_fold_engine which evrp uses is expecting symbolics
from value_of_expr / value_on_edge / etc, which ranger does not provide.
In some cases, these provide important folding cues, as in the case of
aliases for pointers.  For example, legacy evrp may return [, ]
for the value of "bar" where bar is on an edge where bar == , or
when bar has been globally set to   This information is then used
by the subst & fold engine to propagate the known value of bar.

Currently this is a major source of discrepancies between evrp and
ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
for pointer equality as discussed above.

This patch implements a context aware points-to class which
ranger-evrp can use to query what a pointer is currently pointing to.
With it, we reduce the 284 cases legacy evrp is getting to 47.

The API for the points-to analyzer is the following:

class points_to_analyzer
{
public:
   points_to_analyzer (gimple_ranger *r);
   ~points_to_analyzer ();
   void enter (basic_block);
   void leave (basic_block);
   void visit_stmt (gimple *stmt);
   tree get_points_to (tree name) const;
...
};

The enter(), leave(), and visit_stmt() methods are meant to be called
from a DOM walk.   At any point throughout the walk, one can call
get_points_to() to get whatever an SSA is pointing to.

If this class is useful to others, we could place it in a more generic
location.

Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
EVRP folds over ranger before and after this patch.


Hmm, but why call it "points-to" - when I look at the implementation
it's really about equivalences.  Thus,

  if (var1_2 == var2_3)

could be handled the same way.  Also "points-to" implies (to me)
that [1] and [2] point to the same object but your points-to
is clearly tracking equivalences only.

So maybe at least rename it to pointer_equiv_analyzer?  ISTR


Good point.  Renaming done.  I've adjusted the changelog and commit 
message as well.


Mostly just a question of the type choices in the implementation
of the ssa_equiv_stack class: m_stack is an auto_vec while
m_replacements is a plain array.  I'd expect both to be the same
(auto_vec).  Is there a reason for this choice?

If not, I'd suggest to use auto_vec.  That way the ssa_equiv_stack
class shouldn't need a dtor (auto_vec doesn't need to be explicitly
released before it's destroyed), though it should delete its copy
and assignment.

Martin



Thanks.
Aldy




RE: [PATCH] PR middle-end/53267: Constant fold BUILT_IN_FMOD.

2021-06-09 Thread Roger Sayle


Hi Jeff (and Richard),
Many thanks to you both.  Fingers-crossed my write-access (after approval) 
still works
(I think it's just maintainer status that I've lost over time), but finding 
time to contribute
is getting much harder, so my response time is much slower than it was a decade 
ago.
Having someone help with committing patches is always very much appreciated.
Cheers,
Roger
--

-Original Message-
From: Jeff Law  
Sent: 09 June 2021 16:27
To: Richard Biener ; Roger Sayle 

Cc: GCC Patches 
Subject: Re: [PATCH] PR middle-end/53267: Constant fold BUILT_IN_FMOD.



On 6/9/2021 4:51 AM, Richard Biener via Gcc-patches wrote:
> On Tue, Jun 8, 2021 at 9:36 PM Roger Sayle  wrote:
>>
>> Here's a three line patch to implement constant folding for fmod, 
>> fmodf and fmodl, which resolves an enhancement request from 2012.
>>
>> The following patch has been tested on x86_64-pc-linux-gnu with a 
>> make bootstrap and make -k check with no new failures.
>>
>> Ok for mainline?
> OK.  I double-checked and mpfr_fmod appeared in mpfr 2.4.0 and we 
> require at least 3.1.0.
I'm not sure if Roger has write access these days, so I went ahead and 
committed the patch on his behalf.

jeff




Re: [PATCH] tree-optimization/97832 - handle associatable chains in SLP discovery

2021-06-09 Thread Alex Coplan via Gcc-patches
Hi Richi,

On 09/06/2021 14:42, Richard Biener via Gcc-patches wrote:
> On Mon, May 31, 2021 at 5:00 PM Richard Biener  wrote:
> >
> > This makes SLP discovery handle associatable (including mixed
> > plus/minus) chains better by swapping operands across the whole
> > chain.  To work this adds caching of the 'matches' lanes for
> > failed SLP discovery attempts, thereby fixing a failed SLP
> > discovery for the slp-pr98855.cc testcase which results in
> > building an operand from scalars as expected.  Unfortunately
> > this makes us trip over the cost threshold so I'm XFAILing the
> > testcase for now.
> >
> > For BB vectorization all this doesn't work because we have no way
> > to distinguish good from bad associations as we eventually build
> > operands from scalars and thus not fail in the classical sense.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, I'll re-do
> > last years SPEC tests as well.  Now that it is stage1 I'm considering
> > to push this if there are no further comments given I plan to
> > re-use some of the machinery for vectorization of BB reductions.
> 
> Now finally pushed as ce670e4faafb296d1f1a7828d20f8c8ba4686797

Looks like this introduces an ICE on aarch64:

spawn -ignore SIGHUP /data/ajc/toolchain/builds/rel/gcc/xgcc 
-B/data/ajc/toolchain/builds/rel/gcc/ 
/home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c 
-fdiagnostics-plain-output -O3 -S -o pr86179.s
during GIMPLE pass: vect
/home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c: In function 
'c':
/home/alecop01/toolchain/src/gcc/gcc/testsuite/gcc.dg/pr86179.c:7:6: internal 
compiler error: in vect_slp_analyze_node_operations, at tree-vect-slp.c:
0x1132edb vect_slp_analyze_node_operations
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4442
0x1132757 vect_slp_analyze_node_operations
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
0x1132757 vect_slp_analyze_node_operations
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
0x1132757 vect_slp_analyze_node_operations
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
0x1132757 vect_slp_analyze_node_operations
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4385
0x11355cf vect_slp_analyze_operations(vec_info*)
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-slp.c:4592
0x110cbe3 vect_analyze_loop_2
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-loop.c:2396
0x110e4af vect_analyze_loop(loop*, vec_info_shared*)
/home/alecop01/toolchain/src/gcc/gcc/tree-vect-loop.c:2986
0x114381b try_vectorize_loop_1
/home/alecop01/toolchain/src/gcc/gcc/tree-vectorizer.c:1009
0x11442d3 vectorize_loops()
/home/alecop01/toolchain/src/gcc/gcc/tree-vectorizer.c:1243
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1
FAIL: gcc.dg/pr86179.c (internal compiler error)

Alex

> 
> > Richard.
> >
> > 2021-05-31  Richard Biener  
> >
> > PR tree-optimization/97832
> > * tree-vectorizer.h (_slp_tree::failed): New.
> > * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
> > failed member.
> > (_slp_tree::~_slp_tree): Free failed.
> > (vect_build_slp_tree): Retain failed nodes and record
> > matches in them, copying that back out when running
> > into a cached fail.  Dump start and end of discovery.
> > (dt_sort_cmp): New.
> > (vect_build_slp_tree_2): Handle associatable chains
> > together doing more aggressive operand swapping.
> >
> > * gcc.dg/vect/pr97832-1.c: New testcase.
> > * gcc.dg/vect/pr97832-2.c: Likewise.
> > * gcc.dg/vect/pr97832-3.c: Likewise.
> > * g++.dg/vect/slp-pr98855.cc: XFAIL.
> > ---
> >  gcc/testsuite/g++.dg/vect/slp-pr98855.cc |   4 +-
> >  gcc/testsuite/gcc.dg/vect/pr97832-1.c|  17 +
> >  gcc/testsuite/gcc.dg/vect/pr97832-2.c|  29 ++
> >  gcc/testsuite/gcc.dg/vect/pr97832-3.c|  50 +++
> >  gcc/testsuite/gcc.dg/vect/slp-50.c   |  20 +
> >  gcc/tree-vect-slp.c  | 445 ++-
> >  gcc/tree-vectorizer.h|   5 +
> >  7 files changed, 560 insertions(+), 10 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-3.c
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-50.c


Re: [PATCH v2] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-09 Thread Segher Boessenkool
On Wed, Jun 09, 2021 at 11:06:31AM +0800, Xionghu Luo wrote:
> On 2021/6/9 05:07, Segher Boessenkool wrote:
> >> -/* { dg-final { scan-assembler "lxvd2x 34"  } } */
> >> -/* { dg-final { scan-assembler "stxvd2x 34" } } */
> >> +/* { dg-final { scan-assembler "lvx 2"  } } */
> >> +/* { dg-final { scan-assembler "stvx 2" } } */
> > 
> > Huh.  Is that correct?  Where did the other 32 loads and stores go?  Are
> > there now other insns generated that you should scan for?
> 
> This is expected change. lxvd2x+xxpermdi is replaced by lvx.  No need scan 
> other
> instructions. Similarly for stvx. 34 and 2 are *vector register names* 
> instead of
> counts.

Oh!  Oh my.  I read that as "scan-assembler-times" for some reason, as
you expected.

> Thanks for all the other comments, updated and committed with r12-1316.

Thank you!


Segher


Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-09 Thread Segher Boessenkool
On Wed, Jun 09, 2021 at 11:20:20AM +0800, Xionghu Luo wrote:
> On 2021/6/9 04:11, Segher Boessenkool wrote:
> > On Fri, Jun 04, 2021 at 09:40:58AM +0800, Xionghu Luo wrote:
>  rejecting combination of insns 6 and 7
>  original costs 4 + 4 = 8
>  replacement cost 12
> >>>
> >>> So what instructions were these?  Why did the store cost 4 but the new
> >>> one costs 12?
> > 
> > The *vsx_le_perm_store_ instruction has the *preferred*
> > alternative with cost 12, while the other alternative has cost 8.  Why
> > is that?  That looks like a bug.
> > (set_attr "length" "12,8")
> 
> 12 was introduced by Mike's commit c477a6674364(r6-2577), and all the 5
> vsx_le_perm_store_ are set to 12 for modes VSX_D/VSX_W/V8HI/V16QI
> /VSX_LE_128, I guess it is split to two rs6000_emit_le_vsx_permute before
> reload, but 3 rs6000_emit_le_vsx_permute after reload, so the length is
> 12, then it seems also not reasonable to change it from 12 to 8?  And I am
> not sure when the alternative 1 will be chosen?

This is the instruction *length*, not the cost directly.  The length
has to be correct, not lower than it will turn out to be that is, or on
some big testcases you will get branches that cannot reach their target,
and the resulting ICEs.

Alternatives are chosen by register allocation.  Before register
allocation attributes are taken as if alternative 0 is selected (well,
the first enabled alternative is selected, same thing here).

Which alternative is the expected (or wanted) one?  Either put that one
first, or if it is the longer one, give it an explicit cost.

> ;; The post-reload split requires that we re-permute the source
> ;; register in case it is still live.
> (define_split
>   [(set (match_operand:VSX_LE_128 0 "memory_operand")
> (match_operand:VSX_LE_128 1 "vsx_register_operand"))]
>   "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR
>&& !altivec_indexed_or_indirect_operand (operands[0], mode)"
>   [(const_int 0)]
> {
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>   rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
>   rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
>   DONE;
> }) 

So it seems like it is only 3 insns in the very unlucky case?  Normally
it will end up as just one simple store?  So you want an explicit cost
here then:

  ;; What is the insn_cost for this insn?  The target hook can still override
  ;; this.  For optimizing for size the "length" attribute is used instead.
  (define_attr "cost" "" (const_int 0))

So you would use something like

 (set_attr "cost" "4,*")

here (if I got that right, please check :-) )

HtH,


Segher


Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-09 Thread Jeff Law via Gcc-patches




On 5/25/2021 2:23 PM, Paul Eggert wrote:

The GCC manual's documentation of -fno-trampolines was apparently
written from an Ada point of view. However, when I read it I
understandably mistook it to say that -fno-trampolines also works for
C, C++, etc. It doesn't: it is silently ignored for these languages,
and I assume for any language other than Ada.

This confusion caused me to go in the wrong direction in a Gnulib
dicussion, as I mistakenly thought that entire C apps with nested
functions could be compiled with -fno-trampolines and then use nested
C functions in stack overflow handlers where the alternate stack
is allocated via malloc. I was wrong, as this won't work on common
platforms like x86-64 where malloc yields non-executable storage.

gcc/
* doc/invoke.texi (Code Gen Options):
* doc/tm.texi.in (Trampolines):
Document that -fno-trampolines and -ftrampolines work
only with Ada.
So Martin Uecker probably has the most state on this.  IIRC when we last 
discussed -fno-trampolines the belief was that it could be easily made 
to work independent of the language, but that it was ultimately an ABI 
change.   That ultimately derailed plans to use -fno-trampolines for 
other languages in the immediate term.


The patch is fine, I just wanted to give you a bit of background on the 
state.   I'll go ahead and commit it for you.


Jeff



Re: [PATCH] Introduce -Wcoverage-invalid-line-number

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/1/2021 8:09 AM, Martin Liška wrote:

Hello.

As seen in the PR, one can easily corrupt line number information and
we can end up with a function that ends before it starts ;)
I'm adding a new warning for that instead of the ICE.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR gcov-profile/100788

gcc/ChangeLog:

* common.opt: Add new option.
* coverage.c (coverage_begin_function): Emit warning instead on
the internal compiler error.
* doc/invoke.texi: Document the option.
* toplev.c (process_options): Enable it by default.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100788.c: New test.

OK
jeff



Re: [PATCH] libgcc libiberty: optimize and modernize standard string and memory functions

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/3/2021 12:51 PM, Seija K. via Gcc-patches wrote:

This patch optimizes and simplifies many of the standard string functions.

Since C99, some of the standard string functions have been changed to use
the restrict modifier.

diff --git a/libgcc/memcmp.c b/libgcc/memcmp.c
index 2348afe1d27f7..74195cf6baf13 100644
--- a/libgcc/memcmp.c
+++ b/libgcc/memcmp.c
@@ -7,10 +7,11 @@ memcmp (const void *str1, const void *str2, size_t count)
const unsigned char *s1 = str1;
const unsigned char *s2 = str2;

-  while (count-- > 0)
+  while (count--)
  {
-  if (*s1++ != *s2++)
- return s1[-1] < s2[-1] ? -1 : 1;
+  if (*s1 != *s2)
+ return *s1 < *s2 ? -1 : 1;
+  s1++, s2++;
  }
return 0;
  }
I don't see the point behind this change and the similar ones in other 
files.  I'd like to see some justification for the change beyond just 
"this looks cleaner to me".







diff --git a/libgcc/memcpy.c b/libgcc/memcpy.c
index 58b1e405627aa..616df78fd2969 100644
--- a/libgcc/memcpy.c
+++ b/libgcc/memcpy.c
@@ -2,7 +2,7 @@
  #include 

  void *
-memcpy (void *dest, const void *src, size_t len)
+memcpy (void * restrict dest, const void * restrict src, size_t len)
I would expect prototype fixes like this within libgcc to be reasonably 
safe.



diff --git a/libiberty/memcpy.c b/libiberty/memcpy.c
index 7f67d0bd1f26c..d388ae7f3506b 100644
--- a/libiberty/memcpy.c
+++ b/libiberty/memcpy.c
@@ -19,7 +19,7 @@ Copies @var{length} bytes from memory region
@var{in} to region
  void bcopy (const void*, void*, size_t);

  PTR
-memcpy (PTR out, const PTR in, size_t length)
+memcpy (PTR restrict out, const PTR restrict in, size_t length)
It's not entirely clear that using "restrict" is safe because libiberty 
is used by so many other projects which may be building with old 
compilers, non-gcc compilers, etc.


Generally the way to handle this is to use an autoconf check to confirm 
that the tools can handle the feature you want to use and ensure there's 
a fallback when they can't.


Jeff


Re: [PATCH] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd

2021-06-09 Thread Christophe Lyon via Gcc-patches
On Wed, 9 Jun 2021 at 17:39, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > This patch adds support for auto-vectorization of average value
> > computation using vhadd or vrhadd, for both MVE and Neon.
> >
> > The patch adds the needed [u]avg3_[floor|ceil] patterns to
> > vec-common.md, I'm not sure how to factorize them without introducing
> > an unspec iterator?
> >
> > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
> >
> > Vectorization works with 8-bit and 16 bit input/output vectors, but
> > not with 32-bit ones because the vectorizer expects wider types
> > availability for the intermediate values, but int32_t + int32_t does
> > not involve wider types in the IR.
> >
> > The testcases use a cast to int64_t to workaround this and enable
> > vectorization with vectors of 32-bit elements.
>
> We're probably in violent agreement here and just phrasing it
> differently :-), but IMO this isn't a workaround.  It would be actively
> wrong to use V(R)HADD for the (u)int32_t version without the cast:
> V(R)HADD effectively computes a 33-bit addition result, instead of
> the 32-bit addition result in the original (cast-free) source code.
>
> I guess one could argue that overflow is undefined for int32_t + int32_t
> and so we could compute the full 33-bit result instead of using modulo
> arithmetic.  That might be too surprising though.  It also wouldn't
> help with uint32_t, since we do need to use modulo arithmetic for
> uint32_t + uint32_t.

Right. It would have been clearer if I had written different functions
for the different types, only needing the cast for the int32_t.
I used "workaround" because the int64_t cast looks strange with int8_t
and int16_t.

> So personally I think it would be better to drop the last two paragraphs.
>
> The patch LGTM though, thanks.
>
OK, thanks.

> Richard


Re: [PATCH 1/2] arm: Auto-vectorization for MVE: vclz

2021-06-09 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> On Tue, 8 Jun 2021 at 13:58, Richard Sandiford
>  wrote:
>>
>> Christophe Lyon  writes:
>> > This patch adds support for auto-vectorization of clz for MVE.
>> >
>> > It does so by removing the unspec from mve_vclzq_ and uses
>> > 'clz' instead. It moves to neon_vclz expander from neon.md to
>> > vec-common.md and renames it into the standard name clz2.
>> >
>> > 2021-06-03  Christophe Lyon  
>> >
>> >   gcc/
>> >   * config/arm/iterators.md (): Remove VCLZQ_U, VCLZQ_S.
>> >   (VCLZQ): Remove.
>> >   * config/arm/mve.md (mve_vclzq_): Add '@' prefix,
>> >   remove  iterator.
>> >   (mve_vclzq_u): New.
>> >   * config/arm/neon.md (clz2): Rename to neon_vclz.
>> >   (neon_vclz> >   * config/arm/unspecs.md (VCLZQ_U, VCLZQ_S): Remove.
>> >   * config/arm/vec-common.md ... here. Add support for MVE.
>> >
>> >   gcc/testsuite/
>> >   * gcc.target/arm/simd/mve-vclz.c: New test.
>> > ---
>> >  gcc/config/arm/iterators.md  |  3 +--
>> >  gcc/config/arm/mve.md| 12 ++---
>> >  gcc/config/arm/neon.md   | 11 +---
>> >  gcc/config/arm/unspecs.md|  2 --
>> >  gcc/config/arm/vec-common.md | 13 +
>> >  gcc/testsuite/gcc.target/arm/simd/mve-vclz.c | 28 
>> >  6 files changed, 52 insertions(+), 17 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vclz.c
>> >
>> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>> > index 3042bafc6c6..5c4fe895268 100644
>> > --- a/gcc/config/arm/iterators.md
>> > +++ b/gcc/config/arm/iterators.md
>> > @@ -1288,7 +1288,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") 
>> > (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>> >  (VMOVLBQ_U "u") (VCVTQ_FROM_F_S "s") (VCVTQ_FROM_F_U 
>> > "u")
>> >  (VCVTPQ_S "s") (VCVTPQ_U "u") (VCVTNQ_S "s")
>> >  (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
>> > -(VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
>> > +(VREV32Q_U "u")
>> >  (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
>> >  (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
>> >  (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
>> > @@ -1538,7 +1538,6 @@ (define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S 
>> > VCVTQ_FROM_F_U])
>> >  (define_int_iterator VREV16Q [VREV16Q_U VREV16Q_S])
>> >  (define_int_iterator VCVTAQ [VCVTAQ_U VCVTAQ_S])
>> >  (define_int_iterator VDUPQ_N [VDUPQ_N_U VDUPQ_N_S])
>> > -(define_int_iterator VCLZQ [VCLZQ_U VCLZQ_S])
>> >  (define_int_iterator VADDVQ [VADDVQ_U VADDVQ_S])
>> >  (define_int_iterator VREV32Q [VREV32Q_U VREV32Q_S])
>> >  (define_int_iterator VMOVLBQ [VMOVLBQ_S VMOVLBQ_U])
>> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> > index 04aa612331a..99e46d0bc69 100644
>> > --- a/gcc/config/arm/mve.md
>> > +++ b/gcc/config/arm/mve.md
>> > @@ -435,16 +435,22 @@ (define_insn "mve_vdupq_n_"
>> >  ;;
>> >  ;; [vclzq_u, vclzq_s])
>> >  ;;
>> > -(define_insn "mve_vclzq_"
>> > +(define_insn "@mve_vclzq_s"
>> >[
>> > (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>> > - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
>> > -  VCLZQ))
>> > + (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")))
>> >]
>> >"TARGET_HAVE_MVE"
>> >"vclz.i%#  %q0, %q1"
>> >[(set_attr "type" "mve_move")
>> >  ])
>> > +(define_expand "mve_vclzq_u"
>> > +  [
>> > +   (set (match_operand:MVE_2 0 "s_register_operand")
>> > + (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand")))
>> > +  ]
>> > +  "TARGET_HAVE_MVE"
>> > +)
>> >
>> >  ;;
>> >  ;; [vclsq_s])
>> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
>> > index 18571d819eb..0fdffaf4ec4 100644
>> > --- a/gcc/config/arm/neon.md
>> > +++ b/gcc/config/arm/neon.md
>> > @@ -3018,7 +3018,7 @@ (define_insn "neon_vcls"
>> >[(set_attr "type" "neon_cls")]
>> >  )
>> >
>> > -(define_insn "clz2"
>> > +(define_insn "neon_vclz"
>> >[(set (match_operand:VDQIW 0 "s_register_operand" "=w")
>> >  (clz:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")))]
>> >"TARGET_NEON"
>> > @@ -3026,15 +3026,6 @@ (define_insn "clz2"
>> >[(set_attr "type" "neon_cnt")]
>> >  )
>> >
>> > -(define_expand "neon_vclz"
>> > -  [(match_operand:VDQIW 0 "s_register_operand")
>> > -   (match_operand:VDQIW 1 "s_register_operand")]
>> > -  "TARGET_NEON"
>> > -{
>> > -  emit_insn (gen_clz2 (operands[0], operands[1]));
>> > -  DONE;
>> > -})
>> > -
>> >  (define_insn "popcount2"
>> >[(set (match_operand:VE 0 "s_register_operand" "=w")
>> >  (popcount:VE (match_operand:VE 1 "s_register_operand" "w")))]
>> > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
>> > index ed1bc293b78..ad1c6edd005 100644
>> > --- a/gcc/config/arm/unspecs.md
>> > +++ 

PING [PATCH] teach compute_objsize about placement new (PR 100876)

2021-06-09 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571777.html

On 6/2/21 3:40 PM, Martin Sebor wrote:

The two forms of placement operator new defined in  return their
pointer argument and may not be displaced by user-defined functions.
But because they are ordinary (not built-in) functions this property
isn't reflected in their declarations alone, and there's no user-
level attribute to annotate them with.  When they are inlined
the property is transparent in the IL but when they are not (without
inlining such as -O0), calls to the operators appear in the IL and
cause -Wmismatched-new-delete to try to match them with the functions
called to deallocate memory.  When the pointer to the memory was
obtained from a function that matches the deallocator but not
the placement new, the warning falsely triggers.

The attached patch solves this by detecting calls to placement new
and treating them the same as those to other pass-through calls (such
as memset).  In addition, it also teaches -Wfree-nonheap-object about
placement delete, for a similar reason as above.  Finally, it also
adds a test for attribute fn spec indicating a function returns its
argument.  It's not necessary for the fix (I had initially though
placement new might have the attribute) but it seems appropriate
to check.

Tested on x86_64-linux.

Martin




Re: [PATCH 1/2] arm: Auto-vectorization for MVE: vclz

2021-06-09 Thread Christophe Lyon via Gcc-patches
On Tue, 8 Jun 2021 at 13:58, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > This patch adds support for auto-vectorization of clz for MVE.
> >
> > It does so by removing the unspec from mve_vclzq_ and uses
> > 'clz' instead. It moves to neon_vclz expander from neon.md to
> > vec-common.md and renames it into the standard name clz2.
> >
> > 2021-06-03  Christophe Lyon  
> >
> >   gcc/
> >   * config/arm/iterators.md (): Remove VCLZQ_U, VCLZQ_S.
> >   (VCLZQ): Remove.
> >   * config/arm/mve.md (mve_vclzq_): Add '@' prefix,
> >   remove  iterator.
> >   (mve_vclzq_u): New.
> >   * config/arm/neon.md (clz2): Rename to neon_vclz.
> >   (neon_vclz >   * config/arm/unspecs.md (VCLZQ_U, VCLZQ_S): Remove.
> >   * config/arm/vec-common.md ... here. Add support for MVE.
> >
> >   gcc/testsuite/
> >   * gcc.target/arm/simd/mve-vclz.c: New test.
> > ---
> >  gcc/config/arm/iterators.md  |  3 +--
> >  gcc/config/arm/mve.md| 12 ++---
> >  gcc/config/arm/neon.md   | 11 +---
> >  gcc/config/arm/unspecs.md|  2 --
> >  gcc/config/arm/vec-common.md | 13 +
> >  gcc/testsuite/gcc.target/arm/simd/mve-vclz.c | 28 
> >  6 files changed, 52 insertions(+), 17 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vclz.c
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 3042bafc6c6..5c4fe895268 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -1288,7 +1288,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") 
> > (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> >  (VMOVLBQ_U "u") (VCVTQ_FROM_F_S "s") (VCVTQ_FROM_F_U 
> > "u")
> >  (VCVTPQ_S "s") (VCVTPQ_U "u") (VCVTNQ_S "s")
> >  (VCVTNQ_U "u") (VCVTMQ_S "s") (VCVTMQ_U "u")
> > -(VCLZQ_U "u") (VCLZQ_S "s") (VREV32Q_U "u")
> > +(VREV32Q_U "u")
> >  (VREV32Q_S "s") (VADDLVQ_U "u") (VADDLVQ_S "s")
> >  (VCVTQ_N_TO_F_S "s") (VCVTQ_N_TO_F_U "u")
> >  (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s")
> > @@ -1538,7 +1538,6 @@ (define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S 
> > VCVTQ_FROM_F_U])
> >  (define_int_iterator VREV16Q [VREV16Q_U VREV16Q_S])
> >  (define_int_iterator VCVTAQ [VCVTAQ_U VCVTAQ_S])
> >  (define_int_iterator VDUPQ_N [VDUPQ_N_U VDUPQ_N_S])
> > -(define_int_iterator VCLZQ [VCLZQ_U VCLZQ_S])
> >  (define_int_iterator VADDVQ [VADDVQ_U VADDVQ_S])
> >  (define_int_iterator VREV32Q [VREV32Q_U VREV32Q_S])
> >  (define_int_iterator VMOVLBQ [VMOVLBQ_S VMOVLBQ_U])
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index 04aa612331a..99e46d0bc69 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -435,16 +435,22 @@ (define_insn "mve_vdupq_n_"
> >  ;;
> >  ;; [vclzq_u, vclzq_s])
> >  ;;
> > -(define_insn "mve_vclzq_"
> > +(define_insn "@mve_vclzq_s"
> >[
> > (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> > - (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
> > -  VCLZQ))
> > + (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")))
> >]
> >"TARGET_HAVE_MVE"
> >"vclz.i%#  %q0, %q1"
> >[(set_attr "type" "mve_move")
> >  ])
> > +(define_expand "mve_vclzq_u"
> > +  [
> > +   (set (match_operand:MVE_2 0 "s_register_operand")
> > + (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand")))
> > +  ]
> > +  "TARGET_HAVE_MVE"
> > +)
> >
> >  ;;
> >  ;; [vclsq_s])
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > index 18571d819eb..0fdffaf4ec4 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -3018,7 +3018,7 @@ (define_insn "neon_vcls"
> >[(set_attr "type" "neon_cls")]
> >  )
> >
> > -(define_insn "clz2"
> > +(define_insn "neon_vclz"
> >[(set (match_operand:VDQIW 0 "s_register_operand" "=w")
> >  (clz:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")))]
> >"TARGET_NEON"
> > @@ -3026,15 +3026,6 @@ (define_insn "clz2"
> >[(set_attr "type" "neon_cnt")]
> >  )
> >
> > -(define_expand "neon_vclz"
> > -  [(match_operand:VDQIW 0 "s_register_operand")
> > -   (match_operand:VDQIW 1 "s_register_operand")]
> > -  "TARGET_NEON"
> > -{
> > -  emit_insn (gen_clz2 (operands[0], operands[1]));
> > -  DONE;
> > -})
> > -
> >  (define_insn "popcount2"
> >[(set (match_operand:VE 0 "s_register_operand" "=w")
> >  (popcount:VE (match_operand:VE 1 "s_register_operand" "w")))]
> > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> > index ed1bc293b78..ad1c6edd005 100644
> > --- a/gcc/config/arm/unspecs.md
> > +++ b/gcc/config/arm/unspecs.md
> > @@ -556,8 +556,6 @@ (define_c_enum "unspec" [
> >VQABSQ_S
> >VDUPQ_N_U
> >VDUPQ_N_S
> > -  VCLZQ_U
> > -  VCLZQ_S

Re: [PATCH] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd

2021-06-09 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> This patch adds support for auto-vectorization of average value
> computation using vhadd or vrhadd, for both MVE and Neon.
>
> The patch adds the needed [u]avg3_[floor|ceil] patterns to
> vec-common.md, I'm not sure how to factorize them without introducing
> an unspec iterator?
>
> It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
>
> Vectorization works with 8-bit and 16 bit input/output vectors, but
> not with 32-bit ones because the vectorizer expects wider types
> availability for the intermediate values, but int32_t + int32_t does
> not involve wider types in the IR.
>
> The testcases use a cast to int64_t to workaround this and enable
> vectorization with vectors of 32-bit elements.

We're probably in violent agreement here and just phrasing it
differently :-), but IMO this isn't a workaround.  It would be actively
wrong to use V(R)HADD for the (u)int32_t version without the cast:
V(R)HADD effectively computes a 33-bit addition result, instead of
the 32-bit addition result in the original (cast-free) source code.

I guess one could argue that overflow is undefined for int32_t + int32_t
and so we could compute the full 33-bit result instead of using modulo
arithmetic.  That might be too surprising though.  It also wouldn't
help with uint32_t, since we do need to use modulo arithmetic for
uint32_t + uint32_t.

So personally I think it would be better to drop the last two paragraphs.

The patch LGTM though, thanks.

Richard


Re: [PATCH] libgomp: Compile tests with -march=i486 if needed

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/6/2021 7:03 AM, H.J. Lu via Gcc-patches wrote:

Don't add -march=i486 if atomic compare-and-swap is supported on 'int'.
This fixes libgomp tests with "-march=x86-64 -m32 -fcf-protection".

* testsuite/lib/libgomp.exp (libgomp_init): Don't add -march=i486
if atomic compare-and-swap is supported on 'int'.

OK
jeff



Re: [PATCH] doc/typo: mthread -> mthreads

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/7/2021 8:38 PM, imba-tjd via Gcc-patches wrote:

unrecognized command line option '-mthread'; did you mean '-mthreads'?

---
  gcc/doc/invoke.texi | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks.  Installed on the trunk.
jeff



Re: [PATCH] PR middle-end/53267: Constant fold BUILT_IN_FMOD.

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/9/2021 4:51 AM, Richard Biener via Gcc-patches wrote:

On Tue, Jun 8, 2021 at 9:36 PM Roger Sayle  wrote:


Here's a three line patch to implement constant folding for fmod,
fmodf and fmodl, which resolves an enhancement request from 2012.

The following patch has been tested on x86_64-pc-linux-gnu with
a make bootstrap and make -k check with no new failures.

Ok for mainline?

OK.  I double-checked and mpfr_fmod appeared in mpfr 2.4.0 and
we require at least 3.1.0.
I'm not sure if Roger has write access these days, so I went ahead and 
committed the patch on his behalf.


jeff



RE: [PATCH] tree-optimization/100981 - fix SLP patterns involving reductions

2021-06-09 Thread Tamar Christina via Gcc-patches
Hi Richi,

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> Biener
> Sent: Wednesday, June 9, 2021 1:53 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford 
> Subject: [PATCH] tree-optimization/100981 - fix SLP patterns involving
> reductions
> 
> The following fixes the SLP FMA patterns to preserve reduction info and the
> reduction vectorization to consider internal function call defs for the
> reduction stmt.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu, Andre verified
> we're not turning an ICE into a wrong-code bug (.COMPLEX_MUL now
> appears in the reduction chain).
> 
> Note there's a testcase for the ICE which adds -march=armv8.3-a and a
> testcase for correctness which doesn't since I didn't find any dg effective
> target verifying armv8.3-a code can run.

It's called arm_v8_3a_complex_neon_hw

Regards,
Tamar

> 
> 2021-06-09  Richard Biener  
> 
>   PR tree-optimization/100981
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): Use
>   gimple_get_lhs to also handle calls.
>   * tree-vect-slp-patterns.c (complex_pattern::build): Transfer
>   reduction info.
> 
>   * gfortran.dg/vect/pr100981-1.f90: New testcase.
>   * gfortran.dg/vect/pr100981-2.f90: Likewise.
> ---
>  gcc/tree-vect-loop.c | 2 +-
>  gcc/tree-vect-slp-patterns.c | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> ba36348b835..ee79808472c 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -5247,7 +5247,7 @@ vect_create_epilog_for_reduction (loop_vec_info
> loop_vinfo,
>gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt_info);
>  }
> 
> -  scalar_dest = gimple_assign_lhs (orig_stmt_info->stmt);
> +  scalar_dest = gimple_get_lhs (orig_stmt_info->stmt);
>scalar_type = TREE_TYPE (scalar_dest);
>scalar_results.create (group_size);
>new_scalar_dest = vect_create_destination_var (scalar_dest, NULL); diff --
> git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c index
> b25655c9876..2ed49cd9edc 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -544,6 +544,8 @@ complex_pattern::build (vec_info *vinfo)
>  {
>/* Calculate the location of the statement in NODE to replace.  */
>stmt_info = SLP_TREE_REPRESENTATIVE (node);
> +  stmt_vec_info reduc_def
> + = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
>gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
>tree lhs_old_stmt = gimple_get_lhs (old_stmt);
>tree type = TREE_TYPE (lhs_old_stmt); @@ -568,9 +570,10 @@
> complex_pattern::build (vec_info *vinfo)
>   = vinfo->add_pattern_stmt (call_stmt, stmt_info);
> 
>/* Make sure to mark the representative statement pure_slp and
> -  relevant. */
> +  relevant and transfer reduction info. */
>STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
>STMT_SLP_TYPE (call_stmt_info) = pure_slp;
> +  STMT_VINFO_REDUC_DEF (call_stmt_info) = reduc_def;
> 
>gimple_set_bb (call_stmt, gimple_bb (stmt_info->stmt));
>STMT_VINFO_VECTYPE (call_stmt_info) = SLP_TREE_VECTYPE (node);
> --
> 2.26.2


Re: [PATCH] PR tree-optimization/100781 - Do not calculate new values when evaluating a debug, statement.

2021-06-09 Thread Andrew MacLeod via Gcc-patches

On 6/9/21 7:48 AM, Richard Biener wrote:

On Tue, Jun 8, 2021 at 4:48 PM Andrew MacLeod  wrote:



Richard.


Andrew


OK, so this would be the simple way I'd tackle this in gcc11. This
should be quite safe.  Just treat debug_stmts as if they are not stmts..
and make a global query.   EVRP will still provide a contextual range as
good as it ever did, but it wont trigger ranger lookups on debug uses
any more.

It bootstraps on x86_64-pc-linux-gnu.  Is there a process other than
getting the OK to check this into the gcc 11 branch?  Does it go into
releases/gcc-11 ?

it would go into releases/gcc-11, yes.

Now,

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 6158a754dd6..fd7fa5e3dbb 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -945,7 +945,7 @@ gimple_ranger::range_of_expr (irange , tree
expr, gimple *stmt)
  return get_tree_range (r, expr);

// If there is no statement, just get the global value.
-  if (!stmt)
+  if (!stmt || is_gimple_debug (stmt))
  {

unfortunately the function is not documented so I'm just guessing here - why
do we end up passing in a debug stmt as 'stmt'?  (how should expr and stmt
relate?)  So isn't it better to do this check before

   if (!gimple_range_ssa_p (expr))
 return get_tree_range (r, expr);
This parts just handles the non-ssa names, so constants, types , things 
for which there is no lookup involved.. At least in GCC 11.


or even more better, assert we don't get a debug stmt here and fixup whoever
calls range_of_expr to not do that for debug stmts?  When I add this
assertion not even libgcc can configure...  backtraces look like


range_of_expr is the basic API for asking for the range of EXPR as if it 
occurs as a use on STMT.  STMT provides the context for a location in 
the IL.  if STMT isn't provided, it picks up the global range. EXPR does 
not necessarily have to occur on stmt, it's just the context point for 
finding the range.   It should be documented in value-query.h where it 
is initially declared, but I see it is not.  Sorry about that.. It seems 
to have gotten lost in the myriad of moves that were made. We have a 
definite lack of documentation on everything... that is next in 
priority,  once I get the remaining relation code in.


I don't think its wrong to supply a debug stmt.  stmt is simply the 
location in the IL for which we are querying the range of EXPR. So this 
is something like


# DEBUG d => d_10

and the query is asking for the range of d_10 at this point in the IL.. 
ie, what would it be on this stmt.   There isn't anything wrong with 
that..  and we certainly make no attempt to stop it for that reason..  
This change does prevent any analytics from happening (as does the one 
on trunk).




#0  fancy_abort (file=0x2a71420 "../../src/gcc-11-branch/gcc/gimple-range.cc",
 line=944,
 function=0x2a71638  "range_of_expr")
 at ../../src/gcc-11-branch/gcc/diagnostic.c:1884
#1  0x01f28275 in gimple_ranger::range_of_expr (this=0x3274eb0, r=...,
 expr=, stmt=)
 at ../../src/gcc-11-branch/gcc/gimple-range.cc:944
#2  0x0151ab7c in range_query::value_of_expr (this=0x3274eb0,
 name=, stmt=)
 at ../../src/gcc-11-branch/gcc/value-query.cc:86
#3  0x01f36ce3 in hybrid_folder::value_of_expr (this=0x7fffd990,
 op=, stmt=)
 at ../../src/gcc-11-branch/gcc/gimple-ssa-evrp.c:235
#4  0x01387804 in substitute_and_fold_engine::replace_uses_in (
 this=0x7fffd990, stmt=)
 at ../../src/gcc-11-branch/gcc/tree-ssa-propagate.c:871

so after EVRP we substitute and fold - but note we're not expecting to do
any more analysis in this phase but simply use the computed lattice,
since we don't substitute in unreachable code regions and thus SSA form
is temporarily broken that might otherwise cause issues.

But yes, substitute and fold does substitute into debug stmts (but we don't
analyze debug stmts).  So maybe somehow arrange for the substitute_and_fold
In which case this change is exactly what is needed. S will call 
range_of_expr asking for the range on the debug_stmt, and this change 
returns the global range instead of looking it up.

phase to always only use global ranges?  Maybe add the ability to
"lock" a ranger instance (disabling any further on-demand processing)?


The change on trunk is better as it effectively makes debug stmts always 
use whatever the best value we know is without doing anything new. It 
only reverts to the global range if there is nothing better.  It depends 
on a bunch of other structural changes I wouldn't want to try to port 
back to gcc 11, too much churn.


It might be possible to "lock" ranger, but the concept of a lattice 
doesn't really apply. There is no lattice..  there are only values as 
they appear at various points in the IL. We tracxk that mostly by 
propagating ranges to basic blocks.  When a query is made, if there is a 
readily available value it will use it. Unless it has reason to believe 
it is 

Re: [PATCH] s390: Add more vcond_mask patterns.

2021-06-09 Thread Andreas Krebbel via Gcc-patches
On 6/9/21 2:47 PM, Robin Dapp wrote:
>> I think the real problem is the expander name. That's why it could not be 
>> found by optab. The second
>> mode needs to be the int vector mode of op3. With that change the testcases 
>> work as expected:
>>
>> diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
>> index c80d582a300d..ab605b3d2cf3 100644
>> --- a/gcc/config/s390/vector.md
>> +++ b/gcc/config/s390/vector.md
>> @@ -715,7 +715,7 @@
>> DONE;
>>   })
>>
>> -(define_expand "vcond_mask_"
>> +(define_expand "vcond_mask_"
>> [(set (match_operand:V 0 "register_operand" "")
>>  (if_then_else:V
>>   (eq (match_operand: 3 "register_operand" "")
> 
> Ah, yes, it's indeed much simpler that way.  Attached the revised 
> version with the small change and the new tests as a single patch now.
> 
> Regtest and bootstrap was successful.

Ok. Thanks!

Andreas


Re: [PATCH] libstdc++: Fix Wrong param type in :atomic_ref<_Tp*>::wait [PR100889]

2021-06-09 Thread Jonathan Wakely via Gcc-patches
For other tests that don't link to libatomic we use if-constexpr to
limit which types we test e.g.

--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -33,14 +33,17 @@ template
S aa{ va };
S bb{ vb };
std::atomic_ref a{ aa };
-a.wait(bb);
-std::thread t([&]
+if constexpr (std::atomic_ref::is_always_lock_free)
  {
-   a.store(bb);
-   a.notify_one();
-  });
-a.wait(aa);
-t.join();
+   a.wait(bb);
+   std::thread t([&]
+ {
+   a.store(bb);
+   a.notify_one();
+ });
+   a.wait(aa);
+   t.join();
+  }
  }

int


Alternatively we could add arm*-*-* to the targets in
add_options_for_libatomic in testsuite/lib/dg-options.exp



Re: [PATCH] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd

2021-06-09 Thread Christophe Lyon via Gcc-patches
On Tue, 8 Jun 2021 at 13:50, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Wed, 2 Jun 2021 at 20:19, Richard Sandiford
> >  wrote:
> >>
> >> Christophe Lyon  writes:
> >> > This patch adds support for auto-vectorization of average value
> >> > computation using vhadd or vrhadd, for both MVE and Neon.
> >> >
> >> > The patch adds the needed [u]avg3_[floor|ceil] patterns to
> >> > vec-common.md, I'm not sure how to factorize them without introducing
> >> > an unspec iterator?
> >>
> >> Yeah, an int iterator would be one way, but I'm not sure it would
> >> make things better given the differences in how Neon and MVE handle
> >> their unspecs.
> >>
> >> > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
> >> >
> >> > Vectorization works with 8-bit and 16 bit input/output vectors, but
> >> > not with 32-bit ones because the vectorizer expects wider types
> >> > availability for the intermediate values, but int32_t + int32_t does
> >> > not involve wider types in the IR.
> >>
> >> Right.  Like you say, it's only valid to use V(R)HADD if, in the source
> >> code, the addition and shift have a wider precision than the operands.
> >> That happens naturally for 8-bit and 16-bit operands, since C arithmetic
> >> promotes them to "int" first.  But for 32-bit operands, the C code needs
> >> to do the addition and shift in 64 bits.  Doing them in 64 bits should
> >> be fine for narrower operands too.
> >>
> >> So:
> >>
> >> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c 
> >> > b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c
> >> > new file mode 100644
> >> > index 000..40489ecc67d
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vhadd-1.c
> >> > @@ -0,0 +1,31 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> >> > +/* { dg-add-options arm_v8_1m_mve } */
> >> > +/* { dg-additional-options "-O3" } */
> >> > +
> >> > +#include 
> >> > +
> >> > +#define FUNC(SIGN, TYPE, BITS, OP, NAME) \
> >> > +  void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ 
> >> > dest, \
> >> > +   TYPE##BITS##_t *a, 
> >> > TYPE##BITS##_t *b) { \
> >> > +int i;   \
> >> > +for (i=0; i < (128 / BITS); i++) {  
> >> >  \
> >> > +  dest[i] = (a[i] OP b[i]) >> 1; \
> >> > +}   
> >> >  \
> >> > +}
> >> > +
> >>
> >> …it should work if you make this "((int64_t) a[i] OP b[i]) >> 1".
> >
> > Indeed. However, this may not be obvious for end-users :-(
> >
> > I've updated my patch as attached: added the (int64_t) cast and
> > removed the xfail clauses.
> >
> > OK for trunk?
> >
> > Thanks,
> >
> > Christophe
> >
> >>
> >> > As noted in neon-vhadd-1.c, I couldn't write a test able to use Neon
> >> > vectorization with 64-bit vectors: we default to
> >> > -mvectorize-with-neon-quad, and attempts to use
> >> > -mvectorize-with-neon-double resulted in much worse code, which this
> >> > patch does not aim at improving.
> >>
> >> I guess this is because the MVE_2 mode iterators only include 128-bit 
> >> types.
> >> Leaving Neon double as future work sounds good though.
> > Note that I am focusing on MVE enablement at the moment.
>
> Right.  I meant “possible future work by someone somewhere”. :-)
>
> >> And yeah, the code for V(R)HADD-equivalent operations is much worse when
> >> V(R)HADD isn't available, since the compiler really does need to double
> >> the precision of the operands, do double-precision addition,
> >> do double-precision shifts, and then truncate back.  So this looks
> >> like the expected behaviour.
> >>
> >> Thanks,
> >> Richard
> >
> > From 493693b5c2f4e5fee7408062785930f723f2bd85 Mon Sep 17 00:00:00 2001
> > From: Christophe Lyon 
> > Date: Thu, 27 May 2021 20:11:28 +
> > Subject: [PATCH v2] arm: Auto-vectorization for MVE and Neon: vhadd/vrhadd
> >
> > This patch adds support for auto-vectorization of average value
> > computation using vhadd or vrhadd, for both MVE and Neon.
> >
> > The patch adds the needed [u]avg3_[floor|ceil] patterns to
> > vec-common.md, I'm not sure how to factorize them without introducing
> > an unspec iterator?
> >
> > It also adds tests for 'floor' and for 'ceil', each for MVE and Neon.
> >
> > Vectorization works with 8-bit and 16 bit input/output vectors, but
> > not with 32-bit ones because the vectorizer expects wider types
> > availability for the intermediate values, but int32_t + int32_t does
> > not involve wider types in the IR.
> >
> > As noted in neon-vhadd-1.c, I couldn't write a test able to use Neon
> > vectorization with 64-bit vectors: we default to
> > -mvectorize-with-neon-quad, and attempts to use
> > -mvectorize-with-neon-double resulted in much worse code, which 

Re: [PATCH 1/2] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-06-09 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> The problem in this PR is that we call VPSEL with a mask of vector
> type instead of HImode. This happens because operand 3 in vcond_mask
> is the pre-computed vector comparison and has vector type. The fix is
> to transfer this value to VPR.P0 by comparing operand 3 with a vector
> of constant 1 of the same type as operand 3.

The alternative is to implement TARGET_VECTORIZE_GET_MASK_MODE
and return HImode for MVE.  This is how AVX512 handles masks.

It might be worth trying that to see how it works.  I'm not sure
off-hand whether it'll produce worse code or better code.  However,
using HImode as the mask mode would help when defining other
predicated optabs in future.

Thanks,
Richard

> The pr100757*.c testcases are derived from
> gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
> different types and return values different from 0 and 1 to avoid
> commonalization with boolean masks.
>
> Reducing the number of iterations in pr100757-3.c from 32 to 8, we
> generate the code below:
>
> float a[32];
> float fn1(int d) {
>   int c = 4;
>   for (int b = 0; b < 8; b++)
> if (a[b] != 2.0f)
>   c = 5;
>   return c;
> }
>
> fn1:
> ldr r3, .L4+80
>   vpush.64{d8, d9}
>   vldrw.32q3, [r3]// q3=a[0..3]
>   vldr.64 d8, .L4 // q4=(2.0,2.0,2.0,2.0)
>   vldr.64 d9, .L4+8
>   addsr3, r3, #16
>   vcmp.f32eq, q3, q4  // cmp a[0..3] == (2.0,2.0,2.0,2.0)
>   vldr.64 d2, .L4+16  // q1=(1,1,1,1)
>   vldr.64 d3, .L4+24
>   vldrw.32q3, [r3]// q3=a[4..7]
>   vldr.64 d4, .L4+32  // q2=(0,0,0,0)
>   vldr.64 d5, .L4+40
>   vpsel q0, q1, q2// q0=select (a[0..3])
>   vcmp.f32eq, q3, q4  // cmp a[4..7] == (2.0,2.0,2.0,2.0)
>   vldmsp!, {d8-d9}
>   vpsel q2, q1, q2// q2=select (a[4..7])
>   vandq2, q0, q2  // q2=select (a[0..3]) && select 
> (a[4..7])
>   vldr.64 d6, .L4+48  // q3=(4.0,4.0,4.0,4.0)
>   vldr.64 d7, .L4+56
>   vldr.64 d0, .L4+64  // q0=(5.0,5.0,5.0,5.0)
>   vldr.64 d1, .L4+72
>   vcmp.i32  eq, q2, q1// cmp mask(a[0..7]) == (1,1,1,1)
>   vpsel q3, q3, q0// q3= vcond_mask(4.0,5.0)
>   vmov.32 r3, q3[0]   // keep the scalar max
>   vmov.32 r1, q3[1]
>   vmov.32 r0, q3[3]
>   vmov.32 r2, q3[2]
>   vmovs14, r1
>   vmovs15, r3
>   vmaxnm.f32  s15, s15, s14
>   vmovs14, r2
>   vmaxnm.f32  s15, s15, s14
>   vmovs14, r0
>   vmaxnm.f32  s15, s15, s14
>   vmovr0, s15
>   bx  lr
>   .L5:
>   .align  3
>   .L4:
>   .word   1073741824
>   .word   1073741824
>   .word   1073741824
>   .word   1073741824
>   .word   1
>   .word   1
>   .word   1
>   .word   1
>   .word   0
>   .word   0
>   .word   0
>   .word   0
>   .word   1082130432
>   .word   1082130432
>   .word   1082130432
>   .word   1082130432
>   .word   1084227584
>   .word   1084227584
>   .word   1084227584
>   .word   1084227584
>
> 2021-06-09  Christophe Lyon  
>
>   PR target/100757
>   gcc/
>   * config/arm/vec-common.md (vcond_mask_): Fix
>   expansion for MVE.
>
>   gcc/testsuite/
>   * gcc.target/arm/simd/pr100757.c: New test.
>   * gcc.target/arm/simd/pr100757-2.c: New test.
>   * gcc.target/arm/simd/pr100757-3.c: New test.
> ---
>  gcc/config/arm/vec-common.md  | 24 +--
>  .../gcc.target/arm/simd/pr100757-2.c  | 20 
>  .../gcc.target/arm/simd/pr100757-3.c  | 20 
>  gcc/testsuite/gcc.target/arm/simd/pr100757.c  | 19 +++
>  4 files changed, 81 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
>
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 0ffc7a9322c..ccdfaa8321f 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -478,8 +478,28 @@ (define_expand "vcond_mask_"
>  }
>else if (TARGET_HAVE_MVE)
>  {
> -  emit_insn (gen_mve_vpselq (VPSELQ_S, mode, operands[0],
> - operands[1], operands[2], operands[3]));
> +  /* Convert pre-computed vector comparison into VPR.P0 by comparing
> + operand 3 with a vector of '1', then use VPSEL.  */
> +  machine_mode cmp_mode = GET_MODE (operands[3]);
> +  rtx vpr_p0 = gen_reg_rtx (HImode);
> +  rtx one = gen_reg_rtx (cmp_mode);
> +  emit_move_insn (one, CONST1_RTX (cmp_mode));
> +  emit_insn (gen_mve_vcmpq (EQ, cmp_mode, vpr_p0, 

Re: [committed] analyzer: bitfield fixes [PR99212]

2021-06-09 Thread David Malcolm via Gcc-patches
On Wed, 2021-06-09 at 16:17 +0200, Christophe Lyon wrote:
> On Tue, 8 Jun 2021 at 21:34, David Malcolm via Gcc-patches
>  wrote:
> > 
> > This patch verifies the previous fix for bitfield sizes by
> > implementing
> > enough support for bitfields in the analyzer to get the test cases
> > to pass.
> > 
> > The patch implements support in the analyzer for reading from a
> > BIT_FIELD_REF, and support for folding BIT_AND_EXPR of a mask, to
> > handle
> > the cases generated in tests.
> > 
> > The existing bitfields tests in data-model-1.c turned out to rely
> > on
> > undefined behavior, in that they were assigning values to a signed
> > bitfield that were outside of the valid range of values.  I believe
> > that
> > that's why we were seeing target-specific differences in the test
> > results (PR analyzer/99212).  The patch updates the test to remove
> > the
> > undefined behaviors.
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > Lightly tested with cris-elf.
> > 
> > Pushed to trunk as r12-1303-
> > gd3b1ef7a83c0c0cd5b20a1dd1714b868f3d2b442.
> > 
> > gcc/analyzer/ChangeLog:
> >     PR analyzer/99212
> >     * region-model-manager.cc
> >     (region_model_manager::maybe_fold_binop): Add support for
> > folding
> >     BIT_AND_EXPR of compound_svalue and a mask constant.
> >     * region-model.cc (region_model::get_rvalue_1): Implement
> >     BIT_FIELD_REF in terms of...
> >     (region_model::get_rvalue_for_bits): New function.
> >     * region-model.h (region_model::get_rvalue_for_bits): New
> > decl.
> >     * store.cc (bit_range::from_mask): New function.
> >     (selftest::test_bit_range_intersects_p): New selftest.
> >     (selftest::assert_bit_range_from_mask_eq): New.
> >     (ASSERT_BIT_RANGE_FROM_MASK_EQ): New macro.
> >     (selftest::assert_no_bit_range_from_mask_eq): New.
> >     (ASSERT_NO_BIT_RANGE_FROM_MASK): New macro.
> >     (selftest::test_bit_range_from_mask): New selftest.
> >     (selftest::analyzer_store_cc_tests): Call the new
> > selftests.
> >     * store.h (bit_range::intersects_p): New.
> >     (bit_range::from_mask): New decl.
> >     (concrete_binding::get_bit_range): New accessor.
> >     (store_manager::get_concrete_binding): New overload taking
> >     const bit_range &.
> > 
> > gcc/testsuite/ChangeLog:
> >     PR analyzer/99212
> >     * gcc.dg/analyzer/bitfields-1.c: New test.
> >     * gcc.dg/analyzer/data-model-1.c (struct sbits): Make
> > bitfields
> >     explicitly signed.
> >     (test_44): Update test values assigned to the bits to ones
> > that
> >     fit in the range of the bitfield type.  Remove xfails.
> >     (test_45): Remove xfails.
> > 
> 
> Hi,
> 
> This patch is causing regressions / new failures on armeb (and other
> targets according to gcc-testresults):
> 
> FAIL: gcc.dg/analyzer/bitfields-1.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:24:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:26:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:29:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:31:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:36:3: warning: FALSE
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:41:3: warning: FALSE
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:81:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:83:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:85:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:87:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:92:3: warning: FALSE
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:94:3: warning: FALSE
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:96:3: warning: FALSE
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:113:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:115:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:117:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:119:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:121:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:123:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:125:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:127:3: warning: UNKNOWN
> 
> FAIL: gcc.dg/analyzer/data-model-1.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/analyzer/data-model-1.c:947:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/data-model-1.c:950:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/data-model-1.c:965:3: warning: UNKNOWN
> /gcc/testsuite/gcc.dg/analyzer/data-model-1.c:968:3: warning: UNKNOWN
> 
> For instance with target armeb-none-linux-gnueabihf
> 
> Can you check?

Sorry about this; I can reproduce the behavior and am investigating.

Dave



Re: [PATCH] Use range based loops to iterate over vec<> in various places

2021-06-09 Thread Martin Sebor via Gcc-patches

On 6/8/21 6:48 PM, Trevor Saunders wrote:

Hello,

This makes things a good bit shorter, and reduces complexity by removing
a bunch of index variables.


Very nice cleanup!  Thank you!

Martin



bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/analyzer/ChangeLog:

* call-string.cc (call_string::call_string): Iterate over vec<>
with range based for.
(call_string::operator=): Likewise.
(call_string::to_json): Likewise.
(call_string::hash): Likewise.
(call_string::calc_recursion_depth): Likewise.
* checker-path.cc (checker_path::fixup_locations): Likewise.
* constraint-manager.cc (equiv_class::equiv_class): Likewise.
(equiv_class::to_json): Likewise.
(equiv_class::hash): Likewise.
(constraint_manager::constraint_manager): Likewise.
(constraint_manager::operator=): Likewise.
(constraint_manager::hash): Likewise.
(constraint_manager::to_json): Likewise.
(constraint_manager::add_unknown_constraint): Likewise.
* engine.cc (impl_region_model_context::on_svalue_leak):
Likewise.
(on_liveness_change): Likewise.
(impl_region_model_context::on_unknown_change): Likewise.
* program-state.cc (extrinsic_state::to_json): Likewise.
(sm_state_map::set_state): Likewise.
* region-model.cc (make_test_compound_type): Likewise.
(test_canonicalization_4): Likewise.

gcc/ChangeLog:

* auto-profile.c (afdo_find_equiv_class): Iterate over vec<>
with range based for.
* cgraphclones.c (cgraph_node::create_clone): Likewise.
(cgraph_node::create_version_clone): Likewise.
* dwarf2out.c (output_call_frame_info): Likewise.
* gcc.c (do_specs_vec): Likewise.
(do_spec_1): Likewise.
(driver::set_up_specs): Likewise.
* gimple-loop-jam.c (any_access_function_variant_p): Likewise.
* ifcvt.c (cond_move_process_if_block): Likewise.
* ipa-modref.c (modref_lattice::add_escape_point): Likewise.
(analyze_parms): Likewise.
(modref_write_escape_summary): Likewise.
(update_escape_summary_1): Likewise.
* ipa-prop.h (ipa_copy_agg_values): Likewise.
(ipa_release_agg_values): Likewise.
* lower-subreg.c (decompose_multiword_subregs): Likewise.
* lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise.
(hash_tree): Likewise.
(prune_offload_funcs): Likewise.
* sel-sched-dump.c (dump_insn_vector): Likewise.
* timevar.c (timer::named_items::print): Likewise.
* tree-cfgcleanup.c (cleanup_control_flow_pre): Likewise.
(cleanup_tree_cfg_noloop): Likewise.
* tree-data-ref.c (dump_data_references): Likewise.
(print_dir_vectors): Likewise.
(print_dist_vectors): Likewise.
(dump_data_dependence_relation): Likewise.
(dump_data_dependence_relations): Likewise.
(dump_dist_dir_vectors): Likewise.
(dump_ddrs): Likewise.
(prune_runtime_alias_test_list): Likewise.
(create_runtime_alias_checks): Likewise.
(free_subscripts): Likewise.
(save_dist_v): Likewise.
(save_dir_v): Likewise.
(invariant_access_functions): Likewise.
(same_access_functions): Likewise.
(access_functions_are_affine_or_constant_p): Likewise.
(compute_all_dependences): Likewise.
(find_data_references_in_stmt): Likewise.
(graphite_find_data_references_in_stmt): Likewise.
(free_dependence_relations): Likewise.
(free_data_refs): Likewise.
* tree-into-ssa.c (dump_currdefs): Likewise.
(rewrite_update_phi_arguments): Likewise.
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Likewise.
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Likewise.
* tree-ssa-structalias.c (constraint_set_union): Likewise.
(merge_node_constraints): Likewise.
(move_complex_constraints): Likewise.
(do_deref): Likewise.
(get_constraint_for_address_of): Likewise.
(get_constraint_for_1): Likewise.
(process_all_all_constraints): Likewise.
(make_constraints_to): Likewise.
(handle_rhs_call): Likewise.
* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr):
Likewise.
(vect_slp_analyze_node_dependences): Likewise.
(vect_slp_analyze_instance_dependence): Likewise.
(vect_record_base_alignments): Likewise.
(vect_get_peeling_costs_all_drs): Likewise.
(vect_peeling_supportable): Likewise.
* tree-vectorizer.c (vec_info::~vec_info): Likewise.
(vec_info::free_stmt_vec_infos): Likewise.

gcc/c/ChangeLog:

* c-parser.c (c_parser_translation_unit): Iterate over vec<>
with range based for.
(c_parser_postfix_expression): Likewise.

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_call_expression): Iterate over 

Re: [PATCH] libstdc++: Fix Wrong param type in :atomic_ref<_Tp*>::wait [PR100889]

2021-06-09 Thread Thomas Rodgers via Gcc-patches
Pretty sure I know this is, I'll work on a fix today.

On Wed, Jun 9, 2021 at 7:30 AM Christophe Lyon 
wrote:

> Hi,
>
>
> On Wed, 9 Jun 2021 at 01:05, Thomas Rodgers via Gcc-patches
>  wrote:
> >
> > Tested x86_64-pc-linux-gnu, committed to master, backported to
> > releases/gcc-11.
> >
> > On Tue, Jun 8, 2021 at 8:44 AM Jonathan Wakely 
> wrote:
> >
> > > On Tue, 8 Jun 2021 at 01:29, Thomas Rodgers wrote:
> > >
> > >> This time without the repeatred [PR] in the subject line.
> > >>
> > >> Fixes libstdc++/100889
> > >>
> > >
> > > This should be part of the ChangeLog entry instead, preceded by PR so
> it
> > > updates bugzilla, i.e.
> > >
> > >
> > >
> > >> libstdc++-v3/ChangeLog:
> > >>
> > >
> > > PR libstdc++/100889
> > >
> > >
> > >> * include/bits/atomic_base.h (atomic_ref<_Tp*>::wait):
> > >> Change parameter type from _Tp to _Tp*.
> > >> * testsuite/29_atomics/atomic_ref/wait_notify.cc: Extend
> > >> coverage of types tested.
> > >>
> > >
> > >
> > > OK for trunk and gcc-11 with that change, thanks.
> > >
> > >
>
> This is causing a regression on old arm targets:
> --target arm-none-linux-gnueabi
> RUNTESTFLAGS: -march=armv5t
>
> FAIL: 29_atomics/atomic_ref/wait_notify.cc (test for excess errors)
> Excess errors:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
> /ccaaHfBz.o: in function `void
> std::__atomic_impl::store(double*,
> std::remove_volatile::type, std::memory_order)':
>
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:971:
> undefined reference to `__atomic_store_8'
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
> /ccaaHfBz.o: in function `std::remove_volatile::type
> std::__atomic_impl::load(double const*, std::memory_order)':
>
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
> undefined reference to `__atomic_load_8'
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
>
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
> undefined reference to `__atomic_load_8'
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
>
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
> undefined reference to `__atomic_load_8'
> collect2: error: ld returned 1 exit status
>
> Can you check?
>
> Thanks
>
>


Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-09 Thread Jeff Law via Gcc-patches




On 6/9/2021 6:26 AM, Bernhard Reutner-Fischer wrote:

On Wed, 9 Jun 2021 14:35:01 +0300
Claudiu Zissulescu  wrote:


ISTM you only set the expected flags in the switch so i would have
set only that variable and have grepped only once after the switch for
brevity.

ARC has various FPU extensions, some of them are common to EM and HS
architectures, others are specific for only one of them. Hence, the grep
commands are ensuring that we accept the right fpu extension for the
right ARC architecture.

Right. Which you'd accomplish more terse if you would set flags_ok in
the switch cited below and after the switch have one single grep à la

if [ -n "$flags_ok" ] \
   && ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:blank:]]*$flags_ok," \
  ${srcdir}/config/arc/arc-cpus.def
then
   echo "Unknown floating point type used in "\
"--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
   exit 1
fi

The reason is that all case statements in the $with_fpu switch just
differ in the allowed flags AFAICS. But as you prefer.

last comments below.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 13c2004e3c52..09886c8635e0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4258,18 +4258,61 @@ case "${target}" in
;;
  
  	arc*-*-*)

-   supported_defaults="cpu"
+   supported_defaults="cpu fpu"
  
+		new_cpu=hs38_linux

if [ x"$with_cpu" = x ] \
-   || grep "^ARC_CPU ($with_cpu," \
+   || grep -E "^ARC_CPU \($with_cpu," \
   ${srcdir}/config/arc/arc-cpus.def \
   > /dev/null; then

Cosmetics: You may want to keep the non "-E" version but use -q
and drop the redirect to /dev/null.


 # Ok
-true
+new_cpu=$with_cpu
else
 echo "Unknown cpu used in --with-cpu=$with_cpu" 1>&2
 exit 1
fi
+
+   # see if --with-fpu matches any of the supported FPUs
+   case "$with_fpu" in
+   "")
+   # OK
+   ;;
+   fpus | fpus_div | fpus_fma | fpus_all)
+   # OK if em or hs
+   if ! grep -q -E 
"^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*[emhs]+," \

you changed only the first :space: to :blank: (everywhere)


+  ${srcdir}/config/arc/arc-cpus.def
+   then
+echo "Unknown floating point type used in "\
+"--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+exit 1
+   fi
+   ;;
+   fpuda | fpuda_div | fpuda_fma | fpuda_all)
+   # OK only em
+   if ! grep -q -E 
"^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*em," \
+  ${srcdir}/config/arc/arc-cpus.def
+   then
+echo "Unknown floating point type used in "\
+ "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+exit 1
+   fi
+   ;;
+   fpud | fpud_div | fpud_fma | fpud_all)
+   # OK only hs
+   if ! grep -q -E 
"^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*hs," \
+  ${srcdir}/config/arc/arc-cpus.def
+   then
+echo "Unknown floating point type used in"\
+ "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2

missing trailing space after 'in' in "used in"\

LGTM with that fixed with or without cutting down on grep lines,
but of course i cannot approve it.

But your comments are greatly appreciated ;-)

ACK'd for the trunk once the issues Bernhard mentioned are addressed.

jeff



[PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-09 Thread Marcel Vollweiler

This patch fixes an issue with global_load assembler functions leading
to a "invalid operand for instruction" error since in different LLVM
versions those functions use either one or two registers.

In this patch a compatibility check is added to the configure.ac.

Marcel
-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
gcc/ChangeLog: adapt configuration according to assembler fix of global_load 
functions.

* config.in: Regenerate.
* config/gcn/gcn.c (print_operand_address): Fix for global_load 
assembler
functions.
* configure: Regenerate.
* configure.ac: Fix for global_load assembler functions. 

diff --git a/gcc/config.in b/gcc/config.in
index e54f59c..18e6271 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1431,6 +1431,12 @@
 #endif
 
 
+/* Define if your assembler has fixed global_load functions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_ASM_GLOBAL_LOAD_FIXED
+#endif
+
+
 /* Define to 1 if you have the `getchar_unlocked' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GETCHAR_UNLOCKED
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91f..2d27296 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -5481,13 +5481,24 @@ print_operand_address (FILE *file, rtx mem)
  if (vgpr_offset == NULL_RTX)
/* In this case, the vector offset is zero, so we use the first
   lane of v1, which is initialized to zero.  */
-   fprintf (file, "v[1:2]");
+   {
+#if HAVE_GCN_ASM_GLOBAL_LOAD_FIXED == 1
+   fprintf (file, "v1"); 
+#else
+   fprintf (file, "v[1:2]");
+#endif
+   }
  else if (REG_P (vgpr_offset)
   && VGPR_REGNO_P (REGNO (vgpr_offset)))
{
- fprintf (file, "v[%d:%d]",
-  REGNO (vgpr_offset) - FIRST_VGPR_REG,
-  REGNO (vgpr_offset) - FIRST_VGPR_REG + 1);
+#if HAVE_GCN_ASM_GLOBAL_LOAD_FIXED == 1
+   fprintf (file, "v%d",
+REGNO (vgpr_offset) - FIRST_VGPR_REG);
+#else
+   fprintf (file, "v[%d:%d]",
+REGNO (vgpr_offset) - FIRST_VGPR_REG,
+REGNO (vgpr_offset) - FIRST_VGPR_REG + 1);
+#endif
}
  else
output_operand_lossage ("bad ADDR_SPACE_GLOBAL address");
diff --git a/gcc/configure b/gcc/configure
index 4a9e4fa..8e044c3 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28909,6 +28909,36 @@ case "$target" in
 ;;
 esac
 
+# This tests if the assembler supports two registers for global_load functions
+# (like in LLVM versions <12) or one register (like in LLVM 12).
+case "$target" in
+  amdgcn-* | gcn-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler fix for 
global_load functions" >&5
+$as_echo_n "checking assembler fix for global_load functions... " >&6; }
+gcc_cv_as_global_load_fixed=yes
+if test x$gcc_cv_as != x; then
+  cat > conftest.s < /dev/null 2>&1; then
+gcc_cv_as_global_load_fixed=no
+  fi
+  rm -f conftest.s conftest.o conftest
+fi
+if test x$gcc_cv_as_global_load_fixed = xyes; then
+
+$as_echo "#define HAVE_GCN_ASM_GLOBAL_LOAD_FIXED 1" >>confdefs.h
+
+else
+
+$as_echo "#define HAVE_GCN_ASM_GLOBAL_LOAD_FIXED 0" >>confdefs.h
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_global_load_fixed" >&5
+$as_echo "$gcc_cv_as_global_load_fixed" >&6; }
+;;
+esac
+
 # ??? Not all targets support dwarf2 debug_line, even within a version
 # of gas.  Moreover, we need to emit a valid instruction to trigger any
 # info to the output file.  So, as supported targets are added to gas 2.11,
diff --git a/gcc/configure.ac b/gcc/configure.ac
index d9fc3c2..d7ea224 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5357,6 +5357,30 @@ case "$target" in
 ;;
 esac
 
+# This tests if the assembler supports two registers for global_load functions
+# (like in LLVM versions <12) or one register (like in LLVM 12).
+case "$target" in
+  amdgcn-* | gcn-*)
+AC_MSG_CHECKING(assembler fix for global_load functions)
+gcc_cv_as_global_load_fixed=yes
+if test x$gcc_cv_as != x; then
+  cat > conftest.s < /dev/null 2>&1; then
+gcc_cv_as_global_load_fixed=no
+  fi
+  rm -f conftest.s conftest.o conftest
+fi
+if test x$gcc_cv_as_global_load_fixed = xyes; then
+  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 1, [Define if your assembler 
has fixed global_load functions.])
+else
+  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 0, [Define if your assembler 
has fixed global_load functions.])
+fi
+AC_MSG_RESULT($gcc_cv_as_global_load_fixed)
+;;
+esac
+
 # ??? Not all targets support dwarf2 debug_line, even within a 

[PATCH][RFC] Vectorize BB reductions

2021-06-09 Thread Richard Biener
This adds a simple reduction vectorization capability to the
non-loop vectorizer.  Simple meaning it lacks any of the fancy
ways to generate the reduction epilogue but only supports
those we can handle via a direct internal function reducing
a vector to a scalar.  One of the main reasons is to avoid
massive refactoring at this point but also that more complex
epilogue operations are hardly profitable.

Mixed sign reductions are for now fend off and I'm not finally
settled with whether we want an explicit SLP node for the
reduction epilogue operation.  Handling mixed signs could be
done by multiplying with a { 1, -1, .. } vector.

What's missing and visible in testcases is more appropriate
costing and consuming of load permutations at the SLP instance
root (which is where maybe an explicit SLP node could make
things prettier).

Posting this as RFC in any case somebody has ideas or opinions
around the representation of this.  This is not the final version
as I do intend to add some capabilities as I see fit.
The version passes bootstrap & regtest on x86_64-unknown-linux-gnu,
next up is some statistics on SPEC where we vectorize these
cases and where we give up (and for what reasons and how often).

Richard.

2021-06-08  Richard Biener   

PR tree-optimization/54400
* tree-vectorizer.h (enum slp_instance_kind): Add
slp_inst_kind_bb_reduc.
(reduction_fn_for_scalar_code): Declare.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
* tree-vect-slp.c (vect_slp_linearize_chain): Split out
chain linearization from vect_build_slp_tree_2.
(vect_slp_check_for_constructors): Recognize associatable
chains.
...
(vectorize_slp_instance_root_stmt): Generate code for the
BB reduction epilogue.

* gcc.dg/vect/bb-slp-pr54400.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c |  26 ++
 gcc/tree-vect-loop.c   |   2 +-
 gcc/tree-vect-slp.c| 304 +
 gcc/tree-vectorizer.h  |   2 +
 4 files changed, 277 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
new file mode 100644
index 000..eda85104dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_float} */
+/* { dg-additional-options "-w -Wno-psabi -ffast-math" } */
+
+#include "tree-vect.h"
+
+typedef float v4sf __attribute__((vector_size(sizeof(float)*4)));
+
+float __attribute__((noipa))
+f(v4sf v)
+{
+  return v[0]+v[1]+v[2]+v[3];
+}
+
+int
+main ()
+{
+  check_vect ();
+  v4sf v = (v4sf) { 1.f, 3.f, 4.f, 2.f };
+  if (f (v) != 10.f)
+abort ();
+  return 0;
+}
+
+/* We are lacking an effective target for .REDUC_PLUS support.  */
+/* { dg-final { scan-tree-dump "basic block part vectorized" "slp2" { target 
x86_64-*-* } } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ee79808472c..51a46a6d852 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3209,7 +3209,7 @@ fold_left_reduction_fn (tree_code code, internal_fn 
*reduc_fn)
 
Return FALSE if CODE currently cannot be vectorized as reduction.  */
 
-static bool
+bool
 reduction_fn_for_scalar_code (enum tree_code code, internal_fn *reduc_fn)
 {
   switch (code)
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 6b78e8feb82..58a968ba222 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1442,6 +1442,77 @@ dt_sort_cmp (const void *op1_, const void *op2_, void *)
   return (int)op1->code - (int)op2->code;
 }
 
+/* Linearize the associatable expression chain at START with the
+   associatable operation CODE (where PLUS_EXPR also allows MINUS_EXPR),
+   filling CHAIN with the result and using WORKLIST as intermediate storage.
+   CODE_STMT and ALT_CODE_STMT are filled with the first stmt using CODE
+   or MINUS_EXPR.  *CHAIN_STMTS if not NULL is filled with all computation
+   stmts, starting with START.  */
+
+static void
+vect_slp_linearize_chain (vec_info *vinfo,
+ vec > ,
+ vec ,
+ enum tree_code code, gimple *start,
+ gimple *_stmt, gimple *_code_stmt,
+ vec *chain_stmts)
+{
+  /* For each lane linearize the addition/subtraction (or other
+ uniform associatable operation) expression tree.  */
+  worklist.safe_push (std::make_pair (code, start));
+  while (!worklist.is_empty ())
+{
+  auto entry = worklist.pop ();
+  gassign *stmt = as_a  (entry.second);
+  enum tree_code in_code = entry.first;
+  enum tree_code this_code = gimple_assign_rhs_code (stmt);
+  /* Pick some stmts suitable for SLP_TREE_REPRESENTATIVE.  */
+  if (!code_stmt
+ && gimple_assign_rhs_code (stmt) 

Re: [PATCH,rs6000] Fix p10 fusion test cases for -m32

2021-06-09 Thread Segher Boessenkool
On Wed, May 26, 2021 at 04:53:24PM -0500, Aaron Sawdey wrote:
> > The counts of fusion insns are slightly different for 32-bit compiles
> > so we need different scan-assembler-times counts for 32 and 64 bit
> > in the test cases for p10 fusion.

Have you checked all of these actually make sense?  It seems too be
two themes: long long being two regs / two insns on 32-bit, and the
you get cmpld etc. only on 64-bit?

Okay for trunk if this is actually the expected output.  Thanks!


Segher


Re: [PATCH] libstdc++: Fix Wrong param type in :atomic_ref<_Tp*>::wait [PR100889]

2021-06-09 Thread Christophe Lyon via Gcc-patches
Hi,


On Wed, 9 Jun 2021 at 01:05, Thomas Rodgers via Gcc-patches
 wrote:
>
> Tested x86_64-pc-linux-gnu, committed to master, backported to
> releases/gcc-11.
>
> On Tue, Jun 8, 2021 at 8:44 AM Jonathan Wakely  wrote:
>
> > On Tue, 8 Jun 2021 at 01:29, Thomas Rodgers wrote:
> >
> >> This time without the repeatred [PR] in the subject line.
> >>
> >> Fixes libstdc++/100889
> >>
> >
> > This should be part of the ChangeLog entry instead, preceded by PR so it
> > updates bugzilla, i.e.
> >
> >
> >
> >> libstdc++-v3/ChangeLog:
> >>
> >
> > PR libstdc++/100889
> >
> >
> >> * include/bits/atomic_base.h (atomic_ref<_Tp*>::wait):
> >> Change parameter type from _Tp to _Tp*.
> >> * testsuite/29_atomics/atomic_ref/wait_notify.cc: Extend
> >> coverage of types tested.
> >>
> >
> >
> > OK for trunk and gcc-11 with that change, thanks.
> >
> >

This is causing a regression on old arm targets:
--target arm-none-linux-gnueabi
RUNTESTFLAGS: -march=armv5t

FAIL: 29_atomics/atomic_ref/wait_notify.cc (test for excess errors)
Excess errors:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
/ccaaHfBz.o: in function `void
std::__atomic_impl::store(double*,
std::remove_volatile::type, std::memory_order)':
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:971:
undefined reference to `__atomic_store_8'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
/ccaaHfBz.o: in function `std::remove_volatile::type
std::__atomic_impl::load(double const*, std::memory_order)':
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
undefined reference to `__atomic_load_8'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
undefined reference to `__atomic_load_8'
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/arm-none-linux-gnueabi/bin/ld:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/arm-none-linux-gnueabi/libstdc++-v3/include/bits/atomic_base.h:979:
undefined reference to `__atomic_load_8'
collect2: error: ld returned 1 exit status

Can you check?

Thanks


Re: [committed] analyzer: bitfield fixes [PR99212]

2021-06-09 Thread Christophe Lyon via Gcc-patches
On Tue, 8 Jun 2021 at 21:34, David Malcolm via Gcc-patches
 wrote:
>
> This patch verifies the previous fix for bitfield sizes by implementing
> enough support for bitfields in the analyzer to get the test cases to pass.
>
> The patch implements support in the analyzer for reading from a
> BIT_FIELD_REF, and support for folding BIT_AND_EXPR of a mask, to handle
> the cases generated in tests.
>
> The existing bitfields tests in data-model-1.c turned out to rely on
> undefined behavior, in that they were assigning values to a signed
> bitfield that were outside of the valid range of values.  I believe that
> that's why we were seeing target-specific differences in the test
> results (PR analyzer/99212).  The patch updates the test to remove the
> undefined behaviors.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Lightly tested with cris-elf.
>
> Pushed to trunk as r12-1303-gd3b1ef7a83c0c0cd5b20a1dd1714b868f3d2b442.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/99212
> * region-model-manager.cc
> (region_model_manager::maybe_fold_binop): Add support for folding
> BIT_AND_EXPR of compound_svalue and a mask constant.
> * region-model.cc (region_model::get_rvalue_1): Implement
> BIT_FIELD_REF in terms of...
> (region_model::get_rvalue_for_bits): New function.
> * region-model.h (region_model::get_rvalue_for_bits): New decl.
> * store.cc (bit_range::from_mask): New function.
> (selftest::test_bit_range_intersects_p): New selftest.
> (selftest::assert_bit_range_from_mask_eq): New.
> (ASSERT_BIT_RANGE_FROM_MASK_EQ): New macro.
> (selftest::assert_no_bit_range_from_mask_eq): New.
> (ASSERT_NO_BIT_RANGE_FROM_MASK): New macro.
> (selftest::test_bit_range_from_mask): New selftest.
> (selftest::analyzer_store_cc_tests): Call the new selftests.
> * store.h (bit_range::intersects_p): New.
> (bit_range::from_mask): New decl.
> (concrete_binding::get_bit_range): New accessor.
> (store_manager::get_concrete_binding): New overload taking
> const bit_range &.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/99212
> * gcc.dg/analyzer/bitfields-1.c: New test.
> * gcc.dg/analyzer/data-model-1.c (struct sbits): Make bitfields
> explicitly signed.
> (test_44): Update test values assigned to the bits to ones that
> fit in the range of the bitfield type.  Remove xfails.
> (test_45): Remove xfails.
>

Hi,

This patch is causing regressions / new failures on armeb (and other
targets according to gcc-testresults):

FAIL: gcc.dg/analyzer/bitfields-1.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:24:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:26:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:29:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:31:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:36:3: warning: FALSE
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:41:3: warning: FALSE
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:81:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:83:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:85:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:87:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:92:3: warning: FALSE
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:94:3: warning: FALSE
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:96:3: warning: FALSE
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:113:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:115:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:117:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:119:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:121:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:123:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:125:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/bitfields-1.c:127:3: warning: UNKNOWN

FAIL: gcc.dg/analyzer/data-model-1.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/analyzer/data-model-1.c:947:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/data-model-1.c:950:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/data-model-1.c:965:3: warning: UNKNOWN
/gcc/testsuite/gcc.dg/analyzer/data-model-1.c:968:3: warning: UNKNOWN

For instance with target armeb-none-linux-gnueabihf

Can you check?

Thanks,

Christophe


> Signed-off-by: David Malcolm 
> ---
>  gcc/analyzer/region-model-manager.cc |  46 -
>  gcc/analyzer/region-model.cc |  65 ++-
>  gcc/analyzer/region-model.h  |   4 +
>  gcc/analyzer/store.cc| 186 +++
>  gcc/analyzer/store.h |  18 ++
>  

Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 2:53 PM Matthias Kretz  wrote:
>
> On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote:
> > On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz  wrote:
> > > From: Matthias Kretz 
> > >
> > > Explicitly support use of the stdx::simd implementation in situations
> > > where the user links TUs that were compiled with different -m flags. In
> > > general, this is always a (quasi) ODR violation for inline functions
> > > because at least codegen may differ in important ways. However, in the
> > > resulting executable only one (unspecified which one) of them might be
> > > used. For simd we want to support users to compile code multiple times,
> > > with different -m flags and have a runtime dispatch to the TU matching
> > > the target CPU. But if internal functions are not inlined this may lead
> > > to unexpected performance loss or execution of illegal instructions.
> > > Therefore, inline functions that are not marked as always_inline must
> > > use an additional template parameter somewhere in their name, to
> > > disambiguate between the different -m translations.
> >
> > Note that excessive use of always_inline can cause compile-time issues
> > (see for example PR99785).
>
> Ah, I should verify whether that's also the reason my stdx::simd
> implementation is slow to compile.
>
> However, I really must have the always_inline semantics in most of the places
> stdx::simd uses it. Because most of these functions compile to either a single
> function call or a single instruction (often f0 -> f1 -> f2 -> single
> instruction). If the inliner even makes one single wrong inlining decision,
> the whole program might slow down by integral factors, not only small
> percentages. And without inlining these functions, -fno-inline builds (i.e.
> many debug builds) become unbearably slow (aka useless).

Understood.  Note I think that the slow compile is a bug and there must be
a way to address it, there's just too large testcases at the moment to get
a hand on what kind of callgraphs cause which problem and why and how
we might want to address this.

> > I wonder whether the inlines can be
> > placed in an anonymous namespace instead of the difficult to maintain
> > explict list of SIMD features?
>
> It's possible, and part of the patch:
>
> +  namespace
> +  {
> +struct _OdrEnforcer {};
> +  }
> [...]
> +  using __odr_helper
> += conditional_t<__machine_flags() == 0, _OdrEnforcer,
> +   _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;
>
> It can potentially blow up the code size and the instruction cache usage,
> though. The trade-off isn't obvious to make. I guess I can't promise that
> mixing different compiler flags is ODR violation free
>
> > It also doesn't solve the issue when
> > instantiating the functions from a TU which contains #pragma GCC target
> > sections to switch options, of course.
>
> Yes. Can I get PR83875? ;-)

heh ;)

Richard.

> - Matthias
>
> > > Signed-off-by: Matthias Kretz 
> > >
> > > libstdc++-v3/ChangeLog:
> > > * include/experimental/bits/simd.h: Move feature detection bools
> > > and add __have_avx512bitalg, __have_avx512vbmi2,
> > > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> > > __have_avx512vnni, __have_avx512vpopcntdq.
> > > (__detail::__machine_flags): New function which returns a unique
> > > uint64 depending on relevant -m and -f flags.
> > > (__detail::__odr_helper): New type alias for either an anonymous
> > > type or a type specialized with the __machine_flags number.
> > > (_SimdIntOperators): Change template parameters from _Impl to
> > > _Tp, _Abi because _Impl now has an __odr_helper parameter which
> > > may be _OdrEnforcer from the anonymous namespace, which makes
> > > for a bad base class.
> > > (many): Either add __odr_helper template parameter or mark as
> > > always_inline.
> > > * include/experimental/bits/simd_detail.h: Add defines for
> > > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> > > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> > > * include/experimental/bits/simd_builtin.h: Add __odr_helper
> > > template parameter or mark as always_inline.
> > > * include/experimental/bits/simd_fixed_size.h: Ditto.
> > > * include/experimental/bits/simd_math.h: Ditto.
> > > * include/experimental/bits/simd_scalar.h: Ditto.
> > > * include/experimental/bits/simd_neon.h: Add __odr_helper
> > > template parameter.
> > > * include/experimental/bits/simd_ppc.h: Ditto.
> > > * include/experimental/bits/simd_x86.h: Ditto.
> > >
> > > ---
> > >
> > >  libstdc++-v3/include/experimental/bits/simd.h | 380 --
> > >  .../include/experimental/bits/simd_builtin.h  |  41 +-
> > >  .../include/experimental/bits/simd_detail.h   |  40 ++
> > >  

Re: [PATCH v3 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 1:17 AM Hongtao Liu  wrote:
>
> On Wed, Jun 9, 2021 at 2:02 AM H.J. Lu via Gcc-patches
>  wrote:
> >
> > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTO
> > operands to vector broadcast from an integer with AVX2.
> > 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
> > won't increase stack alignment requirement and blocks transformation by
> > the combine pass.
> > 3. Update PR 87767 tests to expect integer broadcast instead of broadcast
> > from memory.
> > 4. Update avx512f_cond_move.c to expect integer broadcast.
> >
> > A small benchmark:
> >
> > https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
> >
> > shows that broadcast is a little bit faster on Intel Core i7-8559U:
> >
> > $ make
> > gcc -g -I. -O2   -c -o test.o test.c
> > gcc -g   -c -o memory.o memory.S
> > gcc -g   -c -o broadcast.o broadcast.S
> > gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
> > gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
> > ./test
> > memory  : 147215
> > broadcast   : 121213
> > vec_dup_sse2: 171366
> > $
> >
> > broadcast is also smaller:
> >
> > $ size memory.o broadcast.o
> >textdata bss dec hex filename
> > 132   0   0 132  84 memory.o
> > 122   0   0 122  7a broadcast.o
> > $
> Only the mov scenario was measured, when it comes to avx512 embedded
> broadcast it's 1 avx512 embedded broadcast instruction vs at least 3
> instructions: mov + broadcast + op. I'm not sure which is better?
>
> take pr87767 for example.
> vpaddd .LC1(%rip){1to16}, %zmm0, %zmm0
> .LC1:
> .long   3
>
> vs
>
> movl 3, %eax
> vpbroadcastd %eax, %zmm1
> vpaddd %zmm1, %zmm0, %zmm0
>

https://gitlab.com/x86-benchmarks/microbenchmark/-/commits/vpaddd/broadcast

shows that vpbroadcastd is faster:

[hjl@gnu-skx-1 microbenchmark]$ make
gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
gcc -g   -c -o memory.o memory.S
gcc -g   -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory  : 425538
broadcast   : 375260
[hjl@gnu-skx-1 microbenchmark]$


-- 
H.J.


Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-09 Thread Bill Schmidt via Gcc-patches

On 6/9/21 5:54 AM, Richard Biener wrote:

On Wed, Jun 9, 2021 at 12:53 PM Richard Biener
 wrote:

On Tue, Jun 8, 2021 at 10:45 PM Bill Schmidt  wrote:

On 6/7/21 12:48 PM, Bill Schmidt wrote:

On 6/7/21 12:45 PM, Richard Biener wrote:

On Mon, Jun 7, 2021 at 5:38 PM Bill Schmidt 
wrote:

On 6/7/21 8:36 AM, Richard Biener wrote:

Some maybe obvious issue - what about DOS-style path hosts?
You seem to build ../ strings to point to parent dirs... I'm not sure
what we do elsewhere - I suppose we arrange for appropriate
-I command line arguments?


Well, actually it's just using "./" to identify the build directory,
though I see what you mean about potential Linux bias. There is
precedent for this syntax identifying the build directory in config.gcc
for target macro files:

#  tm_file  A list of target macro files, if different from
#   "$cpu_type/$cpu_type.h". Usually it's
constructed
#   per target in a way like this:
#   tm_file="${tm_file} dbxelf.h elfos.h
${cpu_type.h}/elf.h"
#   Note that the preferred order is:
#   - specific target header
"${cpu_type}/${cpu_type.h}"
#   - generic headers like dbxelf.h elfos.h, etc.
#   - specializing target headers like
${cpu_type.h}/elf.h
#   This helps to keep OS specific stuff out of
the CPU
#   defining header ${cpu_type}/${cpu_type.h}.
#
#   It is possible to include
automatically-generated
#   build-directory files by prefixing them with
"./".
#   All other files should relative to
$srcdir/config.

...so I thought I would try to be consistent with this change. In patch
0025 I use this as follows:

--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -491,6 +491,7 @@ powerpc*-*-*)
   extra_options="${extra_options} g.opt fused-madd.opt
rs6000/rs6000-tables.opt"
   target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-logue.c
\$(srcdir)/config/rs6000/rs6000-call.c"
   target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
+   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
;;
pru-*-*)
cpu_type=pru

I'm open to trying to do something different if you think that's
appropriate.

Well, I'm not sure whether/how to resolve this.  You could try
building a cross to powerpc-linux from a x86_64-mingw host ...
maybe there's one on the CF?  Or some of your fellow RedHat
people have access to mingw or the like envs to try whether it
just works with your change ...

Otherwise it looks OK.

I'll see what I can find.  Thanks again for reviewing the patch!


Hm.  Ultimately, I think the cross compiler case is doomed unless mingw
already handles converting forward slashes to back slashes. There's no
single syntax that works on both Windows and Linux. (There's no mingw
server in the compile farm to play with.)

I'm inclined to accept both "./" and ".\" for native builds, and kick
the can down the road beyond that.  What do you think?

Can't you use PATH_SEPARATOR somehow?  See file-find.c / incpath.c
or gcc.c for uses and system.h for where it is defined.

Err - DIR_SEPARATOR of course.


Ah -- following the breadcrumbs a little further, it appears that 
IS_DIR_SEPARATOR is the proper way to handle both Linux- and 
Windows-style syntax.  Thanks for the pointer!  That should work. Will test.


Bill



Richard.


Richard.


Bill


Bill



Richard.


Thanks for your help with this!

Bill



Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags

2021-06-09 Thread Matthias Kretz
On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote:
> On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz  wrote:
> > From: Matthias Kretz 
> > 
> > Explicitly support use of the stdx::simd implementation in situations
> > where the user links TUs that were compiled with different -m flags. In
> > general, this is always a (quasi) ODR violation for inline functions
> > because at least codegen may differ in important ways. However, in the
> > resulting executable only one (unspecified which one) of them might be
> > used. For simd we want to support users to compile code multiple times,
> > with different -m flags and have a runtime dispatch to the TU matching
> > the target CPU. But if internal functions are not inlined this may lead
> > to unexpected performance loss or execution of illegal instructions.
> > Therefore, inline functions that are not marked as always_inline must
> > use an additional template parameter somewhere in their name, to
> > disambiguate between the different -m translations.
> 
> Note that excessive use of always_inline can cause compile-time issues
> (see for example PR99785).

Ah, I should verify whether that's also the reason my stdx::simd 
implementation is slow to compile.

However, I really must have the always_inline semantics in most of the places 
stdx::simd uses it. Because most of these functions compile to either a single 
function call or a single instruction (often f0 -> f1 -> f2 -> single 
instruction). If the inliner even makes one single wrong inlining decision, 
the whole program might slow down by integral factors, not only small 
percentages. And without inlining these functions, -fno-inline builds (i.e. 
many debug builds) become unbearably slow (aka useless).

> I wonder whether the inlines can be
> placed in an anonymous namespace instead of the difficult to maintain
> explict list of SIMD features?

It's possible, and part of the patch:

+  namespace
+  {
+struct _OdrEnforcer {};
+  }
[...]
+  using __odr_helper
+= conditional_t<__machine_flags() == 0, _OdrEnforcer,
+   _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;

It can potentially blow up the code size and the instruction cache usage, 
though. The trade-off isn't obvious to make. I guess I can't promise that 
mixing different compiler flags is ODR violation free 

> It also doesn't solve the issue when
> instantiating the functions from a TU which contains #pragma GCC target
> sections to switch options, of course.

Yes. Can I get PR83875? ;-)

- Matthias

> > Signed-off-by: Matthias Kretz 
> > 
> > libstdc++-v3/ChangeLog:
> > * include/experimental/bits/simd.h: Move feature detection bools
> > and add __have_avx512bitalg, __have_avx512vbmi2,
> > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> > __have_avx512vnni, __have_avx512vpopcntdq.
> > (__detail::__machine_flags): New function which returns a unique
> > uint64 depending on relevant -m and -f flags.
> > (__detail::__odr_helper): New type alias for either an anonymous
> > type or a type specialized with the __machine_flags number.
> > (_SimdIntOperators): Change template parameters from _Impl to
> > _Tp, _Abi because _Impl now has an __odr_helper parameter which
> > may be _OdrEnforcer from the anonymous namespace, which makes
> > for a bad base class.
> > (many): Either add __odr_helper template parameter or mark as
> > always_inline.
> > * include/experimental/bits/simd_detail.h: Add defines for
> > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> > * include/experimental/bits/simd_builtin.h: Add __odr_helper
> > template parameter or mark as always_inline.
> > * include/experimental/bits/simd_fixed_size.h: Ditto.
> > * include/experimental/bits/simd_math.h: Ditto.
> > * include/experimental/bits/simd_scalar.h: Ditto.
> > * include/experimental/bits/simd_neon.h: Add __odr_helper
> > template parameter.
> > * include/experimental/bits/simd_ppc.h: Ditto.
> > * include/experimental/bits/simd_x86.h: Ditto.
> > 
> > ---
> > 
> >  libstdc++-v3/include/experimental/bits/simd.h | 380 --
> >  .../include/experimental/bits/simd_builtin.h  |  41 +-
> >  .../include/experimental/bits/simd_detail.h   |  40 ++
> >  .../experimental/bits/simd_fixed_size.h   |  39 +-
> >  .../include/experimental/bits/simd_math.h |  45 ++-
> >  .../include/experimental/bits/simd_neon.h |   4 +-
> >  .../include/experimental/bits/simd_ppc.h  |   4 +-
> >  .../include/experimental/bits/simd_scalar.h   |  71 +++-
> >  .../include/experimental/bits/simd_x86.h  |   4 +-
> >  9 files changed, 440 insertions(+), 188 deletions(-)
> > 
> > --
> > ──
> > 
> >  

[PATCH] tree-optimization/100981 - fix SLP patterns involving reductions

2021-06-09 Thread Richard Biener
The following fixes the SLP FMA patterns to preserve reduction
info and the reduction vectorization to consider internal function
call defs for the reduction stmt.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, Andre
verified we're not turning an ICE into a wrong-code bug
(.COMPLEX_MUL now appears in the reduction chain).

Note there's a testcase for the ICE which adds -march=armv8.3-a
and a testcase for correctness which doesn't since I didn't find
any dg effective target verifying armv8.3-a code can run.

2021-06-09  Richard Biener  

PR tree-optimization/100981
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
gimple_get_lhs to also handle calls.
* tree-vect-slp-patterns.c (complex_pattern::build): Transfer
reduction info.

* gfortran.dg/vect/pr100981-1.f90: New testcase.
* gfortran.dg/vect/pr100981-2.f90: Likewise.
---
 gcc/tree-vect-loop.c | 2 +-
 gcc/tree-vect-slp-patterns.c | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ba36348b835..ee79808472c 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5247,7 +5247,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   gcc_assert (STMT_VINFO_RELATED_STMT (orig_stmt_info) == stmt_info);
 }
   
-  scalar_dest = gimple_assign_lhs (orig_stmt_info->stmt);
+  scalar_dest = gimple_get_lhs (orig_stmt_info->stmt);
   scalar_type = TREE_TYPE (scalar_dest);
   scalar_results.create (group_size); 
   new_scalar_dest = vect_create_destination_var (scalar_dest, NULL);
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index b25655c9876..2ed49cd9edc 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -544,6 +544,8 @@ complex_pattern::build (vec_info *vinfo)
 {
   /* Calculate the location of the statement in NODE to replace.  */
   stmt_info = SLP_TREE_REPRESENTATIVE (node);
+  stmt_vec_info reduc_def
+   = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
   gimple* old_stmt = STMT_VINFO_STMT (stmt_info);
   tree lhs_old_stmt = gimple_get_lhs (old_stmt);
   tree type = TREE_TYPE (lhs_old_stmt);
@@ -568,9 +570,10 @@ complex_pattern::build (vec_info *vinfo)
= vinfo->add_pattern_stmt (call_stmt, stmt_info);
 
   /* Make sure to mark the representative statement pure_slp and
-relevant. */
+relevant and transfer reduction info. */
   STMT_VINFO_RELEVANT (call_stmt_info) = vect_used_in_scope;
   STMT_SLP_TYPE (call_stmt_info) = pure_slp;
+  STMT_VINFO_REDUC_DEF (call_stmt_info) = reduc_def;
 
   gimple_set_bb (call_stmt, gimple_bb (stmt_info->stmt));
   STMT_VINFO_VECTYPE (call_stmt_info) = SLP_TREE_VECTYPE (node);
-- 
2.26.2


Re: [PATCH] s390: Add more vcond_mask patterns.

2021-06-09 Thread Robin Dapp via Gcc-patches

I think the real problem is the expander name. That's why it could not be found 
by optab. The second
mode needs to be the int vector mode of op3. With that change the testcases 
work as expected:

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300d..ab605b3d2cf3 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -715,7 +715,7 @@
DONE;
  })

-(define_expand "vcond_mask_"
+(define_expand "vcond_mask_"
[(set (match_operand:V 0 "register_operand" "")
 (if_then_else:V
  (eq (match_operand: 3 "register_operand" "")


Ah, yes, it's indeed much simpler that way.  Attached the revised 
version with the small change and the new tests as a single patch now.


Regtest and bootstrap was successful.

Regards
 Robin
>From 790feb49a6494c33f0fd4386a8148e0a4880e33b Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 8 Jun 2021 11:49:26 +0200
Subject: [PATCH] s390: Add more vcond_mask patterns.

Add vcond_mask patterns that allow another mode for the condition/mask
than the source and target so e.g. boolean conditions become possible:

  vtarget = bool_cond ? vsource1 : vsource2.
---
 gcc/config/s390/vector.md |  2 +-
 .../s390/vector/vcond-mixed-double.c  | 41 +++
 .../s390/vector/vcond-mixed-float.c   | 41 +++
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300..ab605b3d2cf 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -715,7 +715,7 @@
   DONE;
 })
 
-(define_expand "vcond_mask_"
+(define_expand "vcond_mask_"
   [(set (match_operand:V 0 "register_operand" "")
 	(if_then_else:V
 	 (eq (match_operand: 3 "register_operand" "")
diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
new file mode 100644
index 000..015bc8ab473
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-double.c
@@ -0,0 +1,41 @@
+/* Check for vectorization of mixed conditionals.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -march=z14 -mzarch -ftree-vectorize -fdump-tree-vect-details" } */
+
+double xd[1024];
+double zd[1024];
+double wd[1024];
+
+long xl[1024];
+long zl[1024];
+long wl[1024];
+
+void foold ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = xl[i] ? zd[i] : wd[i];
+}
+
+void foodl ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zl[i] = xd[i] ? zl[i] : wl[i];
+}
+
+void foold2 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = (xd[i] > 0) ? zd[i] : wd[i];
+}
+
+void foold3 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zd[i] = (xd[i] > 0. & wd[i] < 0.) ? zd[i] : wd[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
new file mode 100644
index 000..ba40ffe8660
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/vcond-mixed-float.c
@@ -0,0 +1,41 @@
+/* Check for vectorization of mixed conditionals.  */
+/* { dg-do compile { target { s390*-*-* } } } */
+/* { dg-options "-O2 -march=z14 -mzarch -ftree-vectorize -fdump-tree-vect-details" } */
+
+float xf[1024];
+float zf[1024];
+float wf[1024];
+
+int xi[1024];
+int zi[1024];
+int wi[1024];
+
+void fooif ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = xi[i] ? zf[i] : wf[i];
+}
+
+void foofi ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zi[i] = xf[i] ? zi[i] : wi[i];
+}
+
+void fooif2 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = (xf[i] > 0) ? zf[i] : wf[i];
+}
+
+void fooif3 ()
+{
+  int i;
+  for (i = 0; i < 1024; ++i)
+zf[i] = (xf[i] > 0.f & wf[i] < 0.f) ? zf[i] : wf[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
-- 
2.23.0



Re: [PATCH] tree-optimization/97832 - handle associatable chains in SLP discovery

2021-06-09 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 5:00 PM Richard Biener  wrote:
>
> This makes SLP discovery handle associatable (including mixed
> plus/minus) chains better by swapping operands across the whole
> chain.  To work this adds caching of the 'matches' lanes for
> failed SLP discovery attempts, thereby fixing a failed SLP
> discovery for the slp-pr98855.cc testcase which results in
> building an operand from scalars as expected.  Unfortunately
> this makes us trip over the cost threshold so I'm XFAILing the
> testcase for now.
>
> For BB vectorization all this doesn't work because we have no way
> to distinguish good from bad associations as we eventually build
> operands from scalars and thus not fail in the classical sense.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, I'll re-do
> last years SPEC tests as well.  Now that it is stage1 I'm considering
> to push this if there are no further comments given I plan to
> re-use some of the machinery for vectorization of BB reductions.

Now finally pushed as ce670e4faafb296d1f1a7828d20f8c8ba4686797

> Richard.
>
> 2021-05-31  Richard Biener  
>
> PR tree-optimization/97832
> * tree-vectorizer.h (_slp_tree::failed): New.
> * tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
> failed member.
> (_slp_tree::~_slp_tree): Free failed.
> (vect_build_slp_tree): Retain failed nodes and record
> matches in them, copying that back out when running
> into a cached fail.  Dump start and end of discovery.
> (dt_sort_cmp): New.
> (vect_build_slp_tree_2): Handle associatable chains
> together doing more aggressive operand swapping.
>
> * gcc.dg/vect/pr97832-1.c: New testcase.
> * gcc.dg/vect/pr97832-2.c: Likewise.
> * gcc.dg/vect/pr97832-3.c: Likewise.
> * g++.dg/vect/slp-pr98855.cc: XFAIL.
> ---
>  gcc/testsuite/g++.dg/vect/slp-pr98855.cc |   4 +-
>  gcc/testsuite/gcc.dg/vect/pr97832-1.c|  17 +
>  gcc/testsuite/gcc.dg/vect/pr97832-2.c|  29 ++
>  gcc/testsuite/gcc.dg/vect/pr97832-3.c|  50 +++
>  gcc/testsuite/gcc.dg/vect/slp-50.c   |  20 +
>  gcc/tree-vect-slp.c  | 445 ++-
>  gcc/tree-vectorizer.h|   5 +
>  7 files changed, 560 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-50.c
>
> diff --git a/gcc/testsuite/g++.dg/vect/slp-pr98855.cc 
> b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc
> index 0b4e479b513..b1010326698 100644
> --- a/gcc/testsuite/g++.dg/vect/slp-pr98855.cc
> +++ b/gcc/testsuite/g++.dg/vect/slp-pr98855.cc
> @@ -81,4 +81,6 @@ void encrypt_n(const uint8_t in[], uint8_t out[], size_t 
> blocks, uint32_t *EK)
>  }
>  }
>
> -// { dg-final { scan-tree-dump-times "not vectorized: vectorization is not 
> profitable" 2 "slp1" { target x86_64-*-* i?86-*-* } } }
> +// This used to work on { target x86_64-*-* i?86-*-* } but a fix in SLP
> +// discovery makes us trip over the threshold again.
> +// { dg-final { scan-tree-dump-times "not vectorized: vectorization is not 
> profitable" 2 "slp1" { xfail *-*-* } } }
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr97832-1.c
> new file mode 100644
> index 000..063fc7bd717
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr97832-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +/* { dg-require-effective-target vect_double } */
> +
> +double a[1024], b[1024], c[1024];
> +
> +void foo()
> +{
> +  for (int i = 0; i < 256; ++i)
> +{
> +  a[2*i] = a[2*i] + b[2*i] - c[2*i];
> +  a[2*i+1] = a[2*i+1] - b[2*i+1] - c[2*i+1];
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
> new file mode 100644
> index 000..4f0578120ee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr97832-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +/* { dg-require-effective-target vect_double } */
> +
> +void foo1x1(double* restrict y, const double* restrict x, int clen)
> +{
> +  int xi = clen & 2;
> +  double f_re = x[0+xi+0];
> +  double f_im = x[4+xi+0];
> +  int clen2 = (clen+xi) * 2;
> +#pragma GCC unroll 0
> +  for (int c = 0; c < clen2; c += 8) {
> +// y[c] = y[c] - x[c]*conj(f);
> +#pragma GCC unroll 4
> +for (int k = 0; k < 4; ++k) {
> +  double x_re = x[c+0+k];
> +  double x_im = x[c+4+k];
> +  double y_re = y[c+0+k];
> +  double y_im = y[c+4+k];
> +  y_re = y_re - x_re * f_re - x_im * f_im;;
> +  y_im = y_im + x_re * f_im - x_im * f_re;
> +  

Re: [PATCH] testsuite: Add vect_floatint_cvt to gcc.dg/vect/pr56541.c

2021-06-09 Thread Robin Dapp via Gcc-patches

gcc/testsuite/ChangeLog:

     * gcc.dg/vect/pr56541.c: Add vect_floatint_cvt.


OK.  I'd tend to use XFAIL for a compiler bug that we haven't fixed.  In
this case the target doesn't support what the test is trying to do.  So
skipping the test in one manner or another seems better.


Just realized I seem to have have committed the only option that is 
definitely wrong...  Probably got confused because of switching back and 
forth between the two correct alternatives too often.  Therefore I will 
be committing


diff --git a/gcc/testsuite/gcc.dg/vect/pr56541.c 
b/gcc/testsuite/gcc.dg/vect/pr56541.c

index e1cee6d0b0e..fa86142716b 100644
--- a/gcc/testsuite/gcc.dg/vect/pr56541.c
+++ b/gcc/testsuite/gcc.dg/vect/pr56541.c
@@ -24,4 +24,4 @@ void foo()
 }
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
target { ! vect_floatint_cvt } xfail { ! vect_cond_mixed } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
target { vect_floatint_cvt } xfail { ! vect_cond_mixed } } } } */


as obvious (the target should support vect_floatint_cvt for the test).

Regards
 Robin

--

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr56541.c: Do not skip vect_cond_mixed.


Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-09 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 9 Jun 2021 14:35:01 +0300
Claudiu Zissulescu  wrote:

> > ISTM you only set the expected flags in the switch so i would have
> > set only that variable and have grepped only once after the switch for
> > brevity.  
> 
> ARC has various FPU extensions, some of them are common to EM and HS 
> architectures, others are specific for only one of them. Hence, the grep 
> commands are ensuring that we accept the right fpu extension for the 
> right ARC architecture.

Right. Which you'd accomplish more terse if you would set flags_ok in
the switch cited below and after the switch have one single grep à la

if [ -n "$flags_ok" ] \
  && ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:blank:]]*$flags_ok," \
 ${srcdir}/config/arc/arc-cpus.def
then
  echo "Unknown floating point type used in "\
   "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
  exit 1
fi

The reason is that all case statements in the $with_fpu switch just
differ in the allowed flags AFAICS. But as you prefer.

last comments below.

> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 13c2004e3c52..09886c8635e0 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4258,18 +4258,61 @@ case "${target}" in
>   ;;
>  
>   arc*-*-*)
> - supported_defaults="cpu"
> + supported_defaults="cpu fpu"
>  
> + new_cpu=hs38_linux
>   if [ x"$with_cpu" = x ] \
> - || grep "^ARC_CPU ($with_cpu," \
> + || grep -E "^ARC_CPU \($with_cpu," \
>  ${srcdir}/config/arc/arc-cpus.def \
>  > /dev/null; then

Cosmetics: You may want to keep the non "-E" version but use -q
and drop the redirect to /dev/null.

># Ok
> -  true
> +  new_cpu=$with_cpu
>   else
>echo "Unknown cpu used in --with-cpu=$with_cpu" 1>&2
>exit 1
>   fi
> +
> + # see if --with-fpu matches any of the supported FPUs
> + case "$with_fpu" in
> + "")
> + # OK
> + ;;
> + fpus | fpus_div | fpus_fma | fpus_all)
> + # OK if em or hs
> + if ! grep -q -E 
> "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*[emhs]+," \

you changed only the first :space: to :blank: (everywhere)

> +${srcdir}/config/arc/arc-cpus.def
> + then
> +  echo "Unknown floating point type used in "\
> +  "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
> +  exit 1
> + fi
> + ;;
> + fpuda | fpuda_div | fpuda_fma | fpuda_all)
> + # OK only em
> + if ! grep -q -E 
> "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*em," \
> +${srcdir}/config/arc/arc-cpus.def
> + then
> +  echo "Unknown floating point type used in "\
> +   "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
> +  exit 1
> + fi
> + ;;
> + fpud | fpud_div | fpud_fma | fpud_all)
> + # OK only hs
> + if ! grep -q -E 
> "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*hs," \
> +${srcdir}/config/arc/arc-cpus.def
> + then
> +  echo "Unknown floating point type used in"\
> +   "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2

missing trailing space after 'in' in "used in"\

LGTM with that fixed with or without cutting down on grep lines,
but of course i cannot approve it.
thanks,

> +  exit 1
> + fi
> + ;;
> + *)
> + echo "Unknown floating point type used in "\
> +  "--with-fpu=$with_fpu" 1>&2
> + exit 1
> + ;;
> + esac
>   ;;
>  
>  csky-*-*)



Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags

2021-06-09 Thread Richard Biener via Gcc-patches
On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz  wrote:
>
>
> From: Matthias Kretz 
>
> Explicitly support use of the stdx::simd implementation in situations
> where the user links TUs that were compiled with different -m flags. In
> general, this is always a (quasi) ODR violation for inline functions
> because at least codegen may differ in important ways. However, in the
> resulting executable only one (unspecified which one) of them might be
> used. For simd we want to support users to compile code multiple times,
> with different -m flags and have a runtime dispatch to the TU matching
> the target CPU. But if internal functions are not inlined this may lead
> to unexpected performance loss or execution of illegal instructions.
> Therefore, inline functions that are not marked as always_inline must
> use an additional template parameter somewhere in their name, to
> disambiguate between the different -m translations.

Note that excessive use of always_inline can cause compile-time issues
(see for example PR99785).  I wonder whether the inlines can be
placed in an anonymous namespace instead of the difficult to maintain
explict list of SIMD features?  It also doesn't solve the issue when
instantiating the functions from a TU which contains #pragma GCC target
sections to switch options, of course.

Richard.

> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> * include/experimental/bits/simd.h: Move feature detection bools
> and add __have_avx512bitalg, __have_avx512vbmi2,
> __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> __have_avx512vnni, __have_avx512vpopcntdq.
> (__detail::__machine_flags): New function which returns a unique
> uint64 depending on relevant -m and -f flags.
> (__detail::__odr_helper): New type alias for either an anonymous
> type or a type specialized with the __machine_flags number.
> (_SimdIntOperators): Change template parameters from _Impl to
> _Tp, _Abi because _Impl now has an __odr_helper parameter which
> may be _OdrEnforcer from the anonymous namespace, which makes
> for a bad base class.
> (many): Either add __odr_helper template parameter or mark as
> always_inline.
> * include/experimental/bits/simd_detail.h: Add defines for
> AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> * include/experimental/bits/simd_builtin.h: Add __odr_helper
> template parameter or mark as always_inline.
> * include/experimental/bits/simd_fixed_size.h: Ditto.
> * include/experimental/bits/simd_math.h: Ditto.
> * include/experimental/bits/simd_scalar.h: Ditto.
> * include/experimental/bits/simd_neon.h: Add __odr_helper
> template parameter.
> * include/experimental/bits/simd_ppc.h: Ditto.
> * include/experimental/bits/simd_x86.h: Ditto.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 380 --
>  .../include/experimental/bits/simd_builtin.h  |  41 +-
>  .../include/experimental/bits/simd_detail.h   |  40 ++
>  .../experimental/bits/simd_fixed_size.h   |  39 +-
>  .../include/experimental/bits/simd_math.h |  45 ++-
>  .../include/experimental/bits/simd_neon.h |   4 +-
>  .../include/experimental/bits/simd_ppc.h  |   4 +-
>  .../include/experimental/bits/simd_scalar.h   |  71 +++-
>  .../include/experimental/bits/simd_x86.h  |   4 +-
>  9 files changed, 440 insertions(+), 188 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──


Re: [PATCH 0.5/2] ipa-sra: Restructure how cloning and call redirection communicate (PR 93385)

2021-06-09 Thread Martin Jambor
Hi Honza,

I'd like to ping this patch.  I know it is big but I do believe it makes
the management of information passed from clone materialization to edge
redirection much more straightforward.  And I need it to fix PR 93385
with a follow-up that has already been approved by Richi.

A recently re-based and re-tested version is below.

Thanks,

Martin

On Mon, May 10 2021, Martin Jambor wrote:
> Hi,
>
> On Mon, May 10 2021, Richard Biener wrote:
>> I've tried to have a look at this patch but it does a lot of IPA specific
>> refactoring(?), so the actual DCE bits are hard to find.  Is it possible
>> to split the patch up or is it too entangled?
>>
>
> Yes:

I was asked by Richi to split my fix for PR 93385 for easier review
into IPA-SRA materialization refactoring and the actual DCE addition.
Fortunately it was mostly natural except for a temporary weird
condition in ipa_param_body_adjustments::modify_call_stmt.
Additionally.  In addition to the patch I posted previously, this one
also deallocated the newly added summary in toplev::finalize and fixes
a mistakenly uninitialized field.

This is the first part which basically replaces performed_splits in
clone_info and the code which generates it, keeps it up-to-date and
consumes it with new edge summaries which are much nicer.  It simply
contains 1) a mapping from the original argument indices to the actual
indices in the call statement as it is now, 2) information needed to
identify arguments representing pass-through IPA-SRA splits with which
have been added to the call arguments in place of an original
argument/reference and 3) a delta to the index where va_args may start
- so basically directly all the information that the consumer of
performed_splits had to compute and we also do not need the weird
dummy declarations.

The main disadvantage is that the information has to be created (and
kept up-to-date) for all call graph edges associated with the given
statement from all clones (including inline clones) of the clone where
splitting or removal happened first.  But all of this happens during
clone materialization so the only effect on WPA memory consumption is
the removal of a pointer from clone_info.

The statement modification code also has to know the statement from
the original function in order to be able to locate the edge summaries
which at this point are still keyed to these.  However, the code is
already quite heavily dependant on how things are structured in
tree-inline.c and in order to fix bugs like these it probably has to
be.

The subsequent patch needs this new information to be able to remove
arguments from calls during materialization and communicate this
information to the call redirection.

2021-05-17  Martin Jambor  

PR ipa/93385
* symtab-clones.h (clone_info): Removed member param_adjustments.
* ipa-param-manipulation.h: Adjust initial comment to reflect how we
deal with pass-through splits now.
(ipa_param_performed_split): Removed.
(ipa_param_adjustments::modify_call): Adjusted parameters.
(class ipa_param_body_adjustments): Adjusted parameters of
register_replacement, modify_gimple_stmt and modify_call_stmt.
(ipa_verify_edge_has_no_modifications): Declare.
(ipa_edge_modifications_finalize): Declare.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Remove
performed_splits processing, pas only edge to padjs->modify_call,
check that call arguments were not modified if they should not have
been.
* cgraphclones.c (cgraph_node::create_clone): Do not copy performed
splits.
* ipa-param-manipulation.c (struct pass_through_split_map): New type.
(ipa_edge_modification_info): Likewise.
(ipa_edge_modification_sum): Likewise.
(ipa_edge_modifications): New edge summary.
(ipa_verify_edge_has_no_modifications): New function.
(transitive_split_p): Removed.
(transitive_split_map): Likewise.
(init_transitive_splits): Likewise.
(ipa_param_adjustments::modify_call): Adjusted to use the new edge
summary instead of performed_splits.
(ipa_param_body_adjustments::register_replacement): Drop dummy
parameter, set base_index of the created ipa_param_body_replacement.
(phi_arg_will_live_p): New function.
(ipa_param_body_adjustments::common_initialization): Do not create
IPA_SRA dummy decls.
(simple_tree_swap_info): Removed.
(remap_split_decl_to_dummy): Likewise.
(record_argument_state_1): New function.
(record_argument_state): Likewise.
(ipa_param_body_adjustments::modify_call_stmt): New parameter
orig_stmt.  Do not work with dummy decls, save necessary info about
changes to ipa_edge_modifications.
(ipa_param_body_adjustments::modify_gimple_stmt): New parameter
orig_stmt, pass it to modify_call_stmt.

[Patch ]Fortran/OpenMP: Extend defaultmap clause for OpenMP 5 [PR92568]

2021-06-09 Thread Tobias Burnus

This patch add's OpenMP 5.1's  defaultmap extensions to Fortran.

There is one odd thing,
  integer :: ii, it
  target :: it
both count as nonallocatable, nonpointer scalars (i.e. category 'scalar').
But with implicit mapping (and 'defaultmap(default)'), 'it' is mapped
tofrom due to the TARGET attribute (cf. quote in the PR).

I also had fun with scalar vs. pointer, but solved it by adding an
additional argument, which solves the problems with different use.
(nonpointer/nonallocatable scalar vs. all scalars).

Tobias

PS: The run-time testcase (libgomp.fortran/defaultmap-8.f90) shows two
issues (cf. PR fortran/100991 + PR fortran/90742). Namely,
(a) optional scalars are not recognized as scalar (first PR),
(b) firstprivate does not handle absent scalars nor allocatables
(and all complex objects).

PPS:  There seem to be also issues with character handling, at least I get some
run-time errors with -fsanitize=address,undefined for the 'dg-do compile' tests,
which I have not debugged. (I think they are real issues and not test-case
issues, but I have not yet check this.)

PPPS: In principle, also OpenACC should use 'copy'/'map(tofrom:' instead of
'firstprivate' for allocatable/pointer scalars – but testing shows that
this patch does not affect OpenACC (at least not for my testcase).

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran/OpenMP: Extend defaultmap clause for OpenMP 5 [PR92568]

	PR fortran/92568

gcc/fortran/ChangeLog:

	* dump-parse-tree.c (show_omp_clauses): Update for defaultmap.
	* f95-lang.c (LANG_HOOKS_OMP_ALLOCATABLE_P,
	LANG_HOOKS_OMP_SCALAR_TARGET_P): New.
	* gfortran.h (enum gfc_omp_defaultmap,
	enum gfc_omp_dfltmpap_category): New.
	* openmp.c (gfc_match_omp_clauses): Update defaultmap matching.
	* trans-decl.c (gfc_finish_decl_attrs): Set GFC_DECL_SCALAR_TARGET.
	* trans-openmp.c (gfc_omp_allocatable_p, gfc_omp_scalar_target_p): New.
	(gfc_omp_scalar_p): Take 'ptr_alloc_ok' argument.
	(gfc_trans_omp_clauses, gfc_split_omp_clauses): Update for
	defaultmap changes.
	* trans.h (gfc_omp_scalar_p): Update prototype.
	(gfc_omp_allocatable_p, gfc_omp_scalar_target_p): New.
	(struct lang_decl): Add scalar_target.
	(GFC_DECL_SCALAR_TARGET, GFC_DECL_GET_SCALAR_TARGET): New.

gcc/ChangeLog:

	* gimplify.c (enum gimplify_defaultmap_kind): Add GDMK_SCALAR_TARGET.
	(struct gimplify_omp_ctx): Extend defaultmap array by one.
	(new_omp_context): Init defaultmap[GDMK_SCALAR_TARGET].
	(omp_notice_variable): Update type classification for Fortran.
	(gimplify_scan_omp_clauses): Update calls for new argument; handle
	GDMK_SCALAR_TARGET; for Fortran, GDMK_POINTER avoid GOVD_MAP_0LEN_ARRAY.
	* langhooks-def.h (lhd_omp_scalar_p): Add 'ptr_ok' argument.
	* langhooks.c (lhd_omp_scalar_p): Likewise.
	(LANG_HOOKS_OMP_ALLOCATABLE_P, LANG_HOOKS_OMP_SCALAR_TARGET_P): New.
	(LANG_HOOKS_DECLS): Add them.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks, update
	omp_scalar_p pointer type to include the new bool argument.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/defaultmap-8.f90: New test.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/pr99928-1.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-2.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-3.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-4.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-5.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-6.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/pr99928-8.f90: Uncomment 'defaultmap(none)'.
	* gfortran.dg/gomp/defaultmap-1.f90: New test.
	* gfortran.dg/gomp/defaultmap-2.f90: New test.
	* gfortran.dg/gomp/defaultmap-3.f90: New test.
	* gfortran.dg/gomp/defaultmap-4.f90: New test.
	* gfortran.dg/gomp/defaultmap-5.f90: New test.
	* gfortran.dg/gomp/defaultmap-6.f90: New test.
	* gfortran.dg/gomp/defaultmap-7.f90: New test.

 gcc/fortran/dump-parse-tree.c  |  38 ++-
 gcc/fortran/f95-lang.c |   4 +
 gcc/fortran/gfortran.h |  26 +-
 gcc/fortran/openmp.c   |  83 +-
 gcc/fortran/trans-decl.c   |   5 +
 gcc/fortran/trans-openmp.c |  91 ++-
 gcc/fortran/trans.h|   9 +-
 gcc/gimplify.c |  35 ++-
 gcc/langhooks-def.h|   6 +-
 gcc/langhooks.c|   7 +-
 gcc/langhooks.h|  13 +-
 gcc/testsuite/gfortran.dg/gomp/defaultmap-1.f90|  19 ++
 gcc/testsuite/gfortran.dg/gomp/defaultmap-2.f90| 108 
 gcc/testsuite/gfortran.dg/gomp/defaultmap-3.f90|  60 +
 gcc/testsuite/gfortran.dg/gomp/defaultmap-4.f90| 141 +++
 gcc/testsuite/gfortran.dg/gomp/defaultmap-5.f90| 145 

Re: [PATCH v2] Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 2:05 PM H.J. Lu  wrote:
>
> On Wed, Jun 9, 2021 at 4:00 AM Richard Biener
>  wrote:
> >
> > On Wed, Jun 9, 2021 at 1:13 AM H.J. Lu via Gcc-patches
> >  wrote:
> > >
> > > DT_INIT_ARRAY/DT_FINI_ARRAY support was added to glibc by
> > >
> > > commit fcf70d4114db9ff7923f5dfeb3fea6e2d623e5c2
> > > Author: Ulrich Drepper 
> > > Date:   Sat Jul 24 19:45:13 1999 +
> > >
> > > Update.
> > >
> > > 1999-07-24  Ulrich Drepper  
> > >
> > > * elf/dl-fini.c: Handle DT_FINI_ARRAY.
> > > * elf/link.h (struct link_map): Remove l_init_running.  Add 
> > > l_runcount
> > > and l_initcount.
> > > * elf/dl-init.c: Handle DT_INIT_ARRAY.
> > > ...
> > >
> > > PR target/100896
> > > * config.gcc (gcc_cv_initfini_array): Set to yes for Linux and
> > > GNU targets.
> > > ---
> > >  gcc/config.gcc | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > > index 6833a6c13d9..4dc4fe0b65c 100644
> > > --- a/gcc/config.gcc
> > > +++ b/gcc/config.gcc
> > > @@ -848,6 +848,8 @@ case ${target} in
> > >tmake_file="${tmake_file} t-glibc"
> > >target_has_targetcm=yes
> > >target_has_targetdm=yes
> > > +  # Linux targets always support .init_array.
> >
> > Other *linux targets specifically mention
> >
> > # Force .init_array support.  The configure script cannot always
> > # automatically detect that GAS supports it, yet we require it.
> > gcc_cv_initfini_array=yes
> >
> > and thus involve binutils.  Can you please change the comment
> > to mention the glibc and binutils versions required?  It might
>
> Done.  They are glibc 2.1 and binutils 2.12.
>
> > be good to mention those in install.texi as minimal versions given
>
> Fixed.
>
> > they are not trumped by already stricter requirements.
> >
> > Otherwise I think this is OK.
> >
> > > +  gcc_cv_initfini_array=yes
> > >;;
> > >  *-*-netbsd*)
> > >tm_p_file="${tm_p_file} netbsd-protos.h"
> > > --
> > > 2.31.1
> > >
>
> Here is the v2 patch.  OK for master and release branches?

OK for master, this isn't appropriate for release branches.

Richard.

> Thanks.
>
> --
> H.J.


[PATCH v2] Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux

2021-06-09 Thread H.J. Lu via Gcc-patches
On Wed, Jun 9, 2021 at 4:00 AM Richard Biener
 wrote:
>
> On Wed, Jun 9, 2021 at 1:13 AM H.J. Lu via Gcc-patches
>  wrote:
> >
> > DT_INIT_ARRAY/DT_FINI_ARRAY support was added to glibc by
> >
> > commit fcf70d4114db9ff7923f5dfeb3fea6e2d623e5c2
> > Author: Ulrich Drepper 
> > Date:   Sat Jul 24 19:45:13 1999 +
> >
> > Update.
> >
> > 1999-07-24  Ulrich Drepper  
> >
> > * elf/dl-fini.c: Handle DT_FINI_ARRAY.
> > * elf/link.h (struct link_map): Remove l_init_running.  Add 
> > l_runcount
> > and l_initcount.
> > * elf/dl-init.c: Handle DT_INIT_ARRAY.
> > ...
> >
> > PR target/100896
> > * config.gcc (gcc_cv_initfini_array): Set to yes for Linux and
> > GNU targets.
> > ---
> >  gcc/config.gcc | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6833a6c13d9..4dc4fe0b65c 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -848,6 +848,8 @@ case ${target} in
> >tmake_file="${tmake_file} t-glibc"
> >target_has_targetcm=yes
> >target_has_targetdm=yes
> > +  # Linux targets always support .init_array.
>
> Other *linux targets specifically mention
>
> # Force .init_array support.  The configure script cannot always
> # automatically detect that GAS supports it, yet we require it.
> gcc_cv_initfini_array=yes
>
> and thus involve binutils.  Can you please change the comment
> to mention the glibc and binutils versions required?  It might

Done.  They are glibc 2.1 and binutils 2.12.

> be good to mention those in install.texi as minimal versions given

Fixed.

> they are not trumped by already stricter requirements.
>
> Otherwise I think this is OK.
>
> > +  gcc_cv_initfini_array=yes
> >;;
> >  *-*-netbsd*)
> >tm_p_file="${tm_p_file} netbsd-protos.h"
> > --
> > 2.31.1
> >

Here is the v2 patch.  OK for master and release branches?

Thanks.

-- 
H.J.
From 1ab48e7abf836689581ce007d920649aecebd623 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 8 Jun 2021 16:09:24 -0700
Subject: [PATCH v2] Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux

DT_INIT_ARRAY/DT_FINI_ARRAY support was added to glibc 2.1 by

commit fcf70d4114db9ff7923f5dfeb3fea6e2d623e5c2
Author: Ulrich Drepper 
Date:   Sat Jul 24 19:45:13 1999 +

Update.

1999-07-24  Ulrich Drepper  

* elf/dl-fini.c: Handle DT_FINI_ARRAY.
* elf/link.h (struct link_map): Remove l_init_running.  Add l_runcount
and l_initcount.
* elf/dl-init.c: Handle DT_INIT_ARRAY.
...

and added to binutils 2.12 by

commit e9682144c14fc809af72bd6c0b8c69731d38679c
Author: H.J. Lu 
Date:   Mon Mar 4 20:40:48 2002 +

2002-03-04  H.J. Lu 

* config/obj-elf.c (special_section): Add .init_array,
.fini_array and .preinit_array.

* config/tc-ia64.h (ELF_TC_SPECIAL_SECTIONS): Remove
.init_array and .fini_array.

gcc/

	PR target/100896
	* config.gcc (gcc_cv_initfini_array): Set to yes for Linux and
	GNU targets.
	* doc/install.texi: Require glibc 2.1 and binutils 2.12 for
	Linux and GNU targets.
---
 gcc/config.gcc   | 2 ++
 gcc/doc/install.texi | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6833a6c13d9..4dc4fe0b65c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -848,6 +848,8 @@ case ${target} in
   tmake_file="${tmake_file} t-glibc"
   target_has_targetcm=yes
   target_has_targetdm=yes
+  # Linux targets always support .init_array.
+  gcc_cv_initfini_array=yes
   ;;
 *-*-netbsd*)
   tm_p_file="${tm_p_file} netbsd-protos.h"
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 591ccaacbc1..94b167a3ed3 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4126,6 +4126,9 @@ supported, so @option{--enable-threads=dce} does not work.
 @end html
 @anchor{x-x-linux-gnu}
 @heading *-*-linux-gnu
+The @code{.init_array} and @code{.fini_array} sections are enabled
+unconditionally which requires at least glibc 2.1 and binutils 2.12.
+
 Versions of libstdc++-v3 starting with 3.2.1 require bug fixes present
 in glibc 2.2.5 and later.  More information is available in the
 libstdc++-v3 documentation.
-- 
2.31.1



Re: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-09 Thread Segher Boessenkool
Hi!

On Wed, Jun 09, 2021 at 04:03:43PM +0800, Xionghu Luo wrote:
> >>--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> >>+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> >>@@ -317,10 +317,10 @@ int main ()
> >>  /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
> >>  
> >>  /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
> >>-/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
> >>+/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
> >>  /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
> >>-/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
> >>-/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
> >>+/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
> >>+/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
> >>  /* { dg-final { scan-assembler-times "vmrglh" 8 } } */
> >>  /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
> >>  /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
> >>@@ -347,7 +347,7 @@ int main ()
> >>  /* { dg-final { scan-assembler-times "vspltb" 6 } } */
> >>  /* { dg-final { scan-assembler-times "vspltw" 0 } } */
> >>  /* { dg-final { scan-assembler-times "vmrgow" 8 } } */
> >>-/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
> >>+/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
> >>  /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
> >>  /* { dg-final { scan-assembler-times "vmrgew" 8 } } */
> >>  /* { dg-final { scan-assembler-times "vsplth" 8 } } */
> >
> >Are those changes correct?  It looks like a vmrglb became a vmrghb, and
> >that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong?
> 
> 
> This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
> and it also counted the generated instruction patterns.
> 
> 1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0", so 
> it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)
> 
> li 9,48  # 1282 [c=4 l=4]  *movdi_internal64/3
> -   lxvd2x 0,31,9# 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> -   xxpermdi 0,0,0,2 # 32   [c=4 l=4]  xxswapd_v4si
> -   xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si
> +   lxvd2x 12,31,9   # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
> +   xxpermdi 12,12,12,2  # 32   [c=4 l=4]  xxswapd_v4si
> +   xxmrglw 0,12,0   # 33   [c=4 l=4]  altivec_vmrglw_direct_v4si/0
> xxpermdi 0,0,0,2 # 35   [c=4 l=4]  xxswapd_v4sf
> 
> Note that v0 and v12 is swapped in lxvd2x, these new 3 instructions
> produces same result than before.

And there was one xxmrglw in this snippet before, and now there still
is only one.

But, the testcase uses -dp, I see.  Please use \m and \M in the scans,
it helps :-)  (And convert more than just the few that hit errors ;-) )

(You may want to do that as a separate patch before this one, to make
counting easier (also for me ;-) ),

(I'll review the new patch later today).


Segher


Re: [PATCH] Use range based loops to iterate over vec<> in various places

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 1:32 PM Trevor Saunders  wrote:
>
> On Wed, Jun 09, 2021 at 01:06:44PM +0200, Richard Biener wrote:
> > On Wed, Jun 9, 2021 at 2:48 AM Trevor Saunders  
> > wrote:
> > >
> > > Hello,
> > >
> > > This makes things a good bit shorter, and reduces complexity by removing
> > > a bunch of index variables.
> > >
> > > bootstrapped and regtested on x86_64-linux-gnu, ok?
> >
> > I'd call the cases where you are able to remove the iterator variable
> > declarations obvious, but there are some where the element variable
> > remains declared and thus one wonders if the last elem initialization
> > is used.  Splitting the patch into the obvious (pre-approved) and
> > not-so obvious parts would be nice.  The not-so obvious pieces would
> > be more obvious if the retained decl were moved down to its first
> > use.
>
> Yeah, sorry its a long patch, and that's a sensible idea for making it
> more managable, sorry I didn't think of it the first time.  There's also
> cases where people use the index within the loop for something, kind of
> peaking through the "abstraction" of the macro if you want to see it
> that way.
>
> > That said - how may FOR_EACH_VEC_ELT macro invocations
> > remain?  Can we remove it?
>
> Very many, this is maybe a hundred of what started as about 1000 uses.
> There is certainly more cases that can be converted over, but I needed
> to stop at some point.  There's also a bunch of cases that use the index
> for something, usually either checking if its the last element if
> printing a list, or using the index to index into the vector  or
> something else.  However I suppose those uses might be better as for
> (unsigned i = 0; i < vec.size (); i++).  I'll see about spliting out the
> obvious cases and finding more of those and once that's done we can see
> about the rest.

OK, agreed.

Thanks,
Richard.

> thanks
>
> Trev
>
> >
> > Thanks,
> > Richard.
> >
> > > Trev
> > >
> > > gcc/analyzer/ChangeLog:
> > >
> > > * call-string.cc (call_string::call_string): Iterate over vec<>
> > > with range based for.
> > > (call_string::operator=): Likewise.
> > > (call_string::to_json): Likewise.
> > > (call_string::hash): Likewise.
> > > (call_string::calc_recursion_depth): Likewise.
> > > * checker-path.cc (checker_path::fixup_locations): Likewise.
> > > * constraint-manager.cc (equiv_class::equiv_class): Likewise.
> > > (equiv_class::to_json): Likewise.
> > > (equiv_class::hash): Likewise.
> > > (constraint_manager::constraint_manager): Likewise.
> > > (constraint_manager::operator=): Likewise.
> > > (constraint_manager::hash): Likewise.
> > > (constraint_manager::to_json): Likewise.
> > > (constraint_manager::add_unknown_constraint): Likewise.
> > > * engine.cc (impl_region_model_context::on_svalue_leak):
> > > Likewise.
> > > (on_liveness_change): Likewise.
> > > (impl_region_model_context::on_unknown_change): Likewise.
> > > * program-state.cc (extrinsic_state::to_json): Likewise.
> > > (sm_state_map::set_state): Likewise.
> > > * region-model.cc (make_test_compound_type): Likewise.
> > > (test_canonicalization_4): Likewise.
> > >
> > > gcc/ChangeLog:
> > >
> > > * auto-profile.c (afdo_find_equiv_class): Iterate over vec<>
> > > with range based for.
> > > * cgraphclones.c (cgraph_node::create_clone): Likewise.
> > > (cgraph_node::create_version_clone): Likewise.
> > > * dwarf2out.c (output_call_frame_info): Likewise.
> > > * gcc.c (do_specs_vec): Likewise.
> > > (do_spec_1): Likewise.
> > > (driver::set_up_specs): Likewise.
> > > * gimple-loop-jam.c (any_access_function_variant_p): Likewise.
> > > * ifcvt.c (cond_move_process_if_block): Likewise.
> > > * ipa-modref.c (modref_lattice::add_escape_point): Likewise.
> > > (analyze_parms): Likewise.
> > > (modref_write_escape_summary): Likewise.
> > > (update_escape_summary_1): Likewise.
> > > * ipa-prop.h (ipa_copy_agg_values): Likewise.
> > > (ipa_release_agg_values): Likewise.
> > > * lower-subreg.c (decompose_multiword_subregs): Likewise.
> > > * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise.
> > > (hash_tree): Likewise.
> > > (prune_offload_funcs): Likewise.
> > > * sel-sched-dump.c (dump_insn_vector): Likewise.
> > > * timevar.c (timer::named_items::print): Likewise.
> > > * tree-cfgcleanup.c (cleanup_control_flow_pre): Likewise.
> > > (cleanup_tree_cfg_noloop): Likewise.
> > > * tree-data-ref.c (dump_data_references): Likewise.
> > > (print_dir_vectors): Likewise.
> > > (print_dist_vectors): Likewise.
> > > (dump_data_dependence_relation): Likewise.
> > > (dump_data_dependence_relations): Likewise.
> > > 

Re: [PATCH] PR tree-optimization/100781 - Do not calculate new values when evaluating a debug, statement.

2021-06-09 Thread Richard Biener via Gcc-patches
On Tue, Jun 8, 2021 at 4:48 PM Andrew MacLeod  wrote:
>
> On 6/2/21 3:29 AM, Richard Biener wrote:
> > On Tue, Jun 1, 2021 at 4:24 PM Andrew MacLeod  wrote:
> >> On 6/1/21 3:34 AM, Richard Biener wrote:
> >>> On Tue, Jun 1, 2021 at 3:38 AM Andrew MacLeod via Gcc-patches
> >>>  wrote:
>  An ongoing issue  is the the order we evaluate things in can affect
>  decisions along the way. As ranger isn't a fully iterative pass, we can
>  sometimes come up with different results if back edges are processed in
>  different orders.
> 
>  One of the ways this can happen is when the cache is propagating
>  on-entry values for an SSA_NAME. It calculates outgoing edge values and
>  the gori-compute engine can flag ssa-names that were involved in a range
>  calculation that have not yet been initialized.  When the propagation
>  for the original name is done, it goes back and examines the "poor
>  values" and tries to quickly calculate a better range, and if it comes
>  up with one, immediately tries to go back  and update the location/range
>  gori_compute flagged.   This produces better ranges earlier.
> 
>  However, when we do this in different orders, we can get different
>  results.  We were processing the uses on is_gimple_debug statements just
>  like normal uses, and this would sometimes cause a difference in how
>  things were resolved.
> 
>  This patch adds a flag to enable/disable this attempt to look up new
>  values, and when range_of_expr is processing the use on a debug
>  statement, turns it off for the query.  This means the query will never
>  cause a new lookup, and this should resolve all the -fcompare-debug 
>  issues.
> 
>  Bootstrapped on x86_64-pc-linux-gnu, with no new regressions. Pushed.
> >>> Please check if such fixes also apply to the GCC 11 branch.
> >>>
> >>> Richard.
> >>>
> >>>
> >> I've checked both testcases against gcc11 release, and neither is an
> >> issue there.  Much of this was triggered by changes to the export list.
> >> That said, is there potential for it to surface? The potential is
> >> probably there.   We'd have to address it differently tho.  For the
> >> gcc11 release, since we always run in hybrid mode it doesn't really
> >> matter if ranger looks up ranges for debug statements... EVRP will still
> >> pick up what we use to get for them.  we could simply disable looking
> >> for contextual ranges for is_gimple_stmt and simply pick up the best
> >> known global/on-entry value available..   I can either provide a patch
> >> for that now, or deal with it if we ever get a PR.  I'm ok either way.
> > I think it would be good to robustify the code even w/o a PR.
> >
> >> btw, when is the next point release? I added an infrastructure patch to
> >> trunk (https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569884.html)
> >> to enable replacing the on-entry cache to deal with memory consumption
> >> issues like in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100299 .  I
> >> specifically put it in early before the other changes so that it could
> >> be directly applied to gcc11 as well, but I need to follow up with one
> >> of the replacements I have queued up to look at if we are interested in
> >> fixing this in gcc 11.  I'll bump the priority to try to hit the next
> >> release if thats the case.
> > The first point release is usuall about two month from the initial release
> > which means in about a month and a half.  It would be nice to fix
> > those issues and the earlier in the release series the better.
> >
> > Richard.
> >
> >> Andrew
> >>
> OK, so this would be the simple way I'd tackle this in gcc11. This
> should be quite safe.  Just treat debug_stmts as if they are not stmts..
> and make a global query.   EVRP will still provide a contextual range as
> good as it ever did, but it wont trigger ranger lookups on debug uses
> any more.
>
> It bootstraps on x86_64-pc-linux-gnu.  Is there a process other than
> getting the OK to check this into the gcc 11 branch?  Does it go into
> releases/gcc-11 ?

it would go into releases/gcc-11, yes.

Now,

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 6158a754dd6..fd7fa5e3dbb 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -945,7 +945,7 @@ gimple_ranger::range_of_expr (irange , tree
expr, gimple *stmt)
 return get_tree_range (r, expr);

   // If there is no statement, just get the global value.
-  if (!stmt)
+  if (!stmt || is_gimple_debug (stmt))
 {

unfortunately the function is not documented so I'm just guessing here - why
do we end up passing in a debug stmt as 'stmt'?  (how should expr and stmt
relate?)  So isn't it better to do this check before

  if (!gimple_range_ssa_p (expr))
return get_tree_range (r, expr);

or even more better, assert we don't get a debug stmt here and fixup whoever
calls range_of_expr to not do that for debug stmts?  When I add this
assertion not even libgcc 

[committed] libstdc++: Fix constraint on std::optional assignment [PR 100982]

2021-06-09 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/100982
* include/std/optional (optional::operator=(const optional&)):
Fix value category used in is_assignable check.
* testsuite/20_util/optional/assignment/100982.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit b3fce1bd45f72cc9e55fb7431762e92e30fefcf1
Author: Jonathan Wakely 
Date:   Wed Jun 9 11:03:15 2021

libstdc++: Fix constraint on std::optional assignment [PR 100982]

libstdc++-v3/ChangeLog:

PR libstdc++/100982
* include/std/optional (optional::operator=(const optional&)):
Fix value category used in is_assignable check.
* testsuite/20_util/optional/assignment/100982.cc: New test.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 415f8c49ef4..0a67ce24bbd 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -815,7 +815,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
enable_if_t<__and_v<__not_>,
is_constructible<_Tp, const _Up&>,
-   is_assignable<_Tp&, _Up>,
+   is_assignable<_Tp&, const _Up&>,
__not_<__converts_from_optional<_Tp, _Up>>,
__not_<__assigns_from_optional<_Tp, _Up>>>,
optional&>
diff --git a/libstdc++-v3/testsuite/20_util/optional/assignment/100982.cc 
b/libstdc++-v3/testsuite/20_util/optional/assignment/100982.cc
new file mode 100644
index 000..ae565250d68
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/assignment/100982.cc
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++17 } }
+
+#include 
+
+struct U {};
+
+struct T {
+  explicit T(const U&);
+  T& operator=(const U&);
+  T& operator=(U&&) = delete;
+};
+
+int main() {
+  std::optional opt1;
+  std::optional opt2;
+  opt2 = opt1; // PR libstdc++/100982
+}


[PATCH][pushed] docs: add missing @headitem in Intrinsic Procedures

2021-06-09 Thread Martin Liška

Pushed as obvious.

Martin

gcc/fortran/ChangeLog:

* intrinsic.texi: Add missing @headitem to tables with a header.
---
 gcc/fortran/intrinsic.texi | 144 ++---
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 260dbaae76b..8a92b862070 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -462,7 +462,7 @@ end program test_abs
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument@tab Return type   @tab 
Standard
+@headitem Name@tab Argument@tab Return type   @tab 
Standard
 @item @code{ABS(A)}   @tab @code{REAL(4) A}@tab @code{REAL(4)}@tab 
Fortran 77 and later
 @item @code{CABS(A)}  @tab @code{COMPLEX(4) A} @tab @code{REAL(4)}@tab 
Fortran 77 and later
 @item @code{DABS(A)}  @tab @code{REAL(8) A}@tab @code{REAL(8)}@tab 
Fortran 77 and later
@@ -627,7 +627,7 @@ end program test_acos
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument @tab Return type @tab Standard
+@headitem Name@tab Argument @tab Return type @tab 
Standard
 @item @code{ACOS(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab Fortran 
77 and later
 @item @code{DACOS(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab Fortran 
77 and later
 @end multitable
@@ -686,7 +686,7 @@ end program test_acosd
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument @tab Return type @tab Standard
+@headitem Name@tab Argument @tab Return type @tab 
Standard
 @item @code{ACOSD(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab GNU 
extension
 @item @code{DACOSD(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab GNU 
extension
 @end multitable
@@ -742,7 +742,7 @@ END PROGRAM
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name @tab Argument  @tab Return type   @tab 
Standard
+@headitem Name @tab Argument  @tab Return type   @tab 
Standard
 @item @code{DACOSH(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU 
extension
 @end multitable
 
@@ -891,7 +891,7 @@ end program test_aimag
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name   @tab Argument@tab Return type @tab 
Standard
+@headitem Name   @tab Argument@tab Return type 
@tab Standard
 @item @code{AIMAG(Z)}@tab @code{COMPLEX Z}@tab @code{REAL} @tab 
Fortran 77 and later
 @item @code{DIMAG(Z)}@tab @code{COMPLEX(8) Z} @tab @code{REAL(8)}  @tab 
GNU extension
 @item @code{IMAG(Z)} @tab @code{COMPLEX Z}@tab @code{REAL} @tab 
GNU extension
@@ -951,7 +951,7 @@ end program test_aint
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name   @tab Argument @tab Return type  @tab Standard
+@headitem Name   @tab Argument @tab Return type  @tab 
Standard
 @item @code{AINT(A)} @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab Fortran 
77 and later
 @item @code{DINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 
77 and later
 @end multitable
@@ -1231,7 +1231,7 @@ end program test_anint
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument @tab Return type  @tab Standard
+@headitem Name@tab Argument @tab Return type  @tab 
Standard
 @item @code{ANINT(A)}  @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab 
Fortran 77 and later
 @item @code{DNINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 
77 and later
 @end multitable
@@ -1347,7 +1347,7 @@ end program test_asin
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument  @tab Return type   @tab 
Standard
+@headitem Name@tab Argument  @tab Return type   @tab 
Standard
 @item @code{ASIN(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab 
Fortran 77 and later
 @item @code{DASIN(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab 
Fortran 77 and later
 @end multitable
@@ -1406,7 +1406,7 @@ end program test_asind
 
 @item @emph{Specific names}:

 @multitable @columnfractions .20 .20 .20 .25
-@item Name@tab Argument  @tab Return type   @tab 
Standard
+@headitem Name@tab Argument  @tab Return type   @tab 
Standard
 @item @code{ASIND(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab GNU 
extension
 @item @code{DASIND(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU 
extension
 @end multitable
@@ -1462,7 +1462,7 @@ END PROGRAM
 
 @item 

Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-09 Thread Claudiu Zissulescu via Gcc-patches

Hi,


I would have written [[:space:]]* instead of [[:space:]]+ to handle
potentially missing space, at least after the comma but also before the
comma to avoid surprises for new names in the future.
Furthermore | alone would be [[:blank:]]* but as you prefer.

grep ... > /dev/null would be grep -q which is mandated by POSIX since
at least SUSv2 so can be used safely since quite some time now.

Instead of the redundant 'true' calls, i'd usually write :
E.g.
if grep -q ... ; then :
else echo "nah"; exit 1
fi

Which could be shortened to
if ! grep -q ...
then
   echo "nah"
   exit 1
fi

to avoid any questions about an empty arm in the first place.


I've updated the patch using your feedback (attached). Indeed, it looks 
much better now :)




ISTM you only set the expected flags in the switch so i would have
set only that variable and have grepped only once after the switch for
brevity.


ARC has various FPU extensions, some of them are common to EM and HS 
architectures, others are specific for only one of them. Hence, the grep 
commands are ensuring that we accept the right fpu extension for the 
right ARC architecture.




Either way, thanks for not using grep -P :)
thanks,



I thank you!
Claudiu
>From 1f895d277752277fb51e8436903a94949bd5c7bd Mon Sep 17 00:00:00 2001
From: Claudiu Zissulescu 
Date: Wed, 21 Oct 2020 16:11:43 +0300
Subject: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

Support for a compile-time default FPU. The --with-fpu configuration
option is ignored if -mfpu compiler option is specified. The FPU
options are only available for ARCv2 cpus.

gcc/
-mm-dd  Claudiu Zissulescu  

	* config.gcc (arc): Add support for with_cpu option.
	* config/arc/arc.h (OPTION_DEFAULT_SPECS): Add fpu.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config.gcc   | 49 +---
 gcc/config/arc/arc.h |  4 
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 13c2004e3c52..09886c8635e0 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4258,18 +4258,61 @@ case "${target}" in
 		;;
 
 	arc*-*-*)
-		supported_defaults="cpu"
+		supported_defaults="cpu fpu"
 
+		new_cpu=hs38_linux
 		if [ x"$with_cpu" = x ] \
-		|| grep "^ARC_CPU ($with_cpu," \
+		|| grep -E "^ARC_CPU \($with_cpu," \
 		   ${srcdir}/config/arc/arc-cpus.def \
 		   > /dev/null; then
 		 # Ok
-		 true
+		 new_cpu=$with_cpu
 		else
 		 echo "Unknown cpu used in --with-cpu=$with_cpu" 1>&2
 		 exit 1
 		fi
+
+		# see if --with-fpu matches any of the supported FPUs
+		case "$with_fpu" in
+		"")
+			# OK
+			;;
+		fpus | fpus_div | fpus_fma | fpus_all)
+			# OK if em or hs
+			if ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*[emhs]+," \
+			   ${srcdir}/config/arc/arc-cpus.def
+			then
+			 echo "Unknown floating point type used in "\
+			 "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+			 exit 1
+			fi
+		;;
+		fpuda | fpuda_div | fpuda_fma | fpuda_all)
+			# OK only em
+			if ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*em," \
+			   ${srcdir}/config/arc/arc-cpus.def
+			then
+			 echo "Unknown floating point type used in "\
+			  "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+			 exit 1
+			fi
+			;;
+		fpud | fpud_div | fpud_fma | fpud_all)
+			# OK only hs
+			if ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:space:]]*hs," \
+			   ${srcdir}/config/arc/arc-cpus.def
+			then
+			 echo "Unknown floating point type used in"\
+			  "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+			 exit 1
+			fi
+			;;
+		*)
+			echo "Unknown floating point type used in "\
+			 "--with-fpu=$with_fpu" 1>&2
+			exit 1
+			;;
+		esac
 		;;
 
 csky-*-*)
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 722bb10b8813..b9c4ba0398e5 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -100,7 +100,11 @@ extern const char *arc_cpu_to_as (int argc, const char **argv);
   "%:cpu_to_as(%{mcpu=*:%*}) %{mspfp*} %{mdpfp*} "  \
   "%{mfpu=fpuda*:-mfpuda} %{mcode-density}"
 
+/* Support for a compile-time default CPU and FPU.  The rules are:
+   --with-cpu is ignored if -mcpu, mARC*, marc*, mA7, mA6 are specified.
+   --with-fpu is ignored if -mfpu is specified.  */
 #define OPTION_DEFAULT_SPECS		\
+  {"fpu", "%{!mfpu=*:-mfpu=%(VALUE)}"},	\
   {"cpu", "%{!mcpu=*:%{!mARC*:%{!marc*:%{!mA7:%{!mA6:-mcpu=%(VALUE)}" }
 
 #ifndef DRIVER_ENDIAN_SELF_SPECS
-- 
2.31.1



Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-09 Thread Richard Biener via Gcc-patches
On Tue, Jun 8, 2021 at 4:31 PM Andrew MacLeod  wrote:
>
> On 6/8/21 3:26 AM, Richard Biener wrote:
> > On Mon, Jun 7, 2021 at 9:20 PM Andrew MacLeod  wrote:
> >>
> >> I don't think this is actually doing the propagation though... It tracks
> >> that a_2 currently points to  and returns that to either
> >> simplifier or folder thru value_of_expr().  Presumably it is up to them
> >> to determine whether the tree expression passed back is safe to
> >> propagate.   Is there any attempt in EVRP to NOT set the range of
> >> something to [, ] under some conditions?   This is what the
> >> change amounts to.  Ranger would just return a range of [1, +INF], and
> >> value_of_expr  would therefore return NULL.  This allows value_of to
> >> return  in these conditions.   Aldy, did you see any other checks in
> >> the vr-values code?
> >>
> >> Things like   if (var1_2 == var2_3) deal with just ssa-names and will be
> >> handled by an ssa_name relation oracle. It just treats equivalencies
> >> like a a slightly special kind of relation. Im just about to bring that
> >> forward this week.
> > Ah, great - I'm looking forward to this.  Currently both DOM and VN
>
> The initial code will be a bit basic, but it can be educated as we go
> along :-)
>
> Its currently tied into ranger just because as ranger processes
> statements it registers any relations it sees.. the oracle organizes
> these and can answer questions on anything it has seen.
>
> It is otherwise independent of ranger. It is dominance based, and there
> is no reason relations cant be queried and registered by any pass doing
> a DOM walk without ranger.  It benefits from ranger in that sometime
> relations are refined when we know ranges  (ie for unsigned math)
>
>  a_2 = b_4 + 6
>
> if we know the range of b_4 will not cause an overflow, then we could
> set a_2 > b_4.. otherwise we cant..  Wiring it with ranger also removes
> the dependency on a DOM walk as ranger sorts the ordering out as needed.
>
> It is driven by data provided by range-ops and is more of a data
> propagation/lookup mechanism than anything. There are a number of cases
> we don't currently register relations simply because we have not flushed
> out the various tree code instructions.   We'll get to those eventually.
> I expect a number of the PRs will eventually be fixed primarily by
> adding code to range-ops .
>
> It also only does first order relations so far...  I'll get to
> transitives and other things as well.
>
>
> > do a very simplistic thing when trying to simplify downstream conditions
> > based on earlier ones, abusing their known-expressions hash tables
> > by, for example, registering (a < b) == 1, (a > b) == 0, (a == b) == 0,
> > (a != b) == 1 for an earlier a < b condition on the true edge.  So I wonder
> > if this relation code can be somehow used there.  In VN there's the
> > extra complication that it iterates, but DOM is just a DOM-walk and
> > the VN code also has a non-iterating mode (but not a DOM walk).
>
> I don't think the iteration is an issue,  ranger iterates to some degree
> as well, and some statement are registered multiple times. I believe it
> intersects with any known relation, so if an iteration causes a relation
> to become "better" it should be updated.

Note VN does optimistic iteration thus relations will become only "worse",
thus we somehow need to be able to remove relations we added during
the last iteration.  That is, in the first iteration a if (a > b) might be
registered as a > 1 when b has (optimistic) value b but in the second
we have to make it a > b when b dropped to varying for example.

The optimistic part of VN is that it treats all edges as not executable
initially and thus it ignores values on non-executable edges in PHI
nodes.

> The API is for registering is pretty straightforward:
>
>void register_relation (gimple *stmt, relation_kind k, tree op1, tree
> op2);
>void register_relation (edge e, relation_kind k, tree op1, tree op2);
>
> so all you'd have to do when a < b is encountered is to register  (a
> LT_EXPR b) on the true edge, and (a GE_EXPR b) on the false edge.  Then
> any queries downstream should be reflected.
>
>
>
> > Of course the code is also used to simplify
> >
> >   if (a > b)
> >  c = a != b;
> >
> > but the relation oracle should be able to handle that as well I guess.
> >
> yeah, so a GT_EXPR B is registered on the true edge.  Then when
> processing c = a != b,  you can determine that a NE_EXPR b intersected
> with a GT_EXPR b result in  a GT_EXPR b... which folds to a 1.
>
> This is all also available with the range-op API additions such that you
> can simply call :
>
> rangerop->fold_range (stmt(c = a != b), range_of_a, range_of_b, GT_EXPR
> (relation of a to b)  and the range returned will be [1,1].
>
> The actual ranges in this case are irrelevant, but arent for some other
> kinds of stmts.
>
> Likewise, simply asking ranger for the range of c will likewise return
> [1,1], the relation processing 

Re: [PATCH] Use range based loops to iterate over vec<> in various places

2021-06-09 Thread Trevor Saunders
On Wed, Jun 09, 2021 at 01:06:44PM +0200, Richard Biener wrote:
> On Wed, Jun 9, 2021 at 2:48 AM Trevor Saunders  wrote:
> >
> > Hello,
> >
> > This makes things a good bit shorter, and reduces complexity by removing
> > a bunch of index variables.
> >
> > bootstrapped and regtested on x86_64-linux-gnu, ok?
> 
> I'd call the cases where you are able to remove the iterator variable
> declarations obvious, but there are some where the element variable
> remains declared and thus one wonders if the last elem initialization
> is used.  Splitting the patch into the obvious (pre-approved) and
> not-so obvious parts would be nice.  The not-so obvious pieces would
> be more obvious if the retained decl were moved down to its first
> use.

Yeah, sorry its a long patch, and that's a sensible idea for making it
more managable, sorry I didn't think of it the first time.  There's also
cases where people use the index within the loop for something, kind of
peaking through the "abstraction" of the macro if you want to see it
that way.

> That said - how may FOR_EACH_VEC_ELT macro invocations
> remain?  Can we remove it?

Very many, this is maybe a hundred of what started as about 1000 uses.
There is certainly more cases that can be converted over, but I needed
to stop at some point.  There's also a bunch of cases that use the index
for something, usually either checking if its the last element if
printing a list, or using the index to index into the vector  or
something else.  However I suppose those uses might be better as for
(unsigned i = 0; i < vec.size (); i++).  I'll see about spliting out the
obvious cases and finding more of those and once that's done we can see
about the rest.

thanks

Trev

> 
> Thanks,
> Richard.
> 
> > Trev
> >
> > gcc/analyzer/ChangeLog:
> >
> > * call-string.cc (call_string::call_string): Iterate over vec<>
> > with range based for.
> > (call_string::operator=): Likewise.
> > (call_string::to_json): Likewise.
> > (call_string::hash): Likewise.
> > (call_string::calc_recursion_depth): Likewise.
> > * checker-path.cc (checker_path::fixup_locations): Likewise.
> > * constraint-manager.cc (equiv_class::equiv_class): Likewise.
> > (equiv_class::to_json): Likewise.
> > (equiv_class::hash): Likewise.
> > (constraint_manager::constraint_manager): Likewise.
> > (constraint_manager::operator=): Likewise.
> > (constraint_manager::hash): Likewise.
> > (constraint_manager::to_json): Likewise.
> > (constraint_manager::add_unknown_constraint): Likewise.
> > * engine.cc (impl_region_model_context::on_svalue_leak):
> > Likewise.
> > (on_liveness_change): Likewise.
> > (impl_region_model_context::on_unknown_change): Likewise.
> > * program-state.cc (extrinsic_state::to_json): Likewise.
> > (sm_state_map::set_state): Likewise.
> > * region-model.cc (make_test_compound_type): Likewise.
> > (test_canonicalization_4): Likewise.
> >
> > gcc/ChangeLog:
> >
> > * auto-profile.c (afdo_find_equiv_class): Iterate over vec<>
> > with range based for.
> > * cgraphclones.c (cgraph_node::create_clone): Likewise.
> > (cgraph_node::create_version_clone): Likewise.
> > * dwarf2out.c (output_call_frame_info): Likewise.
> > * gcc.c (do_specs_vec): Likewise.
> > (do_spec_1): Likewise.
> > (driver::set_up_specs): Likewise.
> > * gimple-loop-jam.c (any_access_function_variant_p): Likewise.
> > * ifcvt.c (cond_move_process_if_block): Likewise.
> > * ipa-modref.c (modref_lattice::add_escape_point): Likewise.
> > (analyze_parms): Likewise.
> > (modref_write_escape_summary): Likewise.
> > (update_escape_summary_1): Likewise.
> > * ipa-prop.h (ipa_copy_agg_values): Likewise.
> > (ipa_release_agg_values): Likewise.
> > * lower-subreg.c (decompose_multiword_subregs): Likewise.
> > * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise.
> > (hash_tree): Likewise.
> > (prune_offload_funcs): Likewise.
> > * sel-sched-dump.c (dump_insn_vector): Likewise.
> > * timevar.c (timer::named_items::print): Likewise.
> > * tree-cfgcleanup.c (cleanup_control_flow_pre): Likewise.
> > (cleanup_tree_cfg_noloop): Likewise.
> > * tree-data-ref.c (dump_data_references): Likewise.
> > (print_dir_vectors): Likewise.
> > (print_dist_vectors): Likewise.
> > (dump_data_dependence_relation): Likewise.
> > (dump_data_dependence_relations): Likewise.
> > (dump_dist_dir_vectors): Likewise.
> > (dump_ddrs): Likewise.
> > (prune_runtime_alias_test_list): Likewise.
> > (create_runtime_alias_checks): Likewise.
> > (free_subscripts): Likewise.
> > (save_dist_v): Likewise.
> > (save_dir_v): Likewise.
> > 

Re: [PATCH V3] Split loop for NE condition.

2021-06-09 Thread guojiufu via Gcc-patches

On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:

On 2021-06-08 18:13, Richard Biener wrote:

On Fri, 4 Jun 2021, Jiufu Guo wrote:


cut...

+  gcond *cond = as_a (last);
+  enum tree_code code = gimple_cond_code (cond);
+  if (!(code == NE_EXPR
+   || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE


The NE_EXPR check misses a corresponding && (e->flags & 
EDGE_FALSE_VALUE)

check.


Thanks, check (e->flags & EDGE_FALSE_VALUE) would be safer.


+   continue;
+
+  /* Check if bound is invarant.  */
+  tree idx = gimple_cond_lhs (cond);
+  tree bnd = gimple_cond_rhs (cond);
+  if (expr_invariant_in_loop_p (loop, idx))
+   std::swap (idx, bnd);
+  else if (!expr_invariant_in_loop_p (loop, bnd))
+   continue;
+
+  /* Only unsigned type conversion could cause wrap.  */
+  tree type = TREE_TYPE (idx);
+  if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
+ || !TYPE_UNSIGNED (type))
+   continue;
+
+  /* Avoid to split if bound is MAX/MIN val.  */
+  tree bound_type = TREE_TYPE (bnd);
+  if (TREE_CODE (bnd) == INTEGER_CST && INTEGRAL_TYPE_P 
(bound_type)

+ && (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
+ || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type
+   continue;


Note you do not require 'bnd' to be constant and thus at runtime those
cases still need to be handled correctly.
Yes, bnd is not required to be constant.  The above code is filtering 
the case
where bnd is const max/min value of the type.  So, the code could be 
updated as:

  if (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
  || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))




+  /* Check if there is possible wrap.  */
+  class tree_niter_desc niter;
+  if (!number_of_iterations_exit (loop, e, , false, 
false))

cut...

+
+  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */


It now occurs to me that we nowhere check the evolution of IDX
(split_at_bb_p uses simple_iv for this for example).  The transform
assumes that we will actually hit i == n and that i increments, but
while you check the control IV from number_of_iterations_exit
for NE_EXPR that does not guarantee a positive evolution.


If I do not correctly reply your question, please point out:
number_of_iterations_exit is similar with simple_iv to invoke
simple_iv_with_niters
which check the evolution, and number_of_iterations_exit check
number_of_iterations_cond
which check no_overflow more accurate, this is one reason I use this 
function.


This transform assumes that the last run hits i==n.
Otherwise, the loop may run infinitely wrap after wrap.
For safe, if the step is 1 or -1,  this assumption would be true.  I
would add this check.

Thanks so much for pointing out I missed the negative step!


Your testcases do not include any negative step examples, but I guess
the conditions need to be swapped in this case?


I would add cases and code to support step 1/-1.



I think you also have to consider the order we split, say with

  for (i = start; i != end; ++i)
{
  push (i);
  if (a[i] != b[i])
break;
}

push (i) calls need to be in the same order for all cases of
start < end, start == end and start > end (and also cover
runtime testcases with end == 0 or end == UINT_MAX, likewise
for start).
I add tests for the above cases. If missing sth, please point out, 
thanks!





+  bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc));
+  enum tree_code up_code = inv ? LT_EXPR : GT_EXPR;
+  enum tree_code down_code = inv ? GT_EXPR : LT_EXPR;

cut

Thanks again for the very helpful review!

BR,
Jiufu Guo.


Here is the updated patch, thanks for your time!

diff --git a/gcc/testsuite/gcc.dg/loop-split1.c 
b/gcc/testsuite/gcc.dg/loop-split1.c

new file mode 100644
index 000..dd2d03a7b96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+void
+foo_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+/* No wrap.  */
+void
+foo1_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+unsigned
+foo2_1 (char *a, char *b, unsigned l, unsigned n)
+{
+  l = 0;
+  while (++l != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+unsigned
+foo3 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+/* No wrap.  */
+unsigned
+foo3_1 

Re: [PATCH 2/2] Disallow pointer and offset types on some gimple

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 3:33 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> While debugging PR 100925, I found that the gimple verifiers
> don't reject NEGATE on pointer or offset type.
> This patch adds the check on some unary and binary gimple which
> should not have operated on pointer/offset types.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Are you sure we don't see sth like EXACT_DIV for OFFSET_TYPE?
(I never remember the contexts where OFFSET_TYPE appears, but
I suppose it's from virtual calls / vtable adjustments or so - but also not
sure why we need to treat it different from integer types at all...)

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> gcc/ChangeLog:
>
> * tree-cfg.c (verify_gimple_assign_unary): Reject point and offset
> types on NEGATE_EXPR, ABS_EXPR, BIT_NOT_EXPR, PAREN_EXPR and 
> CNONJ_EXPR.
> (verify_gimple_assign_binary): Reject point and offset types on
> MULT_EXPR, MULT_HIGHPART_EXPR, TRUNC_DIV_EXPR, CEIL_DIV_EXPR,
> FLOOR_DIV_EXPR, ROUND_DIV_EXPR, TRUNC_MOD_EXPR, CEIL_MOD_EXPR,
> FLOOR_MOD_EXPR, ROUND_MOD_EXPR, RDIV_EXPR, and EXACT_DIV_EXPR.
> ---
>  gcc/tree-cfg.c | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 02256580c98..90fe4775405 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3752,6 +3752,15 @@ verify_gimple_assign_unary (gassign *stmt)
>  case BIT_NOT_EXPR:
>  case PAREN_EXPR:
>  case CONJ_EXPR:
> +  /* Disallow pointer and offset types for many of the unary gimple. */
> +  if (POINTER_TYPE_P (lhs_type)
> + || TREE_CODE (lhs_type) == OFFSET_TYPE)
> +   {
> + error ("invalid types for %qs", code_name);
> + debug_generic_expr (lhs_type);
> + debug_generic_expr (rhs1_type);
> + return true;
> +   }
>break;
>
>  case ABSU_EXPR:
> @@ -4127,6 +4136,19 @@ verify_gimple_assign_binary (gassign *stmt)
>  case ROUND_MOD_EXPR:
>  case RDIV_EXPR:
>  case EXACT_DIV_EXPR:
> +  /* Disallow pointer and offset types for many of the binary gimple. */
> +  if (POINTER_TYPE_P (lhs_type)
> + || TREE_CODE (lhs_type) == OFFSET_TYPE)
> +   {
> + error ("invalid types for %qs", code_name);
> + debug_generic_expr (lhs_type);
> + debug_generic_expr (rhs1_type);
> + debug_generic_expr (rhs2_type);
> + return true;
> +   }
> +  /* Continue with generic binary expression handling.  */
> +  break;
> +
>  case MIN_EXPR:
>  case MAX_EXPR:
>  case BIT_IOR_EXPR:
> --
> 2.27.0
>


[PATCH] aix: add Power10 flag in ASM_CPU_SPEC

2021-06-09 Thread CHIGOT, CLEMENT via Gcc-patches
Changelog:
2021-06-09  Clément Chigot  

* config/rs6000/aix71.h (ASM_CPU_SPEC): Add entry for Power10.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.





0001-aix-add-Power10-flag-in-ASM_CPU_SPEC.patch
Description: 0001-aix-add-Power10-flag-in-ASM_CPU_SPEC.patch


Re: [PATCH 1/2] Fix PR 100925: Limit some a?CST1:CST2 optimizations to intergal types only

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 3:32 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> The problem here is with offset (and pointer) types is we produce
> a negative expression when this optimization hits.
> It is easier to disable this optimization for all non-integeral types
> instead of finding an integer type which is the same precission as the
> type to do the negative expression on it.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR tree-optimization/100925
> * match.pd (a ? CST1 : CST2): Limit transformations
> that would produce a negative to integeral types only.
> Change !POINTER_TYPE_P to INTEGRAL_TYPE_P also.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/pr100925.C: New test.
> ---
>  gcc/match.pd|  8 
>  gcc/testsuite/g++.dg/torture/pr100925.C | 24 
>  2 files changed, 28 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr100925.C
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d06ff170684..bf22bc3a198 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3733,10 +3733,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (integer_onep (@1))
>   (convert (convert:boolean_type_node @0)))
>  /* a ? -1 : 0 -> -a. */
> -(if (integer_all_onesp (@1))
> +(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
>   (negate (convert (convert:boolean_type_node @0
>  /* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
> -(if (!POINTER_TYPE_P (type) && integer_pow2p (@1))
> +(if (INTEGRAL_TYPE_P (type) && integer_pow2p (@1))
>   (with {
> tree shift = build_int_cst (integer_type_node, tree_log2 (@1));
>}
> @@ -3750,10 +3750,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (integer_onep (@2))
>(convert (bit_xor (convert:boolean_type_node @0) { booltrue; } )))
>   /* a ? -1 : 0 -> -(!a). */
> - (if (integer_all_onesp (@2))
> + (if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2))
>(negate (convert (bit_xor (convert:boolean_type_node @0) { booltrue; } 
> 
>   /* a ? powerof2cst : 0 -> (!a) << (log2(powerof2cst)) */
> - (if (!POINTER_TYPE_P (type) && integer_pow2p (@2))
> + (if (INTEGRAL_TYPE_P (type) &&  integer_pow2p (@2))
>(with {
> tree shift = build_int_cst (integer_type_node, tree_log2 (@2));
> }
> diff --git a/gcc/testsuite/g++.dg/torture/pr100925.C 
> b/gcc/testsuite/g++.dg/torture/pr100925.C
> new file mode 100644
> index 000..de13950dca0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr100925.C
> @@ -0,0 +1,24 @@
> +// { dg-do compile }
> +
> +struct QScopedPointerDeleter {
> +  static void cleanup(int *);
> +};
> +class QScopedPointer {
> +  typedef int *QScopedPointer::*RestrictedBool;
> +
> +public:
> +  operator RestrictedBool() { return d ? nullptr : ::d; }
> +  void reset() {
> +if (d)
> +  QScopedPointerDeleter::cleanup(d);
> +  }
> +  int *d;
> +};
> +class DOpenGLPaintDevicePrivate {
> +public:
> +  QScopedPointer fbo;
> +} DOpenGLPaintDeviceresize_d;
> +void DOpenGLPaintDeviceresize() {
> +  if (DOpenGLPaintDeviceresize_d.fbo)
> +DOpenGLPaintDeviceresize_d.fbo.reset();
> +}
> --
> 2.27.0
>


[PATCH] Simplify vect_is_simple_use

2021-06-09 Thread Richard Biener
This simplifies vect_is_simple_use to always get the def-type from
the stmt_info instead of singleing out some gimple stmt kinds.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-09  Richard Biener  

* tree-vect-stmts.c (vect_is_simple_use): Always get dt
from the stmt.
---
 gcc/tree-vect-stmts.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index bd2a1c89e67..eeef96a2eb6 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11326,17 +11326,7 @@ vect_is_simple_use (tree operand, vec_info *vinfo, 
enum vect_def_type *dt,
{
  stmt_vinfo = vect_stmt_to_vectorize (stmt_vinfo);
  def_stmt = stmt_vinfo->stmt;
- switch (gimple_code (def_stmt))
-   {
-   case GIMPLE_PHI:
-   case GIMPLE_ASSIGN:
-   case GIMPLE_CALL:
- *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
- break;
-   default:
- *dt = vect_unknown_def_type;
- break;
-   }
+ *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
  if (def_stmt_info_out)
*def_stmt_info_out = stmt_vinfo;
}
-- 
2.26.2


Re: [PATCH] Use range based loops to iterate over vec<> in various places

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 2:48 AM Trevor Saunders  wrote:
>
> Hello,
>
> This makes things a good bit shorter, and reduces complexity by removing
> a bunch of index variables.
>
> bootstrapped and regtested on x86_64-linux-gnu, ok?

I'd call the cases where you are able to remove the iterator variable
declarations obvious, but there are some where the element variable
remains declared and thus one wonders if the last elem initialization
is used.  Splitting the patch into the obvious (pre-approved) and
not-so obvious parts would be nice.  The not-so obvious pieces would
be more obvious if the retained decl were moved down to its first
use.

That said - how may FOR_EACH_VEC_ELT macro invocations
remain?  Can we remove it?

Thanks,
Richard.

> Trev
>
> gcc/analyzer/ChangeLog:
>
> * call-string.cc (call_string::call_string): Iterate over vec<>
> with range based for.
> (call_string::operator=): Likewise.
> (call_string::to_json): Likewise.
> (call_string::hash): Likewise.
> (call_string::calc_recursion_depth): Likewise.
> * checker-path.cc (checker_path::fixup_locations): Likewise.
> * constraint-manager.cc (equiv_class::equiv_class): Likewise.
> (equiv_class::to_json): Likewise.
> (equiv_class::hash): Likewise.
> (constraint_manager::constraint_manager): Likewise.
> (constraint_manager::operator=): Likewise.
> (constraint_manager::hash): Likewise.
> (constraint_manager::to_json): Likewise.
> (constraint_manager::add_unknown_constraint): Likewise.
> * engine.cc (impl_region_model_context::on_svalue_leak):
> Likewise.
> (on_liveness_change): Likewise.
> (impl_region_model_context::on_unknown_change): Likewise.
> * program-state.cc (extrinsic_state::to_json): Likewise.
> (sm_state_map::set_state): Likewise.
> * region-model.cc (make_test_compound_type): Likewise.
> (test_canonicalization_4): Likewise.
>
> gcc/ChangeLog:
>
> * auto-profile.c (afdo_find_equiv_class): Iterate over vec<>
> with range based for.
> * cgraphclones.c (cgraph_node::create_clone): Likewise.
> (cgraph_node::create_version_clone): Likewise.
> * dwarf2out.c (output_call_frame_info): Likewise.
> * gcc.c (do_specs_vec): Likewise.
> (do_spec_1): Likewise.
> (driver::set_up_specs): Likewise.
> * gimple-loop-jam.c (any_access_function_variant_p): Likewise.
> * ifcvt.c (cond_move_process_if_block): Likewise.
> * ipa-modref.c (modref_lattice::add_escape_point): Likewise.
> (analyze_parms): Likewise.
> (modref_write_escape_summary): Likewise.
> (update_escape_summary_1): Likewise.
> * ipa-prop.h (ipa_copy_agg_values): Likewise.
> (ipa_release_agg_values): Likewise.
> * lower-subreg.c (decompose_multiword_subregs): Likewise.
> * lto-streamer-out.c (DFS::DFS_write_tree_body): Likewise.
> (hash_tree): Likewise.
> (prune_offload_funcs): Likewise.
> * sel-sched-dump.c (dump_insn_vector): Likewise.
> * timevar.c (timer::named_items::print): Likewise.
> * tree-cfgcleanup.c (cleanup_control_flow_pre): Likewise.
> (cleanup_tree_cfg_noloop): Likewise.
> * tree-data-ref.c (dump_data_references): Likewise.
> (print_dir_vectors): Likewise.
> (print_dist_vectors): Likewise.
> (dump_data_dependence_relation): Likewise.
> (dump_data_dependence_relations): Likewise.
> (dump_dist_dir_vectors): Likewise.
> (dump_ddrs): Likewise.
> (prune_runtime_alias_test_list): Likewise.
> (create_runtime_alias_checks): Likewise.
> (free_subscripts): Likewise.
> (save_dist_v): Likewise.
> (save_dir_v): Likewise.
> (invariant_access_functions): Likewise.
> (same_access_functions): Likewise.
> (access_functions_are_affine_or_constant_p): Likewise.
> (compute_all_dependences): Likewise.
> (find_data_references_in_stmt): Likewise.
> (graphite_find_data_references_in_stmt): Likewise.
> (free_dependence_relations): Likewise.
> (free_data_refs): Likewise.
> * tree-into-ssa.c (dump_currdefs): Likewise.
> (rewrite_update_phi_arguments): Likewise.
> * tree-ssa-phiopt.c (cond_if_else_store_replacement): Likewise.
> * tree-ssa-propagate.c (clean_up_loop_closed_phi): Likewise.
> * tree-ssa-structalias.c (constraint_set_union): Likewise.
> (merge_node_constraints): Likewise.
> (move_complex_constraints): Likewise.
> (do_deref): Likewise.
> (get_constraint_for_address_of): Likewise.
> (get_constraint_for_1): Likewise.
> (process_all_all_constraints): Likewise.
> (make_constraints_to): Likewise.
> (handle_rhs_call): Likewise.
> * tree-vect-data-refs.c 

Re: [PATCH] Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 1:13 AM H.J. Lu via Gcc-patches
 wrote:
>
> DT_INIT_ARRAY/DT_FINI_ARRAY support was added to glibc by
>
> commit fcf70d4114db9ff7923f5dfeb3fea6e2d623e5c2
> Author: Ulrich Drepper 
> Date:   Sat Jul 24 19:45:13 1999 +
>
> Update.
>
> 1999-07-24  Ulrich Drepper  
>
> * elf/dl-fini.c: Handle DT_FINI_ARRAY.
> * elf/link.h (struct link_map): Remove l_init_running.  Add 
> l_runcount
> and l_initcount.
> * elf/dl-init.c: Handle DT_INIT_ARRAY.
> ...
>
> PR target/100896
> * config.gcc (gcc_cv_initfini_array): Set to yes for Linux and
> GNU targets.
> ---
>  gcc/config.gcc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6833a6c13d9..4dc4fe0b65c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -848,6 +848,8 @@ case ${target} in
>tmake_file="${tmake_file} t-glibc"
>target_has_targetcm=yes
>target_has_targetdm=yes
> +  # Linux targets always support .init_array.

Other *linux targets specifically mention

# Force .init_array support.  The configure script cannot always
# automatically detect that GAS supports it, yet we require it.
gcc_cv_initfini_array=yes

and thus involve binutils.  Can you please change the comment
to mention the glibc and binutils versions required?  It might
be good to mention those in install.texi as minimal versions given
they are not trumped by already stricter requirements.

Otherwise I think this is OK.

> +  gcc_cv_initfini_array=yes
>;;
>  *-*-netbsd*)
>tm_p_file="${tm_p_file} netbsd-protos.h"
> --
> 2.31.1
>


Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-09 Thread Richard Biener via Gcc-patches
On Wed, Jun 9, 2021 at 12:53 PM Richard Biener
 wrote:
>
> On Tue, Jun 8, 2021 at 10:45 PM Bill Schmidt  wrote:
> >
> > On 6/7/21 12:48 PM, Bill Schmidt wrote:
> > > On 6/7/21 12:45 PM, Richard Biener wrote:
> > >> On Mon, Jun 7, 2021 at 5:38 PM Bill Schmidt 
> > >> wrote:
> > >>> On 6/7/21 8:36 AM, Richard Biener wrote:
> >  Some maybe obvious issue - what about DOS-style path hosts?
> >  You seem to build ../ strings to point to parent dirs... I'm not sure
> >  what we do elsewhere - I suppose we arrange for appropriate
> >  -I command line arguments?
> > 
> > >>> Well, actually it's just using "./" to identify the build directory,
> > >>> though I see what you mean about potential Linux bias. There is
> > >>> precedent for this syntax identifying the build directory in config.gcc
> > >>> for target macro files:
> > >>>
> > >>> #  tm_file  A list of target macro files, if different from
> > >>> #   "$cpu_type/$cpu_type.h". Usually it's
> > >>> constructed
> > >>> #   per target in a way like this:
> > >>> #   tm_file="${tm_file} dbxelf.h elfos.h
> > >>> ${cpu_type.h}/elf.h"
> > >>> #   Note that the preferred order is:
> > >>> #   - specific target header
> > >>> "${cpu_type}/${cpu_type.h}"
> > >>> #   - generic headers like dbxelf.h elfos.h, etc.
> > >>> #   - specializing target headers like
> > >>> ${cpu_type.h}/elf.h
> > >>> #   This helps to keep OS specific stuff out of
> > >>> the CPU
> > >>> #   defining header ${cpu_type}/${cpu_type.h}.
> > >>> #
> > >>> #   It is possible to include
> > >>> automatically-generated
> > >>> #   build-directory files by prefixing them with
> > >>> "./".
> > >>> #   All other files should relative to
> > >>> $srcdir/config.
> > >>>
> > >>> ...so I thought I would try to be consistent with this change. In patch
> > >>> 0025 I use this as follows:
> > >>>
> > >>> --- a/gcc/config.gcc
> > >>> +++ b/gcc/config.gcc
> > >>> @@ -491,6 +491,7 @@ powerpc*-*-*)
> > >>>   extra_options="${extra_options} g.opt fused-madd.opt
> > >>> rs6000/rs6000-tables.opt"
> > >>>   target_gtfiles="$target_gtfiles
> > >>> \$(srcdir)/config/rs6000/rs6000-logue.c
> > >>> \$(srcdir)/config/rs6000/rs6000-call.c"
> > >>>   target_gtfiles="$target_gtfiles
> > >>> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
> > >>> +   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
> > >>> ;;
> > >>>pru-*-*)
> > >>> cpu_type=pru
> > >>>
> > >>> I'm open to trying to do something different if you think that's
> > >>> appropriate.
> > >> Well, I'm not sure whether/how to resolve this.  You could try
> > >> building a cross to powerpc-linux from a x86_64-mingw host ...
> > >> maybe there's one on the CF?  Or some of your fellow RedHat
> > >> people have access to mingw or the like envs to try whether it
> > >> just works with your change ...
> > >>
> > >> Otherwise it looks OK.
> > >
> > > I'll see what I can find.  Thanks again for reviewing the patch!
> >
> >
> > Hm.  Ultimately, I think the cross compiler case is doomed unless mingw
> > already handles converting forward slashes to back slashes. There's no
> > single syntax that works on both Windows and Linux. (There's no mingw
> > server in the compile farm to play with.)
> >
> > I'm inclined to accept both "./" and ".\" for native builds, and kick
> > the can down the road beyond that.  What do you think?
>
> Can't you use PATH_SEPARATOR somehow?  See file-find.c / incpath.c
> or gcc.c for uses and system.h for where it is defined.

Err - DIR_SEPARATOR of course.

Richard.

> Richard.
>
> >
> > Bill
> >
> > >
> > > Bill
> > >
> > >
> > >>
> > >> Richard.
> > >>
> > >>> Thanks for your help with this!
> > >>>
> > >>> Bill
> > >>>


Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-09 Thread Richard Biener via Gcc-patches
On Tue, Jun 8, 2021 at 10:45 PM Bill Schmidt  wrote:
>
> On 6/7/21 12:48 PM, Bill Schmidt wrote:
> > On 6/7/21 12:45 PM, Richard Biener wrote:
> >> On Mon, Jun 7, 2021 at 5:38 PM Bill Schmidt 
> >> wrote:
> >>> On 6/7/21 8:36 AM, Richard Biener wrote:
>  Some maybe obvious issue - what about DOS-style path hosts?
>  You seem to build ../ strings to point to parent dirs... I'm not sure
>  what we do elsewhere - I suppose we arrange for appropriate
>  -I command line arguments?
> 
> >>> Well, actually it's just using "./" to identify the build directory,
> >>> though I see what you mean about potential Linux bias. There is
> >>> precedent for this syntax identifying the build directory in config.gcc
> >>> for target macro files:
> >>>
> >>> #  tm_file  A list of target macro files, if different from
> >>> #   "$cpu_type/$cpu_type.h". Usually it's
> >>> constructed
> >>> #   per target in a way like this:
> >>> #   tm_file="${tm_file} dbxelf.h elfos.h
> >>> ${cpu_type.h}/elf.h"
> >>> #   Note that the preferred order is:
> >>> #   - specific target header
> >>> "${cpu_type}/${cpu_type.h}"
> >>> #   - generic headers like dbxelf.h elfos.h, etc.
> >>> #   - specializing target headers like
> >>> ${cpu_type.h}/elf.h
> >>> #   This helps to keep OS specific stuff out of
> >>> the CPU
> >>> #   defining header ${cpu_type}/${cpu_type.h}.
> >>> #
> >>> #   It is possible to include
> >>> automatically-generated
> >>> #   build-directory files by prefixing them with
> >>> "./".
> >>> #   All other files should relative to
> >>> $srcdir/config.
> >>>
> >>> ...so I thought I would try to be consistent with this change. In patch
> >>> 0025 I use this as follows:
> >>>
> >>> --- a/gcc/config.gcc
> >>> +++ b/gcc/config.gcc
> >>> @@ -491,6 +491,7 @@ powerpc*-*-*)
> >>>   extra_options="${extra_options} g.opt fused-madd.opt
> >>> rs6000/rs6000-tables.opt"
> >>>   target_gtfiles="$target_gtfiles
> >>> \$(srcdir)/config/rs6000/rs6000-logue.c
> >>> \$(srcdir)/config/rs6000/rs6000-call.c"
> >>>   target_gtfiles="$target_gtfiles
> >>> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
> >>> +   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
> >>> ;;
> >>>pru-*-*)
> >>> cpu_type=pru
> >>>
> >>> I'm open to trying to do something different if you think that's
> >>> appropriate.
> >> Well, I'm not sure whether/how to resolve this.  You could try
> >> building a cross to powerpc-linux from a x86_64-mingw host ...
> >> maybe there's one on the CF?  Or some of your fellow RedHat
> >> people have access to mingw or the like envs to try whether it
> >> just works with your change ...
> >>
> >> Otherwise it looks OK.
> >
> > I'll see what I can find.  Thanks again for reviewing the patch!
>
>
> Hm.  Ultimately, I think the cross compiler case is doomed unless mingw
> already handles converting forward slashes to back slashes. There's no
> single syntax that works on both Windows and Linux. (There's no mingw
> server in the compile farm to play with.)
>
> I'm inclined to accept both "./" and ".\" for native builds, and kick
> the can down the road beyond that.  What do you think?

Can't you use PATH_SEPARATOR somehow?  See file-find.c / incpath.c
or gcc.c for uses and system.h for where it is defined.

Richard.

>
> Bill
>
> >
> > Bill
> >
> >
> >>
> >> Richard.
> >>
> >>> Thanks for your help with this!
> >>>
> >>> Bill
> >>>


Re: [PATCH] PR middle-end/53267: Constant fold BUILT_IN_FMOD.

2021-06-09 Thread Richard Biener via Gcc-patches
On Tue, Jun 8, 2021 at 9:36 PM Roger Sayle  wrote:
>
>
> Here's a three line patch to implement constant folding for fmod,
> fmodf and fmodl, which resolves an enhancement request from 2012.
>
> The following patch has been tested on x86_64-pc-linux-gnu with
> a make bootstrap and make -k check with no new failures.
>
> Ok for mainline?

OK.  I double-checked and mpfr_fmod appeared in mpfr 2.4.0 and
we require at least 3.1.0.

Thanks,
Richard.

>
> 2020-06-08  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/53267
> * fold-const-call.c (fold_const_call_sss) [CASE_CFN_FMOD]:
> Support evaluation of fmod/fmodf/fmodl at compile-time.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/builtins-70.c: New test.
>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-09 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 4 Jun 2021 at 13:15, Christophe Lyon  wrote:
>
> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > Hi,
> > As mentioned in PR, for the following test-case:
> >
> > #include 
> >
> > bfloat16x4_t f1 (bfloat16_t a)
> > {
> >   return vdup_n_bf16 (a);
> > }
> >
> > bfloat16x4_t f2 (bfloat16_t a)
> > {
> >   return (bfloat16x4_t) {a, a, a, a};
> > }
> >
> > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> >
> > f1:
> > vdup.16 d16, r0
> > vmovr0, r1, d16  @ v4bf
> > bx  lr
> >
> > f2:
> > mov r3, r0  @ __bf16
> > adr r1, .L4
> > ldrdr0, [r1]
> > mov r2, r3  @ __bf16
> > mov ip, r3  @ __bf16
> > bfi r1, r2, #0, #16
> > bfi r0, ip, #0, #16
> > bfi r1, r3, #16, #16
> > bfi r0, r2, #16, #16
> > bx  lr
> >
> > This seems to happen because vec_init pattern in neon.md has VDQ mode
> > iterator, which doesn't include V4BF. In attached patch, I changed
> > mode
> > to VDQX which seems to work for the test-case, and the compiler now 
> > generates:
> >
> > f2:
> > vdup.16 d16, r0
> > vmovr0, r1, d16  @ v4bf
> > bx  lr
> >
> > However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > only 128-bit vectors ?
> >
>
> I think patterns common to both Neon and MVE should be moved to
> vec-common.md, I don't know why such patterns were left in neon.md.
Since we end up calling neon_expand_vector_init for both NEON and MVE,
I am not sure if we should separate the pattern ?
Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
in attached patch so
it will call neon_expand_vector_init only for 128-bit vectors ?
Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.

Thanks,
Prathamesh
>
> That being said, I suggest you look at other similar patterns in
> vec-common.md, most of which are gated on
> ARM_HAVE__ARITH
> and possibly beware of issues with iwmmxt :-)
>
> Christophe
>
> > Thanks,
> > Prathamesh


pr98435-2.diff
Description: Binary data


  1   2   >