Re: [PATCH][WIP] Add install-dvi Makefile targets

2021-10-18 Thread Thomas Koenig via Gcc-patches

Hi Eric,


Hi, I have updated this patch and tested it with more languages now; I
can now confirm that it works with ada, d, and fortran now. The only
languages that remain untested now are go (since I'm building on
darwin and go doesn't build on darwin anyways, as per bug 46986) and
jit (which I ran into a bug about that I brought up on IRC, and will
probably need to file on bugzilla). OK to install?


Fortran parts look good.

Best regards

Thomas


[PATCH 4/4] Improve maybe_remove_writeonly_store to do a simple DCE for defining statement

2021-10-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Instead of putting a full blow DCE after execute_fixup_cfg, it makes sense
to try to remove the defining statement for the store that is being removed.
Right now we only handle PHI node statements as there needs no extra checks
except for it is only used once in the store statement.

gcc/ChangeLog:

* tree-cfg.c (maybe_remove_writeonly_store): Remove defining
(PHI) statement of the store if possible.
---
 gcc/tree-cfg.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index dbbf6beb6e4..d9efdc220ca 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9692,6 +9692,41 @@ maybe_remove_writeonly_store (gimple_stmt_iterator , 
gimple *stmt)
   print_gimple_stmt (dump_file, stmt, 0,
 TDF_VOPS|TDF_MEMSYMS);
 }
+
+  /* Remove the statement defining the rhs if it was only
+ used by this statement. */
+  if (gimple_assign_single_p (stmt))
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+  gimple *use_stmt;
+  use_operand_p use_p;
+  gimple *stmt1;
+
+
+  if (TREE_CODE (rhs) == SSA_NAME
+ && single_imm_use (rhs, _p, _stmt)
+ && (stmt1 = SSA_NAME_DEF_STMT (rhs))
+ /* For now only handle PHI nodes.
+FIXME: this should handle more. */
+ && gimple_code (stmt1) == GIMPLE_PHI)
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Removing defining statement:\n");
+ print_gimple_stmt (dump_file, stmt1, 0,
+TDF_VOPS|TDF_MEMSYMS);
+   }
+ gimple_stmt_iterator gsi_for_def;
+ gsi_for_def = gsi_for_stmt (stmt1);
+ if (gimple_code (stmt1) == GIMPLE_PHI)
+   remove_phi_node (_for_def, true);
+ else
+   {
+ gsi_remove (_for_def, true);
+ release_defs (stmt1);
+   }
+   }
+}
   unlink_stmt_vdef (stmt);
   gsi_remove (, true);
   release_defs (stmt);
-- 
2.17.1



[PATCH 3/4] Factor out removal of write only stores from execute_fixup_cfg

2021-10-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To make it easier to fix PR 102703, factoring this code out
to its own function makes it easier to read and less indentions
too.

gcc/ChangeLog:

* tree-cfg.c (maybe_remove_writeonly_store): New function
factored out from ...
(execute_fixup_cfg): Here. Call maybe_remove_writeonly_store.
---
 gcc/tree-cfg.c | 62 ++
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c20fc4980c6..dbbf6beb6e4 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9666,6 +9666,38 @@ make_pass_warn_unused_result (gcc::context *ctxt)
   return new pass_warn_unused_result (ctxt);
 }
 
+/* Maybe Remove stores to variables we marked write-only.
+   Return true if a store was removed. */
+static bool
+maybe_remove_writeonly_store (gimple_stmt_iterator , gimple *stmt)
+{
+  /* Keep access when store has side effect, i.e. in case when source
+ is volatile.  */  
+  if (!gimple_store_p (stmt)
+  || gimple_has_side_effects (stmt)
+  || optimize_debug)
+return false;
+
+  tree lhs = get_base_address (gimple_get_lhs (stmt));
+
+  if (!VAR_P (lhs)
+  || (!TREE_STATIC (lhs) && !DECL_EXTERNAL (lhs))
+  || !varpool_node::get (lhs)->writeonly)
+return false;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Removing statement, writes"
+  " to write only var:\n");
+  print_gimple_stmt (dump_file, stmt, 0,
+TDF_VOPS|TDF_MEMSYMS);
+}
+  unlink_stmt_vdef (stmt);
+  gsi_remove (, true);
+  release_defs (stmt);
+  return true;
+}
+
 /* IPA passes, compilation of earlier functions or inlining
might have changed some properties, such as marked functions nothrow,
pure, const or noreturn.
@@ -9721,33 +9753,13 @@ execute_fixup_cfg (void)
todo |= TODO_cleanup_cfg;
 }
 
- /* Remove stores to variables we marked write-only.
-Keep access when store has side effect, i.e. in case when source
-is volatile.  */
- if (gimple_store_p (stmt)
- && !gimple_has_side_effects (stmt)
- && !optimize_debug)
+ /* Remove stores to variables we marked write-only. */
+ if (maybe_remove_writeonly_store (gsi, stmt))
{
- tree lhs = get_base_address (gimple_get_lhs (stmt));
-
- if (VAR_P (lhs)
- && (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
- && varpool_node::get (lhs)->writeonly)
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   {
- fprintf (dump_file, "Removing statement, writes"
-  " to write only var:\n");
- print_gimple_stmt (dump_file, stmt, 0,
-TDF_VOPS|TDF_MEMSYMS);
-   }
- unlink_stmt_vdef (stmt);
- gsi_remove (, true);
- release_defs (stmt);
- todo |= TODO_update_ssa | TODO_cleanup_cfg;
- continue;
-   }
+ todo |= TODO_update_ssa | TODO_cleanup_cfg;
+ continue;
}
+
  /* For calls we can simply remove LHS when it is known
 to be write-only.  */
  if (is_gimple_call (stmt)
-- 
2.17.1



[PATCH 2/4] Remove outdated comment about execute_fixup_cfg

2021-10-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The comment about execute_fixup_cfg not being able to
run as a standalone pass is not true for a long time
now.  It has been a standalone pass for a while now.

gcc/ChangeLog:

* tree-cfg.c (execute_fixup_cfg): Remove comment
about standalone pass.
---
 gcc/tree-cfg.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index b78e4564e4d..c20fc4980c6 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9669,10 +9669,7 @@ make_pass_warn_unused_result (gcc::context *ctxt)
 /* IPA passes, compilation of earlier functions or inlining
might have changed some properties, such as marked functions nothrow,
pure, const or noreturn.
-   Remove redundant edges and basic blocks, and create new ones if necessary.
-
-   This pass can't be executed as stand alone pass from pass manager, because
-   in between inlining and this fixup the verify_flow_info would fail.  */
+   Remove redundant edges and basic blocks, and create new ones if necessary. 
*/
 
 unsigned int
 execute_fixup_cfg (void)
-- 
2.17.1



[PATCH 1/4] Add dump prints when execute_fixup_cfg removes a write only var store.

2021-10-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

While debugging PR 102703, I found it was hard to figure out where
the store was being removed as there was no pass which was outputting
why the store was removed.
This adds to execute_fixup_cfg the output.
Also note most of removals happen when execute_fixup_cfg is called
from the inliner.

gcc/ChangeLog:

* tree-cfg.c (execute_fixup_cfg): Output when the statement
is removed when it is a write only var.
---
 gcc/tree-cfg.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 4b4b0b52d9a..b78e4564e4d 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9737,6 +9737,13 @@ execute_fixup_cfg (void)
  && (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
  && varpool_node::get (lhs)->writeonly)
{
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Removing statement, writes"
+  " to write only var:\n");
+ print_gimple_stmt (dump_file, stmt, 0,
+TDF_VOPS|TDF_MEMSYMS);
+   }
  unlink_stmt_vdef (stmt);
  gsi_remove (, true);
  release_defs (stmt);
-- 
2.17.1



[PATCH 0/4] Fix PR tree-opt/102703

2021-10-18 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This patch series fixes PR tree-opt/102703 by
improving the code which will delete write only stores to also
delete the phi node (if it was a phi node) that was used to define
the write.
We need to some factoring out of the code to make it easier
to understand and less indention.

Andrew Pinski (4):
  Add dump prints when execute_fixup_cfg removes a write only var store.
  Remove outdated comment about execute_fixup_cfg
  Factor out removal of write only stores from execute_fixup_cfg
  Improve maybe_remove_writeonly_store to do a simple DCE for defining
statement

 gcc/tree-cfg.c | 95 ++
 1 file changed, 73 insertions(+), 22 deletions(-)

-- 
2.17.1



[PATCH] tree-object-size: Make unknown a computation

2021-10-18 Thread Siddhesh Poyarekar
Compute the unknown size value as a function of the min/max bit of
object_size_type.  This transforms into a neat little branchless
sequence on x86_64:

movl%edi, %eax
sarl%eax
xorl$1, %eax
negl%eax
cltq

which should be faster than loading the value from memory.  A quick
unscientific test using

`time make check-gcc RUNTESTFLAGS="dg.exp=builtin*"`

shaves about half a second off execution time with this.  Also remove
unknown_object_size and its only use in favour of setting object_sizes
directly to unknown(object_size_type).

gcc/ChangeLog:

* tree-object-size.c (unknown): Make into a function.  Adjust
all uses.
(unknown_object_size): Remove function.
(collect_object_sizes_for): Set object_sizes directly.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/tree-object-size.c | 108 +++--
 1 file changed, 39 insertions(+), 69 deletions(-)

diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
index 46a976dfe10..bd948b6f669 100644
--- a/gcc/tree-object-size.c
+++ b/gcc/tree-object-size.c
@@ -45,13 +45,6 @@ struct object_size_info
   unsigned int *stack, *tos;
 };
 
-static const unsigned HOST_WIDE_INT unknown[4] = {
-  HOST_WIDE_INT_M1U,
-  HOST_WIDE_INT_M1U,
-  0,
-  0
-};
-
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
  const_tree, int, unsigned HOST_WIDE_INT *);
@@ -82,6 +75,11 @@ static bitmap computed[4];
 /* Maximum value of offset we consider to be addition.  */
 static unsigned HOST_WIDE_INT offset_limit;
 
+static inline unsigned HOST_WIDE_INT
+unknown (int object_size_type)
+{
+  return ((unsigned HOST_WIDE_INT) -((object_size_type >> 1) ^ 1));
+}
 
 /* Initialize OFFSET_LIMIT variable.  */
 static void
@@ -204,7 +202,7 @@ decl_init_size (tree decl, bool min)
 
 /* Compute __builtin_object_size for PTR, which is a ADDR_EXPR.
OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
-   If unknown, return unknown[object_size_type].  */
+   If unknown, return unknown (object_size_type).  */
 
 static bool
 addr_object_size (struct object_size_info *osi, const_tree ptr,
@@ -216,7 +214,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 
   /* Set to unknown and overwrite just before returning if the size
  could be determined.  */
-  *psize = unknown[object_size_type];
+  *psize = unknown (object_size_type);
 
   pt_var = TREE_OPERAND (ptr, 0);
   while (handled_component_p (pt_var))
@@ -244,9 +242,9 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
SSA_NAME_VERSION (var)))
sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
-  if (sz != unknown[object_size_type])
+  if (sz != unknown (object_size_type))
{
  offset_int mem_offset;
  if (mem_ref_offset (pt_var).is_constant (_offset))
@@ -257,13 +255,13 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
  else if (wi::fits_uhwi_p (dsz))
sz = dsz.to_uhwi ();
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
  else
-   sz = unknown[object_size_type];
+   sz = unknown (object_size_type);
}
 
-  if (sz != unknown[object_size_type] && sz < offset_limit)
+  if (sz != unknown (object_size_type) && sz < offset_limit)
pt_var_size = size_int (sz);
 }
   else if (DECL_P (pt_var))
@@ -445,7 +443,7 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
 /* Compute __builtin_object_size for CALL, which is a GIMPLE_CALL.
Handles calls to functions declared with attribute alloc_size.
OBJECT_SIZE_TYPE is the second argument from __builtin_object_size.
-   If unknown, return unknown[object_size_type].  */
+   If unknown, return unknown (object_size_type).  */
 
 static unsigned HOST_WIDE_INT
 alloc_object_size (const gcall *call, int object_size_type)
@@ -459,7 +457,7 @@ alloc_object_size (const gcall *call, int object_size_type)
 calltype = gimple_call_fntype (call);
 
   if (!calltype)
-return unknown[object_size_type];
+return unknown (object_size_type);
 
   /* Set to positions of alloc_size arguments.  */
   int arg1 = -1, arg2 = -1;
@@ -479,7 +477,7 @@ alloc_object_size (const gcall *call, int object_size_type)
   || (arg2 >= 0
  && (arg2 >= (int)gimple_call_num_args (call)
  || TREE_CODE (gimple_call_arg (call, arg2)) != INTEGER_CST)))
-return unknown[object_size_type];
+return unknown (object_size_type);
 
   tree bytes = NULL_TREE;
   if (arg2 >= 0)
@@ -492,7 +490,7 @@ alloc_object_size (const gcall *call, int object_size_type)
   if (bytes && 

[PATCH v5 2/2] Don't move cold code out of loop by checking bb count

2021-10-18 Thread Xionghu Luo via Gcc-patches



On 2021/10/18 12:29, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/10/15 16:11, Richard Biener wrote:
>> On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo  wrote:
>>>
>>> Hi,
>>>
>>> On 2021/9/28 20:09, Richard Biener wrote:
 On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
>
> Update the patch to v3, not sure whether you prefer the paste style
> and continue to link the previous thread as Segher dislikes this...
>
>
> [PATCH v3] Don't move cold code out of loop by checking bb count
>
>
> Changes:
> 1. Handle max_loop in determine_max_movement instead of
> outermost_invariant_loop.
> 2. Remove unnecessary changes.
> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
> can_sm_ref_p.
> 4. "gsi_next ();" in move_computations_worker is kept since it caused
> infinite loop when implementing v1 and the iteration is missed to be
> updated actually.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
>
> There was a patch trying to avoid move cold block out of loop:
>
> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
>
> Richard suggested to "never hoist anything from a bb with lower execution
> frequency to a bb with higher one in LIM invariantness_dom_walker
> before_dom_children".
>
> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
> expected target loop, if profile count of the loop bb is colder
> than target loop preheader, it won't be hoisted out of loop.
> Likely for store motion, if all locations of the REF in loop is cold,
> don't do store motion of it.
>
> SPEC2017 performance evaluation shows 1% performance improvement for
> intrate GEOMEAN and no obvious regression for others.  Especially,
> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> on P8LE.
>
> gcc/ChangeLog:
>
> * loop-invariant.c (find_invariants_bb): Check profile count
> before motion.
> (find_invariants_body): Add argument.
> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
> (determine_max_movement): Use find_coldest_out_loop.
> (move_computations_worker): Adjust and fix iteration udpate.
> (execute_sm_exit): Check pointer validness.
> (class ref_in_loop_hot_body): New functor.
> (ref_in_loop_hot_body::operator): New.
> (can_sm_ref_p): Use for_all_locs_in_loop.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> ---
>  gcc/loop-invariant.c   | 10 ++--
>  gcc/tree-ssa-loop-im.c | 61 --
>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
>  7 files changed, 165 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
>
> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> index fca0c2b24be..5c3be7bf0eb 100644
> --- a/gcc/loop-invariant.c
> +++ b/gcc/loop-invariant.c
> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> always_reached, bool always_executed)
> call.  */
>
>  static void
> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> always_executed)
> +find_invariants_bb (class loop *loop, basic_block bb, bool 
> always_reached,
> +   bool always_executed)
>  {
>rtx_insn *insn;
> +  basic_block preheader = loop_preheader_edge (loop)->src;
> +
> +  if (preheader->count > bb->count)
> +return;
>
>FOR_BB_INSNS (bb, insn)
>  {
> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
> *body,
>unsigned i;
>
>for (i = 0; i < loop->num_nodes; i++)
> -find_invariants_bb (body[i],
> -   bitmap_bit_p (always_reached, i),
> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> bitmap_bit_p (always_executed, i));
>  

Re: [PATCH][WIP] Add install-dvi Makefile targets

2021-10-18 Thread Eric Gallager via Gcc-patches
On Tue, Oct 12, 2021 at 5:09 PM Eric Gallager  wrote:
>
> On Thu, Oct 6, 2016 at 10:41 AM Eric Gallager  wrote:
> >
> > Currently the build machinery handles install-pdf and install-html
> > targets, but no install-dvi target. This patch is a step towards
> > fixing that. Note that I have only tested with
> > --enable-languages=c,c++,lto,objc,obj-c++. Thus, target hooks will
> > probably also have to be added for the languages I skipped.
> > Also, please note that this patch applies on top of:
> > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00370.html
> >
> > ChangeLog:
> >
> > 2016-10-06  Eric Gallager  
> >
> > * Makefile.def: Handle install-dvi target.
> > * Makefile.tpl: Likewise.
> > * Makefile.in: Regenerate.
> >
> > gcc/ChangeLog:
> >
> > 2016-10-06  Eric Gallager  
> >
> > * Makefile.in: Handle dvidir and install-dvi target.
> > * ./[c|cp|lto|objc|objcp]/Make-lang.in: Add dummy install-dvi
> > target hooks.
> > * configure.ac: Handle install-dvi target.
> > * configure: Regenerate.
> >
> > libiberty/ChangeLog:
> >
> > 2016-10-06  Eric Gallager  
> >
> > * Makefile.in: Handle dvidir and install-dvi target.
> > * functions.texi: Regenerate.
>
> Ping. The prerequisite patch that I linked to previously has gone in now.
> I'm not sure if this specific patch still applies, though.
> Also note that I've opened a bug to track this issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102663

Hi, I have updated this patch and tested it with more languages now; I
can now confirm that it works with ada, d, and fortran now. The only
languages that remain untested now are go (since I'm building on
darwin and go doesn't build on darwin anyways, as per bug 46986) and
jit (which I ran into a bug about that I brought up on IRC, and will
probably need to file on bugzilla). OK to install?


patch-install-dvi.diff
Description: Binary data


[PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.
---
 gcc/config/rs6000/smmintrin.h | 292 ++
 .../gcc.target/powerpc/sse4_1-round3.h|  81 +
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
 6 files changed, 1014 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 90ce03d22709..6bb03e6e20ac 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,234 @@
 #include 
 #include 
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT   0x00
+#define _MM_FROUND_TO_ZERO  0x01
+#define _MM_FROUND_TO_POS_INF   0x02
+#define _MM_FROUND_TO_NEG_INF   0x03
+#define _MM_FROUND_CUR_DIRECTION0x04
+
+#define _MM_FROUND_NINT\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR   \
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC   \
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT   \
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+
+#define _MM_FROUND_RAISE_EXC0x00
+#define _MM_FROUND_NO_EXC   0x08
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_pd (__m128d __A, int __rounding)
+{
+  __v2df __r;
+  union {
+double __fr;
+long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+{
+  /* Save enabled exceptions, disable all exceptions,
+and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+  __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+  __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+  __fpscr_save.__fr = __builtin_mffs ();
+  

[PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-10-18  Paul A. Clarke  

gcc
PR target/101893
PR target/102719
* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
* config/rs6000/pmmintrin.h: Same.
* config/rs6000/smmintrin.h: Same.
* config/rs6000/tmmintrin.h: Same.
---
 gcc/config/rs6000/emmintrin.h | 12 ++--
 gcc/config/rs6000/pmmintrin.h |  4 
 gcc/config/rs6000/smmintrin.h |  4 
 gcc/config/rs6000/tmmintrin.h | 12 
 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c |  4 ++--
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0, 0x7ff0};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
   return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6bb03e6e20ac..24adc95589ad 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -324,6 +324,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -335,6 +336,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -395,6 +397,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 __inline __m128d
 __attribute__ ((__gnu_inline__, 

[PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers

2021-10-18 Thread Paul A. Clarke via Gcc-patches
Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-18  Paul A. Clarke  

gcc
* config/config.gcc (extra_headers): Add nmmintrin.h.
---
 gcc/config.gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index aa5bd5d14590..1cb9303b3a85 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,6 +490,7 @@ powerpc*-*-*)
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
extra_headers="${extra_headers} mmintrin.h x86intrin.h"
extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
+   extra_headers="${extra_headers} nmmintrin.h"
extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h 
si2vmx.h"
extra_headers="${extra_headers} amo.h"
case x$with_cpu in
-- 
2.27.0



[PATCH v4 0/3] rs6000: Support more SSE4 intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
v4:
- Of original 6 patches in this series, I committed patches 2-5.
- Found an issue from v3. New file "nmmintrin.h" also needs to be added
to gcc/config.gcc "extra_headers".  Unfortunately, I discovered this
after committing the patch which added "nmmintrin.h", so I've added a
new patch here.
- Added scheduling "barriers" to patch 2 after review from Segher.
- Noted additional PR fixed by patch 3.

v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
and users will expect to be able to include "nmmintrin.h",
even though "nmmintrin.h" just includes "smmintrin.h"
where all of the SSE4.2 implementations actually appear.
Only patch 5/6 changed from v2.

Tested ppc64le (POWER9) and ppc64/32 (POWER7).

OK for trunk?

Paul A. Clarke (3):
  rs6000: Add nmmintrin.h to extra_headers
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config.gcc|   1 +
 gcc/config/rs6000/emmintrin.h |  12 +-
 gcc/config/rs6000/pmmintrin.h |   4 +
 gcc/config/rs6000/smmintrin.h | 296 ++
 gcc/config/rs6000/tmmintrin.h |  12 +
 .../gcc.target/powerpc/sse4_1-round3.h|  81 +
 .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
 .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
 .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
 .../gcc.target/powerpc/sse4_1-roundss.c   | 208 
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   |   4 +-
 11 files changed, 1039 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

-- 
2.27.0



Re: [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics

2021-10-18 Thread Paul A. Clarke via Gcc-patches
On Tue, Oct 12, 2021 at 05:25:32PM -0500, Segher Boessenkool wrote:
> On Tue, Oct 12, 2021 at 02:35:57PM -0500, Paul A. Clarke wrote:
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feholdsetround_ppc_ctx (struct rm_ctx *ctx, int r)
> > {
> >   fenv_union_t old;
> >   register fenv_union_t __fr;
> >   __asm__ __volatile__ ("mffscrni %0,%1" : "=f" (__fr.fenv) : "i" (r));
> >   ctx->env = old.fenv = __fr.fenv; 
> >   ctx->updated_status = (r != (old.l & 3));
> > }
> 
> (Should use "n", not "i", only numbers are allowed, not e.g. the address
> of something.  This actually can matter, in unusual cases.)

Noted, will submit a change to glibc when I get a chance. Thanks!

> This orders the updating of RN before the store to __fr.fenv .  There is
> no other ordering ensured here.
> 
> The store to __fr.env obviously has to stay in order with anything that
> can alias it, if that store isn't optimised away completely later.
> 
> > static __inline __attribute__ ((__always_inline__)) void
> > libc_feresetround_ppc (fenv_t *envp)
> > { 
> >   fenv_union_t new = { .fenv = *envp };
> >   register fenv_union_t __fr;
> >   __fr.l = new.l & 3;
> >   __asm__ __volatile__ ("mffscrn %0,%1" : "=f" (__fr.fenv) : "f" 
> > (__fr.fenv));
> > }
> 
> This both reads from and stores to __fr.fenv, the asm has to stay
> between those two accesses (in the machine code).  If the code that
> actually depends on the modified RN depends onb that __fr.fenv some way,
> all will be fine.
> 
> > double
> > __sin (double x)
> > {
> >   struct rm_ctx ctx __attribute__ ((cleanup (libc_feresetround_ppc_ctx)));
> >   libc_feholdsetround_ppc_ctx (, (0));
> >   /* floating point intensive code.  */
> >   return retval;
> > }
> 
> ... but there is no such dependency.  The cleanup attribute does not
> give any such ordering either afaik.
> 
> > There's not much to it, really.  "mffscrni" on the way in to save and set
> > a required rounding mode, and "mffscrn" on the way out to restore it.
> 
> Yes.  But the code making use of the modified RN needs to have some
> artificial dependencies with the RN setters, perhaps via __fr.fenv .
> 
> > > Calling a real function (that does not even need a stack frame, just a
> > > blr) is not terribly expensive, either.
> > 
> > Not ideal, better would be better.
> 
> Yes.  But at least it *works* :-)  I'll take a stupid, simply, stupidly
> simple, *robust* solution over some nice, faster,nicely faster way of
> doing the wrong thing.

Understand, and agree. 

> > > > > > Would creating a __builtin_mffsce be another solution?
> > > > > 
> > > > > Yes.  And not a bad idea in the first place.
> > > > 
> > > > The previous "Nope" and this "Yes" seem in contradiction. If there is no
> > > > difference between "asm" and builtin, how does using a builtin solve the
> > > > problem?
> > > 
> > > You will have to make the builtin solve it.  What a builtin can do is
> > > virtually unlimited.  What an asm can do is not: it just outputs some
> > > assembler language, and does in/out/clobber constraints.  You can do a
> > > *lot* with that, but it is much more limited than everything you can do
> > > in the compiler!  :-)
> > > 
> > > The fact remains that there is no way in RTL (or Gimple for that matter)
> > > to express things like rounding mode changes.  You will need to
> > > artificially make some barriers.
> > 
> > I know there is __builtin_set_fpscr_rn that generates mffscrn.
> 
> Or some mtfsb[01]'s, or nasty mffs/mtfsf code, yeah.  And it does not
> provide the ordering either.  It *cannot*: you need to cooperate with
> whatever you are ordering against.  There is no way in GCC to say "this
> is an FP insn and has to stay in order with all FP control writes and FP
> status reads".
> 
> Maybe now you see why I like external functions for this :-)
> 
> > This
> > is not used in the code above because I believe it first appears in
> > GCC 9.1 or so, and glibc still supports GCC 6.2 (and it doesn't define
> > a return value, which would be handy in this case).  Does the
> > implementation of that builtin meet the requirements needed here,
> > to prevent reordering of FP computation across instantiations of the
> > builtin?  If not, is there a model on which to base an implementation
> > of __builtin_mffsce (or some preferred name)?
> 
> It depends on what you are actually ordering, unfortunately.

What I hear is that for the specific requirements and restrictions here,
there is nothing special that another builtin, like a theoretical
__builtin_mffsce implemented like __builtin_fpscr_set_rn, can provide
to solve the issue under discussion.  The dependencies need to be expressed
such that the compiler understand them, and there is no way to do so
with the current implementation of __builtin_fpscr_set_rn.

With some effort, and proper visibility, the dependencies can be expressed
using "asm". I believe that's the case here, and will submit a v2 for
review shortly.

For the general case of inlines, 

Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations

2021-10-18 Thread Paul A. Clarke via Gcc-patches
On Wed, Oct 13, 2021 at 06:47:21PM -0500, Segher Boessenkool wrote:
> On Wed, Oct 13, 2021 at 12:04:39PM -0500, Paul A. Clarke wrote:
> > On Mon, Oct 11, 2021 at 07:11:13PM -0500, Segher Boessenkool wrote:
> > > > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
> > > 
> > > Did this fail on p7?  If not, add a test that *does*?
> > 
> > Do you mean fail if not for "dg-require-effective-target p8vector_hw"?
> > We have that, in gcc/testsuite/gcc.target/powerpc/sse2-pmuludq-1.c.
> 
> "Some compatibility implementations of x86 intrinsics include
> Power intrinsics which require POWER8."
> 
> Plus, everything this patch does.  None of that would be needed if it
> worked on p7!

The tests that are permitted to compile/link on P7, gated by dg directives,
work on P7.

> So things in this patch are either not needed (so add noise only, and
> reduce functionality on older systems for no reason), or they do fix a
> bug.  It would be nice if we could have detected such bugs earlier.

Most, if not all of the intrinsics tests were originally limited to
P8 and up, 64bit, and little-endian. At your request, I have lowered
many of those restrictions in areas that are capable of support.
Such is the case here, to enable compiling and running as much as
possible on P7.

If you want a different approach, do let me know.

> > > > gcc
> > > > PR target/101893
> > > 
> > > This is a different bug (the vgbdd one)?
> > 
> > PR 101893 is the same issue: things not being properly masked by
> > #ifdefs.
> 
> But PR101893 does not mention anything you touch here, and this patch
> does not fix PR101893.  The main purpose of bug tracking systems is the
> tracking part!

The error message in PR101893 is in smmintrin.h:
| gcc/include/smmintrin.h:103:3: error: AltiVec argument passed to unprototyped 
function
| 
| That line is
| 
|   __charmask = vec_gb (__charmask);

smmintrin.h is changed by this patch, including `#ifdef _ARCH_PWR8` around
the code which has vec_gb.

PC


Re: FW: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-10-18 Thread Joseph Myers
On Fri, 15 Oct 2021, Richard Biener via Gcc-patches wrote:

> On Fri, Sep 24, 2021 at 2:59 PM Jirui Wu via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html
> >
> > The patch is attached as text for ease of use. Is there anything that needs 
> > to change?
> >
> > Ok for master? If OK, can it be committed for me, I have no commit rights.
> 
> I'm still not sure about the correctness.  I suppose the
> flag_fp_int_builtin_inexact && !flag_trapping_math is supposed to guard
> against spurious inexact exceptions, shouldn't that be
> !flag_fp_int_builtin_inexact || !flag_trapping_math instead?

The following remarks may be relevant here, but are not intended as an 
assertion of what is correct in this case.

1. flag_fp_int_builtin_inexact is the more permissive case ("inexact" may 
or may not be raised).  All existing uses in back ends are 
"flag_fp_int_builtin_inexact || !flag_trapping_math" or equivalent.

2. flag_fp_int_builtin_inexact only applies to certain built-in functions 
(as listed in invoke.texi).  It's always unspecified, even in C2X, whether 
casts of non-integer values from floating-point to integer types raise 
"inexact".  So flag_fp_int_builtin_inexact should not be checked in insn 
patterns corresponding to simple casts from floating-point to integer, 
only in insn patterns corresponding to the built-in functions listed for 
-fno-fp-int-builtin-inexact in invoke.texi (or for operations that combine 
such a built-in function with a cast of the *result* to integer type).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-18 Thread David Edelsohn via Gcc-patches
On Tue, Oct 12, 2021 at 9:50 PM Xionghu Luo  wrote:
>
> Resend this patch.  Previous discussion is:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
>
> vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
> 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrglb.
> Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
> + vec_concat as normal RTL.
>
> Tested pass on P8LE, P9LE and P8BE{m32}, ok for trunk?
>
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (*altivec_vmrghb_internal): Delete.
> (altivec_vmrghb_direct): New.
> (*altivec_vmrghh_internal): Delete.
> (altivec_vmrghh_direct): New.
> (*altivec_vmrghw_internal): Delete.
> (altivec_vmrghw_direct_): New.
> (altivec_vmrghw_direct): Delete.
> (*altivec_vmrglb_internal): Delete.
> (altivec_vmrglb_direct): New.
> (*altivec_vmrglh_internal): Delete.
> (altivec_vmrglh_direct): New.
> (*altivec_vmrglw_internal): Delete.
> (altivec_vmrglw_direct_): New.
> (altivec_vmrglw_direct): Delete.
> * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const):
> Adjust.
> * config/rs6000/vsx.md (vsx_xxmrghw_): Adjust.
> (vsx_xxmrglw_): Adjust.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/builtins-1.c: Update instruction counts.

This patch is okay.

Thanks, David


[COMMITTED] tree-optimization/102796 - Process EH edges again.

2021-10-18 Thread Andrew MacLeod via Gcc-patches

Sorry for the breakage, we need to continue processing EH edges..

Bootstrapped on x86_64-pc-linux-gnu (including Go :-)  with no 
regressions as of the original checkin.   I hope this catches all the 
other ripple PRs too.  Pushed.




Returning NULL in gimple_range_ssa_p is probably not a good idea.  The
name does carry a range it just has to be considered VARYING.

The issue with abnormal edges is that they do not have a jump
associated with them and thus we cannot insert code on the edge
because we cannot split it.  That has implications for coalescing
since we cannot even insert copies there so the PHI argument
and the PHI result have to be the same register for the arguments
on abnormal edges.

Otherwise they do carry a value and a range but forcing that to be
VARYING makes sense to avoid propagating constants to where
it is not allowed (though the substitution phase should be the one
checking).


gimple_range_ssa_p is mean to be more of a gateway into processing.  If 
its false, we wont try to find any additional range for it.  This keeps 
it out of the tables and caches, reducing processing time as well as the 
memory footprint.


We can find ranges for anything with supports_type_p() set to true,and 
it is likely to be VARYING if gimple_range_ssa_p is false now.


That said, this is the first time is been super heavily exercised in 
this regard, and there are a couple of places where we were returning 
FALSE which was less than ideal.   I should have been calling 
get_tree_range, which would then return false for non-supported types, 
or the global/varying value if they are.


And we could probably do better for at least calculating a range for 
such SSA_NAMES under these circumstances.. there is no reason we can't 
fold the stmt an get a range.  I'll tweak/audit  that in a follow up so 
there is better consistency between when we check gimple_range_ssa_p and 
irange::type_support_p()



Andrew


commit 4d92a69fc5882c86aab63d52382b393d4f20b3ed
Author: Andrew MacLeod 
Date:   Mon Oct 18 13:52:18 2021 -0400

Process EH edges again and call get_tree_range on non gimple_range_ssa_p names.

PR tree-optimization/102796
gcc/
* gimple-range.cc (gimple_ranger::range_on_edge): Process EH edges
normally.  Return get_tree_range for non gimple_range_ssa_p names.
(gimple_ranger::range_of_stmt): Use get_tree_range for non
gimple_range_ssa_p names.

gcc/testsuite/
* g++.dg/pr102796.C: New.

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 85ef9745593..93d6da66ccb 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -180,9 +180,9 @@ gimple_ranger::range_on_edge (irange , edge e, tree name)
   int_range_max edge_range;
   gcc_checking_assert (irange::supports_type_p (TREE_TYPE (name)));
 
-  // Do not process values along abnormal or EH edges.
-  if (e->flags & (EDGE_ABNORMAL|EDGE_EH))
-return false;
+  // Do not process values along abnormal edges.
+  if (e->flags & EDGE_ABNORMAL)
+return get_tree_range (r, name, NULL);
 
   unsigned idx;
   if ((idx = tracer.header ("range_on_edge (")))
@@ -203,7 +203,7 @@ gimple_ranger::range_on_edge (irange , edge e, tree name)
 
   bool res = true;
   if (!gimple_range_ssa_p (name))
-res = range_of_expr (r, name);
+return get_tree_range (r, name, NULL);
   else
 {
   range_on_exit (r, e->src, name);
@@ -258,7 +258,7 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
   if (!name)
 res = fold_range_internal (r, s, NULL_TREE);
   else if (!gimple_range_ssa_p (name))
-res = false;
+res = get_tree_range (r, name, NULL);
   // Check if the stmt has already been processed, and is not stale.
   else if (m_cache.get_non_stale_global_range (r, name))
 {
diff --git a/gcc/testsuite/g++.dg/pr102796.C b/gcc/testsuite/g++.dg/pr102796.C
new file mode 100644
index 000..6ad1008922f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr102796.C
@@ -0,0 +1,18 @@
+// { dg-do compile }
+// { dg-options "-O3 -fno-tree-ccp -fno-tree-fre -fno-tree-forwprop -std=c++17" }
+
+namespace std {
+template 
+struct initializer_list {
+  const int* __begin_;
+  decltype(sizeof(int)) __size_;
+};
+}  // namespace std
+struct destroyme1 {};
+struct witharg1 {
+  witharg1(const destroyme1&);
+  ~witharg1();
+};
+std::initializer_list globalInitList2 = {witharg1(destroyme1()),
+ witharg1(destroyme1())};
+


Re: [PATCH] openmp, fortran: Add support for declare variant in Fortran

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 10:05:29PM +0100, Kwok Cheung Yeung wrote:
> On 14/10/2021 1:47 pm, Jakub Jelinek wrote:
> > What I still miss is tests for the (proc_name : variant_name) syntax
> > in places where proc_name : is optional, but is supplied and is valid, like
> > e.g. in interface, or in subroutine/function and where proc_name specifies
> > the name of the containing interface or subroutine/function.
> > I see that syntax tested in some places with dg-error on that line and
> > in spaces where it isn't optional (e.g. at module scope before contains).
> > But if you want, that can be added incrementally.
> 
> Do you mean something like these tests?

Yeah, LGTM, thanks.

Jakub



Re: [PATCH] openmp, fortran: Add support for declare variant in Fortran

2021-10-18 Thread Kwok Cheung Yeung

On 14/10/2021 1:47 pm, Jakub Jelinek wrote:

What I still miss is tests for the (proc_name : variant_name) syntax
in places where proc_name : is optional, but is supplied and is valid, like
e.g. in interface, or in subroutine/function and where proc_name specifies
the name of the containing interface or subroutine/function.
I see that syntax tested in some places with dg-error on that line and
in spaces where it isn't optional (e.g. at module scope before contains).
But if you want, that can be added incrementally.


Do you mean something like these tests?

Thanks

Kwok
From 38733234024697d2144613c4a992e970f40afad8 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 18 Oct 2021 13:56:59 -0700
Subject: [PATCH] openmp: Add additional tests for declare variant in Fortran

Add tests to check that explicitly specifying the containing procedure as the
base name for declare variant works.

2021-10-18  Kwok Cheung Yeung  

gcc/testsuite/

* gfortran.dg/gomp/declare-variant-15.f90 (variant2, base2, test2):
Add tests.
* gfortran.dg/gomp/declare-variant-16.f90 (base2, variant2, test2):
Add tests.
---
 .../gfortran.dg/gomp/declare-variant-15.f90| 13 +
 .../gfortran.dg/gomp/declare-variant-16.f90| 14 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-15.f90 
b/gcc/testsuite/gfortran.dg/gomp/declare-variant-15.f90
index b2ad96a8998..4a88e3e46c7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-15.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-15.f90
@@ -14,6 +14,13 @@ contains
   subroutine base ()
   end subroutine
 
+  subroutine variant2 ()
+  end subroutine
+
+  subroutine base2 ()
+!$omp declare variant (base2: variant2) match (construct={parallel})
+  end subroutine
+
   subroutine test1 ()
 !$omp target
   !$omp parallel
@@ -21,4 +28,10 @@ contains
   !$omp end parallel
 !$omp end target
   end subroutine
+
+  subroutine test2 ()
+!$omp parallel
+   call base2 ()   ! { dg-final { scan-tree-dump-times "variant2 
\\\(\\\);" 1 "gimple" } }
+!$omp end parallel
+  end subroutine
 end module
diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-16.f90 
b/gcc/testsuite/gfortran.dg/gomp/declare-variant-16.f90
index fc97322e667..5e34d474da4 100644
--- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-16.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-16.f90
@@ -10,15 +10,27 @@ module main
 subroutine base ()
   !$omp declare variant (variant) match (construct={parallel})
 end subroutine
+
+subroutine base2 ()
+  !$omp declare variant (base2: variant2) match (construct={target})
+end subroutine
   end interface
-
 contains
   subroutine variant ()
   end subroutine
 
+  subroutine variant2 ()
+  end subroutine
+
   subroutine test ()
 !$omp parallel
   call base ()  ! { dg-final { scan-tree-dump-times "variant \\\(\\\);" 1 
"gimple" } }
 !$omp end parallel
   end subroutine
+
+  subroutine test2 ()
+!$omp target
+  call base2 ()  ! { dg-final { scan-tree-dump-times "variant2 \\\(\\\);" 
1 "gimple" } }
+!$omp end target
+  end subroutine
 end module
-- 
2.30.0.335.ge636282



Re: [PATCH] Add a simulate_record_decl lang hook

2021-10-18 Thread Richard Sandiford via Gcc-patches
Jason Merrill  writes:
> On 9/24/21 13:53, Richard Sandiford wrote:
>> This patch adds a lang hook for defining a struct/RECORD_TYPE
>> “as if” it had appeared directly in the source code.  It follows
>> the similar existing hook for enums.
>> 
>> It's the caller's responsibility to create the fields
>> (as FIELD_DECLs) but the hook's responsibility to create
>> and declare the associated RECORD_TYPE.
>> 
>> For now the hook is hard-coded to do the equivalent of:
>> 
>>typedef struct NAME { FIELDS } NAME;
>> 
>> but this could be controlled by an extra parameter if some callers
>> want a different behaviour in future.
>> 
>> The motivating use case is to allow the long list of struct
>> definitions in arm_neon.h to be provided by the compiler,
>> which in turn unblocks various arm_neon.h optimisations.
>> 
>> Tested on aarch64-linux-gnu, individually and with a follow-on
>> patch from Jonathan that makes use of the hook.  OK to install?
>> 
>> Richard
>> 
>> 
>> gcc/
>>  * langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook.
>>  * langhooks-def.h (lhd_simulate_record_decl): Declare.
>>  (LANG_HOOKS_SIMULATE_RECORD_DECL): Define.
>>  (LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
>>  * langhooks.c (lhd_simulate_record_decl): New function.
>> 
>> gcc/c/
>>  * c-tree.h (c_simulate_record_decl): Declare.
>>  * c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
>>  * c-decl.c (c_simulate_record_decl): New function.
>> 
>> gcc/cp/
>>  * decl.c: Include langhooks-def.h.
>>  (cxx_simulate_record_decl): New function.
>>  * cp-objcp-common.h (cxx_simulate_record_decl): Declare.
>>  (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
>> ---
>>   gcc/c/c-decl.c   | 31 +++
>>   gcc/c/c-objc-common.h|  2 ++
>>   gcc/c/c-tree.h   |  2 ++
>>   gcc/cp/cp-objcp-common.h |  4 
>>   gcc/cp/decl.c| 38 ++
>>   gcc/langhooks-def.h  |  4 
>>   gcc/langhooks.c  | 21 +
>>   gcc/langhooks.h  | 10 ++
>>   8 files changed, 112 insertions(+)
>> 
>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>> index 771efa3eadf..8d1324b118c 100644
>> --- a/gcc/c/c-decl.c
>> +++ b/gcc/c/c-decl.c
>> @@ -9436,6 +9436,37 @@ c_simulate_enum_decl (location_t loc, const char 
>> *name,
>> input_location = saved_loc;
>> return enumtype;
>>   }
>> +
>> +/* Implement LANG_HOOKS_SIMULATE_RECORD_DECL.  */
>> +
>> +tree
>> +c_simulate_record_decl (location_t loc, const char *name,
>> +array_slice fields)
>> +{
>> +  location_t saved_loc = input_location;
>> +  input_location = loc;
>> +
>> +  class c_struct_parse_info *struct_info;
>> +  tree ident = get_identifier (name);
>> +  tree type = start_struct (loc, RECORD_TYPE, ident, _info);
>> +
>> +  for (unsigned int i = 0; i < fields.size (); ++i)
>> +{
>> +  DECL_FIELD_CONTEXT (fields[i]) = type;
>> +  if (i > 0)
>> +DECL_CHAIN (fields[i - 1]) = fields[i];
>> +}
>> +
>> +  finish_struct (loc, type, fields[0], NULL_TREE, struct_info);
>> +
>> +  tree decl = build_decl (loc, TYPE_DECL, ident, type);
>> +  TYPE_NAME (type) = decl;
>> +  TYPE_STUB_DECL (type) = decl;
>> +  lang_hooks.decls.pushdecl (decl);
>> +
>> +  input_location = saved_loc;
>> +  return type;
>> +}
>>
>>   /* Create the FUNCTION_DECL for a function definition.
>>  DECLSPECS, DECLARATOR and ATTRIBUTES are the parts of
>> diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
>> index 7d35a0621e4..f4e8271f06c 100644
>> --- a/gcc/c/c-objc-common.h
>> +++ b/gcc/c/c-objc-common.h
>> @@ -81,6 +81,8 @@ along with GCC; see the file COPYING3.  If not see
>>   
>>   #undef LANG_HOOKS_SIMULATE_ENUM_DECL
>>   #define LANG_HOOKS_SIMULATE_ENUM_DECL c_simulate_enum_decl
>> +#undef LANG_HOOKS_SIMULATE_RECORD_DECL
>> +#define LANG_HOOKS_SIMULATE_RECORD_DECL c_simulate_record_decl
>>   #undef LANG_HOOKS_TYPE_FOR_MODE
>>   #define LANG_HOOKS_TYPE_FOR_MODE c_common_type_for_mode
>>   #undef LANG_HOOKS_TYPE_FOR_SIZE
>> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
>> index d50d0cb7f2d..8578d2d1e77 100644
>> --- a/gcc/c/c-tree.h
>> +++ b/gcc/c/c-tree.h
>> @@ -598,6 +598,8 @@ extern tree finish_struct (location_t, tree, tree, tree,
>> class c_struct_parse_info *);
>>   extern tree c_simulate_enum_decl (location_t, const char *,
>>vec *);
>> +extern tree c_simulate_record_decl (location_t, const char *,
>> +array_slice);
>>   extern struct c_arg_info *build_arg_info (void);
>>   extern struct c_arg_info *get_parm_info (bool, tree);
>>   extern tree grokfield (location_t, struct c_declarator *,
>> diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
>> index f1704aad557..d5859406e8f 100644
>> --- a/gcc/cp/cp-objcp-common.h
>> +++ b/gcc/cp/cp-objcp-common.h
>> @@ -39,6 +39,8 @@ extern bool 

[Version 2][Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers

2021-10-18 Thread Qing Zhao via Gcc-patches
Hi, Jakub,

This is the 2nd version of the patch based on your comment.

Bootstrapped on both x86 and aarch64. Regression testings are ongoing.

Please let me know if this is ready for committing?

Thanks a lot.

Qing.

==

>From d6f60370dee69b5deb3d7ef51873a5e986490782 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Mon, 18 Oct 2021 19:04:39 +
Subject: [PATCH] PR 102281 (-ftrivial-auto-var-init=zero causes ice)

Do not add call to __builtin_clear_padding when a variable is a gimple
register or it might not have padding.

gcc/ChangeLog:

2021-10-18  qing zhao  

* gimplify.c (gimplify_decl_expr): Do not add call to
__builtin_clear_padding when a variable is a gimple register
or it might not have padding.
(gimplify_init_constructor): Likewise.

gcc/testsuite/ChangeLog:

2021-10-18  qing zhao  

* c-c++-common/pr102281.c: New test.
* gcc.target/i386/auto-init-2.c: Adjust testing case.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-6.c: Likewise.
* gcc.target/aarch64/auto-init-6.c: Likewise.
---
 gcc/gimplify.c| 25 ++-
 gcc/testsuite/c-c++-common/pr102281.c | 17 +
 .../gcc.target/aarch64/auto-init-6.c  |  4 +--
 gcc/testsuite/gcc.target/i386/auto-init-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-4.c   | 10 +++-
 gcc/testsuite/gcc.target/i386/auto-init-6.c   |  7 +++---
 6 files changed, 47 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr102281.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d8e4b139349..b27dc0ed308 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1784,8 +1784,8 @@ gimple_add_init_for_auto_var (tree decl,
that padding is initialized to zero. So, we always initialize paddings
to zeroes regardless INIT_TYPE.
To do the padding initialization, we insert a call to
-   __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true).
-   Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING,
+   __builtin_clear_padding (, 0, for_auto_init = true).
+   Note, we add an additional dummy argument for __builtin_clear_padding,
'for_auto_init' to distinguish whether this call is for automatic
variable initialization or not.
*/
@@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 pattern initialization.
 In order to make the paddings as zeroes for pattern init, We
 should add a call to __builtin_clear_padding to clear the
-paddings to zero in compatiple with CLANG.  */
- if (flag_auto_var_init == AUTO_INIT_PATTERN)
+paddings to zero in compatiple with CLANG.
+We cannot insert this call if the variable is a gimple register
+since __builtin_clear_padding will take the address of the
+variable.  As a result, if a long double/_Complex long double
+variable will spilled into stack later, its padding is 0XFE.  */
+ if (flag_auto_var_init == AUTO_INIT_PATTERN
+ && !is_gimple_reg (decl)
+ && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
}
 }
@@ -5384,12 +5390,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
 
   /* If the user requests to initialize automatic variables, we
  should initialize paddings inside the variable.  Add a call to
- __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true) to
+ __builtin_clear_pading (, 0, for_auto_init = true) to
  initialize paddings of object always to zero regardless of
  INIT_TYPE.  Note, we will not insert this call if the aggregate
  variable has be completely cleared already or it's initialized
- with an empty constructor.  */
+ with an empty constructor.  We cannot insert this call if the
+ variable is a gimple register since __builtin_clear_padding will take
+ the address of the variable.  As a result, if a long double/_Complex long
+ double variable will be spilled into stack later, its padding cannot
+ be cleared with __builtin_clear_padding.  We should clear its padding
+ when it is spilled into memory.  */
   if (is_init_expr
+  && !is_gimple_reg (object)
+  && clear_padding_type_may_have_padding_p (type)
   && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor)
  || !AGGREGATE_TYPE_P (type))
   && is_var_need_auto_init (object))
diff --git a/gcc/testsuite/c-c++-common/pr102281.c 
b/gcc/testsuite/c-c++-common/pr102281.c
new file mode 100644
index 000..a961451b5a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr102281.c
@@ -0,0 +1,17 @@
+/* PR102281  */
+/* { dg-do compile } */
+/* { dg-options "-ftrivial-auto-var-init=zero -Wno-psabi" } */
+long long var1;
+float var2;
+typedef long long V 

Re: [PATCH] PR target/102785: Correct addsub/subadd patterns on bfin.

2021-10-18 Thread Jeff Law via Gcc-patches




On 10/18/2021 9:07 AM, Roger Sayle wrote:

This patch resolves PR target/102785 where my recent patch to constant
fold saturating addition/subtraction exposed a latent bug in the bfin
backend.  The patterns used for blackfin's V2HI ssaddsub and sssubadd
instructions had the indices/operations swapped.  This was harmless
until we started evaluating these expressions at compile-time, when
the mismatch was caught by the testsuite.

Many thanks to Jeff Law for confirming that this patch fixes these
regressions on bfin-elf.  Ok for mainline?


2021-10-18  Roger Sayle  

gcc/ChangeLog
 PR target/102785
 * config/bfin/bfin.md (addsubv2hi3, subaddv2hi3, ssaddsubv2hi3,
 sssubaddv2hi3):  Swap the order of operators in vec_concat.

OK.  Thanks for taking care of this.

jeff



Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Qing Zhao via Gcc-patches


> On Oct 18, 2021, at 12:15 PM, Jakub Jelinek  wrote:
> 
> On Mon, Oct 18, 2021 at 05:01:55PM +, Qing Zhao wrote:
>>> The where is typically somewhere in the FEs.
>>> But, there are two things.
>>> One is that in order to gimplify it properly, it needs to be marked earlier.
>>> But the other is that if it is not addressable, then clearing padding in it
>>> makes not much sense as I've tried to explain, all it could do is making it
>>> slightly less likely that the var will be optimized into a gimple reg later,
>>> but if it does, then the padding will not be cleared anyway.
>>> And it is only at RTL expansion or RA that the var could be assigned into a
>>> stack location or spilled and at that point is the right time to clear
>>> padding in there if needed.
>>> So while the FEs could make it addressable and then you could clear padding,
>>> it would just penalize code and nothing else, later on the optimizations
>>> would figure out it is no longer addressable and optimize it into gimple
>>> reg.
>> 
>> So, from my understanding so far, adding “!is_gimple_reg(decl)” to decide 
>> whether to add __builtin_clear_padding call
>> is a good and simple solution to this bug? 
> 
> Yes.
> And if you want to do something during RTL expansion or RA time,
> do it there incrementally after reasoning why it is needed at those points.

I already created 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102781

For this RTL phase padding clearing. (Mostly for long double/_Complex long 
double variables that have explicit initializer, I guess).

It’s hard to come up with a simple testing case yet for this PR.

> E.g. for the spill slots, it doesn't have to be any kind of user variable
> but can be any kind of intermediate temporary...

However, for -ftrivial-auto-var-init, we suppose to clear the user variables, 
shall we clear intermediate temporary variables with this option?

Qing

> 
>   Jakub
> 



Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 05:01:55PM +, Qing Zhao wrote:
> > The where is typically somewhere in the FEs.
> > But, there are two things.
> > One is that in order to gimplify it properly, it needs to be marked earlier.
> > But the other is that if it is not addressable, then clearing padding in it
> > makes not much sense as I've tried to explain, all it could do is making it
> > slightly less likely that the var will be optimized into a gimple reg later,
> > but if it does, then the padding will not be cleared anyway.
> > And it is only at RTL expansion or RA that the var could be assigned into a
> > stack location or spilled and at that point is the right time to clear
> > padding in there if needed.
> > So while the FEs could make it addressable and then you could clear padding,
> > it would just penalize code and nothing else, later on the optimizations
> > would figure out it is no longer addressable and optimize it into gimple
> > reg.
> 
> So, from my understanding so far, adding “!is_gimple_reg(decl)” to decide 
> whether to add __builtin_clear_padding call
> is a good and simple solution to this bug? 

Yes.
And if you want to do something during RTL expansion or RA time,
do it there incrementally after reasoning why it is needed at those points.
E.g. for the spill slots, it doesn't have to be any kind of user variable
but can be any kind of intermediate temporary...

Jakub



Re: [PATCH 4/4] ipa-cp: Select saner profile count to base heuristics on

2021-10-18 Thread Martin Jambor
Hi,

On Wed, Oct 06 2021, Jan Hubicka wrote:
>> 2021-08-23  Martin Jambor  
>> 
>>  * params.opt (param_ipa_cp_profile_count_base): New parameter.
>>  * ipa-cp.c (max_count): Replace with base_count, replace all
>>  occurrences too, unless otherwise stated.
>>  (ipcp_cloning_candidate_p): identify mostly-directly called
>>  functions based on their counts, not max_count.
>>  (compare_edge_profile_counts): New function.
>>  (ipcp_propagate_stage): Instead of setting max_count, find the
>>  appropriate edge count in a sorted vector of counts of eligible
>>  edges and make it the base_count.
>> ---
>>  gcc/ipa-cp.c   | 82 +-
>>  gcc/params.opt |  4 +++
>>  2 files changed, 78 insertions(+), 8 deletions(-)
>> 
>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> index 53cca7aa804..6ab74f61e83 100644
>> --- a/gcc/ipa-cp.c
>> +++ b/gcc/ipa-cp.c
>> @@ -400,9 +400,9 @@ object_allocator > 
>> ipcp_sources_pool
>>  object_allocator ipcp_agg_lattice_pool
>>("IPA_CP aggregate lattices");
>>  
>> -/* Maximal count found in program.  */
>> +/* Base count to use in heuristics when using profile feedback.  */
>>  
>> -static profile_count max_count;
>> +static profile_count base_count;
>>  
>>  /* Original overall size of the program.  */
>>  
>> @@ -809,7 +809,8 @@ ipcp_cloning_candidate_p (struct cgraph_node *node)
>>/* When profile is available and function is hot, propagate into it even 
>> if
>>   calls seems cold; constant propagation can improve function's speed
>>   significantly.  */
>> -  if (max_count > profile_count::zero ())
>> +  if (stats.count_sum > profile_count::zero ()
>> +  && node->count.ipa ().initialized_p ())
>>  {
>>if (stats.count_sum > node->count.ipa ().apply_scale (90, 100))
>>  {
>> @@ -3310,10 +3311,10 @@ good_cloning_opportunity_p (struct cgraph_node 
>> *node, sreal time_benefit,
>>  
>>ipa_node_params *info = ipa_node_params_sum->get (node);
>>int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
>> -  if (max_count > profile_count::zero ())
>> +  if (base_count > profile_count::zero ())
>>  {
>>  
>> -  sreal factor = count_sum.probability_in (max_count).to_sreal ();
>> +  sreal factor = count_sum.probability_in (base_count).to_sreal ();
>
> If you have part of program built with -fprofile-use and part built without 
> this will
> disable cloning for functions called only from places w/o profile.
> I think we want to count frequencies when ipa profile is uninitialized
> and then pass the cloning oportunity if either count or freqs says it is
> good idea.

OK, I would like to address that by a separate follow-up patch below.

>
>>sreal evaluation = (time_benefit * factor) / size_cost;
>>evaluation = incorporate_penalties (node, info, evaluation);
>>evaluation *= 1000;
>> @@ -3950,6 +3951,21 @@ value_topo_info::propagate_effects ()
>>  }
>>  }
>>  
>> +/* Callback for qsort to sort counts of all edges.  */
>> +
>> +static int
>> +compare_edge_profile_counts (const void *a, const void *b)
>> +{
>> +  const profile_count *cnt1 = (const profile_count *) a;
>> +  const profile_count *cnt2 = (const profile_count *) b;
>> +
>> +  if (*cnt1 < *cnt2)
>> +return 1;
>> +  if (*cnt1 > *cnt2)
>> +return -1;
>> +  return 0;
>> +}
>> +
>>  
>>  /* Propagate constants, polymorphic contexts and their effects from the
>> summaries interprocedurally.  */
>> @@ -3962,8 +3978,10 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
>>if (dump_file)
>>  fprintf (dump_file, "\n Propagating constants:\n\n");
>>  
>> -  max_count = profile_count::uninitialized ();
>> +  base_count = profile_count::uninitialized ();
>>  
>> +  bool compute_count_base = false;
>> +  unsigned base_count_pos_percent = 0;
>>FOR_EACH_DEFINED_FUNCTION (node)
>>{
>>  if (node->has_gimple_body_p ()
>> @@ -3981,9 +3999,57 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
>>  ipa_size_summary *s = ipa_size_summaries->get (node);
>>  if (node->definition && !node->alias && s != NULL)
>>overall_size += s->self_size;
>> -max_count = max_count.max (node->count.ipa ());
>> +if (node->count.ipa ().initialized_p ())
>> +  {
>> +compute_count_base = true;
>> +unsigned pos_percent = opt_for_fn (node->decl,
>> +   param_ipa_cp_profile_count_base);
>> +base_count_pos_percent = MAX (base_count_pos_percent, pos_percent);
>> +  }
>>}
>>  
>> +  if (compute_count_base)
>> +{
>> +  auto_vec all_edge_counts;
>> +  all_edge_counts.reserve_exact (symtab->edges_count);
>> +  FOR_EACH_DEFINED_FUNCTION (node)
>> +for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
>> +  {
>> +profile_count count = cs->count.ipa ();
>> +if (!(count > profile_count::zero ()))
>> +  continue;
>> +
>> +enum availability 

Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Qing Zhao via Gcc-patches


> On Oct 18, 2021, at 11:46 AM, Jakub Jelinek  wrote:
> 
> On Mon, Oct 18, 2021 at 03:58:56PM +, Qing Zhao wrote:
>>> Furthermore, __builtin_clear_padding doesn't assume anything, but it takes
>>> an address of an object as argument and already the taking of the address
>>> that gimple_add_padding_init_for_auto_var does makes the var
>>> TREE_ADDRESABLE, which is something that needs to be done before the var is
>>> ever accessed during gimplification.
>> 
>> Yes, currently, “gimple_add_padding_init_for_auto_var” has already done the 
>> following before
>> calling __builtin_clear_padding:
>> 
>>  mark_addressable (decl);
>>  addr_of_decl = build_fold_addr_expr (decl);
>> 
>> But looks like that “making the DECL as ADDRESSABLE here” is too late. 
> 
> Wouldn't be if we could prove it wasn't gimplified already.
> 
>> Yes, If we can “make the DECL as ADDRESSABLE” before the var is accessed 
>> during gimplification,  that will also
>> fix this issue. But “Where” is the right place to make the DECL as 
>> ADDRESSABLE? (Do it in “gimplify_decl_expr” when handling
>> The variable’s declaration? )
> 
> The where is typically somewhere in the FEs.
> But, there are two things.
> One is that in order to gimplify it properly, it needs to be marked earlier.
> But the other is that if it is not addressable, then clearing padding in it
> makes not much sense as I've tried to explain, all it could do is making it
> slightly less likely that the var will be optimized into a gimple reg later,
> but if it does, then the padding will not be cleared anyway.
> And it is only at RTL expansion or RA that the var could be assigned into a
> stack location or spilled and at that point is the right time to clear
> padding in there if needed.
> So while the FEs could make it addressable and then you could clear padding,
> it would just penalize code and nothing else, later on the optimizations
> would figure out it is no longer addressable and optimize it into gimple
> reg.

So, from my understanding so far, adding “!is_gimple_reg(decl)” to decide 
whether to add __builtin_clear_padding call
is a good and simple solution to this bug? 

Qing
> 
>   Jakub
> 



Re: [PATCH] c++: Don't reject calls through PMF during constant evaluation [PR102786]

2021-10-18 Thread Jason Merrill via Gcc-patches

On 10/18/21 08:14, Jakub Jelinek wrote:

Hi!

The following testcase incorrectly rejects the c initializer,
while in the s.*a case cxx_eval_* sees .__pfn reads etc.,
in the s.*::foo case get_member_function_from_ptrfunc creates
expressions which use INTEGER_CSTs with type of pointer to METHOD_TYPE.
And cxx_eval_constant_expression rejects any INTEGER_CSTs with pointer
type if they aren't 0.
Either we'd need to make sure we defer such folding till cp_fold but the
function and pfn_from_ptrmemfunc is used from lots of places, or
the following patch just tries to reject only non-zero INTEGER_CSTs
with pointer types if they don't point to METHOD_TYPE in the hope that
all such INTEGER_CSTs with POINTER_TYPE to METHOD_TYPE are result of
folding valid pointer-to-member function expressions.
I don't immediately see how one could create such INTEGER_CSTs otherwise,
cast of integers to PMF is rejected and would have the PMF RECORD_TYPE
anyway, etc.

Regtested on x86_64-linux
(with GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++)
ok for trunk if it passes full bootstrap/regtest?

2021-10-18  Jakub Jelinek  

PR c++/102786
* constexpr.c (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

* g++.dg/cpp2a/constexpr-virtual19.C: New test.

--- gcc/cp/constexpr.c.jj   2021-10-15 11:59:15.917687093 +0200
+++ gcc/cp/constexpr.c  2021-10-18 13:26:49.458610657 +0200
@@ -6191,6 +6191,7 @@ cxx_eval_constant_expression (const cons
  
if (TREE_CODE (t) == INTEGER_CST

  && TYPE_PTR_P (TREE_TYPE (t))
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (t))) != METHOD_TYPE


This should have a comment that, as you say, an INTEGER_CST with 
pointer-to-method type is only used for a virtual method in a pointer to 
member function.  OK with that change.



  && !integer_zerop (t))
{
  if (!ctx->quiet)
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual19.C.jj 2021-10-18 
13:35:00.229693908 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual19.C2021-10-18 
12:31:05.265747723 +0200
@@ -0,0 +1,11 @@
+// PR c++/102786
+// { dg-do compile { target c++20 } }
+
+struct S {
+  virtual constexpr int foo () const { return 42; }
+};
+
+constexpr S s;
+constexpr auto a = ::foo;
+constexpr auto b = (s.*a) ();
+constexpr auto c = (s.*::foo) ();

Jakub





Re: [PATCH 3/4] ipa-cp: Fix updating of profile counts and self-gen value evaluation

2021-10-18 Thread Martin Jambor
Hi,

On Fri, Oct 08 2021, Jan Hubicka wrote:
>> For non-local nodes which can have unknown callers, the algorithm just
>> takes half of the counts - we may decide that taking just a third or
>> some other portion is more reasonable, but I do not think we can
>> attempt anything more clever.
>
> Can't you just sum the calling edges and subtract it from callee's
> count?

I guess I can, the patch below changes handling of this scenario, it now
behaves as if there was another hidden caller with the count of calls
that are calculated as you suggested.

[...]

>> +/* With partial train run we do not want to assume that original's count is
>> +   zero whenever we redurect all executed edges to clone.  Simply drop 
>> profile
>> +   to local one in this case.  In eany case, return the new value.  
>> ORIG_NODE
>> +   is the original node and its count has not been updaed yet.  */
>> +
>> +profile_count
>> +lenient_count_portion_handling (profile_count remainder, cgraph_node 
>> *orig_node)
>> +{
>> +  if (remainder.ipa_p () && !remainder.ipa ().nonzero_p ()
>> +  && orig_node->count.ipa_p () && orig_node->count.ipa ().nonzero_p ()
>> +  && opt_for_fn (orig_node->decl, flag_profile_partial_training))
>> +remainder = remainder.guessed_local ();
>
> I do not think you need partial training flag here.  You should see IPA
> profile is mising by simply testing ipa_p predicate on relevant counts.

I can take that test out, that is not a problem, but I have not done
that yet because I am still wondering whether it is the right thing to
do.  The code attempts to do the same thing which is currently in
update_profiling_info and which you added and which does test the flag.

If I understand that snippet I'm replacing correctly, the code is
supposed to make sure that, when a clone "steals" all counts from the
original node, this original node is not left with "adjusted" zero count
but with a "locally guessed" zero count when the partial training flag
is provided, which should not make it cold cold but rather switch back
to reasoning about it as if we did not have profile at all.

That is why I kept the flag check there, but if you really want me to remove
it, I'll do that.

>> +
>> +/* If caller edge counts of a clone created for a self-recursive arithmetic 
>> jump
>> +   function must be adjusted, do so. NODE is the node or its thunk.  */
>
> I would add comment on why it needs to be adjusted and how.

Done.  The adjusted patch which I am testing (but which has already
passed LTO profiledbootatrap and testing) is below.  Let me know what
you think.

Thanks,

Martin


IPA-CP does not do a reasonable job when it is updating profile counts
after it has created clones of recursive functions.  This patch
addresses that by:

1. Only updating counts for special-context clones.  When a clone is
created for all contexts, the original is going to be dead and the
cgraph machinery has copied counts to the new node which is the right
thing to do.  Therefore updating counts has been moved from
create_specialized_node to decide_about_value and
decide_whether_version_node.

2. The current profile updating code artificially increased the assumed
old count when the sum of counts of incoming edges to both the
original and new node were bigger than the count of the original
node.  This always happened when self-recursive edge from the clone
was also redirected to the clone because both the original edge and
its clone had original high counts.  This clutch was removed and
replaced by the next point.

3. When cloning also redirects a self-recursive clone to the clone
itself, new logic has been added to divide the counts brought by such
recursive edges between the original node and the clone.  This is
impossible to do well without special knowledge about the function and
which non-recursive entry calls are responsible for what portion of
recursion depth, so the approach taken is rather crude.

For local nodes, we detect the case when the original node is never
called (in the training run at least) with another value and if so,
steal all its counts like if it was dead.  If that is not the case, we
try to divide the count brought by recursive edges (or rather not
brought by direct edges) proportionally to the counts brought by
non-recursive edges - but with artificial limits in place so that we
do not take too many or too few, because that was happening with
detrimental effect in mcf_r.

4. When cloning creates extra clones for values brought by a formerly
self-recursive edge with an arithmetic pass-through jump function on
it, such as it does in exchange2_r, all such clones are processed at
once rather than one after another.  The counts of all such nodes are
distributed evenly (modulo even-formerly-non-recursive-edges) and the
whole situation is then fixed up so that the edge counts fit.  This is
what new function update_counts_for_self_gen_clones does.

5. When values brought by a formerly self-recursive edge with an
arithmetic 

Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 03:58:56PM +, Qing Zhao wrote:
> > Furthermore, __builtin_clear_padding doesn't assume anything, but it takes
> > an address of an object as argument and already the taking of the address
> > that gimple_add_padding_init_for_auto_var does makes the var
> > TREE_ADDRESABLE, which is something that needs to be done before the var is
> > ever accessed during gimplification.
> 
> Yes, currently, “gimple_add_padding_init_for_auto_var” has already done the 
> following before
> calling __builtin_clear_padding:
> 
>   mark_addressable (decl);
>   addr_of_decl = build_fold_addr_expr (decl);
> 
> But looks like that “making the DECL as ADDRESSABLE here” is too late. 

Wouldn't be if we could prove it wasn't gimplified already.

> Yes, If we can “make the DECL as ADDRESSABLE” before the var is accessed 
> during gimplification,  that will also
> fix this issue. But “Where” is the right place to make the DECL as 
> ADDRESSABLE? (Do it in “gimplify_decl_expr” when handling
> The variable’s declaration? )

The where is typically somewhere in the FEs.
But, there are two things.
One is that in order to gimplify it properly, it needs to be marked earlier.
But the other is that if it is not addressable, then clearing padding in it
makes not much sense as I've tried to explain, all it could do is making it
slightly less likely that the var will be optimized into a gimple reg later,
but if it does, then the padding will not be cleared anyway.
And it is only at RTL expansion or RA that the var could be assigned into a
stack location or spilled and at that point is the right time to clear
padding in there if needed.
So while the FEs could make it addressable and then you could clear padding,
it would just penalize code and nothing else, later on the optimizations
would figure out it is no longer addressable and optimize it into gimple
reg.

Jakub



Re: [PATCH] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-18 Thread Jason Merrill via Gcc-patches

On 10/18/21 04:12, Jakub Jelinek wrote:

Hi!

The following testcase ICEs, because while we have in cp_build_addr_expr_1
diagnostics for taking address of an immediate function (and as an exception
deal with build_address from immediate invocation), I forgot to diagnose
taking address of a member function which is done in a different place.
I hope (s.*::foo) () is not an immediate invocation like
(*) () is not, so this patch just diagnoses taking address of a member
function when not in immediate context.

Bootstrapped/regtested on x86_64-linux and i686-linux (without go,
that seem to have some evrp issue when building libgo on both), ok for
trunk?

2021-10-18  Jakub Jelinek  

PR c++/102753
* typeck.c (cp_build_addr_expr_1): Diagnose taking address of
an immediate method.  Use t instead of TREE_OPERAND (arg, 1).

* g++.dg/cpp2a/consteval20.C: New test.

--- gcc/cp/typeck.c.jj  2021-10-05 09:53:55.382734051 +0200
+++ gcc/cp/typeck.c 2021-10-15 19:28:38.034213437 +0200
@@ -6773,9 +6773,21 @@ cp_build_addr_expr_1 (tree arg, bool str
return error_mark_node;
  }
  
+	if (TREE_CODE (t) == FUNCTION_DECL

+   && DECL_IMMEDIATE_FUNCTION_P (t)
+   && cp_unevaluated_operand == 0
+   && (current_function_decl == NULL_TREE
+   || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))


This doesn't cover some of the other cases of immediate context; we 
should probably factor most of immediate_invocation_p out into a 
function called something like in_immediate_context and use it here, and 
in several other places as well.



+ {
+   if (complain & tf_error)
+ error_at (loc, "taking address of an immediate function %qD",
+   t);
+   return error_mark_node;
+ }
+
type = build_ptrmem_type (context_for_name_lookup (t),
  TREE_TYPE (t));
-   t = make_ptrmem_cst (type, TREE_OPERAND (arg, 1));
+   t = make_ptrmem_cst (type, t);
return t;
}
  
--- gcc/testsuite/g++.dg/cpp2a/consteval20.C.jj	2021-10-15 19:40:38.691900472 +0200

+++ gcc/testsuite/g++.dg/cpp2a/consteval20.C2021-10-15 19:49:15.281508419 
+0200
@@ -0,0 +1,24 @@
+// PR c++/102753
+// { dg-do compile { target c++20 } }
+
+struct S {
+  consteval int foo () const { return 42; }
+};
+
+constexpr S s;
+
+int
+bar ()
+{
+  return (s.*::foo) ();  // { dg-error "taking address of an immediate 
function" }
+}
+
+constexpr auto a = ::foo;// { dg-error "taking address of an 
immediate function" }
+
+consteval int
+baz ()
+{
+  return (s.*::foo) ();
+}
+
+static_assert (baz () == 42);

Jakub





Re: [PATCH v4] Fix ICE when mixing VLAs and statement expressions [PR91038]

2021-10-18 Thread Jason Merrill via Gcc-patches

On 10/17/21 09:52, Uecker, Martin wrote:



Here is the 4th version of the patch. I tried to implement
Jason's suggestion and this also fixes the problem. But
I am not sure I understand the condition on
the TREE_SIDE_EFFECTS ...


Checking TREE_SIDE_EFFECTS filters out many trivial cases that we don't 
need to worry about.  I think we also want to check 
variably_modified_type_p, which ought to avoid the OMP problem below.



And there is now another problem:

c_finish_omp_for in c-family/c-omp.c does not seem
to understand the expressions anymore and I get a test
failure in

testsuite/c-c++-common/gomp/for-5.c

where I now get an "invalid increment expression"
instead of the expected error.



(bootstrapping and all other tests work fine)


Martin




Fix ICE when mixing VLAs and statement expressions [PR91038]

When returning VM-types from statement expressions, this can
lead to an ICE when declarations from the statement expression
are referred to later. Most of these issues can be addressed by
gimplifying the base expression earlier in gimplify_compound_lval.

>

Another issue is fixed by adding SAVE_EXPRs in pointer_int_sum
in the FE to force a correct order of evaluation. This fixes
PR91038 and some of the test cases from PR29970 (structs with
VLA members need further work).

 
 2021-08-01  Martin Uecker  
 
2021-08-01  Martin Uecker  
 
 gcc/

 PR c/91038
 PR c/29970
  
* gimplify.c (gimplify_var_or_parm_decl): Update comment.

 (gimplify_compound_lval):
Gimplify base expression first.
 (gimplify_target_expr): Add comment.
 * c-family/c-
common.c (pointer_int_sum): Wrap pointer
operand in SAVE_EXPR and also it to the integer
argument.
 
 gcc/testsuite/

 PR c/91038
 PR c/29970
 * gcc.dg/vla-stexp-3.c:
New test.
 * gcc.dg/vla-stexp-4.c: New test.
 * gcc.dg/vla-stexp-5.c: New test.
 *
gcc.dg/vla-stexp-6.c: New test.
 * gcc.dg/vla-stexp-7.c: New test.
 * gcc.dg/vla-stexp-
8.c: New test.
 * gcc.dg/vla-stexp-9.c: New test.


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9d19e352725..522085664f5 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3348,6 +3348,16 @@ pointer_int_sum (location_t loc, enum tree_code 
resultcode,
  intop = convert (c_common_type_for_size (TYPE_PRECISION (sizetype),
 TYPE_UNSIGNED (sizetype)), intop);
  
+  /* Wrap the pointer expression in a SAVE_EXPR to make sure it

+   * is evaluated first because the size expression may depend on it
+   * for VM types.
+   */


We usually don't give the trailing */ its own line.


+  if (TREE_SIDE_EFFECTS (size_exp))
+{
+ptrop = build1_loc (loc, SAVE_EXPR, TREE_TYPE (ptrop), ptrop);


Why not use the save_expr function?


+intop = build2 (COMPOUND_EXPR, TREE_TYPE (intop), ptrop, intop);
+}
+
/* Replace the integer argument with a suitable product by the object size.
   Do this multiplication as signed, then convert to the appropriate type
   for the pointer operation and disregard an overflow that occurred only
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d8e4b139349..be5b00b6716 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2958,7 +2958,10 @@ gimplify_var_or_parm_decl (tree *expr_p)
   declaration, for which we've already issued an error.  It would
   be really nice if the front end wouldn't leak these at all.
   Currently the only known culprit is C++ destructors, as seen
- in g++.old-deja/g++.jason/binding.C.  */
+ in g++.old-deja/g++.jason/binding.C.
+ Another possible culpit are size expressions for variably modified
+ types which are lost in the FE or not gimplified correctly.
+  */


As above.


if (VAR_P (decl)
&& !DECL_SEEN_IN_BIND_EXPR_P (decl)
&& !TREE_STATIC (decl) && !DECL_EXTERNAL (decl)
@@ -3103,16 +3106,22 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
   expression until we deal with any variable bounds, sizes, or
   positions in order to deal with PLACEHOLDER_EXPRs.
  
- So we do this in three steps.  First we deal with the annotations

- for any variables in the components, then we gimplify the base,
- then we gimplify any indices, from left to right.  */
+ The base expression may contain a statement expression that
+ has declarations used in size expressions, so has to be
+ gimplified before gimplifying the size expressions.
+
+ So we do this in three steps.  First we deal with variable
+ bounds, sizes, and positions, then we gimplify the base,
+ then we deal with the annotations for any variables in the
+ components and any indices, from left to right.  */
+
for (i = expr_stack.length () - 1; i >= 0; i--)
  {
tree t = expr_stack[i];
  
if (TREE_CODE (t) == ARRAY_REF || 

Re: [PATCH] AArch64: Tune case-values-threshold

2021-10-18 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Tune the case-values-threshold setting for modern cores.  A value of 11 
> improves
> SPECINT2017 by 0.2% and reduces codesize by 0.04%.  With -Os use value 8 which
> reduces codesize by 0.07%.
>
> Passes regress, OK for commit?
>
> ChangeLog:
>
> 2021-10-18  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (aarch64_case_values_threshold):
> Change to 8 with -Os, 11 otherwise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
> The expansion for a table switch is quite expensive due to the number
> of instructions, the table lookup and hard to predict indirect jump.
> When optimizing for speed, and -O3 enabled, use the per-core tuning if
> -   set, otherwise use tables for > 16 cases as a tradeoff between size and
> -   performance.  When optimizing for size, use the default setting.  */
> +   set, otherwise use tables for >= 11 cases as a tradeoff between size and
> +   performance.  When optimizing for size, use 8 for smallest codesize.  */

I'm just concerned that here we're using the same explanation but with
different numbers.  Why are the new numbers more right than the old ones
(especially when it comes to code size, where the trade-off hasn't
really changed)?

It would be good to have more discussion of why certain numbers are
too small or too high, and why 8 is the right pivot point for -Os.

Thanks,
Richard

>
>  static unsigned int
>  aarch64_case_values_threshold (void)
> @@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void)
>&& selected_cpu->tune->max_case_values != 0)
>  return selected_cpu->tune->max_case_values;
>else
> -return optimize_size ? default_case_values_threshold () : 17;
> +return optimize_size ? 8 : 11;
>  }
>
>  /* Return true if register REGNO is a valid index register.


Re: [PATCH] AArch64: Enable fast shifts on Neoverse V1/N2

2021-10-18 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Enable the fast shift feature in Neoverse V1 and N2 tunings as well.
>
> ChangeLog:
> 2021-10-18  Wilco Dijkstra  
>
> * config/aarch64/aarch64.c (neoversev1_tunings):
> Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
> (neoversen2_tunings): Likewise.

OK, thanks.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c7b76a7cdeea0539cd73f7987d1a4354d0a40624..e65afe39047359a3279ae6b2047a3e9a04e6c2a9
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1832,7 +1832,8 @@ static const struct tune_params neoversev1_tunings =
>tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> +   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
>_prefetch_tune
>  };
>
> @@ -1858,7 +1859,7 @@ static const struct tune_params neoversen2_tunings =
>2,   /* min_div_recip_mul_df.  */
>0,   /* max_case_values.  */
>tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> -  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
> +  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
>_prefetch_tune
>  };


[PATCH] AArch64: Enable fast shifts on Neoverse V1/N2

2021-10-18 Thread Wilco Dijkstra via Gcc-patches
Enable the fast shift feature in Neoverse V1 and N2 tunings as well.

ChangeLog:
2021-10-18  Wilco Dijkstra  

* config/aarch64/aarch64.c (neoversev1_tunings):
Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
(neoversen2_tunings): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c7b76a7cdeea0539cd73f7987d1a4354d0a40624..e65afe39047359a3279ae6b2047a3e9a04e6c2a9
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1832,7 +1832,8 @@ static const struct tune_params neoversev1_tunings =
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
-   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
+   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
+   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -1858,7 +1859,7 @@ static const struct tune_params neoversen2_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
   _prefetch_tune
 };
 

[PATCH] AArch64: Tune case-values-threshold

2021-10-18 Thread Wilco Dijkstra via Gcc-patches

Tune the case-values-threshold setting for modern cores.  A value of 11 improves
SPECINT2017 by 0.2% and reduces codesize by 0.04%.  With -Os use value 8 which
reduces codesize by 0.07%.

Passes regress, OK for commit?

ChangeLog:

2021-10-18  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_case_values_threshold):
Change to 8 with -Os, 11 otherwise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
f5b25a7f7041645921e6ad85714efda73b993492..adc5256c5ccc1182710d87cc6a1091083d888663
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9360,8 +9360,8 @@ aarch64_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
The expansion for a table switch is quite expensive due to the number
of instructions, the table lookup and hard to predict indirect jump.
When optimizing for speed, and -O3 enabled, use the per-core tuning if 
-   set, otherwise use tables for > 16 cases as a tradeoff between size and
-   performance.  When optimizing for size, use the default setting.  */
+   set, otherwise use tables for >= 11 cases as a tradeoff between size and
+   performance.  When optimizing for size, use 8 for smallest codesize.  */
 
 static unsigned int
 aarch64_case_values_threshold (void)
@@ -9372,7 +9372,7 @@ aarch64_case_values_threshold (void)
   && selected_cpu->tune->max_case_values != 0)
 return selected_cpu->tune->max_case_values;
   else
-return optimize_size ? default_case_values_threshold () : 17;
+return optimize_size ? 8 : 11;
 }
 
 /* Return true if register REGNO is a valid index register.


[PATCH] C, C++, OpenMP: Add 'has_device_addr' clause to 'target' construct

2021-10-18 Thread Marcel Vollweiler

Hi,

This patch adds the 'has_device_addr' clause to the OpenMP 'target'
construct which was introduced in OpenMP 5.1:

"The has_device_addr clause was added to the target construct to allow
access to variables or array sections that already have a device
address" (OpenMP 5.1 Specification, p. 669)

"The has_device_addr clause indicates that its list items already have
device addresses and therefore they may be directly accessed from a
target device. If the device address of a list item is not for the
device on which the target region executes, accessing the list item
inside the region results in unspecified behavior. The list items may
include array sections." (OpenMP 5.1 Specification, p. 200)

There are some restrictions for 'has_device_addr' (p. 202f):

1. "A list item may not be specified in both an is_device_ptr clause and
a has_device_addr clause on the directive."

2. "A list item that appears in an is_device_ptr or a has_device_addr
clause must not be specified in any data-sharing attribute clause on the
same target construct."

3. "A list item that appears in a has_device_addr clause must have a
valid device address for the device data environment."

4. As discussed on the omp-lang mailing list
(https://mailman.openmp.org/mailman/private/omp-lang/2021/017982.html),
has_device_addr is a data-sharing attribute clause (that is not yet
stated explicitly but will be corrected in OpenMP 5.2) and should not be
used together with the map clause on the same construct.

Similar restrictions hold also for the 'is_device_ptr' clause, so I
updated the code and added tests for that clause, too.

I tested the patch without regressions on powerpc64le-linux-gnu with
nvptx offloading and x86_64-linux-gnu with amdgcn offloading.

This patch only considers C/C++. The changes for Fortran will be
submitted separately later.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
C, C++, OpenMP: Add 'has_device_addr' clause to 'target' construct.

This patch adds the 'has_device_addr' clause to the OpenMP 'target' construct
which was introduced in OpenMP 5.1.

gcc/c-family/ChangeLog:

* c-omp.c (c_omp_split_clauses): Add OMP_CLAUSE_HAS_DEVICE_ADDR case.
* c-pragma.h (enum pragma_kind): Add 5.1 in comment.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_clause_name): Parse 'has_device_addr' clause.
(c_parser_omp_clause_has_device_addr): Added.
(c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case.
(c_parser_omp_target_exit_data): Add HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* c-typeck.c (c_finish_omp_clauses): Add check that has_device_addr and 
is_device_ptr do not appear together with map.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_clause_name): Parse 'has_device_addr' clause.
(cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR case.
(cp_parser_omp_target_update): Add HAS_DEVICE_ADDR to OMP_CLAUSE_MASK.
* pt.c (tsubst_omp_clauses): Add cases for OMP_CLAUSE_HAS_DEVICE_ADDR.
* semantics.c (finish_omp_clauses): Add check that has_device_addr and
is_device_ptr do not appear together with map.

gcc/ChangeLog:

* gimplify.c (gimplify_scan_omp_clauses): Add 
OMP_CLAUSE_HAS_DEVICE_ADDR case.
(gimplify_adjust_omp_clauses): Likewise.
* omp-low.c (scan_sharing_clauses): Add lowering for has_device_addr 
clause.
(lower_omp_target): Likewise.
* tree-core.h (enum omp_clause_code): Update enum.
* tree-nested.c (convert_nonlocal_omp_clauses): Add has_device_addr 
support.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.c (dump_omp_clause): Likewise.
* tree.c: Update omp_clause_num_ops array.

libgomp/ChangeLog:

* libgomp.texi: Updated entry for 'has-device-addr'.
* testsuite/libgomp.c++/target-has-device-addr-2.C: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-1.c: New test.
* testsuite/libgomp.c-c++-common/target-has-device-addr-3.c: New test.
* testsuite/libgomp.c/target-has-device-addr-4.c: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/clauses-1.c: Add has_device_addr to test cases.
* g++.dg/gomp/attrs-1.C: Likewise.
* g++.dg/gomp/attrs-2.C: Likewise.
* c-c++-common/gomp/target-has-device-addr-1.c: New test.
* c-c++-common/gomp/target-is-device-ptr.c: New test.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index b9024cb..eb4950c 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -1837,6 +1837,7 @@ c_omp_split_clauses (location_t loc, enum tree_code code,
case OMP_CLAUSE_DEVICE:

Re: [PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-18 Thread Andrew MacLeod via Gcc-patches

On 10/16/21 5:27 AM, Andrew Pinski wrote:

On Fri, Oct 15, 2021 at 6:53 AM Andrew MacLeod via Gcc-patches
 wrote:

I've been looking at the pathological time issue ranger has with the
testcase from, uh..  PR 97623 I think.  I've lost the details, but
kept the file since it was showing unpleasant behaviour.

Most of the time is spent in callbacks from substitute_and_fold to
value_on_edge()  dealing with PHI results and arguments.  Turns out, its
virtually all wasted time dealing with SSA_NAMES with the
OCCURS_IN_ABNORMAL_PHI flag set..

This patch tells ranger not to consider any SSA_NAMEs which occur in
abnormal PHIs.  This reduces the memory footprint of all the caches, and
also has a ripple effect with the new threader code which uses the GORI
exports and imports tables, making it faster as well as no ssa-name with
the abnormal flag set will be entered into the tables.

That alone was not quite enough, as all the sheer volume of call backs
still took time,  so I added checks in the value_of_* class of routines
used by substitute_and_fold to indicate there is no constant value
available for any SSA_NAME with that flag set.

On my x86_64 box, before this change, that test case looked like:

tree VRP   :   7.76 (  4%)   0.23 ( 5%)   8.02
(  4%)   537k (  0%)
tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 (
4%)   392k (  0%)
tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 39.44 (
22%)  1142k (  0%)

And with this patch , the results are:

   tree VRP   :   7.57 (  6%)   0.26 ( 5%)   7.85
(  6%)   537k (  0%)
   tree VRP threader  :   0.62 (  0%)   0.02 ( 0%)   0.65
(  0%)   392k (  0%)
   tree Early VRP :   4.00 (  3%)   0.01 ( 0%)   4.03
(  3%)  1142k (  0%)

Which is a significant improvement, both for EVRP and the threader..

The patch adjusts the ranger folder, as well as the hybrid folder.

bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed
cases that I have been able to find.

Did you test it with go enabled?
Because others and myself are now running into a bootstrap failure
most likely due to this patch.
The number of SSA_NAME_OCCURS_IN_ABNORMAL_PHI in go is increased due
to -fnon-call-exceptions being true there.


I would have sworn upside down I did, but looking at my build script, 
somewhere along the way GO got turned off, so although I was building 
ada, GO was not being included.. sorry.


I'll get this resolved this afternoon.

Andrew



Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Qing Zhao via Gcc-patches


> On Oct 18, 2021, at 10:36 AM, Jakub Jelinek  wrote:
> 
> On Mon, Oct 18, 2021 at 03:04:40PM +, Qing Zhao wrote:
>> 2021-10-16  qing zhao  
>> 
>>  * gimplify.c (gimplify_decl_expr): Do not add call to
>>  __BUILTIN_CLEAR_PADDING when a variable is a gimple register
> 
> The builtin is called __builtin_clear_padding, using __BUILTIN_CLEAR_PADDING
> makes no sense, either use the lower-case version of it, or use
> BUILT_IN_CLEAR_PADDING which is the enum built_in_function enumerator
> used to refer to it in the compiler.

Okay, will fix all the names in the patch.

> 
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
>>   pattern initialization.
>>   In order to make the paddings as zeroes for pattern init, We
>>   should add a call to __builtin_clear_padding to clear the
>> - paddings to zero in compatiple with CLANG.  */
>> -  if (flag_auto_var_init == AUTO_INIT_PATTERN)
>> + paddings to zero in compatiple with CLANG.
>> + We cannot insert this call if the variable is a gimple register
>> + since __BUILTIN_CLEAR_PADDING assumes the variable is in memory.
> 
> Likewise (and please fix the other spots in gimplify.c that do that too.

Yes, will fix that.

> Furthermore, __builtin_clear_padding doesn't assume anything, but it takes
> an address of an object as argument and already the taking of the address
> that gimple_add_padding_init_for_auto_var does makes the var
> TREE_ADDRESABLE, which is something that needs to be done before the var is
> ever accessed during gimplification.

Yes, currently, “gimple_add_padding_init_for_auto_var” has already done the 
following before
calling __builtin_clear_padding:

  mark_addressable (decl);
  addr_of_decl = build_fold_addr_expr (decl);

But looks like that “making the DECL as ADDRESSABLE here” is too late. 

Yes, If we can “make the DECL as ADDRESSABLE” before the var is accessed during 
gimplification,  that will also
fix this issue. But “Where” is the right place to make the DECL as ADDRESSABLE? 
(Do it in “gimplify_decl_expr” when handling
The variable’s declaration? )


> 
>> + As a result, if a long double/Complex long double variable will
> 
> Please change Complex to _Complex

Okay.

> 
>> + spilled into stack later, its padding is 0XFE.  */
>> +  if (flag_auto_var_init == AUTO_INIT_PATTERN
>> +  && !is_gimple_reg (decl)
>> +  && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
>>  gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
>>  }
>> }
>> @@ -5388,8 +5394,15 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
>> *pre_p, gimple_seq *post_p,
>>  initialize paddings of object always to zero regardless of
>>  INIT_TYPE.  Note, we will not insert this call if the aggregate
>>  variable has be completely cleared already or it's initialized
>> - with an empty constructor.  */
>> + with an empty constructor.  We cannot insert this call if the
>> + variable is a gimple register since __BUILTIN_CLEAR_PADDING assumes
>> + the variable is in memory.  As a result, if a long double/Complex long
> 
> Likewise.

Okay.

>> + double variable will be spilled into stack later, its padding cannot
>> + be cleared with __BUILTIN_CLEAR_PADDING.  we should clear its padding
> 
> s/we/We/

Okay.
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/pr102281.c
>> @@ -0,0 +1,15 @@
>> +/* PR102281  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-ftrivial-auto-var-init=zero" } */
>> +long _mm_set_epi64x___q0;
>> +__attribute__((__vector_size__(2 * sizeof(long long long 
>> _mm_set_epi64x() {
>> +  return (__attribute__((__vector_size__(2 * sizeof(long long long){
>> +  _mm_set_epi64x___q0};
>> +}
>> +
>> +float _mm_set1_ps___F;
>> +__attribute__((__vector_size__(4 * sizeof(float float
>> +__attribute___mm_set1_ps() {
>> +  return (__attribute__((__vector_size__(4 * sizeof(float float){
>> +  _mm_set1_ps___F};
>> +}
> 
> If it is a generic testcase, please change the variable and function names
> so that they don't look like x86 intrinsics, using v, w or var1, var2 etc.
> instead of _mm_set_epi64x___q0 or _mm_set1_ps___F, and
> foo/bar instead of _mm_set_epi64x or __attribute___mm_set1_ps will make it
> certainly more readable.  Please add typedefs for the vector types,
> typedef long long V __attribute__((__vector_size__ (2 * sizeof (long long;
> typedef float W __attribute__((__vector_size__ (4 * sizeof (float;
> and use that (note that I've changed the long in there to long long to make
> it match.
> And please use (void) instead of () in the function declarations.
> Also, the testcase will likely fail on x86-64 with
> RUNTESTFLAGS='--target_board=unix\{-m32/-mno-sse,-m32/-msse2,-m64\} 
> dg.exp=pr102281.c'
> because of -Wpsabi warnings.  So, you likely want -Wno-psabi or 

Re: [PATCH v6] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2021-10-18 Thread Joseph Myers
On Sun, 17 Oct 2021, Raoni Fassina Firmino wrote:

> First is the different arguments from the C99 functions.  I think the
> solution is a macro to correct this, like so:
> 
> #define feclearexcept(excepts) \
> __builtin_feclearexcept(excepts, FE_DIVBYZERO, FE_INEXACT, \
> FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW)
> 
> That is automatically always included or included when fenv.h is
> included.  Does the preprocessor have this ability? If so, where
> should I put it?

The compiler should not be adding such macros to libc headers.  If libc 
wants to provide such optimizations based on built-in functions, they 
should go in libc headers with an appropriate condition on the compiler 
version.

However, it's better to get things right automatically without needing any 
macros or other header additions at all.  That is, define feclearexcept as 
a built-in function, *without* the extra arguments, and with the back end 
knowing about the FE_* values for the target libc.  Then you can simply 
avoid expanding the function inline when the back end doesn't know both 
the FE_* values and how to use them.

fpclassify is a fundamentally different case, because that's defined by 
the standard to be a *type-generic macro*, whereas feclearexcept is 
defined by the standard to be a *function*.  The example of fpclassify 
should not be followed for feclearexcept, feraiseexcept or fegetround.

> Second is the fallback of the expanders.  When the expanders fail it
> will leave the function call, which is great, but since the argument
> list is different, well, it not is pretty.  There is no execution

If you define __builtin_X to have different arguments to X, it should also 
be defined so it's *always* expanded inline (on all architectures) and 
never falls back to a library function call to X.  (This means, for 
example, not defining __builtin_X at all as a built-in function in cases, 
such as soft float, where you can't expand it inline, so that erroneous 
code trying to call __builtin_X in that case ends up with an undefined 
reference to the __builtin_X symbol.)

Once you avoid having different arguments to the library function, you can 
simply avoid expanding inline whenever the back end lacks the relevant 
information; you don't need to do anything to avoid the built-in function 
existing.

> +@deftypefn {Built-in Function} int __builtin_fegetround (int, int, int, int)
> +This built-in implements the C99 fegetround functionality.  The four int
> +arguments should be the target library's notion of the possible FP rouding
> +modes.  They must be constant values and they must appear in this order:
> +@code{FE_DOWNWARD}, @code{FE_TONEAREST}, @code{FE_TOWARDZERO},
> +@code{FE_UPWARD}.  In other words:

Some architectures have more rounding modes (e.g. FE_TONEARESTFROMZERO).  
Some have fewer.  I think that illustrates the essential flaw of defining 
these functions to take a fixed set of rounding mode macros as arguments.

On the other hand, there is a use for a *different* built-in function to 
get the rounding mode for FLT_ROUNDS, using the fixed set of values for 
FLT_ROUNDS specified in the C standard, and *always expanding inline 
without ever introducing a dependency on libm*.  See bug 30569 and 
 regarding that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 03:04:40PM +, Qing Zhao wrote:
> 2021-10-16  qing zhao  
> 
>   * gimplify.c (gimplify_decl_expr): Do not add call to
>   __BUILTIN_CLEAR_PADDING when a variable is a gimple register

The builtin is called __builtin_clear_padding, using __BUILTIN_CLEAR_PADDING
makes no sense, either use the lower-case version of it, or use
BUILT_IN_CLEAR_PADDING which is the enum built_in_function enumerator
used to refer to it in the compiler.

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
>pattern initialization.
>In order to make the paddings as zeroes for pattern init, We
>should add a call to __builtin_clear_padding to clear the
> -  paddings to zero in compatiple with CLANG.  */
> -   if (flag_auto_var_init == AUTO_INIT_PATTERN)
> +  paddings to zero in compatiple with CLANG.
> +  We cannot insert this call if the variable is a gimple register
> +  since __BUILTIN_CLEAR_PADDING assumes the variable is in memory.

Likewise (and please fix the other spots in gimplify.c that do that too.
Furthermore, __builtin_clear_padding doesn't assume anything, but it takes
an address of an object as argument and already the taking of the address
that gimple_add_padding_init_for_auto_var does makes the var
TREE_ADDRESABLE, which is something that needs to be done before the var is
ever accessed during gimplification.

> +  As a result, if a long double/Complex long double variable will

Please change Complex to _Complex

> +  spilled into stack later, its padding is 0XFE.  */
> +   if (flag_auto_var_init == AUTO_INIT_PATTERN
> +   && !is_gimple_reg (decl)
> +   && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
>   gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
>   }
>  }
> @@ -5388,8 +5394,15 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>   initialize paddings of object always to zero regardless of
>   INIT_TYPE.  Note, we will not insert this call if the aggregate
>   variable has be completely cleared already or it's initialized
> - with an empty constructor.  */
> + with an empty constructor.  We cannot insert this call if the
> + variable is a gimple register since __BUILTIN_CLEAR_PADDING assumes
> + the variable is in memory.  As a result, if a long double/Complex long

Likewise.
> + double variable will be spilled into stack later, its padding cannot
> + be cleared with __BUILTIN_CLEAR_PADDING.  we should clear its padding

s/we/We/

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr102281.c
> @@ -0,0 +1,15 @@
> +/* PR102281  */
> +/* { dg-do compile } */
> +/* { dg-options "-ftrivial-auto-var-init=zero" } */
> +long _mm_set_epi64x___q0;
> +__attribute__((__vector_size__(2 * sizeof(long long long 
> _mm_set_epi64x() {
> +  return (__attribute__((__vector_size__(2 * sizeof(long long long){
> +  _mm_set_epi64x___q0};
> +}
> +
> +float _mm_set1_ps___F;
> +__attribute__((__vector_size__(4 * sizeof(float float
> +__attribute___mm_set1_ps() {
> +  return (__attribute__((__vector_size__(4 * sizeof(float float){
> +  _mm_set1_ps___F};
> +}

If it is a generic testcase, please change the variable and function names
so that they don't look like x86 intrinsics, using v, w or var1, var2 etc.
instead of _mm_set_epi64x___q0 or _mm_set1_ps___F, and
foo/bar instead of _mm_set_epi64x or __attribute___mm_set1_ps will make it
certainly more readable.  Please add typedefs for the vector types,
typedef long long V __attribute__((__vector_size__ (2 * sizeof (long long;
typedef float W __attribute__((__vector_size__ (4 * sizeof (float;
and use that (note that I've changed the long in there to long long to make
it match.
And please use (void) instead of () in the function declarations.
Also, the testcase will likely fail on x86-64 with
RUNTESTFLAGS='--target_board=unix\{-m32/-mno-sse,-m32/-msse2,-m64\} 
dg.exp=pr102281.c'
because of -Wpsabi warnings.  So, you likely want -Wno-psabi or even -Wno-psabi 
-w
in dg-options.

Jakub



Re: [PATCH][i386] target: support spaces in target attribute.

2021-10-18 Thread Uros Bizjak via Gcc-patches
On Mon, Oct 18, 2021 at 1:23 PM Martin Liška  wrote:
>
> On 10/11/21 13:17, Martin Liška wrote:
> > On 10/4/21 23:02, Andrew Pinski wrote:
> >> It might be useful to skip tabs for the same reason as spaces really.
> >
> > Sure, be my guest.
> >
> > Martin
>
> May I please ping this i386-specific patch?

It is not i386-specific (due to system.h change). But one line in
i386-options.c is OK.

Thanks,
Uros.


Re: [PATCH] Adjust testcase for O2 vectorization.

2021-10-18 Thread Martin Sebor via Gcc-patches

On 10/17/21 10:38 PM, Hongtao Liu wrote:

On Fri, Oct 15, 2021 at 11:37 PM Martin Sebor  wrote:


On 10/14/21 1:11 AM, liuhongt wrote:

Hi Kewen:
Cound you help to verify if this patch fix those regressions
for rs6000 port.

As discussed in [1], this patch add xfail/target selector to those
testcases, also make a copy of them so that they can be tested w/o
vectorization.


Just to make sure I understand what's happening with the tests:
the new -N-novec.c tests consist of just the casses xfailed due
to vectorizartion in the corresponding -N.c tests?  Or are there

Wstringop-overflow-2-novec.c is the same as Wstringop-overflow-2.c
before O2 vectorization adjustment.
Do you want me to reduce them to only contain cases for new xfail/target?


That would be helpful, thank you.  Are the others also full
copies? (If yes, then copying just the failing cases into
the new tests would be good as well.)


some other differences (e.g., new cases in them, etc.)?  I'd
hope to eventually remove the -novec.c tests once all warnings
behave as expected with vectorization as without it (maybe
keeping just one case both ways as a sanity check).

For the target-supports selectors, I confess I don't know enough
about vectorization to find their names quite intuitive enough
to know when to use each.  For instance, for vect_slp_v4qi_store:

It's 4-byte char stores with address being 4-bytes aligned.
.i.e.



+# Return the true if target support vectorization of v4qi store.
+proc check_effective_target_vect_slp_v4qi_store { } {
+set pattern {add new stmt: MEM }
+return [expr { [check_vect_slp_vnqihi_store_usage $pattern ] != 0 }]
+}

When should this selector be used?  In cases involving 4-byte
char stores?  Only naturally aligned 4-bytes stores (i.e., on
a 4 byte boundary, as the check_vect_slp_vnqihi_store_usage
suggests?) Or 4-byte stores of any types (e.g., four chars
as well as two 16-bit shorts), etc.?

Hopefully once all the warnings handle vectorization we won't
need to use them, but until then it would be good to document
this in more detail in the .exp file.

Finally, thank you for adding comments to the xfailed tests
referencing the corresponding bugs!  Can you please mention
the PR in the comment in each of the new xfails?  Like so:

index 7d29b5f48c7..cb687c69324 100644
--- a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
+++ b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
@@ -189,8 +189,9 @@ void ga1__ (void)

 struct A1 a = { 1 };
 a.a[0] = 0;
+  // O2 vectorization regress Wstringop-overflow case (1), refer to
pr102462.
 a.a[1] = 1;// { dg-warning
"\\\[-Wstringop-overflow" }
-  a.a[2] = 2;// { dg-warning
"\\\[-Wstringop-overflow" "" { xfail { i?86-*-* x86_64-*-* } } }
+  a.a[2] = 2;// { dg-warning
"\\\[-Wstringop-overflow" "pr102462" { xfail { vect_slp_v2qi_store } } }
 
 PR in dg-warning comment.

This should make it easier to deal with the XFAILs once
the warnings have improved to handle vectorization.

Will do.


Great, thank you!

Martin


Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-18 Thread H.J. Lu via Gcc-patches
On Mon, Oct 18, 2021 at 8:04 AM David Edelsohn  wrote:
>
> Hi, H.J.
>
> My colleague responded that GCC Go builds and works on AIX, but it
> currently requires a special, custom version of GNU objcopy that adds
> support for the types of features that Go requires to operate on AIX
> XCOFF files.  Those changes have not yet been updated and contributed
> to GNU Binutils.
>
> I will see if I can install that version of objcopy standalone.  We
> also can ask Clement and ATOS to test GCC Go build with your proposed
> libffi patch, or is it vanilla libffi trunk?

My libffi branch:

https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/libffi/master

synced with libffi v3.4.2, not master.

BTW, the current master branch won't build libgo:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102796

Thanks.

> Thanks, David
>
>
> On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
> >
> > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  
> > > > wrote:
> > > > >
> > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 
> > > > > > > > > > > > > and cherry-picked fixes
> > > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > > has been changed significantly with new features, bug 
> > > > > > > > > > > > > fixes and new target
> > > > > > > > > > > > > support.  Here is a set of patches to sync with 
> > > > > > > > > > > > > libffi 3.4.2 release and
> > > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  
> > > > > > > > > > > > > It is availale at
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > > > > >
> > > > > > > > > > > > How did you test this?  It looks like libgo is the only 
> > > > > > > > > > > > consumer of
> > > > > > > > > > > > libffi these days.
> > > > > > > > > > > > In particular go/libgo seems to be supported on almost 
> > > > > > > > > > > > all targets besides
> > > > > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > > > > configurations?
> > > > > > > > > > >
> > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > > > > understanding is that
> > > > > > > > > > > the upstream libffi works on Darwin and Windows.
> > > > > > > > > > >
> > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you 
> > > > > > > > > > > > won't get any "review"
> > > > > > > > > > > > of this massive diff.
> > > > > > > > > > >
> > > > > > > > > > > I believe that it should just work.  Our libffi is very 
> > > > > > > > > > > much out of date.
> > > > > > > > > >
> > > > > > > > > > Yes, you can hope.  And yes, our libffi is out of date.
> > > > > > > > > >
> > > > > > > > > > Can you please do the extra step to test one weird 
> > > > > > > > > > architecture, namely
> > > > > > > > > > powerpc64-aix which is available on the compile-farm?
> > > > > > > > >
> > > > > > > > > I will give it a try and report back.
> > > > > > > > >
> > > > > > > > > > If that goes well I think it's good to "hope" at this point 
> > > > > > > > > > (and plenty of
> > > > > > > > > > time to fix fallout until the GCC 12 release).
> > > > > > > > > >
> > > > > > > > > > Thus OK after the extra testing dance and waiting until 
> > > > > > > > > > early next
> > > > > > > > > > week so others can throw in a veto.
> > > > > > > >
> > > > > > > > I tried to bootstrap GCC master branch on  

[PATCH] PR target/102785: Correct addsub/subadd patterns on bfin.

2021-10-18 Thread Roger Sayle

This patch resolves PR target/102785 where my recent patch to constant
fold saturating addition/subtraction exposed a latent bug in the bfin
backend.  The patterns used for blackfin's V2HI ssaddsub and sssubadd
instructions had the indices/operations swapped.  This was harmless
until we started evaluating these expressions at compile-time, when
the mismatch was caught by the testsuite.

Many thanks to Jeff Law for confirming that this patch fixes these
regressions on bfin-elf.  Ok for mainline?


2021-10-18  Roger Sayle  

gcc/ChangeLog
PR target/102785
* config/bfin/bfin.md (addsubv2hi3, subaddv2hi3, ssaddsubv2hi3,
sssubaddv2hi3):  Swap the order of operators in vec_concat.

Thanks again,
Roger
--

diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md
index 8b311f3..fd65f4d 100644
--- a/gcc/config/bfin/bfin.md
+++ b/gcc/config/bfin/bfin.md
@@ -3018,19 +3018,6 @@
 (define_insn "addsubv2hi3"
   [(set (match_operand:V2HI 0 "register_operand" "=d")
(vec_concat:V2HI
-(plus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" "d")
-(parallel [(const_int 0)]))
- (vec_select:HI (match_operand:V2HI 2 "register_operand" "d")
-(parallel [(const_int 0)])))
-(minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
-  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])]
-  ""
-  "%0 = %1 +|- %2%!"
-  [(set_attr "type" "dsp32")])
-
-(define_insn "subaddv2hi3"
-  [(set (match_operand:V2HI 0 "register_operand" "=d")
-   (vec_concat:V2HI
 (minus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" "d")
  (parallel [(const_int 0)]))
   (vec_select:HI (match_operand:V2HI 2 "register_operand" "d")
@@ -3038,23 +3025,23 @@
 (plus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])]
   ""
-  "%0 = %1 -|+ %2%!"
+  "%0 = %1 +|- %2%!"
   [(set_attr "type" "dsp32")])
 
-(define_insn "ssaddsubv2hi3"
+(define_insn "subaddv2hi3"
   [(set (match_operand:V2HI 0 "register_operand" "=d")
(vec_concat:V2HI
-(ss_plus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" 
"d")
-   (parallel [(const_int 0)]))
-(vec_select:HI (match_operand:V2HI 2 "register_operand" 
"d")
-   (parallel [(const_int 0)])))
-(ss_minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
- (vec_select:HI (match_dup 2) (parallel [(const_int 
1)])]
+(plus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" "d")
+(parallel [(const_int 0)]))
+ (vec_select:HI (match_operand:V2HI 2 "register_operand" "d")
+(parallel [(const_int 0)])))
+(minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
+  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])]
   ""
-  "%0 = %1 +|- %2 (S)%!"
+  "%0 = %1 -|+ %2%!"
   [(set_attr "type" "dsp32")])
 
-(define_insn "sssubaddv2hi3"
+(define_insn "ssaddsubv2hi3"
   [(set (match_operand:V2HI 0 "register_operand" "=d")
(vec_concat:V2HI
 (ss_minus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" 
"d")
@@ -3064,6 +3051,19 @@
 (ss_plus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
 (vec_select:HI (match_dup 2) (parallel [(const_int 
1)])]
   ""
+  "%0 = %1 +|- %2 (S)%!"
+  [(set_attr "type" "dsp32")])
+
+(define_insn "sssubaddv2hi3"
+  [(set (match_operand:V2HI 0 "register_operand" "=d")
+   (vec_concat:V2HI
+(ss_plus:HI (vec_select:HI (match_operand:V2HI 1 "register_operand" 
"d")
+   (parallel [(const_int 0)]))
+(vec_select:HI (match_operand:V2HI 2 "register_operand" 
"d")
+   (parallel [(const_int 0)])))
+(ss_minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))
+ (vec_select:HI (match_dup 2) (parallel [(const_int 
1)])]
+  ""
   "%0 = %1 -|+ %2 (S)%!"
   [(set_attr "type" "dsp32")])
 


[PATCH] i386: Fix ICE in ix86_print_opreand_address [PR 102761]

2021-10-18 Thread Uros Bizjak via Gcc-patches
2021-10-18  Uroš Bizjak  

PR target/102761

gcc/ChangeLog:

* config/i386/i386.c (ix86_print_operand_address):
Error out for non-address_operand asm operands.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102761.c: New test.

Boostrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master, will be backported to other release branches.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9cc903e826b..5ef1a92a7ce 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13921,7 +13921,10 @@ ix86_print_operand_address_as (FILE *file, rtx addr,
 static void
 ix86_print_operand_address (FILE *file, machine_mode /*mode*/, rtx addr)
 {
-  ix86_print_operand_address_as (file, addr, ADDR_SPACE_GENERIC, false);
+  if (this_is_asm_operands && ! address_operand (addr, VOIDmode))
+output_operand_lossage ("invalid constraints for operand");
+  else
+ix86_print_operand_address_as (file, addr, ADDR_SPACE_GENERIC, false);
 }
 
 /* Implementation of TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr102761.c 
b/gcc/testsuite/gcc.target/i386/pr102761.c
new file mode 100644
index 000..58ff27e4bcc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102761.c
@@ -0,0 +1,11 @@
+/* PR target/102761 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+int foo (void);
+
+void
+bar (void)
+{
+  asm volatile ("%a0" : : "X"(foo () ? 2 : 1)); /* { dg-error "invalid 
constraints for operand" } */
+}


[Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers.

2021-10-18 Thread Qing Zhao via Gcc-patches
Hi,

PR102281  -ftrivial-auto-var-init=zero causes ice

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281

Exposed multiple issues in the current padding clearing implementation of 
-ftrivial-auto-var-init:

  A. should check is_gimple_reg before adding the call to 
__builtin_clear_padding; (correctness)
  B. should check whether a type has padding before adding this call; (more 
efficient)
  C. For long double/Complex long double variables, if they are explicitly 
initialzied, should clear their padding during RTL phase when the variable is 
spilled into stack.

in the fix to this bug, A is a must, B is better to add in. C is not needed, 
can be fixed in another bug, I have created a new 
PR 102781 to record this issue and will fix it later is needed.

The patch for this Bug includes A + B. 

I have tested the patch on X86 and aarch64, bootstrapped and regression 
testing. 

Okay for trunk?

Thanks.

Qing.

=
>From ca78d82d7fe9064c0dcae845d1e4df34601fc083 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Sat, 16 Oct 2021 17:15:23 +
Subject: [PATCH] PR 102281 (-ftrivial-auto-var-init=zero causes ice)

Do not add call to __BUILTIN_CLEAR_PADDING when a variable is a gimple
register or it might not have padding.

gcc/ChangeLog:

2021-10-16  qing zhao  

* gimplify.c (gimplify_decl_expr): Do not add call to
__BUILTIN_CLEAR_PADDING when a variable is a gimple register
or it might not have padding.
(gimplify_init_constructor): Likewise.

gcc/testsuite/ChangeLog:

2021-10-16  qing zhao  

* c-c++-common/pr102281.c: New test.
* gcc.target/i386/auto-init-2.c: Adjust testing case.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-6.c: Likewise.
* gcc.target/aarch64/auto-init-6.c: Likewise.
---
 gcc/gimplify.c| 19 ---
 gcc/testsuite/c-c++-common/pr102281.c | 15 +++
 .../gcc.target/aarch64/auto-init-6.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/auto-init-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-4.c   | 10 --
 gcc/testsuite/gcc.target/i386/auto-init-6.c   |  7 ---
 6 files changed, 42 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr102281.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index d8e4b139349..82968017cd9 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 pattern initialization.
 In order to make the paddings as zeroes for pattern init, We
 should add a call to __builtin_clear_padding to clear the
-paddings to zero in compatiple with CLANG.  */
- if (flag_auto_var_init == AUTO_INIT_PATTERN)
+paddings to zero in compatiple with CLANG.
+We cannot insert this call if the variable is a gimple register
+since __BUILTIN_CLEAR_PADDING assumes the variable is in memory.
+As a result, if a long double/Complex long double variable will
+spilled into stack later, its padding is 0XFE.  */
+ if (flag_auto_var_init == AUTO_INIT_PATTERN
+ && !is_gimple_reg (decl)
+ && clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
}
 }
@@ -5388,8 +5394,15 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
  initialize paddings of object always to zero regardless of
  INIT_TYPE.  Note, we will not insert this call if the aggregate
  variable has be completely cleared already or it's initialized
- with an empty constructor.  */
+ with an empty constructor.  We cannot insert this call if the
+ variable is a gimple register since __BUILTIN_CLEAR_PADDING assumes
+ the variable is in memory.  As a result, if a long double/Complex long
+ double variable will be spilled into stack later, its padding cannot
+ be cleared with __BUILTIN_CLEAR_PADDING.  we should clear its padding
+ when it is spilled into memory.  */
   if (is_init_expr
+  && !is_gimple_reg (object)
+  && clear_padding_type_may_have_padding_p (type)
   && ((AGGREGATE_TYPE_P (type) && !cleared && !is_empty_ctor)
  || !AGGREGATE_TYPE_P (type))
   && is_var_need_auto_init (object))
diff --git a/gcc/testsuite/c-c++-common/pr102281.c 
b/gcc/testsuite/c-c++-common/pr102281.c
new file mode 100644
index 000..bfe9b08524b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr102281.c
@@ -0,0 +1,15 @@
+/* PR102281  */
+/* { dg-do compile } */
+/* { dg-options "-ftrivial-auto-var-init=zero" } */
+long _mm_set_epi64x___q0;
+__attribute__((__vector_size__(2 * sizeof(long long long _mm_set_epi64x() {
+  return (__attribute__((__vector_size__(2 * sizeof(long long long){
+  _mm_set_epi64x___q0};
+}
+
+float _mm_set1_ps___F;

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-18 Thread David Edelsohn via Gcc-patches
Hi, H.J.

My colleague responded that GCC Go builds and works on AIX, but it
currently requires a special, custom version of GNU objcopy that adds
support for the types of features that Go requires to operate on AIX
XCOFF files.  Those changes have not yet been updated and contributed
to GNU Binutils.

I will see if I can install that version of objcopy standalone.  We
also can ask Clement and ATOS to test GCC Go build with your proposed
libffi patch, or is it vanilla libffi trunk?

Thanks, David


On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
>
> On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> >
> > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > >
> > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > > > > > > > cherry-picked fixes
> > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > has been changed significantly with new features, bug 
> > > > > > > > > > > > fixes and new target
> > > > > > > > > > > > support.  Here is a set of patches to sync with libffi 
> > > > > > > > > > > > 3.4.2 release and
> > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It 
> > > > > > > > > > > > is availale at
> > > > > > > > > > > >
> > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > > > >
> > > > > > > > > > > How did you test this?  It looks like libgo is the only 
> > > > > > > > > > > consumer of
> > > > > > > > > > > libffi these days.
> > > > > > > > > > > In particular go/libgo seems to be supported on almost 
> > > > > > > > > > > all targets besides
> > > > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > > > configurations?
> > > > > > > > > >
> > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > > > understanding is that
> > > > > > > > > > the upstream libffi works on Darwin and Windows.
> > > > > > > > > >
> > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you 
> > > > > > > > > > > won't get any "review"
> > > > > > > > > > > of this massive diff.
> > > > > > > > > >
> > > > > > > > > > I believe that it should just work.  Our libffi is very 
> > > > > > > > > > much out of date.
> > > > > > > > >
> > > > > > > > > Yes, you can hope.  And yes, our libffi is out of date.
> > > > > > > > >
> > > > > > > > > Can you please do the extra step to test one weird 
> > > > > > > > > architecture, namely
> > > > > > > > > powerpc64-aix which is available on the compile-farm?
> > > > > > > >
> > > > > > > > I will give it a try and report back.
> > > > > > > >
> > > > > > > > > If that goes well I think it's good to "hope" at this point 
> > > > > > > > > (and plenty of
> > > > > > > > > time to fix fallout until the GCC 12 release).
> > > > > > > > >
> > > > > > > > > Thus OK after the extra testing dance and waiting until early 
> > > > > > > > > next
> > > > > > > > > week so others can throw in a veto.
> > > > > > >
> > > > > > > I tried to bootstrap GCC master branch on  gcc119.fsffrance.org:
> > > > > > >
> > > > > > > *  MT/MODEL: 8284-22A 
> > > > > > > *
> > > > > > > * Partition: gcc119   
> > > > > > > *
> > > > > > > *System: power8-aix.osuosl.org
> > > > > > > *
> > > > > > > *   O/S: AIX V7.2 7200-04-03-2038
> > > > > > >
> > > > > > > I configured GCC with
> > > > > > >
> > > > > > > --with-as=/usr/bin/as 

[PATCH RFA] timevar: Add auto_cond_timevar class

2021-10-18 Thread Jason Merrill via Gcc-patches
The auto_timevar sentinel class for starting and stopping timevars was added
in 2014, but doesn't work for the many uses of timevar_cond_start/stop in
the C++ front end.  So let's add one that does.

This allows us to remove a lot of wrapper functions that were just used to
call timevar_cond_stop on all exits from the function.

Tested x86_64-pc-linux-gnu, OK for trunk?

gcc/ChangeLog:

* timevar.h (class auto_cond_timevar): New.

gcc/cp/ChangeLog:

* call.c
* decl.c
* name-lookup.c:
Use auto_cond_timevar instead of timevar_cond_start/stop.
Remove wrapper functions.
---
 gcc/timevar.h|  46 -
 gcc/cp/call.c| 106 +--
 gcc/cp/decl.c|  51 +++--
 gcc/cp/name-lookup.c | 240 ---
 4 files changed, 150 insertions(+), 293 deletions(-)

diff --git a/gcc/timevar.h b/gcc/timevar.h
index 72e31adb9e6..ccaa42e5904 100644
--- a/gcc/timevar.h
+++ b/gcc/timevar.h
@@ -247,13 +247,53 @@ class auto_timevar
   m_timer->pop (m_tv);
   }
 
- private:
+  // Disallow copies.
+  auto_timevar (const auto_timevar &) = delete;
 
-  // Private to disallow copies.
-  auto_timevar (const auto_timevar &);
+ private:
+  timer *m_timer;
+  timevar_id_t m_tv;
+};
+
+// As above, but use cond_start/stop.
+class auto_cond_timevar
+{
+ public:
+  auto_cond_timevar (timer *t, timevar_id_t tv)
+: m_timer (t),
+  m_tv (tv)
+  {
+start ();
+  }
+
+  explicit auto_cond_timevar (timevar_id_t tv)
+: m_timer (g_timer)
+, m_tv (tv)
+  {
+start ();
+  }
+
+  ~auto_cond_timevar ()
+  {
+if (m_timer && !already_running)
+  m_timer->cond_stop (m_tv);
+  }
+
+  // Disallow copies.
+  auto_cond_timevar (const auto_cond_timevar &) = delete;
+
+ private:
+  void start()
+  {
+if (m_timer)
+  already_running = m_timer->cond_start (m_tv);
+else
+  already_running = false;
+  }
 
   timer *m_timer;
   timevar_id_t m_tv;
+  bool already_running;
 };
 
 extern void print_time (const char *, long);
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c5601d96ab8..80e618622fb 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4436,7 +4436,7 @@ build_user_type_conversion (tree totype, tree expr, int 
flags,
   struct z_candidate *cand;
   tree ret;
 
-  bool subtime = timevar_cond_start (TV_OVERLOAD);
+  auto_cond_timevar tv (TV_OVERLOAD);
   cand = build_user_type_conversion_1 (totype, expr, flags, complain);
 
   if (cand)
@@ -4452,7 +4452,6 @@ build_user_type_conversion (tree totype, tree expr, int 
flags,
   else
 ret = NULL_TREE;
 
-  timevar_cond_stop (TV_OVERLOAD, subtime);
   return ret;
 }
 
@@ -4692,7 +4691,7 @@ perform_overload_resolution (tree fn,
   tree explicit_targs;
   int template_only;
 
-  bool subtime = timevar_cond_start (TV_OVERLOAD);
+  auto_cond_timevar tv (TV_OVERLOAD);
 
   explicit_targs = NULL_TREE;
   template_only = 0;
@@ -4724,7 +4723,6 @@ perform_overload_resolution (tree fn,
   else
 cand = NULL;
 
-  timevar_cond_stop (TV_OVERLOAD, subtime);
   return cand;
 }
 
@@ -4989,8 +4987,8 @@ build_operator_new_call (tree fnname, vec 
**args,
 
 /* Build a new call to operator().  This may change ARGS.  */
 
-static tree
-build_op_call_1 (tree obj, vec **args, tsubst_flags_t complain)
+tree
+build_op_call (tree obj, vec **args, tsubst_flags_t complain)
 {
   struct z_candidate *candidates = 0, *cand;
   tree fns, convs, first_mem_arg = NULL_TREE;
@@ -4998,6 +4996,8 @@ build_op_call_1 (tree obj, vec **args, 
tsubst_flags_t complain)
   tree result = NULL_TREE;
   void *p;
 
+  auto_cond_timevar tv (TV_OVERLOAD);
+
   obj = mark_lvalue_use (obj);
 
   if (error_operand_p (obj))
@@ -5127,18 +5127,6 @@ build_op_call_1 (tree obj, vec **args, 
tsubst_flags_t complain)
   return result;
 }
 
-/* Wrapper for above.  */
-
-tree
-build_op_call (tree obj, vec **args, tsubst_flags_t complain)
-{
-  tree ret;
-  bool subtime = timevar_cond_start (TV_OVERLOAD);
-  ret = build_op_call_1 (obj, args, complain);
-  timevar_cond_stop (TV_OVERLOAD, subtime);
-  return ret;
-}
-
 /* Called by op_error to prepare format strings suitable for the error
function.  It concatenates a prefix (controlled by MATCH), ERRMSG,
and a suffix (controlled by NTYPES).  */
@@ -5330,10 +5318,10 @@ conditional_conversion (tree e1, tree e2, 
tsubst_flags_t complain)
 /* Implement [expr.cond].  ARG1, ARG2, and ARG3 are the three
arguments to the conditional expression.  */
 
-static tree
-build_conditional_expr_1 (const op_location_t ,
- tree arg1, tree arg2, tree arg3,
-  tsubst_flags_t complain)
+tree
+build_conditional_expr (const op_location_t ,
+   tree arg1, tree arg2, tree arg3,
+   tsubst_flags_t complain)
 {
   tree arg2_type;
   tree arg3_type;
@@ -5345,6 +5333,8 @@ build_conditional_expr_1 (const op_location_t ,
   void *p;
   tree orig_arg2, orig_arg3;
 
+  auto_cond_timevar tv 

[pushed] c++: improve template/crash90.C

2021-10-18 Thread Jason Merrill via Gcc-patches
In r208350 I improved the diagnostic location of the initializer-list
pedwarn in C++98 mode on crash90.C, but didn't adjust the testcase to verify
the location, so reverting that change didn't break regression testing.

gcc/testsuite/ChangeLog:

* g++.dg/template/crash90.C: Check location of pedwarn.
---
 gcc/testsuite/g++.dg/template/crash90.C | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/template/crash90.C 
b/gcc/testsuite/g++.dg/template/crash90.C
index 125ab0a9d46..fee7dc5ec46 100644
--- a/gcc/testsuite/g++.dg/template/crash90.C
+++ b/gcc/testsuite/g++.dg/template/crash90.C
@@ -4,5 +4,6 @@ template < unsigned >
 struct A ;
 template < typename >
 struct B ;
-template < typename T , A < B < T > {} // { dg-error "parse 
error|non-type|initializer" }
+template < typename T , A < B < T > {} // { dg-error "parse error|non-type" }
 // { dg-error "39:expected" "" { target *-*-* } .-1 }
+// { dg-error "37:initializer list" "" { target c++98_only } .-2 }

base-commit: 1257aad1073e1fb8989acdf7ca832fba82d10534
-- 
2.27.0



[PATCH] Apply TLC to vect_supportable_dr_alignment

2021-10-18 Thread Richard Biener via Gcc-patches
This fixes handling of the return value of vect_supportable_dr_alignment
in multiple places.  We should use the enum type and not int for
storage and not auto-convert the enum return value to bool.  It also
commonizes the read/write path in vect_supportable_dr_alignment.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-18  Richard Biener  

* tree-vect-data-refs.c (vect_peeling_hash_insert): Do
not auto-convert dr_alignment_support to bool.
(vect_peeling_supportable): Likewise.
(vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Commonize read/write case.
* tree-vect-stmts.c (vect_get_store_cost): Use
dr_alignment_support, not int, for the vect_supportable_dr_alignment
result.
(vect_get_load_cost): Likewise.
---
 gcc/tree-vect-data-refs.c | 45 ++-
 gcc/tree-vect-stmts.c |  4 ++--
 2 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index a19045f7e46..4c9215874c9 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1471,7 +1471,8 @@ vect_peeling_hash_insert (hash_table 
*peeling_htab,
   _vect_peel_info **new_slot;
   tree vectype = STMT_VINFO_VECTYPE (dr_info->stmt);
   bool supportable_dr_alignment
-= vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true);
+= (vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, true)
+   != dr_unaligned_unsupported);
 
   elem.npeel = npeel;
   slot = peeling_htab->find ();
@@ -1663,7 +1664,7 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
dr_vec_info *dr0_info,
= vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype, false);
   SET_DR_MISALIGNMENT (dr_info, save_misalignment);
 
-  if (!supportable_dr_alignment)
+  if (supportable_dr_alignment == dr_unaligned_unsupported)
return false;
 }
 
@@ -1999,11 +2000,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
 
  /* Check for data refs with unsupportable alignment that
 can be peeled.  */
- if (!supportable_dr_alignment)
- {
-   one_dr_unsupportable = true;
-   unsupportable_dr_info = dr_info;
- }
+ if (supportable_dr_alignment == dr_unaligned_unsupported)
+   {
+ one_dr_unsupportable = true;
+ unsupportable_dr_info = dr_info;
+   }
 
  if (!first_store && DR_IS_WRITE (dr))
{
@@ -2356,7 +2357,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
  supportable_dr_alignment
= vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype,
 false);
-  if (!supportable_dr_alignment)
+ if (supportable_dr_alignment == dr_unaligned_unsupported)
 {
  if (known_alignment_for_access_p (dr_info, vectype)
   || LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).length ()
@@ -6720,9 +6721,6 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
 
   if (DR_IS_READ (dr))
 {
-  bool is_packed = false;
-  tree type = (TREE_TYPE (DR_REF (dr)));
-
   if (optab_handler (vec_realign_load_optab, mode) != CODE_FOR_nothing
  && (!targetm.vectorize.builtin_mask_for_load
  || targetm.vectorize.builtin_mask_for_load ()))
@@ -6744,26 +6742,15 @@ vect_supportable_dr_alignment (vec_info *vinfo, 
dr_vec_info *dr_info,
  else
return dr_explicit_realign_optimized;
}
-  if (!known_alignment_for_access_p (dr_info, vectype))
-   is_packed = not_size_aligned (DR_REF (dr));
-
-  if (targetm.vectorize.support_vector_misalignment
-   (mode, type, dr_misalignment (dr_info, vectype), is_packed))
-   /* Can't software pipeline the loads, but can at least do them.  */
-   return dr_unaligned_supported;
 }
-  else
-{
-  bool is_packed = false;
-  tree type = (TREE_TYPE (DR_REF (dr)));
-
-  if (!known_alignment_for_access_p (dr_info, vectype))
-   is_packed = not_size_aligned (DR_REF (dr));
 
- if (targetm.vectorize.support_vector_misalignment
-  (mode, type, dr_misalignment (dr_info, vectype), is_packed))
-   return dr_unaligned_supported;
-}
+  bool is_packed = false;
+  tree type = (TREE_TYPE (DR_REF (dr)));
+  if (!known_alignment_for_access_p (dr_info, vectype))
+is_packed = not_size_aligned (DR_REF (dr));
+  if (targetm.vectorize.support_vector_misalignment
+   (mode, type, dr_misalignment (dr_info, vectype), is_packed))
+return dr_unaligned_supported;
 
   /* Unsupported.  */
   return dr_unaligned_unsupported;
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 07123a2970f..eaf3f0abef3 100644
--- a/gcc/tree-vect-stmts.c
+++ 

Re: [RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-18 Thread Aldy Hernandez via Gcc-patches




On 10/18/21 3:41 PM, Aldy Hernandez wrote:


I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).


It occurs to me that we could also remove the threading before VRP 
passes, and enable a fully-resolving backward threader after VRP.  I 
haven't played with this scenario, but it should be just as good.  That 
being said, I don't know the intricacies of why we had both pre and post 
VRP threading passes, and if one is ideally better than the other.


Aldy



[RFC] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-18 Thread Aldy Hernandez via Gcc-patches
The jump threading bits seem to have stabilized.  The one or two open
PRs can be fixed by the pending loop threading restrictions to loop
rotation and loop headers.  With all the pieces in play, we can
finally explore altering the pipeline to reduce the jump threading
passes.

I know the jump threaders have become a confusing mess, and I have
added to this sphaghetti, but please bear with me, the goal is to
reduce the threaders to one code base.

As a quick birds-eye view, we have 2 engines:

1. The new backward threader using a path solver based on the ranger.
It can work in two modes-- a quick mode that assumes any SSA outside
the path is VARYING, and a fully resolving mode.

All the *.thread passes are running with this engine, but in quick
mode.

2. The old-old forward threader used by VRP and DOM.  The DOM client
uses its internal structures as well as evrp to resolve conditionals.
Whereas the VRP threader uses the engine in #1 in fully resolving
mode.

The VRP threaders are running with this engine, but using the solver
in #1.  That is, the VRP threaders use the old forward threader for
path discovery, but the new backward threader to solve candidate
paths (hybrid-threader).  This was always a stop-gap while we moved
everyone to #1.

The DOM threader is the last remaining threader with no changes
whatsoever from the previous release.  It uses the forward threader,
with all the evrp + DOM goo.

It doesn't matter if you're all confused, we're about to make things
simpler.

I've been experimenting with reducing the total number of threading
passes, and I'd like to see if there's consensus/stomach for altering
the pipeline.  Note, that the goal is to remove forward threader clients,
not the other way around.  So, we should prefer to remove a VRP threader
instance over a *.thread one immediately before VRP.

After some playing, it looks like if we enable fully-resolving mode in
the *.thread passes immediately preceeding VRP, we can remove the VRP
threading passes altogether, thus removing 2 threading passes (and
forward threading passes at that!).

The numbers look really good.  We get 6874 more jump threading passes
over my boostrap .ii files for a total 3.74% increase.  And we get that
while running marginally faster (0.19% faster, so noise).

The details are:

*** Mainline (with the loop rotation patch):
  ethread:64722
  dom:31246
  thread:73709
  vrp-thread:14357
  total:  184034

*** Removing all the VRP threaders.
 ethread:64722
 thread-full:76493
 dom:33648
 thread:16045
 total:  190908

Notice that not only do we get a lot more threads in thread-full
(resolving mode), but even DOM can get more jump threads.

This doesn't come without risks though.  The main issue is that we would
be removing one engine (forward threader), with another one (backward
threader).  But the good news is that (a) we've been using the new
backward threader for a while now (b) even the VRP threader in
mainline is using the backward threader solver.  So, all that would
really be changing would be the path discovery bits and custom copier
in the forward threader, with the backward threader bit and the
generic copier.

I personally don't think this is a big risk, because we've done all
the hard work already and it's all being stressed in one way or another.

The untested patch below is all that would need to happen, albeit with
copius changes to tests.

I'd like to see where we all stand on this before I start chugging away
at testing and other time consuming tasks.

Note, that all the relevant bits will still be tested in this release,
so I'm not gonna cry one way or another.  But it'd be nice to start
reducing passes, especially if we get a 3.74% increase in jump threads
for no time penalty.

Finally, even if we all agree, I think we should give us a week after the
loop rotation restrictions go in, because threading changes always cause
a party of unexpected things to happen.

Shoot!

diff --git a/gcc/passes.def b/gcc/passes.def
index c11c237f6d2..96fc230e780 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -210,9 +210,8 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_return_slot);
   NEXT_PASS (pass_fre, true /* may_iterate */);
   NEXT_PASS (pass_merge_phi);
-  NEXT_PASS (pass_thread_jumps);
+  NEXT_PASS (pass_thread_jumps_full);
   NEXT_PASS (pass_vrp, true /* warn_array_bounds_p */);
-  NEXT_PASS (pass_vrp_threader);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_dce);
   /* pass_stdarg is always run and at this point we execute
@@ -336,9 +335,9 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
   NEXT_PASS (pass_strlen);
-  NEXT_PASS (pass_thread_jumps);
+  NEXT_PASS (pass_thread_jumps_full);
   NEXT_PASS (pass_vrp, false /* warn_array_bounds_p 

[RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-10-18 Thread Jiufu Guo via Gcc-patches
With reference the discussions in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html

Base on the patches in above discussion, we may draft a patch to fix the
issue.

In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
'{b0,s0-s1} op {b1,0}', we also compute the condition which could assume
both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
< the niter "of untill wrap for iv0 or iv1".

Does this patch make sense?

BR,
Jiufu Guo

gcc/ChangeLog:

PR tree-optimization/100740
* tree-ssa-loop-niter.c (number_of_iterations_cond): Add
assume condition for combining of two IVs

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr100740.c: New test.
---
 gcc/tree-ssa-loop-niter.c | 103 +++---
 .../gcc.c-torture/execute/pr100740.c  |  11 ++
 2 files changed, 99 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 75109407124..f2987a4448d 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1863,29 +1863,102 @@ number_of_iterations_cond (class loop *loop,
 
  provided that either below condition is satisfied:
 
-   a) the test is NE_EXPR;
-   b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   a) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   b) assumptions in below table also need to be satisfied.
+
+   | iv0 | iv1 | assum (iv0step > iv1->step;
+   The second three rows: iv0->step < iv1->step.
 
  This rarely occurs in practice, but it is simple enough to manage.  */
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
 {
+  if (TREE_CODE (iv0->step) != INTEGER_CST
+ || TREE_CODE (iv1->step) != INTEGER_CST)
+   return false;
+  if (!iv0->no_overflow || !iv1->no_overflow)
+   return false;
+
   tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
-  tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
-  iv0->step, iv1->step);
-
-  /* No need to check sign of the new step since below code takes care
-of this well.  */
-  if (code != NE_EXPR
- && (TREE_CODE (step) != INTEGER_CST
- || !iv0->no_overflow || !iv1->no_overflow))
+  tree step
+   = fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, iv1->step);
+
+  if (code != NE_EXPR && tree_int_cst_sign_bit (step))
return false;
 
-  iv0->step = step;
-  if (!POINTER_TYPE_P (type))
-   iv0->no_overflow = false;
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);
 
-  iv1->step = build_int_cst (step_type, 0);
-  iv1->no_overflow = true;
+  /* Cases in rows 2 and 4 of above table.  */
+  if ((positive0 && !positive1) || (!positive0 && positive1))
+   {
+ iv0->step = step;
+ iv1->step = build_int_cst (step_type, 0);
+ return number_of_iterations_cond (loop, type, iv0, code, iv1,
+   niter, only_exit, every_iteration);
+   }
+
+  affine_iv i_0, i_1;
+  class tree_niter_desc num;
+  i_0 = *iv0;
+  i_1 = *iv1;
+  i_0.step = step;
+  i_1.step = build_int_cst (step_type, 0);
+  if (!number_of_iterations_cond (loop, type, _0, code, _1, ,
+ only_exit, every_iteration))
+   return false;
+
+  affine_iv i0, i1;
+  class tree_niter_desc num_wrap;
+  i0 = *iv0;
+  i1 = *iv1;
+
+  /* Reset iv0 and iv1 to calculate the niter which cause overflow.  */
+  if (tree_int_cst_lt (i1.step, i0.step))
+   {
+ if (positive0 && positive1)
+   i0.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i1.step = build_int_cst (step_type, 0);
+ if (code == NE_EXPR)
+   code = LT_EXPR;
+   }
+  else
+   {
+ if (positive0 && positive1)
+   i1.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i0.step = build_int_cst (step_type, 0);
+ gcc_assert (code == NE_EXPR);
+ code = GT_EXPR;
+   }
+
+  /* Calculate the niter which cause overflow.  */
+  if (!number_of_iterations_cond (loop, type, , code, , _wrap,
+ only_exit, every_iteration))
+   return false;
+
+  /* Make assumption there is no overflow. */
+  tree assum
+   = fold_build2 (LE_EXPR, boolean_type_node, num.niter,
+  fold_convert (TREE_TYPE (num.niter), num_wrap.niter));
+  num.assumptions = fold_build2 (TRUTH_AND_EXPR, 

Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, 18 Oct 2021, Michael Matz wrote:

> Hello,
> 
> On Mon, 18 Oct 2021, Richard Sandiford wrote:
> 
> > > (It's a really cute hack that works as a micro optimization, the question 
> > > is, do we really need to go there already, are all other less hacky 
> > > approaches not bringing similar improvements?  The cuter the hacks the 
> > > less often they pay off in the long run of production software :) )
> > 
> > FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> > for block_symbol, I like the approach of concatenating/combining structures
> > based on flags.
> 
> The problem is that if you unset the flag you can't free the (now useless) 
> storage.  What's worse is that you can't even reuse it anymore, because 
> you lost the knowledge that it exists (except if you want to use another 
> flag to note that).

Yes, I suspect in the end I'd use two bits to optimize this case.

> It's of course obvious, but it helps to spell that 
> out if we want to argue about ...
> 
> > The main tree and rtl types have too much baggage and
> 
> ... baggage.  What you actually gain by associating different info pieces 
> by address (e.g. concatenate allocations) is that you don't need to refer 
> to one from the other, that's the space you spare, not anything inherent 
> in the structures (which remain to have the members they would have 
> anyway).  So, you basically trade one pointer (or index), which would 
> possibly be optional, with address association and inflexibility (with the 
> impossibility to manage both pieces individually: you can't free the 
> second piece, and you can't add the second piece post-allocation).  It 
> might be a good trade off sometimes, but in the abstract it's not a good 
> design.
> 
> Regarding trees and space: to make something a tree you need 8 bytes and 
> get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
> see that as much baggage.  We could reduce it further by splitting the 
> arbitrary union and the tree_code+flags parts.  Especially for things 
> referred to from tree_exp it makes sense to try making them trees 
> themself.

So the main issue is that I consider none of the discussed approaches
nice (or well-designed), so I went for the one that appears to be
least intrusive (the concatenating and bit-indication).

That said, I'm probably going to codify the on-the-side 
(optional) hashtable variant as well which is at least well-designed
but might have a disadvantage in the larger constant overhead and
principle difficulties in carrying info across passes and a necessarily
more explicit invalidation API.  Note all prototypes missed the
verification part (that info is not stale and reasonably up-to-date).

The real answer might of course be to invent the "proper" MEM_REF
tree that has fast access to ao_ref-style info as well as being
able to encode the important parts of the access path.

Richard.


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-18 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 18 Oct 2021, Richard Sandiford wrote:

> > (It's a really cute hack that works as a micro optimization, the question 
> > is, do we really need to go there already, are all other less hacky 
> > approaches not bringing similar improvements?  The cuter the hacks the 
> > less often they pay off in the long run of production software :) )
> 
> FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
> for block_symbol, I like the approach of concatenating/combining structures
> based on flags.

The problem is that if you unset the flag you can't free the (now useless) 
storage.  What's worse is that you can't even reuse it anymore, because 
you lost the knowledge that it exists (except if you want to use another 
flag to note that).  It's of course obvious, but it helps to spell that 
out if we want to argue about ...

> The main tree and rtl types have too much baggage and

... baggage.  What you actually gain by associating different info pieces 
by address (e.g. concatenate allocations) is that you don't need to refer 
to one from the other, that's the space you spare, not anything inherent 
in the structures (which remain to have the members they would have 
anyway).  So, you basically trade one pointer (or index), which would 
possibly be optional, with address association and inflexibility (with the 
impossibility to manage both pieces individually: you can't free the 
second piece, and you can't add the second piece post-allocation).  It 
might be a good trade off sometimes, but in the abstract it's not a good 
design.

Regarding trees and space: to make something a tree you need 8 bytes and 
get a number of flags, and an arbitrary 4-byte blob in return.  I don't 
see that as much baggage.  We could reduce it further by splitting the 
arbitrary union and the tree_code+flags parts.  Especially for things 
referred to from tree_exp it makes sense to try making them trees 
themself.

> so I think there are some things that are better represented outside
> of them.
> 
> I suppose cselib VALUE rtxes are also similar, although they're more
> of a special case, since cselib data doesn't survive between passes.


Ciao,
Michael.


[PATCH] Reduce the number of aligned_access_p calls

2021-10-18 Thread Richard Biener via Gcc-patches
This uses the computed alignment scheme in vectorizable_store
much like vectorizable_load does instead of re-querying
it via aligned_access_p.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-18  Richard Biener  

* tree-vect-stmts.c (vectorizable_store): Use the
computed alignment scheme instead of querying
aligned_access_p.
---
 gcc/tree-vect-stmts.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 0e5e553ffe8..07123a2970f 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -8213,8 +8213,11 @@ vectorizable_store (vec_info *vinfo,
vec_oprnd = result_chain[i];
 
  align = known_alignment (DR_TARGET_ALIGNMENT (first_dr_info));
- if (aligned_access_p (first_dr_info, vectype))
-   misalign = 0;
+ if (alignment_support_scheme == dr_aligned)
+   {
+ gcc_assert (aligned_access_p (first_dr_info, vectype));
+ misalign = 0;
+   }
  else if (dr_misalignment (first_dr_info, vectype)
   == DR_MISALIGNMENT_UNKNOWN)
{
@@ -8299,8 +8302,8 @@ vectorizable_store (vec_info *vinfo,
  dataref_offset
  ? dataref_offset
  : build_int_cst (ref_type, 0));
- if (aligned_access_p (first_dr_info, vectype))
-   ;
+ if (alignment_support_scheme == dr_aligned)
+   gcc_assert (aligned_access_p (first_dr_info, vectype));
  else
TREE_TYPE (data_ref)
  = build_aligned_type (TREE_TYPE (data_ref),
-- 
2.31.1


[PATCH] Remove redundant alignment scheme recomputation

2021-10-18 Thread Richard Biener via Gcc-patches
The following avoids the recomputation of the alignment scheme
which is already fully determined by get_load_store_type.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-18  Richard Biener  

* tree-vect-stmts.c (vectorizable_store): Do not recompute
alignment scheme already determined by get_load_store_type.
---
 gcc/tree-vect-stmts.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f5e1941f8ad..0e5e553ffe8 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7904,14 +7904,6 @@ vectorizable_store (vec_info *vinfo,
   auto_vec dr_chain (group_size);
   oprnds.create (group_size);
 
-  /* Gather-scatter accesses perform only component accesses, alignment
- is irrelevant for them.  */
-  if (memory_access_type == VMAT_GATHER_SCATTER)
-alignment_support_scheme = dr_unaligned_supported;
-  else
-alignment_support_scheme
-  = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype, false);
-
   gcc_assert (alignment_support_scheme);
   vec_loop_masks *loop_masks
 = (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
-- 
2.31.1


[committed] openmp: Fix handling of numa_domains(1)

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 15, 2021 at 12:26:34PM -0700, sunil.k.pandey wrote:
> 4764049dd620affcd3e2658dc7f03a6616370a29 is the first bad commit
> commit 4764049dd620affcd3e2658dc7f03a6616370a29
> Author: Jakub Jelinek 
> Date:   Fri Oct 15 16:25:25 2021 +0200
> 
> openmp: Fix up handling of OMP_PLACES=threads(1)
> 
> caused
> 
> FAIL: libgomp.c/places-10.c execution test

Reproduced on gcc112 in CompileFarm (my ws isn't NUMA).
If numa-domains is used with num-places count, sometimes the function
could create more places than requested and crash.  This depended on the
content of /sys/devices/system/node/online file, e.g. if the file
contains
0-1,16-17
and all NUMA nodes contain at least one CPU in the cpuset of the program,
then numa_domains(2) or numa_domains(4) (or 5+) work fine while
numa_domains(1) or numa_domains(3) misbehave.  I.e. the function was able
to stop after reaching limit on the , separators (or trivially at the end),
but not within in the ranges.

Fixed thusly, tested on powerpc64le-linux, committed to trunk.

2021-10-18  Jakub Jelinek  

* config/linux/affinity.c (gomp_affinity_init_numa_domains): Add
&& gomp_places_list_len < count after nfirst <= nlast loop condition.

--- libgomp/config/linux/affinity.c.jj  2021-10-15 16:28:30.374460522 +0200
+++ libgomp/config/linux/affinity.c 2021-10-18 14:44:51.559667127 +0200
@@ -401,7 +401,7 @@ gomp_affinity_init_numa_domains (unsigne
break;
  q = end;
}
-  for (; nfirst <= nlast; nfirst++)
+  for (; nfirst <= nlast && gomp_places_list_len < count; nfirst++)
{
  sprintf (name + prefix_len, "node%lu/cpulist", nfirst);
  f = fopen (name, "r");


Jakub



[COMMITTED] Clone correct pass in class pass_thread_jumps_full.

2021-10-18 Thread Aldy Hernandez via Gcc-patches
The pass_thread_jumps_full pass was cloning the wrong pass.

Committed as obvious.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class pass_thread_jumps_full):
Clone corresponding pass.
---
 gcc/tree-ssa-threadbackward.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 62f936a9651..8770be88706 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -1059,7 +1059,7 @@ public:
   {}
   opt_pass * clone (void) override
   {
-return new pass_thread_jumps (m_ctxt);
+return new pass_thread_jumps_full (m_ctxt);
   }
   bool gate (function *) override
   {
-- 
2.31.1



[PATCH] 387-12.c: Require ia32 target instead of -m32

2021-10-18 Thread H.J. Lu via Gcc-patches
On x86-64,

$ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"

can be used to test both 64-bit and 32-bit targets.  Require ia32 target
instead of explicit -m32 for 32-bit only test.

* gcc.target/i386/387-12.c (dg-do compile): Require ia32.
(dg-options): Remove -m32.
---
 gcc/testsuite/gcc.target/i386/387-12.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/387-12.c 
b/gcc/testsuite/gcc.target/i386/387-12.c
index 7fe50a21981..ba86536c67a 100644
--- a/gcc/testsuite/gcc.target/i386/387-12.c
+++ b/gcc/testsuite/gcc.target/i386/387-12.c
@@ -1,6 +1,6 @@
 /* PR target/26915 */
-/* { dg-do compile } */
-/* { dg-options "-O -m32 -mfpmath=387 -mfancy-math-387" } */
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O -mfpmath=387 -mfancy-math-387" } */
 
 double testm0(void)
 {
-- 
2.32.0



Re: [RFC PATCH 1/8] RISC-V: Minimal support of bitmanip extension

2021-10-18 Thread Kito Cheng
> > That's a good point, but ISA_SPEC_CLASS_FROZEN_2021 is hard to
> > reference to which spec, so I would prefer to add a -misa-spec=2021 to
> > align platform/profile spec, and then ISA_SPEC_CLASS_2021, and before
> > RISC-V platform/profile spec has released, let keep
> > ISA_SPEC_CLASS_NONE :p
>
> For sure we cannot reference a spec that is not frozen yet (i.e.
> platform/profile).
> ISA_SPEC_CLASS_FROZEN_2021 was a proposal for all groups of ISA extensions
> that have been frozen in 2021 (zb*, zk*, etc.) and will eventually be 
> ratified.
> But yes, keeping NONE until the specifications are ratified and change
> the specification
> class then is also possible.

I expect those specs can be ratified at the end of this year, and then
we still have a few months
to update before GCC 12 release.


[PATCH] c++: Don't reject calls through PMF during constant evaluation [PR102786]

2021-10-18 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase incorrectly rejects the c initializer,
while in the s.*a case cxx_eval_* sees .__pfn reads etc.,
in the s.*::foo case get_member_function_from_ptrfunc creates
expressions which use INTEGER_CSTs with type of pointer to METHOD_TYPE.
And cxx_eval_constant_expression rejects any INTEGER_CSTs with pointer
type if they aren't 0.
Either we'd need to make sure we defer such folding till cp_fold but the
function and pfn_from_ptrmemfunc is used from lots of places, or
the following patch just tries to reject only non-zero INTEGER_CSTs
with pointer types if they don't point to METHOD_TYPE in the hope that
all such INTEGER_CSTs with POINTER_TYPE to METHOD_TYPE are result of
folding valid pointer-to-member function expressions.
I don't immediately see how one could create such INTEGER_CSTs otherwise,
cast of integers to PMF is rejected and would have the PMF RECORD_TYPE
anyway, etc.

Regtested on x86_64-linux
(with GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++)
ok for trunk if it passes full bootstrap/regtest?

2021-10-18  Jakub Jelinek  

PR c++/102786
* constexpr.c (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

* g++.dg/cpp2a/constexpr-virtual19.C: New test.

--- gcc/cp/constexpr.c.jj   2021-10-15 11:59:15.917687093 +0200
+++ gcc/cp/constexpr.c  2021-10-18 13:26:49.458610657 +0200
@@ -6191,6 +6191,7 @@ cxx_eval_constant_expression (const cons
 
   if (TREE_CODE (t) == INTEGER_CST
  && TYPE_PTR_P (TREE_TYPE (t))
+ && TREE_CODE (TREE_TYPE (TREE_TYPE (t))) != METHOD_TYPE
  && !integer_zerop (t))
{
  if (!ctx->quiet)
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual19.C.jj 2021-10-18 
13:35:00.229693908 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual19.C2021-10-18 
12:31:05.265747723 +0200
@@ -0,0 +1,11 @@
+// PR c++/102786
+// { dg-do compile { target c++20 } }
+
+struct S {
+  virtual constexpr int foo () const { return 42; }
+};
+
+constexpr S s;
+constexpr auto a = ::foo;
+constexpr auto b = (s.*a) ();
+constexpr auto c = (s.*::foo) ();

Jakub



Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> On Mon, 18 Oct 2021 at 17:10, Richard Biener  wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > On Mon, 18 Oct 2021 at 16:18, Richard Biener  wrote:
> > > >
> > > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > > >
> > > > > Hi Richard,
> > > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > > to match.pd:
> > > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > > >
> > > > > This works to remove call to erfc for the following test:
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erf (x);
> > > > >   double t2 = __builtin_erfc (x);
> > > > >   return g(t1, t2);
> > > > > }
> > > > >
> > > > > with .optimized dump shows:
> > > > >   t1_2 = __builtin_erf (x_1(D));
> > > > >   t2_3 = 1.0e+0 - t1_2;
> > > > >
> > > > > However, for the following test:
> > > > > double f(double x)
> > > > > {
> > > > >   double g(double, double);
> > > > >
> > > > >   double t1 = __builtin_erfc (x);
> > > > >   return t1;
> > > > > }
> > > > >
> > > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > > erf(x) to erfc(x) again
> > > > > post canonicalization.
> > > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > > but then it tries to
> > > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > > with erfc(x) transformed to
> > > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > > Could you suggest how to proceed ?
> > > >
> > > > I applied your patch manually and it does the intended
> > > > simplifications so I wonder what I am missing?
> > > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > > no erf(x) in the source ?
> >
> > I do think it's reasonable to expect erfc to be available when erf
> > is and vice versa but note both are C99 specified functions (either
> > requires -lm).
> OK, thanks. Would it be OK to commit the patch after bootstrap+test ?

Yes, but I'm confused because you say the patch doesn't work for you?

Btw, please add the testcase from the PR and also a testcase that shows
the canonicalization is undone.  Maybe you can also double-check that
we handle x + erfc (x) because I see we associate that as
(x + 1) - erf (x) which is then not recognized back to erfc.

The less surprising (as to preserve the function called in the source)
variant for the PR would be to teach CSE to lookup erf(x) when
visiting erfc(x) and when found synthesize 1 - erf(x).

That said, a mathematician should chime in on how important it is
to preserve erfc vs. erf (precision or even speed).

Thanks,
Richard.

> Thanks,
> Prathamesh
> 
> >
> > Richard.
> >
> > > So for the following test:
> > > double f(double x)
> > > {
> > >   t1 = __builtin_erfc(x)
> > >   return t1;
> > > }
> > >
> > > .optimized dump shows:
> > > double f (double x)
> > > {
> > >   double t1;
> > >   double _2;
> > >
> > >[local count: 1073741824]:
> > >   _2 = __builtin_erf (x_1(D));
> > >   t1_3 = 1.0e+0 - _2;
> > >   return t1_3;
> > > }
> > >
> > > while before patch, it has:
> > >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> > >   return t1_4;
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > >
> > > > --
> > > > Richard Biener 
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 18 Oct 2021 at 17:10, Richard Biener  wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > On Mon, 18 Oct 2021 at 16:18, Richard Biener  wrote:
> > >
> > > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> > >
> > > > Hi Richard,
> > > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > > to match.pd:
> > > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > > >
> > > > This works to remove call to erfc for the following test:
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erf (x);
> > > >   double t2 = __builtin_erfc (x);
> > > >   return g(t1, t2);
> > > > }
> > > >
> > > > with .optimized dump shows:
> > > >   t1_2 = __builtin_erf (x_1(D));
> > > >   t2_3 = 1.0e+0 - t1_2;
> > > >
> > > > However, for the following test:
> > > > double f(double x)
> > > > {
> > > >   double g(double, double);
> > > >
> > > >   double t1 = __builtin_erfc (x);
> > > >   return t1;
> > > > }
> > > >
> > > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > > erf(x) to erfc(x) again
> > > > post canonicalization.
> > > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > > but then it tries to
> > > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > > with erfc(x) transformed to
> > > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > > Could you suggest how to proceed ?
> > >
> > > I applied your patch manually and it does the intended
> > > simplifications so I wonder what I am missing?
> > Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> > no erf(x) in the source ?
>
> I do think it's reasonable to expect erfc to be available when erf
> is and vice versa but note both are C99 specified functions (either
> requires -lm).
OK, thanks. Would it be OK to commit the patch after bootstrap+test ?

Thanks,
Prathamesh

>
> Richard.
>
> > So for the following test:
> > double f(double x)
> > {
> >   t1 = __builtin_erfc(x)
> >   return t1;
> > }
> >
> > .optimized dump shows:
> > double f (double x)
> > {
> >   double t1;
> >   double _2;
> >
> >[local count: 1073741824]:
> >   _2 = __builtin_erf (x_1(D));
> >   t1_3 = 1.0e+0 - _2;
> >   return t1_3;
> > }
> >
> > while before patch, it has:
> >   t1_4 = __builtin_erfc (x_2(D)); [tail call]
> >   return t1_4;
> >
> > Thanks,
> > Prathamesh
> >
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> On Mon, 18 Oct 2021 at 16:18, Richard Biener  wrote:
> >
> > On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
> >
> > > Hi Richard,
> > > As suggested in PR, I have attached WIP patch that adds two patterns
> > > to match.pd:
> > > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> > >
> > > This works to remove call to erfc for the following test:
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erf (x);
> > >   double t2 = __builtin_erfc (x);
> > >   return g(t1, t2);
> > > }
> > >
> > > with .optimized dump shows:
> > >   t1_2 = __builtin_erf (x_1(D));
> > >   t2_3 = 1.0e+0 - t1_2;
> > >
> > > However, for the following test:
> > > double f(double x)
> > > {
> > >   double g(double, double);
> > >
> > >   double t1 = __builtin_erfc (x);
> > >   return t1;
> > > }
> > >
> > > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > > erf(x) to erfc(x) again
> > > post canonicalization.
> > > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > > but then it tries to
> > > resimplify erfc(x), which fails post canonicalization. So we end up
> > > with erfc(x) transformed to
> > > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > > Could you suggest how to proceed ?
> >
> > I applied your patch manually and it does the intended
> > simplifications so I wonder what I am missing?
> Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
> no erf(x) in the source ?

I do think it's reasonable to expect erfc to be available when erf
is and vice versa but note both are C99 specified functions (either
requires -lm).

Richard.

> So for the following test:
> double f(double x)
> {
>   t1 = __builtin_erfc(x)
>   return t1;
> }
> 
> .optimized dump shows:
> double f (double x)
> {
>   double t1;
>   double _2;
> 
>[local count: 1073741824]:
>   _2 = __builtin_erf (x_1(D));
>   t1_3 = 1.0e+0 - _2;
>   return t1_3;
> }
> 
> while before patch, it has:
>   t1_4 = __builtin_erfc (x_2(D)); [tail call]
>   return t1_4;
> 
> Thanks,
> Prathamesh
> 
> >
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-10-18 Thread Robin Dapp via Gcc-patches

Hi Richard,

after giving it a second thought, and seeing that most of the changes to 
existing code are not strictly necessary anymore, I figured it could be 
easier not changing the current control flow too much like in the 
attached patch.


The changes remaining are to "outsource" the maybe_expand_insn part and 
making the emit_conditional_move with full comparison and rev_comparsion 
externally available.


I suppose straightening of the arguably somewhat baroque parts, we can 
defer to a separate patch.


On s390 this works nicely but I haven't yet done a bootstrap on other archs.

Regards
 Robin
commit eb50384ee0cdeeefa61ae89bdbb2875500b7ce60
Author: Robin Dapp 
Date:   Wed Nov 27 13:53:40 2019 +0100

ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

Currently we only ever call emit_conditional_move with the comparison
(as well as its comparands) we got from the jump.  Thus, backends are
going to emit a CC comparison for every conditional move that is being
generated instead of re-using the existing CC.
This, combined with emitting temporaries for each conditional move,
causes sky-high costs for conditional moves.

This patch allows to re-use a CC so the costing situation is improved a
bit.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 6ae883cbdd4..f7765e60548 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -772,7 +772,7 @@ static int noce_try_addcc (struct noce_if_info *);
 static int noce_try_store_flag_constants (struct noce_if_info *);
 static int noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
-			rtx, rtx, rtx);
+			rtx, rtx, rtx, rtx = NULL, rtx = NULL);
 static int noce_try_cmove (struct noce_if_info *);
 static int noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
@@ -1711,7 +1711,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 static rtx
 noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
-		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue)
+		 rtx cmp_a, rtx cmp_b, rtx vfalse, rtx vtrue, rtx cc_cmp,
+		 rtx rev_cc_cmp)
 {
   rtx target ATTRIBUTE_UNUSED;
   int unsignedp ATTRIBUTE_UNUSED;
@@ -1743,23 +1744,30 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
   end_sequence ();
 }
 
-  /* Don't even try if the comparison operands are weird
- except that the target supports cbranchcc4.  */
-  if (! general_operand (cmp_a, GET_MODE (cmp_a))
-  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
-{
-  if (!have_cbranchcc4
-	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
-	  || cmp_b != const0_rtx)
-	return NULL_RTX;
-}
-
   unsignedp = (code == LTU || code == GEU
 	   || code == LEU || code == GTU);
 
-  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
-  vtrue, vfalse, GET_MODE (x),
-  unsignedp);
+  if (cc_cmp != NULL_RTX && rev_cc_cmp != NULL_RTX)
+target = emit_conditional_move (x, cc_cmp, rev_cc_cmp,
+vtrue, vfalse, GET_MODE (x));
+  else
+{
+  /* Don't even try if the comparison operands are weird
+	 except that the target supports cbranchcc4.  */
+  if (! general_operand (cmp_a, GET_MODE (cmp_a))
+	  || ! general_operand (cmp_b, GET_MODE (cmp_b)))
+	{
+	  if (!have_cbranchcc4
+	  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
+	  || cmp_b != const0_rtx)
+	return NULL_RTX;
+	}
+
+  target = emit_conditional_move (x, code, cmp_a, cmp_b, VOIDmode,
+  vtrue, vfalse, GET_MODE (x),
+  unsignedp);
+}
+
   if (target)
 return target;
 
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 019bbb62882..25eecf29ed8 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -52,6 +52,9 @@ static void prepare_float_lib_cmp (rtx, rtx, enum rtx_code, rtx *,
 static rtx expand_unop_direct (machine_mode, optab, rtx, rtx, int);
 static void emit_libcall_block_1 (rtx_insn *, rtx, rtx, rtx, bool);
 
+static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
+rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
+
 /* Debug facility for use in GDB.  */
 void debug_optab_libfuncs (void);
 
@@ -4875,6 +4878,7 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
   /* get_condition will prefer to generate LT and GT even if the old
  comparison was against zero, so undo that canonicalization here since
  comparisons against zero are cheaper.  */
+
   if (code == LT && op1 == const1_rtx)
 code = LE, op1 = const0_rtx;
   else if (code == GT && op1 == constm1_rtx)
@@ -4925,18 +4929,10 @@ emit_conditional_move (rtx target, enum rtx_code code, rtx op0, rtx op1,
 			OPTAB_WIDEN, , );
 	  if (comparison)
 	{
-	  class expand_operand ops[4];
-
-	  create_output_operand ([0], target, mode);
-	  create_fixed_operand ([1], comparison);
-	  

Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 18 Oct 2021 at 16:18, Richard Biener  wrote:
>
> On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:
>
> > Hi Richard,
> > As suggested in PR, I have attached WIP patch that adds two patterns
> > to match.pd:
> > erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> > 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> >
> > This works to remove call to erfc for the following test:
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erf (x);
> >   double t2 = __builtin_erfc (x);
> >   return g(t1, t2);
> > }
> >
> > with .optimized dump shows:
> >   t1_2 = __builtin_erf (x_1(D));
> >   t2_3 = 1.0e+0 - t1_2;
> >
> > However, for the following test:
> > double f(double x)
> > {
> >   double g(double, double);
> >
> >   double t1 = __builtin_erfc (x);
> >   return t1;
> > }
> >
> > It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> > erf(x) to erfc(x) again
> > post canonicalization.
> > -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> > but then it tries to
> > resimplify erfc(x), which fails post canonicalization. So we end up
> > with erfc(x) transformed to
> > 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> > Could you suggest how to proceed ?
>
> I applied your patch manually and it does the intended
> simplifications so I wonder what I am missing?
Would it be OK to always fold erfc(x) -> 1 - erf(x) even when there's
no erf(x) in the source ?
So for the following test:
double f(double x)
{
  t1 = __builtin_erfc(x)
  return t1;
}

.optimized dump shows:
double f (double x)
{
  double t1;
  double _2;

   [local count: 1073741824]:
  _2 = __builtin_erf (x_1(D));
  t1_3 = 1.0e+0 - _2;
  return t1_3;
}

while before patch, it has:
  t1_4 = __builtin_erfc (x_2(D)); [tail call]
  return t1_4;

Thanks,
Prathamesh

>
> Richard.
>
> > Thanks,
> > Prathamesh
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH][i386] target: support spaces in target attribute.

2021-10-18 Thread Martin Liška

On 10/11/21 13:17, Martin Liška wrote:

On 10/4/21 23:02, Andrew Pinski wrote:

It might be useful to skip tabs for the same reason as spaces really.


Sure, be my guest.

Martin


May I please ping this i386-specific patch?

Thanks,
Martin


[aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-10-18 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
The attached patch emits a more verbose diagnostic for target attribute that
is an architecture extension needing a leading '+'.

For the following test,
void calculate(void) __attribute__ ((__target__ ("sve")));

With patch, the compiler now emits:
102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
1 | void calculate(void) __attribute__ ((__target__ ("sve")));
  | ^~~~

instead of:
102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
1 | void calculate(void) __attribute__ ((__target__ ("sve")));
  | ^~~~

(This isn't specific to sve though).
OK to commit after bootstrap+test ?

Thanks,
Prathamesh
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a9a1800af53..975f7faf968 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
   num_attrs++;
   if (!aarch64_process_one_target_attr (token))
{
- error ("pragma or attribute % is not valid", token);
+ /* Check if token is possibly an arch extension without
+leading '+'.  */
+ char *str = (char *) xmalloc (strlen (token) + 2);
+ str[0] = '+';
+ strcpy(str + 1, token);
+ if (aarch64_handle_attr_isa_flags (str))
+   error("arch extension %<%s%> should be prepended with %<+%>", 
token);
+ else
+   error ("pragma or attribute % is not valid", 
token);
+ free (str);
  return false;
}
 


[PATCH] gcov: return proper exit code when error happens

2021-10-18 Thread Martin Liška

Hello.

The patch records error codes when something serious happens during
emission of GCOV reports.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

I'm going to push the change.

Thanks,
Martin

PR gcov-profile/102746
PR gcov-profile/102747

gcc/ChangeLog:

* gcov.c (main): Return return_code.
(output_gcov_file): Mark return_code when error happens.
(generate_results): Likewise.
(read_graph_file): Likewise.
(read_count_file): Likewise.
---
 gcc/gcov.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/gcov.c b/gcc/gcov.c
index 3672ae7a6f8..34f53ac2d78 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -638,6 +638,9 @@ static int flag_preserve_paths = 0;
 
 static int flag_counts = 0;
 
+/* Return code of the tool invocation.  */

+static int return_code = 0;
+
 /* Forward declarations.  */
 static int process_args (int, char **);
 static void print_usage (int) ATTRIBUTE_NORETURN;
@@ -907,7 +910,7 @@ main (int argc, char **argv)
   if (!flag_use_stdout)
 executed_summary (total_lines, total_executed);
 
-  return 0;

+  return return_code;
 }
 
 /* Print a usage message and exit.  If ERROR_P is nonzero, this is an error,
@@ -1467,12 +1470,18 @@ output_gcov_file (const char *file_name, source_info 
*src)
  fnotice (stdout, "Creating '%s'\n", gcov_file_name);
  output_lines (gcov_file, src);
  if (ferror (gcov_file))
-   fnotice (stderr, "Error writing output file '%s'\n",
-gcov_file_name);
+   {
+ fnotice (stderr, "Error writing output file '%s'\n",
+  gcov_file_name);
+ return_code = 6;
+   }
  fclose (gcov_file);
}
   else
-   fnotice (stderr, "Could not open output file '%s'\n", gcov_file_name);
+   {
+ fnotice (stderr, "Could not open output file '%s'\n", gcov_file_name);
+ return_code = 6;
+   }
 }
   else
 {
@@ -1594,6 +1603,7 @@ generate_results (const char *file_name)
{
  fnotice (stderr, "Cannot open JSON output file %s\n",
   gcov_intermediate_filename.c_str ());
+ return_code = 6;
  return;
}
 
@@ -1602,6 +1612,7 @@ generate_results (const char *file_name)

{
  fnotice (stderr, "Error writing JSON output file %s\n",
   gcov_intermediate_filename.c_str ());
+ return_code = 6;
  return;
}
}
@@ -1790,12 +1801,14 @@ read_graph_file (void)
   if (!gcov_open (bbg_file_name, 1))
 {
   fnotice (stderr, "%s:cannot open notes file\n", bbg_file_name);
+  return_code = 1;
   return;
 }
   bbg_file_time = gcov_time ();
   if (!gcov_magic (gcov_read_unsigned (), GCOV_NOTE_MAGIC))
 {
   fnotice (stderr, "%s:not a gcov notes file\n", bbg_file_name);
+  return_code = 2;
   gcov_close ();
   return;
 }
@@ -1810,6 +1823,7 @@ read_graph_file (void)
 
   fnotice (stderr, "%s:version '%.4s', prefer '%.4s'\n",

   bbg_file_name, v, e);
+  return_code = 3;
 }
   bbg_stamp = gcov_read_unsigned ();
   /* Read checksum.  */
@@ -1977,6 +1991,7 @@ read_graph_file (void)
{
corrupt:;
  fnotice (stderr, "%s:corrupted\n", bbg_file_name);
+ return_code = 4;
  break;
}
 }
@@ -2009,6 +2024,7 @@ read_count_file (void)
   if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
 {
   fnotice (stderr, "%s:not a gcov data file\n", da_file_name);
+  return_code = 2;
 cleanup:;
   gcov_close ();
   return 1;
@@ -2023,11 +2039,13 @@ read_count_file (void)
 
   fnotice (stderr, "%s:version '%.4s', prefer version '%.4s'\n",

   da_file_name, v, e);
+  return_code = 3;
 }
   tag = gcov_read_unsigned ();
   if (tag != bbg_stamp)
 {
   fnotice (stderr, "%s:stamp mismatch with notes file\n", da_file_name);
+  return_code = 5;
   goto cleanup;
 }
 
@@ -2088,6 +2106,7 @@ read_count_file (void)

   ? N_("%s:overflowed\n")
   : N_("%s:corrupted\n"),
   da_file_name);
+ return_code = 4;
  goto cleanup;
}
 }
--
2.33.0



[PATCH] tree-optimization/102788 - avoid spurious bool pattern fails

2021-10-18 Thread Richard Biener via Gcc-patches
Bool pattern recog is required for correctness since vectorized
compares otherwise produce -1 for true so any context where bool
is used as value and not as condition or mask needs to be replaced
with CMP ? 1 : 0.  When we fail to find a vector type for the
result of such use we may not simply elide such transform since
a new bool result can emerge when for example the cast_forwprop
pattern is applied.  So the following avoids failing of the
bool pattern recog process and instead not assign a vector type
for the stmt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I also built
SPEC CPU 2017 with -Ofast -flto -march=znver2, pushed to trunk sofar.

Richard.

2021-10-18  Richard Biener  

PR tree-optimization/102788
* tree-vect-patterns.c (vect_init_pattern_stmt): Allow
a NULL vectype.
(vect_pattern_recog_1): Likewise.
(vect_recog_bool_pattern): Continue matching the pattern
even if we do not have a vector type for a conversion
result.

* g++.dg/vect/pr102788.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr102788.cc | 32 +++
 gcc/tree-vect-patterns.c  |  8 +++
 2 files changed, 35 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr102788.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr102788.cc 
b/gcc/testsuite/g++.dg/vect/pr102788.cc
new file mode 100644
index 000..fa9c366fe56
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr102788.cc
@@ -0,0 +1,32 @@
+// { dg-do run }
+// { dg-additional-options "-O3" }
+
+unsigned long long int var_4 = 235;
+unsigned long long int var_5 = 74;
+signed char var_12 = -99;
+unsigned long long int var_349;
+unsigned char var_645;
+void test();
+
+const unsigned long long (const unsigned long long ,
+ const unsigned long long )
+{
+  return b < a ? b : a;
+}
+
+void __attribute__((noipa)) test()
+{
+  for (short c = var_12; c; c += 5)
+;
+  for (int e = 0; e < 12; e += 1) {
+  var_349 = var_4 ? 235 : 74;
+  var_645 = min((unsigned long long)true, var_5 ? var_12 : var_4);
+  }
+}
+
+int main()
+{
+  test();
+  if (var_645 != 1)
+__builtin_abort();
+}
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index e6c5bcdad36..854cbcff390 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -119,8 +119,9 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
-  gcc_assert (VECTOR_BOOLEAN_TYPE_P (vectype)
- == vect_use_mask_type_p (orig_stmt_info));
+  gcc_assert (!vectype
+ || (VECTOR_BOOLEAN_TYPE_P (vectype)
+ == vect_use_mask_type_p (orig_stmt_info)));
   STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
   pattern_stmt_info->mask_precision = orig_stmt_info->mask_precision;
 }
@@ -4283,8 +4284,6 @@ vect_recog_bool_pattern (vec_info *vinfo,
  || VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (lhs)))
return NULL;
   vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
-  if (vectype == NULL_TREE)
-   return NULL;
 
   if (check_bool_pattern (var, vinfo, bool_stmts))
{
@@ -5696,7 +5695,6 @@ vect_pattern_recog_1 (vec_info *vinfo,
 }
 
   loop_vinfo = dyn_cast  (vinfo);
-  gcc_assert (pattern_vectype);
  
   /* Found a vectorizable pattern.  */
   if (dump_enabled_p ())
-- 
2.31.1


Re: [match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, 18 Oct 2021, Prathamesh Kulkarni wrote:

> Hi Richard,
> As suggested in PR, I have attached WIP patch that adds two patterns
> to match.pd:
> erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
> 1 - erf(x) --> erfc(x) if !canonicalize_math_p().
> 
> This works to remove call to erfc for the following test:
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erf (x);
>   double t2 = __builtin_erfc (x);
>   return g(t1, t2);
> }
> 
> with .optimized dump shows:
>   t1_2 = __builtin_erf (x_1(D));
>   t2_3 = 1.0e+0 - t1_2;
> 
> However, for the following test:
> double f(double x)
> {
>   double g(double, double);
> 
>   double t1 = __builtin_erfc (x);
>   return t1;
> }
> 
> It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
> erf(x) to erfc(x) again
> post canonicalization.
> -fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
> but then it tries to
> resimplify erfc(x), which fails post canonicalization. So we end up
> with erfc(x) transformed to
> 1 - erf(x) in .optimized dump, which I suppose isn't ideal.
> Could you suggest how to proceed ?

I applied your patch manually and it does the intended
simplifications so I wonder what I am missing?

Richard.

> Thanks,
> Prathamesh
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-18 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 18 Oct 2021 at 14:34, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> > index 4604365fbef..cedc5b7c549 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> > @@ -56,7 +56,11 @@ TEST_ALL (DEF_LOOP)
> > we're relying on combine to merge a SEL and an arithmetic operation,
> > and the SEL doesn't allow the "false" value to be zero when the "true"
> > value is a register.  */
> > -/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+, z[0-9]+\n} 14 } 
> > } */
> > +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+, z[0-9]+\n} 7 } } 
> > */
> > +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.b, p[0-9]/z, 
> > z[0-9]+\.b} 1 } } */
> > +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.h, p[0-9]/z, 
> > z[0-9]+\.h} 2 } } */
> > +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.s, p[0-9]/z, 
> > z[0-9]+\.s} 2 } } */
> > +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.d, p[0-9]/z, 
> > z[0-9]+\.d} 2 } } */
>
> Very minor, but: p[0-7] is more accurate than p[0-9].
Oops sorry, typo.
>
> OK with that change, thanks.
Thanks, committed as 20dcda98ed376cb61c74b2c71656f99c671ec9ce.

Thanks,
Prathamesh
>
> Richard
>
> >
> >  /* { dg-final { scan-assembler-not {\tmov\tz[^\n]*z} } } */
> >  /* { dg-final { scan-assembler-not {\tsel\t} } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> > new file mode 100644
> > index 000..2f92224cecb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mcpu=generic+sve" } */
> > +
> > +typedef unsigned char uint8_t;
> > +
> > +static inline uint8_t
> > +x264_clip_uint8(uint8_t x)
> > +{
> > +  uint8_t t = -x;
> > +  uint8_t t1 = x & ~63;
> > +  return (t1 != 0) ? t : x;
> > +}
> > +
> > +void
> > +mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
> > +{
> > +  for (int x = 0; x < n*16; x++)
> > +dst[x] = x264_clip_uint8(src[x]);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not {\tsel} } } */


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-18 Thread Aldy Hernandez via Gcc-patches



On 10/17/21 3:32 AM, Jeff Law wrote:


I think once we reach a consensus on the tests, this will be good to go.



diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
index 90ea1c45524..66318fc08dc 100644
--- a/gcc/testsuite/gcc.dg/loop-8.c
+++ b/gcc/testsuite/gcc.dg/loop-8.c
@@ -24,5 +24,9 @@ f (int *a, int *b)
  
  /* Load of 42 is moved out of the loop, introducing a new pseudo register.  */

  /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
-/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" 
"loop2_invariant" } } */
+
+
+/* ?? The expected behavior below depends on threading the 2->3->5 path
+   in DOM2, but this is invalid since it would rotate the loop.  */
+/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" 
"loop2_invariant" { xfail *-*-* } } } */
So maybe the thing to do here since I guess we want to keep the test 
would be to manually rotate the loop in the source.  In theory that 
should restore the test to validating what we want it to validate 
(specifically the behavior of LICM).


I have rotated the loop.  This fixes the xfail I introduced, but the 
"Decided"  test fails.  I've removed that instead.  Let me know if this 
is OK.





diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
index 0246ebf3c63..f83cefd8d89 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
@@ -22,6 +22,8 @@
  
 All the cases are picked up by VRP1 as jump threads.  */
  
-/* There used to be 6 jump threads found by thread1, but they all

-   depended on threading through distinct loops in ethread.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
+/* This test should be obsoleted.  We used to catch 2 total threads in
+   vrp-thread1, but after adding loop rotating restrictions, we get
+   none.  Interestingly, on x86-64 we now get 1 in DOM2, 5 in DOM3,
+   and 1 in vrp-thread2.  */
+/* { dg-final { scan-tree-dump-not "Threaded" "vrp-thread1" } } */
I think that testing nothing was threaded in vrp1 is probably best. 
Though I wouldn't lose any sleep if this just went away.


I've opted to remove these tests, since I'm testing the exact behavior 
we're disallowing in the gimple FE tests I've included in this patch.





diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
index 8f0a12c12ee..68808bd09fc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
@@ -1,10 +1,9 @@
  /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
+/* { dg-options "-O2 -fdump-statistics" } */
  
  void bla();
  
-/* In the following case, we should be able to thread edge through

-   the loop header.  */
+/* No one should thread through the loop header.  */
  
  void thread_entry_through_header (void)

  {
@@ -14,8 +13,4 @@ void thread_entry_through_header (void)
  bla ();
  }
  
-/* There's a single jump thread that should be handled by the VRP

-   jump threading pass.  */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
+/* { dg-final { scan-tree-dump-not "Jumps threaded" "statistics"} } */

Similarly.


Same.




diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index b0a7d423475..24de9d57d50 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,8 +1,12 @@
  /* { dg-do compile } */
  /* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" 
} */
  
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */

-/* { dg-final { scan-tree-dump-times "Registering jump" 1 "thread3" } } */
+/* ?? We should obsolete this test.  All the threads in thread1 and
+   thread3 we used to get cross the loop header but does not exit the
+   loop, so they have been deemed invalid.  */
+
+/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread1" } } */
+/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread3" } } */
  
  int sum0, sum1, sum2, sum3;

  int foo (char *s, char **ret)

Similarly.


Same.




diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index e68a9b62535..fc3adab3fc3 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -65,5 +65,9 @@ int main (void)
return 0;
  }
  
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */

+/* ?? The check below depends on jump threading.  There are no a
+   couple threaded paths that 

Re: [PATCH] tree-object-size: Avoid unnecessary processing of __builtin_object_size

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 11:57:19AM +0200, Richard Biener via Gcc-patches wrote:
> On Mon, Oct 18, 2021 at 6:25 AM Siddhesh Poyarekar  
> wrote:
> >
> > This is a minor cleanup to bail out early if the result of
> > __builtin_object_size is not assigned to anything and avoid initializing
> > the object size arrays.
> 
> OK.

Yeah, fortunately we have expansion for __builtin_object_size so even if
the user tries hard to avoid DCE of the builtin with no LHS through
-fno-tree-dce etc., it shouldn't ICE but expand to -1, 0 or __builtin_trap ()
depending on the arguments it had.

> > gcc/ChangeLog:
> >
> > * tree-object-size (object_sizes_execute): Consolidate LHS null
> > check and do it early.
> >
> > Signed-off-by: Siddhesh Poyarekar 

Jakub



Re: [RFC PATCH 1/8] RISC-V: Minimal support of bitmanip extension

2021-10-18 Thread Christoph Muellner
On Mon, Oct 18, 2021 at 10:48 AM Kito Cheng  wrote:
>
> Hi Christoph:
>
> > I think this needs another specification class (there is a
> > specification for the instructions and it is in public review).
> > Proposal: ISA_SPEC_CLASS_FROZEN_2021
>
> That's a good point, but ISA_SPEC_CLASS_FROZEN_2021 is hard to
> reference to which spec, so I would prefer to add a -misa-spec=2021 to
> align platform/profile spec, and then ISA_SPEC_CLASS_2021, and before
> RISC-V platform/profile spec has released, let keep
> ISA_SPEC_CLASS_NONE :p

For sure we cannot reference a spec that is not frozen yet (i.e.
platform/profile).
ISA_SPEC_CLASS_FROZEN_2021 was a proposal for all groups of ISA extensions
that have been frozen in 2021 (zb*, zk*, etc.) and will eventually be ratified.
But yes, keeping NONE until the specifications are ratified and change
the specification
class then is also possible.


[match.pd] PR83750 - CSE erf/erfc pair

2021-10-18 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
As suggested in PR, I have attached WIP patch that adds two patterns
to match.pd:
erfc(x) --> 1 - erf(x) if canonicalize_math_p() and,
1 - erf(x) --> erfc(x) if !canonicalize_math_p().

This works to remove call to erfc for the following test:
double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erf (x);
  double t2 = __builtin_erfc (x);
  return g(t1, t2);
}

with .optimized dump shows:
  t1_2 = __builtin_erf (x_1(D));
  t2_3 = 1.0e+0 - t1_2;

However, for the following test:
double f(double x)
{
  double g(double, double);

  double t1 = __builtin_erfc (x);
  return t1;
}

It canonicalizes erfc(x) to 1 - erf(x), but does not transform 1 -
erf(x) to erfc(x) again
post canonicalization.
-fdump-tree-folding shows that 1 - erf(x) --> erfc(x) gets applied,
but then it tries to
resimplify erfc(x), which fails post canonicalization. So we end up
with erfc(x) transformed to
1 - erf(x) in .optimized dump, which I suppose isn't ideal.
Could you suggest how to proceed ?

Thanks,
Prathamesh
diff --git a/gcc/match.pd b/gcc/match.pd
index a9791ceb74a..217e46ff12c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6147,6 +6147,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(floors tree_expr_nonnegative_p@0)
(truncs @0
 
+/* Simplify,
+   erfc(x) -> 1 - erf(x) if canonicalize_math_p().
+   1 - erf(x) -> erfc(x) if !canonicalize_math_p().  */
+
+(if (flag_unsafe_math_optimizations)
+ (simplify
+  (ERFC @0)
+   (if (canonicalize_math_p ())
+(minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0
+ (simplify
+  (minus real_onep (ERF @0))
+  (if (!canonicalize_math_p ())
+   (ERFC @0
+
 (match double_value_p
  @0
  (if (TYPE_MAIN_VARIANT (TREE_TYPE (@0)) == double_type_node)))


Re: [PATCH] Remove MAY_HAVE_DEBUG_MARKER_STMTS and MAY_HAVE_DEBUG_BIND_STMTS.

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, Oct 18, 2021 at 10:54 AM Martin Liška  wrote:
>
> The macros correspond 1:1 to an option flags and make it harder
> to find all usages of the flags.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

Hmm, they were introduced on purpose - since you leave around
MAY_HAVE_DEBUG_STMTS they conceptually make the code
easier to understand.

So I'm not sure if we want this change.  CCed Alex so maybe he
can weight in.

Richard.

> Thanks,
> Martin
>
> gcc/c-family/ChangeLog:
>
> * c-gimplify.c (genericize_c_loop): Use option directly.
>
> gcc/c/ChangeLog:
>
> * c-parser.c (add_debug_begin_stmt): Use option directly.
>
> gcc/ChangeLog:
>
> * cfgexpand.c (pass_expand::execute): Use option directly.
> * function.c (allocate_struct_function): Likewise.
> * gimple-low.c (lower_function_body): Likewise.
> (lower_stmt): Likewise.
> * gimple-ssa-backprop.c (backprop::prepare_change): Likewise.
> * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): 
> Likewise.
> * ipa-split.c (split_function): Likewise.
> * lto-streamer-in.c (input_function): Likewise.
> * sese.c (sese_insert_phis_for_liveouts): Likewise.
> * ssa-iterators.h (num_imm_uses): Likewise.
> * tree-cfg.c (make_blocks): Likewise.
> (gimple_merge_blocks): Likewise.
> * tree-inline.c (tree_function_versioning): Likewise.
> * tree-loop-distribution.c (generate_loops_for_partition): Likewise.
> * tree-sra.c (analyze_access_subtree): Likewise.
> * tree-ssa-dce.c (remove_dead_stmt): Likewise.
> * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
> * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
> * tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise.
> * tree-ssa-tail-merge.c (tail_merge_optimize): Likewise.
> * tree-ssa-threadedge.c (propagate_threaded_block_debug_into): 
> Likewise.
> * tree-ssa.c (gimple_replace_ssa_lhs): Likewise.
> (target_for_debug_bind): Likewise.
> (insert_debug_temp_for_var_def): Likewise.
> (insert_debug_temps_for_defs): Likewise.
> (reset_debug_uses): Likewise.
> * tree-ssanames.c (release_ssa_name_fn): Likewise.
> * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Likewise.
> (adjust_debug_stmts): Likewise.
> (adjust_phi_and_debug_stmts): Likewise.
> (vect_do_peeling): Likewise.
> * tree-vect-loop.c (vect_transform_loop_stmt): Likewise.
> (vect_transform_loop): Likewise.
> * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Remove
> (MAY_HAVE_DEBUG_BIND_STMTS): Remove.
> (MAY_HAVE_DEBUG_STMTS): Use options directly.
>
> gcc/cp/ChangeLog:
>
> * parser.c (add_debug_begin_stmt): Use option directly.
> ---
>   gcc/c-family/c-gimplify.c|  4 ++--
>   gcc/c/c-parser.c |  2 +-
>   gcc/cfgexpand.c  |  2 +-
>   gcc/cp/parser.c  |  2 +-
>   gcc/function.c   |  2 +-
>   gcc/gimple-low.c |  4 ++--
>   gcc/gimple-ssa-backprop.c|  2 +-
>   gcc/ipa-param-manipulation.c |  2 +-
>   gcc/ipa-split.c  |  6 +++---
>   gcc/lto-streamer-in.c|  4 ++--
>   gcc/sese.c   |  2 +-
>   gcc/ssa-iterators.h  |  2 +-
>   gcc/tree-cfg.c   |  4 ++--
>   gcc/tree-inline.c|  2 +-
>   gcc/tree-loop-distribution.c |  2 +-
>   gcc/tree-sra.c   |  2 +-
>   gcc/tree-ssa-dce.c   |  2 +-
>   gcc/tree-ssa-loop-ivopts.c   |  2 +-
>   gcc/tree-ssa-phiopt.c|  2 +-
>   gcc/tree-ssa-reassoc.c   |  2 +-
>   gcc/tree-ssa-tail-merge.c|  2 +-
>   gcc/tree-ssa-threadedge.c|  2 +-
>   gcc/tree-ssa.c   | 10 +-
>   gcc/tree-ssanames.c  |  2 +-
>   gcc/tree-vect-loop-manip.c   |  8 
>   gcc/tree-vect-loop.c |  4 ++--
>   gcc/tree.h   |  7 +--
>   27 files changed, 41 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
> index 0d38b706f4c..d9cf051a680 100644
> --- a/gcc/c-family/c-gimplify.c
> +++ b/gcc/c-family/c-gimplify.c
> @@ -295,7 +295,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
> tree cond, tree body,
> finish_bc_block (_list, bc_continue, clab);
> if (incr)
>   {
> -  if (MAY_HAVE_DEBUG_MARKER_STMTS && incr_locus != UNKNOWN_LOCATION)
> +  if (debug_nonbind_markers_p && incr_locus != UNKNOWN_LOCATION)
> {
>   tree d = build0 (DEBUG_BEGIN_STMT, void_type_node);
>   SET_EXPR_LOCATION (d, expr_loc_or_loc (incr, start_locus));
> @@ -305,7 +305,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
> tree cond, tree body,
>   }
> append_to_statement_list (entry, _list);
>
> -  if (MAY_HAVE_DEBUG_MARKER_STMTS && cond_locus != UNKNOWN_LOCATION)
> +  if 

Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-18 Thread Richard Sandiford via Gcc-patches
Michael Matz via Gcc-patches  writes:
> Hello,
>
> On Thu, 14 Oct 2021, Richard Biener wrote:
>
>> > So, at _this_ write-through of the email I think I like the above idea 
>> > best: make ao_ref be a tree (at least its storage, because it currently 
>> > is a one-member-function class), make ao_ref.volatile_p be 
>> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
>> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
>> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
>> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
>> > tcc_reference tree.
>> 
>> Hmm.  I'm not sure that's really something I like - it's especially
>> quite some heavy lifting while at the same time lacking true boldness
>> as to changing the representation of memory refs ;)
>
> Well, it would at least enable such changes later in an orderly fashion.
>
>> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
>> even simpler than the original TREE_AOREFWRAP approach, see below.
>> 
>> Note that I'm not embedding it into the tree structure, I'm merely
>> using the same allocation to store two objects, the outermost ref
>> and the ao_ref associated with it.  Quote:
>> 
>> +  size_t length = tree_code_size (TREE_CODE (lhs));
>> +  if (!TREE_ASM_WRITTEN (lhs))
>> +{
>> +  tree alt_lhs
>> +   = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
>> +  memcpy (alt_lhs, lhs, length);
>> +  TREE_ASM_WRITTEN (alt_lhs) = 1;
>> +  *ref = new ((char *)alt_lhs + length) ao_ref;
>
> You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
> yeah, for a hack that works.  If you really want to go that way you need 
> good comments about this hack.  It's really somewhat worrisome that the 
> size of the allocation depends on a bit in tree_base.
>
> (It's a really cute hack that works as a micro optimization, the question 
> is, do we really need to go there already, are all other less hacky 
> approaches not bringing similar improvements?  The cuter the hacks the 
> less often they pay off in the long run of production software :) )

FWIW, having been guilty of adding a similar hack(?) to SYMBOL_REFs
for block_symbol, I like the approach of concatenating/combining structures
based on flags.  The main tree and rtl types have too much baggage and
so I think there are some things that are better represented outside
of them.

I suppose cselib VALUE rtxes are also similar, although they're more
of a special case, since cselib data doesn't survive between passes.

Thanks,
Richard


Re: [PATCH] tree-object-size: Avoid unnecessary processing of __builtin_object_size

2021-10-18 Thread Richard Biener via Gcc-patches
On Mon, Oct 18, 2021 at 6:25 AM Siddhesh Poyarekar  wrote:
>
> This is a minor cleanup to bail out early if the result of
> __builtin_object_size is not assigned to anything and avoid initializing
> the object size arrays.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-object-size (object_sizes_execute): Consolidate LHS null
> check and do it early.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
>  gcc/tree-object-size.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/tree-object-size.c b/gcc/tree-object-size.c
> index 6a4dc724f34..46a976dfe10 100644
> --- a/gcc/tree-object-size.c
> +++ b/gcc/tree-object-size.c
> @@ -1298,6 +1298,10 @@ object_sizes_execute (function *fun, bool 
> insert_min_max_p)
>   if (!gimple_call_builtin_p (call, BUILT_IN_OBJECT_SIZE))
> continue;
>
> + tree lhs = gimple_call_lhs (call);
> + if (!lhs)
> +   continue;
> +
>   init_object_sizes ();
>
>   /* If insert_min_max_p, only attempt to fold
> @@ -1312,11 +1316,9 @@ object_sizes_execute (function *fun, bool 
> insert_min_max_p)
> {
>   unsigned HOST_WIDE_INT object_size_type = tree_to_uhwi 
> (ost);
>   tree ptr = gimple_call_arg (call, 0);
> - tree lhs = gimple_call_lhs (call);
>   if ((object_size_type == 1 || object_size_type == 3)
>   && (TREE_CODE (ptr) == ADDR_EXPR
> - || TREE_CODE (ptr) == SSA_NAME)
> - && lhs)
> + || TREE_CODE (ptr) == SSA_NAME))
> {
>   tree type = TREE_TYPE (lhs);
>   unsigned HOST_WIDE_INT bytes;
> @@ -1339,10 +1341,6 @@ object_sizes_execute (function *fun, bool 
> insert_min_max_p)
>   continue;
> }
>
> - tree lhs = gimple_call_lhs (call);
> - if (!lhs)
> -   continue;
> -
>   result = gimple_fold_stmt_to_constant (call, do_valueize);
>   if (!result)
> {
> --
> 2.31.1
>


Re: [r12-4457 Regression] FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test on Linux/x86_64

2021-10-18 Thread Richard Biener via Gcc-patches
On Sat, Oct 16, 2021 at 8:24 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> >
> > FAIL: gfortran.dg/deferred_type_param_6.f90   -O1  execution test
> > FAIL: gfortran.dg/deferred_type_param_6.f90   -Os  execution test
> Sorry for the breakage.  This time it seems like bug in Fortran FE
> which was previously latent:
>
> __attribute__((fn spec (". . R ")))
> void subfunc (character(kind=1)[1:..__result] * & __result, integer(kind=8) * 
> .__result)
> {
>   # PT = nonlocal
>   character(kind=1)[1:..__result] * & __result_3(D) = __result;
>   # PT = nonlocal null
>   integer(kind=8) * .__result_5(D) = .__result;
>   integer(kind=4) _1;
>
>[local count: 1073741824]:
>   *__result_3(D) = 
>   # USE = nonlocal escaped { D.4230 } (nonlocal, escaped)
>   _1 = _gfortran_compare_string (5, , 5, &"FIVEC"[1]{lb: 1 sz: 1});
>   if (_1 != 0)
> goto ; [0.04%]
>   else
> goto ; [99.96%]
>
>[local count: 429496]:
>   # USE = nonlocal escaped null
>   # CLB = nonlocal escaped null
>   _gfortran_stop_numeric (10, 0);
>
>[local count: 1073312329]:
>   *.__result_5(D) = 5;
>   return;
> }
>
> The fnspec ". . R " specifies that .__result is readonly however we
> have:
>   *.__result_5(D) = 5;
>
> I am not sure I understand fortran FE well enough to figure out why
> it is set so.  The function is declared as:
>
>   function subfunc() result(res)
> character(len=:), pointer :: res
> res => fifec
> if (len(res) /= 5) STOP 9
> if (res /= "FIVEC") STOP 10
>   end function subfunc
>
> and we indeed optimize load of the result:
> -  # USE = nonlocal escaped { D.4252 D.4254 } (nonlocal, escaped)
> -  # CLB = nonlocal escaped { D.4254 } (escaped)
> +  # USE = nonlocal escaped
> +  # CLB = { D.4254 }
>subfunc (, );
> -  .s2_34 = slen.4;
> -  # PT = nonlocal escaped null { D.4254 } (escaped)
> -  s2_35 = pstr.5;
>pstr.5 ={v} {CLOBBER};
>
> and I think tat is what breaks the testcase (I also verified that
> ignoring the fnspec 'R' fixes it).

The FE code adding the fnspec probably fails to consider the
separately passed length of the string result?

Can you open a bugreport please?

Richard.

> Honza
> >
> > with GCC configured with
> >
> > ../../gcc/configure 
> > --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4457/usr
> >  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet 
> > --without-isl --enable-libmpx x86_64-linux --disable-bootstrap
> >
> > To reproduce:
> >
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> > --target_board='unix{-m32}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> > --target_board='unix{-m32\ -march=cascadelake}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> > --target_board='unix{-m64}'"
> > $ cd {build_dir}/gcc && make check 
> > RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> > --target_board='unix{-m64\ -march=cascadelake}'"
> >
> > (Please do not reply to this email, for question about this report, contact 
> > me at skpgkp2 at gmail dot com)


Re: [PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-18 Thread Richard Biener via Gcc-patches
On Fri, Oct 15, 2021 at 3:50 PM Andrew MacLeod  wrote:
>
> I've been looking at the pathological time issue ranger has with the
> testcase from, uh..  PR 97623 I think.  I've lost the details, but
> kept the file since it was showing unpleasant behaviour.
>
> Most of the time is spent in callbacks from substitute_and_fold to
> value_on_edge()  dealing with PHI results and arguments.  Turns out, its
> virtually all wasted time dealing with SSA_NAMES with the
> OCCURS_IN_ABNORMAL_PHI flag set..
>
> This patch tells ranger not to consider any SSA_NAMEs which occur in
> abnormal PHIs.  This reduces the memory footprint of all the caches, and
> also has a ripple effect with the new threader code which uses the GORI
> exports and imports tables, making it faster as well as no ssa-name with
> the abnormal flag set will be entered into the tables.
>
> That alone was not quite enough, as all the sheer volume of call backs
> still took time,  so I added checks in the value_of_* class of routines
> used by substitute_and_fold to indicate there is no constant value
> available for any SSA_NAME with that flag set.
>
> On my x86_64 box, before this change, that test case looked like:
>
> tree VRP   :   7.76 (  4%)   0.23 ( 5%)   8.02
> (  4%)   537k (  0%)
> tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 (
> 4%)   392k (  0%)
> tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 39.44 (
> 22%)  1142k (  0%)
>
> And with this patch , the results are:
>
>   tree VRP   :   7.57 (  6%)   0.26 ( 5%)   7.85
> (  6%)   537k (  0%)
>   tree VRP threader  :   0.62 (  0%)   0.02 ( 0%)   0.65
> (  0%)   392k (  0%)
>   tree Early VRP :   4.00 (  3%)   0.01 ( 0%)   4.03
> (  3%)  1142k (  0%)
>
> Which is a significant improvement, both for EVRP and the threader..
>
> The patch adjusts the ranger folder, as well as the hybrid folder.
>
> bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed
> cases that I have been able to find.
>
> I don't want to push it quite yet as I wanted feedback to make sure we
> don't actually do anything I'm not aware of with SSA_NAMES which have
> the ABNORMAL_PHI flag set.  Most of the code i can find in VRP and
> vr-values appears to punt, so I presume not even considering those names
> is fine?
>
> This also seems like something that might be worth back-porting,
> especially the hybrid pass parts...

Returning NULL in gimple_range_ssa_p is probably not a good idea.  The
name does carry a range it just has to be considered VARYING.

The issue with abnormal edges is that they do not have a jump
associated with them and thus we cannot insert code on the edge
because we cannot split it.  That has implications for coalescing
since we cannot even insert copies there so the PHI argument
and the PHI result have to be the same register for the arguments
on abnormal edges.

Otherwise they do carry a value and a range but forcing that to be
VARYING makes sense to avoid propagating constants to where
it is not allowed (though the substitution phase should be the one
checking).

Richard.

> Andrew
>
>


[PATCH][GCC] arm: enable cortex-a710 CPU

2021-10-18 Thread Przemyslaw Wirkus via Gcc-patches
Hi, 

This patch is adding support for Cortex-A710 CPU [0].

  [0] https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a710

OK for master?

gcc/ChangeLog:

* config/arm/arm-cpus.in (cortex-a710): New CPU.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

-- 
kind regards, 
Przemyslaw Wirkus

Staff Compiler Engineer | Arm 
. . . . . . . . . . . . . . . . . . . . . . . . . .

Arm.com diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 3756ba56c6ea36fa9d017347bd73b27ab7752325..a6a8e4319a69be0913281701f3a85610d637922e 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1513,6 +1513,17 @@ begin cpu cortex-a78c
  part d4b
 end cpu cortex-a78c
 
+begin cpu cortex-a710
+ cname cortexa710
+ tune for cortex-a57
+ tune flags LDSCHED
+ architecture armv9-a+fp16+bf16+i8mm
+ option crypto add FP_ARMv8 CRYPTO
+ costs cortex_a57
+ vendor 41
+ part d47
+end cpu cortex-a710
+
 begin cpu cortex-x1
  cname cortexx1
  tune for cortex-a57
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index c00e252ec5aa0f1a9004718dbea3cf969a4e5be6..6e457fb250223eac22c033424dae406cb74b7df8 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -249,6 +249,9 @@ Enum(processor_type) String(cortex-a78ae) Value( TARGET_CPU_cortexa78ae)
 EnumValue
 Enum(processor_type) String(cortex-a78c) Value( TARGET_CPU_cortexa78c)
 
+EnumValue
+Enum(processor_type) String(cortex-a710) Value( TARGET_CPU_cortexa710)
+
 EnumValue
 Enum(processor_type) String(cortex-x1) Value( TARGET_CPU_cortexx1)
 
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 6482833fc35b5758f66f2c7082e89c8ded250242..54e701f439b1a6f33267fd54248623755acef3b4 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -46,8 +46,9 @@ (define_attr "tune"
 	cortexa73cortexa53,cortexa55,cortexa75,
 	cortexa76,cortexa76ae,cortexa77,
 	cortexa78,cortexa78ae,cortexa78c,
-	cortexx1,neoversen1,cortexa75cortexa55,
-	cortexa76cortexa55,neoversev1,neoversen2,
-	cortexm23,cortexm33,cortexm35p,
-	cortexm55,cortexr52,cortexr52plus"
+	cortexa710,cortexx1,neoversen1,
+	cortexa75cortexa55,cortexa76cortexa55,neoversev1,
+	neoversen2,cortexm23,cortexm33,
+	cortexm35p,cortexm55,cortexr52,
+	cortexr52plus"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ce738e830a948016d89e456539fef5f5b18688fb..c5966de3231f9b50df68c0ad434789b0abe7f616 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20477,7 +20477,7 @@ Permissible names are: @samp{arm7tdmi}, @samp{arm7tdmi-s}, @samp{arm710t},
 @samp{cortex-a32}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
 @samp{cortex-a76}, @samp{cortex-a76ae}, @samp{cortex-a77},
-@samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
+@samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c}, @samp{cortex-a710},
 @samp{ares}, @samp{cortex-r4}, @samp{cortex-r4f}, @samp{cortex-r5},
 @samp{cortex-r7}, @samp{cortex-r8}, @samp{cortex-r52}, @samp{cortex-r52plus},
 @samp{cortex-m0}, @samp{cortex-m0plus}, @samp{cortex-m1}, @samp{cortex-m3},


[PATCH][GCC] arm: add armv9-a architecture to -march

2021-10-18 Thread Przemyslaw Wirkus via Gcc-patches
Hi,

This patch is adding `armv9-a` to -march in Arm GCC.

In this patch:
+ Add `armv9-a` to -march.
+ Update multilib with armv9-a and armv9-a+simd.

After this patch three additional multilib directories are available:

$ arm-none-eabi-gcc --print-multi-lib
.;
[...vanilla multi-lib dirs...]
thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-abi=softfp
thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-abi=hard

New multi-lib directories under
$GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:

thumb/
+--- v9-a
||--- nofp
|
+--- v9-a+simd
 |--- hard
 |--- softfp

Regtested on arm-none-eabi cross and no issues.

OK for master?

gcc/ChangeLog:

* config/arm/arm-cpus.in (armv9): New define.
(ARMv9a): New group.
(armv9-a): New arch definition.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
* config/arm/t-aprofile: Added armv9-a and armv9+simd.
* config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
to MULTILIB_MATCHES.
* config/arm/t-multilib: Added v9_a_nosimd_variants and
v9_a_simd_variants to MULTILIB_MATCHES.
* doc/invoke.texi: Update docs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/multilib.exp: Update test with armv9-a entries.
* lib/target-supports.exp (v9a): Add new armflag.
(__ARM_ARCH_9A__): Add new armdef.

-- 
kind regards, 
Przemyslaw Wirkus
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d0d0d0f1c7e4176fc4aa30d82394fe938b083a59..3756ba56c6ea36fa9d017347bd73b27ab7752325 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -132,6 +132,9 @@ define feature cmse
 # Architecture rel 8.1-M.
 define feature armv8_1m_main
 
+# Architecture rel 9.0.
+define feature armv9
+
 # Floating point and Neon extensions.
 # VFPv1 is not supported in GCC.
 
@@ -293,6 +296,7 @@ define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
 define fgroup ARMv8_1m_main ARMv8m_main armv8_1m_main
+define fgroup ARMv9a  ARMv8_5a armv9
 
 # Useful combinations.
 define fgroup VFPv2	vfpv2
@@ -751,6 +755,21 @@ begin arch armv8.1-m.main
  option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
+begin arch armv9-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 9A
+ profile A
+ isa ARMv9a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 DOTPROD
+ option bf16 add bf16 FP_ARMv8 DOTPROD
+end arch armv9-a
+
 begin arch iwmmxt
  tune for iwmmxt
  tune flags LDSCHED STRONG XSCALE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 8bb0c9f6a7bd9230e7b2de1e2ef4ed5177f89495..c00e252ec5aa0f1a9004718dbea3cf969a4e5be6 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -383,10 +383,13 @@ EnumValue
 Enum(arm_arch) String(armv8.1-m.main) Value(30)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(31)
+Enum(arm_arch) String(armv9-a) Value(31)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(32)
+Enum(arm_arch) String(iwmmxt) Value(32)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(33)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c15346f1bea59d70fdcb1d19545473b23b..3a8d223ee622ffe5b25e14ed07bfaa07835dc683 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -452,7 +452,8 @@ enum base_architecture
   BASE_ARCH_8A = 8,
   BASE_ARCH_8M_BASE = 8,
   BASE_ARCH_8M_MAIN = 8,
-  BASE_ARCH_8R = 8
+  BASE_ARCH_8R = 8,
+  BASE_ARCH_9A = 9
 };
 
 /* The major revision number of the ARM Architecture implemented by the target.  */
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 8574ac3e24d0d67c12bae5f88d1410ec1e0f983d..68e2251c7266712177723a7d634016f4fddaacac 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -26,8 +26,8 @@
 
 # Arch and FPU variants to build libraries with
 
-MULTI_ARCH_OPTS_A   = march=armv7-a/march=armv7-a+fp/march=armv7-a+simd/march=armv7ve+simd/march=armv8-a/march=armv8-a+simd
-MULTI_ARCH_DIRS_A   = v7-a v7-a+fp v7-a+simd v7ve+simd v8-a v8-a+simd
+MULTI_ARCH_OPTS_A   = march=armv7-a/march=armv7-a+fp/march=armv7-a+simd/march=armv7ve+simd/march=armv8-a/march=armv8-a+simd/march=armv9-a/march=armv9-a+simd
+MULTI_ARCH_DIRS_A   = v7-a v7-a+fp v7-a+simd v7ve+simd v8-a v8-a+simd v9-a v9-a+simd
 
 # ARMv7-A - build nofp, fp-d16 and SIMD variants
 
@@ -46,6 +46,11 @@ MULTILIB_REQUIRED	+= mthumb/march=armv8-a/mfloat-abi=soft
 MULTILIB_REQUIRED	+= mthumb/march=armv8-a+simd/mfloat-abi=hard
 MULTILIB_REQUIRED	+= mthumb/march=armv8-a+simd/mfloat-abi=softfp
 
+# Armv9-A - build nofp and 

Re: [PATCH] hardened conditionals

2021-10-18 Thread Richard Biener via Gcc-patches
On Fri, Oct 15, 2021 at 8:35 PM Alexandre Oliva  wrote:
>
> On Oct 14, 2021, Richard Biener  wrote:
>
> > Yeah, I think that eventually marking the operation we want to preserve
> > (with volatile?) would be the best way.  On GIMPLE that's difficult,
> > it's easier on GENERIC (we can set TREE_THIS_VOLATILE on the
> > expression at least), and possibly also on RTL (where such flag
> > might already be a thing?).
>
> Making the expr volatile would likely get gimple to deal with it like
> memory, which would completely defeat the point of trying to avoid a
> copy.
>
> RTL has support for volatile MEMs and (user-)REGs indeed, but in order
> to avoid the copy, we don't want the pseudo to be volatile, we want
> specific users thereof to be.  An unspec_volatile would accomplish that,
> but it would require RTL patterns to match it wherever a pseudo might
> appear.  Considering all forms of insns involving conditionals on all
> relevant targets, that's far too much effort for no measurable beenefit.
>
>
> > So when going that way doing the hardening on RTL seems easier (if you
> > want to catch all compares, if you want to only catch compare + jump
> > that has your mentioned issue of all the different representations)
>
> It's not.  RTL has various ways to represent store-flags too.  Even now
> that we don't have to worry about implicit CC, a single boolean test may
> expand to a compare-and-set-[CC-]reg, and then a
> compare-and-store-CC-reg, or a single compare-and-set-[non-CC-]reg, and
> IIRC in some cases even more than one (pair of) conditionals.
>
> Compare-and-branches also come in such a multitude of settings.
>
> It all depends on the target, and I don't really see any benefit
> whatsoever of implementing such trivial gimple passes with all the
> potential complexity of RTL on all the architectures relevant for GCC,
> or even for this project alone.
>
> > Note that I did not look on the actual patch, I'm trying to see whether 
> > there's
> > some generic usefulness we can extract from the patchs requirement
> > which to me looks quite specific and given it's "hackish" implementation
> > way might not be the most important one to carry on trunk (I understand
> > that AdaCore will carry it in their compiler).
>
> It's also simple, no-maintenance, and entirely self-contained.

Yes, it is (just having had a quick look most of the functions in the
pass lack function-level comments).

>  A good
> example of something that could be implemented as a plugin, except for
> command-line options.
>
> Maybe we could have a plugin collection in our source tree, to hold
> stuff like this and to offer examples of plugins, and means to build
> select plugins as such, or as preloaded modules into the compiler for
> easier deployment.

I think this has been suggested before, yes.  But if those "plugins"
are as self-contained as yours here there's also no reason to not
simply compile them in as regular passes (unless they are slightly
broken and thus a maintainance burden).

Richard.

> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


[PATCH][PUSHED] gcc-changelog: update error message location

2021-10-18 Thread Martin Liška

Hello.

The patch improves location information for 'bad parentheses wrapping'.

Pushed to master.
Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Update location of
'bad parentheses wrapping'.
* gcc-changelog/test_email.py: Test it.
---
 contrib/gcc-changelog/git_commit.py | 14 +++---
 contrib/gcc-changelog/test_email.py |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index f26dc3b4135..cf29f761964 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -217,7 +217,7 @@ class ChangeLogEntry:
 self.lines = []
 self.files = []
 self.file_patterns = []
-self.opened_parentheses = 0
+self.parentheses_stack = []
 
 def parse_file_names(self):

 # Whether the content currently processed is between a star prefix the
@@ -551,7 +551,7 @@ class GitCommit:
 m = star_prefix_regex.match(line)
 if m:
 if (len(m.group('spaces')) != 1 and
-last_entry.opened_parentheses == 0):
+not last_entry.parentheses_stack):
 msg = 'one space should follow asterisk'
 self.errors.append(Error(msg, line))
 else:
@@ -576,13 +576,13 @@ class GitCommit:
 def process_parentheses(self, last_entry, line):
 for c in line:
 if c == '(':
-last_entry.opened_parentheses += 1
+last_entry.parentheses_stack.append(line)
 elif c == ')':
-if last_entry.opened_parentheses == 0:
+if not last_entry.parentheses_stack:
 msg = 'bad wrapping of parenthesis'
 self.errors.append(Error(msg, line))
 else:
-last_entry.opened_parentheses -= 1
+del last_entry.parentheses_stack[-1]
 
 def parse_file_names(self):

 for entry in self.changelog_entries:
@@ -608,9 +608,9 @@ class GitCommit:
 
 def check_for_broken_parentheses(self):

 for entry in self.changelog_entries:
-if entry.opened_parentheses != 0:
+if entry.parentheses_stack:
 msg = 'bad parentheses wrapping'
-self.errors.append(Error(msg, entry.lines[0]))
+self.errors.append(Error(msg, entry.parentheses_stack[-1]))
 
 def get_file_changelog_location(self, changelog_file):

 for file in self.info.modified_files:
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index dae7c27c707..a4796dbbe94 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -415,6 +415,7 @@ class TestGccChangelog(unittest.TestCase):
 def test_multiline_bad_parentheses(self):
 email = self.from_patch_glob('0002-Wrong-macro-changelog.patch')
 assert email.errors[0].message == 'bad parentheses wrapping'
+assert email.errors[0].line == '   * config/i386/i386.md 
(*fix_trunc_i387_1,'
 
 def test_changelog_removal(self):

 email = self.from_patch_glob('0001-ChangeLog-removal.patch')
--
2.33.0



Re: [SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-18 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> index 4604365fbef..cedc5b7c549 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_4.c
> @@ -56,7 +56,11 @@ TEST_ALL (DEF_LOOP)
> we're relying on combine to merge a SEL and an arithmetic operation,
> and the SEL doesn't allow the "false" value to be zero when the "true"
> value is a register.  */
> -/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+, z[0-9]+\n} 14 } } 
> */
> +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+, z[0-9]+\n} 7 } } */
> +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.b, p[0-9]/z, 
> z[0-9]+\.b} 1 } } */
> +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.h, p[0-9]/z, 
> z[0-9]+\.h} 2 } } */
> +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.s, p[0-9]/z, 
> z[0-9]+\.s} 2 } } */
> +/* { dg-final { scan-assembler-times {\tmovprfx\tz[0-9]+\.d, p[0-9]/z, 
> z[0-9]+\.d} 2 } } */

Very minor, but: p[0-7] is more accurate than p[0-9].

OK with that change, thanks.

Richard

>  
>  /* { dg-final { scan-assembler-not {\tmov\tz[^\n]*z} } } */
>  /* { dg-final { scan-assembler-not {\tsel\t} } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> new file mode 100644
> index 000..2f92224cecb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr93183.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mcpu=generic+sve" } */
> +
> +typedef unsigned char uint8_t;
> +
> +static inline uint8_t
> +x264_clip_uint8(uint8_t x)
> +{
> +  uint8_t t = -x;
> +  uint8_t t1 = x & ~63; 
> +  return (t1 != 0) ? t : x; 
> +}
> +
> +void
> +mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
> +{
> +  for (int x = 0; x < n*16; x++)
> +dst[x] = x264_clip_uint8(src[x]);
> +}
> +
> +/* { dg-final { scan-assembler-not {\tsel} } } */


Re: [RFC] Port git gcc-descr to Python

2021-10-18 Thread Martin Liška

On 10/12/21 10:59, Martin Liška wrote:

Hello.

There's a complete patch that implements both git gcc-descr and gcc-undesrc
and sets corresponding git aliases to use them.

Ready to be installed?
Thanks,
Martin


All right, so Jakub told me at IRC that we doesn't support porting to Python.
However, he promised supporting the changes I made in the original shell script.

Cheers,
Martin


Re: [Patch][GCN] [GCC 11] Backport GCN with LLVM-MC 13 linker fixes to GCC 11

2021-10-18 Thread Andrew Stubbs

This is fine by me.

As I said in my email on the 15th, LLVM 13 is still not considered safe 
to use. The ICE you encountered is a real problem that will affect real 
users.


I expect to work on a solution for that soon.

Andrew

On 16/10/2021 21:41, Tobias Burnus wrote:

This patch is mostly motivated by distribution needs in general
and Debian/Ubuntu needs in particular – but I think it makes
sense for all GCC 11 users.

GCC's AMD GCN support uses LLVM's assembler and linker mc/lld
and thus requires compatibility with LLVM. On mainline, support
for LLVM 13 was added – and I like to see a backport to GCC 11.

In particular, I would like to "git cherry-pick -x' the following patches:

cfa1f8226f2 gcc/configure.ac: fix register issue for global_load 
assembler functions

aad32a00b7d amdgcn: Add -mxnack and -msram-ecc [PR 100208]
5c127c4cac3 amdgcn: Mark s_mulk_i32 as clobbering SCC
  (Outside of this series as it only picks a bug fix)
6ca03ca35a5 amdgcn: Support LLVM 13 assembler syntax
  (-> see remark below)
205dafb6ede amdgcn: Implement -msram-ecc=any
81c362c7c2b amdgcn: Fix assembler version incompatibility
f3d64372d77 amdgcn: fix up offload debug linking with LLVM 13

OK for GCC 11?

[I have build GCC x86-64 with amdgcn offloading enabled both
with LLVM 9 and with LLVM 13; libgomp passes fine with LLVM 9
but with the LLVM 13 build, I see an ICE in lld for some testcases,
which I have not debugged – but GCC itself builds and several
libgomp testcases do pass.]

Tobias

PS: I attached
* the full "git log --stat" for all those patches for references
* as the 4th one, "Support LLVM 13 assembler syntax", does not
   cleanly apply, I attached the full patch.

The reason that the latter does not apply is that mainline changed:
"configure: remove version argument from gcc_GAS_CHECK_FEATURE"
in https://gcc.gnu.org/g:e0b6d0b39c69372e4a66f44d218e0244bb549d83
which was fixed for GCN a bit later in commit
"configure: Adjust several assembler checks to remove an unused parm."
https://gcc.gnu.org/g:e5d9873fcb6f90d03b7534af53de39ec65d0cdc5

The only change is "," to ",," in configure.ac; 'configure' itself
was already fine.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955




[PATCH] Remove MAY_HAVE_DEBUG_MARKER_STMTS and MAY_HAVE_DEBUG_BIND_STMTS.

2021-10-18 Thread Martin Liška

The macros correspond 1:1 to an option flags and make it harder
to find all usages of the flags.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/c-family/ChangeLog:

* c-gimplify.c (genericize_c_loop): Use option directly.

gcc/c/ChangeLog:

* c-parser.c (add_debug_begin_stmt): Use option directly.

gcc/ChangeLog:

* cfgexpand.c (pass_expand::execute): Use option directly.
* function.c (allocate_struct_function): Likewise.
* gimple-low.c (lower_function_body): Likewise.
(lower_stmt): Likewise.
* gimple-ssa-backprop.c (backprop::prepare_change): Likewise.
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call): 
Likewise.
* ipa-split.c (split_function): Likewise.
* lto-streamer-in.c (input_function): Likewise.
* sese.c (sese_insert_phis_for_liveouts): Likewise.
* ssa-iterators.h (num_imm_uses): Likewise.
* tree-cfg.c (make_blocks): Likewise.
(gimple_merge_blocks): Likewise.
* tree-inline.c (tree_function_versioning): Likewise.
* tree-loop-distribution.c (generate_loops_for_partition): Likewise.
* tree-sra.c (analyze_access_subtree): Likewise.
* tree-ssa-dce.c (remove_dead_stmt): Likewise.
* tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
* tree-ssa-phiopt.c (spaceship_replacement): Likewise.
* tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise.
* tree-ssa-tail-merge.c (tail_merge_optimize): Likewise.
* tree-ssa-threadedge.c (propagate_threaded_block_debug_into): Likewise.
* tree-ssa.c (gimple_replace_ssa_lhs): Likewise.
(target_for_debug_bind): Likewise.
(insert_debug_temp_for_var_def): Likewise.
(insert_debug_temps_for_defs): Likewise.
(reset_debug_uses): Likewise.
* tree-ssanames.c (release_ssa_name_fn): Likewise.
* tree-vect-loop-manip.c (adjust_vec_debug_stmts): Likewise.
(adjust_debug_stmts): Likewise.
(adjust_phi_and_debug_stmts): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (vect_transform_loop_stmt): Likewise.
(vect_transform_loop): Likewise.
* tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Remove
(MAY_HAVE_DEBUG_BIND_STMTS): Remove.
(MAY_HAVE_DEBUG_STMTS): Use options directly.

gcc/cp/ChangeLog:

* parser.c (add_debug_begin_stmt): Use option directly.
---
 gcc/c-family/c-gimplify.c|  4 ++--
 gcc/c/c-parser.c |  2 +-
 gcc/cfgexpand.c  |  2 +-
 gcc/cp/parser.c  |  2 +-
 gcc/function.c   |  2 +-
 gcc/gimple-low.c |  4 ++--
 gcc/gimple-ssa-backprop.c|  2 +-
 gcc/ipa-param-manipulation.c |  2 +-
 gcc/ipa-split.c  |  6 +++---
 gcc/lto-streamer-in.c|  4 ++--
 gcc/sese.c   |  2 +-
 gcc/ssa-iterators.h  |  2 +-
 gcc/tree-cfg.c   |  4 ++--
 gcc/tree-inline.c|  2 +-
 gcc/tree-loop-distribution.c |  2 +-
 gcc/tree-sra.c   |  2 +-
 gcc/tree-ssa-dce.c   |  2 +-
 gcc/tree-ssa-loop-ivopts.c   |  2 +-
 gcc/tree-ssa-phiopt.c|  2 +-
 gcc/tree-ssa-reassoc.c   |  2 +-
 gcc/tree-ssa-tail-merge.c|  2 +-
 gcc/tree-ssa-threadedge.c|  2 +-
 gcc/tree-ssa.c   | 10 +-
 gcc/tree-ssanames.c  |  2 +-
 gcc/tree-vect-loop-manip.c   |  8 
 gcc/tree-vect-loop.c |  4 ++--
 gcc/tree.h   |  7 +--
 27 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
index 0d38b706f4c..d9cf051a680 100644
--- a/gcc/c-family/c-gimplify.c
+++ b/gcc/c-family/c-gimplify.c
@@ -295,7 +295,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
   finish_bc_block (_list, bc_continue, clab);
   if (incr)
 {
-  if (MAY_HAVE_DEBUG_MARKER_STMTS && incr_locus != UNKNOWN_LOCATION)
+  if (debug_nonbind_markers_p && incr_locus != UNKNOWN_LOCATION)
{
  tree d = build0 (DEBUG_BEGIN_STMT, void_type_node);
  SET_EXPR_LOCATION (d, expr_loc_or_loc (incr, start_locus));
@@ -305,7 +305,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
 }
   append_to_statement_list (entry, _list);
 
-  if (MAY_HAVE_DEBUG_MARKER_STMTS && cond_locus != UNKNOWN_LOCATION)

+  if (debug_nonbind_markers_p && cond_locus != UNKNOWN_LOCATION)
 {
   tree d = build0 (DEBUG_BEGIN_STMT, void_type_node);
   SET_EXPR_LOCATION (d, cond_locus);
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599e..1ba2b2f8342 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1791,7 +1791,7 @@ static void
 add_debug_begin_stmt (location_t loc)
 {
   /* Don't add DEBUG_BEGIN_STMTs outside of functions, see PR84721.  */
-  if (!MAY_HAVE_DEBUG_MARKER_STMTS || !building_stmt_list_p ())
+  if 

Re: [RFC PATCH 1/8] RISC-V: Minimal support of bitmanip extension

2021-10-18 Thread Kito Cheng
Hi Christoph:

> I think this needs another specification class (there is a
> specification for the instructions and it is in public review).
> Proposal: ISA_SPEC_CLASS_FROZEN_2021

That's a good point, but ISA_SPEC_CLASS_FROZEN_2021 is hard to
reference to which spec, so I would prefer to add a -misa-spec=2021 to
align platform/profile spec, and then ISA_SPEC_CLASS_2021, and before
RISC-V platform/profile spec has released, let keep
ISA_SPEC_CLASS_NONE :p

> BR
> Christoph


[PATCH] tree-optimization/102798 - avoid copying PTA info to old SSA names

2021-10-18 Thread Richard Biener via Gcc-patches
The vectorizer duplicates pointer-info to created pointer bases
but it has to avoid changing points-to info on existing SSA names
because there's now flow-sensitive info in there (pt->pt_null as
set from VRP).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

Richard.

2021-10-18  Richard Biener  

PR tree-optimization/102798
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Only copy points-to info to newly generated SSA names.

* gcc.dg/pr102798.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr102798.c | 41 +
 gcc/tree-vect-data-refs.c   |  8 +--
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102798.c

diff --git a/gcc/testsuite/gcc.dg/pr102798.c b/gcc/testsuite/gcc.dg/pr102798.c
new file mode 100644
index 000..3a50546a16b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102798.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-tree-pta" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+__attribute__((__noipa__))
+void BUF_reverse (unsigned char *out, const unsigned char *in, size_t size)
+{
+  size_t i;
+  if (in)
+{
+  out += size - 1;
+  for (i = 0; i < size; i++)
+*out++ = *in++;
+}
+  else
+{
+  unsigned char *q;
+  char c;
+  q = out + size - 1;
+  for (i = 0; i < size ; i++)
+{
+  *out++ = 1;
+}
+}
+}
+
+int
+main (void)
+{
+  unsigned char buf[40];
+  unsigned char buf1[40];
+  for (unsigned i = 0; i < sizeof (buf); i++)
+buf[i] = i;
+  BUF_reverse (buf, 0, sizeof (buf));
+  for (unsigned i = 0; i < sizeof (buf); i++)
+if (buf[i] != 1)
+  __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 1e13148190c..a19045f7e46 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4785,8 +4785,12 @@ vect_create_addr_base_for_vector_ref (vec_info *vinfo, 
stmt_vec_info stmt_info,
 
   if (DR_PTR_INFO (dr)
   && TREE_CODE (addr_base) == SSA_NAME
-  && !SSA_NAME_PTR_INFO (addr_base))
-vect_duplicate_ssa_name_ptr_info (addr_base, dr_info);
+  /* We should only duplicate pointer info to newly created SSA names.  */
+  && SSA_NAME_VAR (addr_base) == dest)
+{
+  gcc_assert (!SSA_NAME_PTR_INFO (addr_base));
+  vect_duplicate_ssa_name_ptr_info (addr_base, dr_info);
+}
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "created %T\n", addr_base);
-- 
2.31.1


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-18 Thread Aldy Hernandez via Gcc-patches



On 10/18/21 12:52 AM, Jeff Law wrote:



On 10/8/2021 9:12 AM, Aldy Hernandez via Gcc-patches wrote:

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.
So is there any reason why we can't convert DOM as well?   DOM's use of 
EVRP is pretty limited.  You've mentioned FP bits before, but my 
recollection is those are not part of the EVRP analysis DOM uses. Hell, 
give me a little guidance and I'll do the work...


Not only will I take you up on that offer, but I can provide 90% of the 
work.  Here be dragons, though (well, for me, maybe not for you ;-)).


DOM is actually an evrp pass at -O1 in disguise.  The reason it really 
is a covert evrp pass is because:


a) It calls extract_range_from_stmt on each statement.

b) It folds conditionals with simplify_using_ranges.

c) But most importantly, it exports discovered ranges when it's done 
(evrp_range_analyzer(true)).


If you look at the evrp pass, you'll notice that that's basically what 
it does, albeit with the substitute and fold engine, which also calls 
gimple fold plus other goodies.


But I could argue that we've made DOM into an evrp pass without 
noticing.  The last item (c) is particularly invasive because these 
exported ranges show up in other passes unexpectedly.  For instance, I 
saw an RTL pass at -O1 miss an optimization because it was dependent on 
some global range being set.  IMO, DOM should not export global ranges 
it discovered during its walk (do one thing and do it well), but I leave 
it to you experts to pontificate.


The attached patch is rather trivial.  It's mostly deleting state.  It 
seems DOM spends a lot of time massaging the IL so that it can fold 
conditionals or thread paths.  None of this is needed, because the 
ranger can do all of this.  Well, except floats, but...


...You'll notice that converting to the threader is an exercise in code 
deletion.  Basically, inherit from the hybrid threader while still using 
the copies/avail_exprs business.  This has the added benefit of keeping 
our float threading capabilities intact, while pulling in all the hybrid 
threader goodness.


Over the past year or so, I've been cleaning up evrp clients to use the 
common range_query API.  This makes this conversion easier, as it mostly 
involves replacing evrp_range_analyzer with the ranger and removing the 
state pushing/popping business.


That's the good news.  The bad news is that DOM changes the IL as it 
goes and the patch doesn't bootstrap.  Andrew insists that we should 
work even with DOM's changing IL, but last time we played this dance 
with the substitute_and_fold engine, there were some tweaks needed to 
the ranger.  Could be this, but I haven't investigated.  It could also 
be that the failures I was seeing were just DOM things that were no 
longer needed (shuffling the IL to simplify things for evrp).


This just needs a little shepherding from a DOM expert ;-).  If you get 
it to bootstrap, I could take care of the tests, performance, and making 
sure we're getting the same number of threads etc.






No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

As a further enhancement, if the relevant maintainers deem useful,
the domwalk could be removed from strlen.  That is, unless the pass
needs it for something else.
The dom walk was strictly for the benefit of EVRP when it was added.  So 
I think it can get zapped once the pass is converted.


Jakub mentioned a while ago, that the strlen pass itself needs DOM, so 
perhaps this needs to stay.


Aldy
>From d0cb66e21abfccb738faabe2910b5000823f3030 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 29 Sep 2021 20:50:41 +0200
Subject: [PATCH] Convert DOM to ranger.

---
 gcc/testsuite/gcc.dg/graphite/pr69728.c   |   4 +-
 gcc/testsuite/gcc.dg/sancov/cmp0.c|   2 +-
 .../gcc.dg/tree-ssa/ssa-dom-branch-1.c|   7 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-2.c  |   6 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c  |  24 +++-
 gcc/tree-ssa-dom.c| 132 --
 gcc/tree-ssa-threadedge.c |  67 +
 gcc/tree-ssa-threadedge.h |   5 +-
 8 files changed, 72 insertions(+), 175 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/pr69728.c b/gcc/testsuite/gcc.dg/graphite/pr69728.c
index 69e28318aaf..e8cd7bec0a1 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr69728.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr69728.c
@@ -24,6 +24,4 @@ fn1 ()
run into scheduling issues before here, not being able to handle
empty domains.  */
 
-/* XFAILed by fix for PR86865.  */
-
-/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" { xfail *-*-* } } }  */
+/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" } }  */
diff 

Re: [PATCH 1/2] arm: add arm bti pass

2021-10-18 Thread Andrea Corallo via Gcc-patches
Andrea Corallo via Gcc-patches  writes:

> Hi all,
>
> this patch is part of a series that enables Armv8.1-M in GCC and adds
> Branch Target Identification Mechanism [1].
>
> This patch moves and generalize the Aarch64 "bti" pass so it can be
> used also by the Arm backend.
>
> The pass iterates through the instructions and adds the necessary BTI
> instructions at the beginning of every function and at every landing
> pads targeted by indirect jumps.
>
> Regressioned and bootstraped on arm-linux-gnu aarch64-linux-gnu.
>
> Best Regards
>
>   Andrea
>
> [1] 
> 

Ping

Best Regards

  Andrea


[PATCH][PUSHED] Remove unused but set variables.

2021-10-18 Thread Martin Liška

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Pushed to master as obvious.

Reported by clang13 -Wunused-but-set-variable:

gcc/ChangeLog:

* dbgcnt.c (dbg_cnt_process_opt): Remove unused but set variable.
* gcov.c (get_cycles_count): Likewise.
* lto-compress.c (lto_compression_zlib): Likewise.
(lto_uncompression_zlib): Likewise.
* targhooks.c (default_pch_valid_p): Likewise.

libcpp/ChangeLog:

* charset.c (convert_oct): Remove unused but set variable.
---
 gcc/dbgcnt.c   | 2 --
 gcc/gcov.c | 4 +---
 gcc/lto-compress.c | 4 
 gcc/targhooks.c| 4 +---
 libcpp/charset.c   | 2 --
 5 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/gcc/dbgcnt.c b/gcc/dbgcnt.c
index 6a7eb34cd3e..458341a53a0 100644
--- a/gcc/dbgcnt.c
+++ b/gcc/dbgcnt.c
@@ -208,7 +208,6 @@ void
 dbg_cnt_process_opt (const char *arg)
 {
   char *str = xstrdup (arg);
-  unsigned int start = 0;
 
   auto_vec tokens;

   for (char *next = strtok (str, ","); next != NULL; next = strtok (NULL, ","))
@@ -227,7 +226,6 @@ dbg_cnt_process_opt (const char *arg)
  if (!dbg_cnt_process_single_pair (name, ranges[j]))
break;
}
-  start += strlen (tokens[i]) + 1;
 }
 }
 
diff --git a/gcc/gcov.c b/gcc/gcov.c

index 829e955a63b..3672ae7a6f8 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -843,7 +843,6 @@ get_cycles_count (line_info )
  Therefore, operating on a permuted order (i.e., non-sorted) only
  has the effect of permuting the output cycles.  */
 
-  bool loop_found = false;

   gcov_type count = 0;
   for (vector::iterator it = linfo.blocks.begin ();
it != linfo.blocks.end (); it++)
@@ -851,8 +850,7 @@ get_cycles_count (line_info )
   arc_vector_t path;
   block_vector_t blocked;
   vector block_lists;
-  loop_found |= circuit (*it, path, *it, blocked, block_lists, linfo,
-count);
+  circuit (*it, path, *it, blocked, block_lists, linfo, count);
 }
 
   return count;

diff --git a/gcc/lto-compress.c b/gcc/lto-compress.c
index b5f4916b139..c40a13c8446 100644
--- a/gcc/lto-compress.c
+++ b/gcc/lto-compress.c
@@ -250,7 +250,6 @@ lto_compression_zlib (struct lto_compression_stream *stream)
   const size_t outbuf_length = Z_BUFFER_LENGTH;
   unsigned char *outbuf = (unsigned char *) xmalloc (outbuf_length);
   z_stream out_stream;
-  size_t compressed_bytes = 0;
   int status;
 
   gcc_assert (stream->is_compression);

@@ -282,7 +281,6 @@ lto_compression_zlib (struct lto_compression_stream *stream)
 
   stream->callback ((const char *) outbuf, out_bytes, stream->opaque);

   lto_stats.num_compressed_il_bytes += out_bytes;
-  compressed_bytes += out_bytes;
 
   cursor += in_bytes;

   remaining -= in_bytes;
@@ -342,7 +340,6 @@ lto_uncompression_zlib (struct lto_compression_stream 
*stream)
   size_t remaining = stream->bytes;
   const size_t outbuf_length = Z_BUFFER_LENGTH;
   unsigned char *outbuf = (unsigned char *) xmalloc (outbuf_length);
-  size_t uncompressed_bytes = 0;
 
   gcc_assert (!stream->is_compression);

   timevar_push (TV_IPA_LTO_DECOMPRESS);
@@ -378,7 +375,6 @@ lto_uncompression_zlib (struct lto_compression_stream 
*stream)
 
 	  stream->callback ((const char *) outbuf, out_bytes, stream->opaque);

  lto_stats.num_uncompressed_il_bytes += out_bytes;
- uncompressed_bytes += out_bytes;
 
 	  cursor += in_bytes;

  remaining -= in_bytes;
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index cbbcedf790f..812bbe3f16e 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -2200,7 +2200,7 @@ pch_option_mismatch (const char *option)
 /* Default version of pch_valid_p.  */
 
 const char *

-default_pch_valid_p (const void *data_p, size_t len)
+default_pch_valid_p (const void *data_p, size_t len ATTRIBUTE_UNUSED)
 {
   struct cl_option_state state;
   const char *data = (const char *)data_p;
@@ -2221,7 +2221,6 @@ default_pch_valid_p (const void *data_p, size_t len)
 
   memcpy (, data, sizeof (target_flags));

   data += sizeof (target_flags);
-  len -= sizeof (target_flags);
   r = targetm.check_pch_target_flags (tf);
   if (r != NULL)
return r;
@@ -2233,7 +2232,6 @@ default_pch_valid_p (const void *data_p, size_t len)
if (memcmp (data, state.data, state.size) != 0)
  return pch_option_mismatch (cl_options[i].opt_text);
data += state.size;
-   len -= state.size;
   }
 
   return NULL;

diff --git a/libcpp/charset.c b/libcpp/charset.c
index b84a9740165..e4e45f6d39d 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -1464,7 +1464,6 @@ convert_oct (cpp_reader *pfile, const uchar *from, const 
uchar *limit,
   cppchar_t c, n = 0;
   size_t width = cvt.width;
   size_t mask = width_to_mask (width);
-  bool overflow = false;
 
   /* loc_reader and ranges must either be both NULL, or both be non-NULL.  */

   gcc_assert ((loc_reader != NULL) == (ranges 

[PATCH] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-18 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs, because while we have in cp_build_addr_expr_1
diagnostics for taking address of an immediate function (and as an exception
deal with build_address from immediate invocation), I forgot to diagnose
taking address of a member function which is done in a different place.
I hope (s.*::foo) () is not an immediate invocation like
(*) () is not, so this patch just diagnoses taking address of a member
function when not in immediate context.

Bootstrapped/regtested on x86_64-linux and i686-linux (without go,
that seem to have some evrp issue when building libgo on both), ok for
trunk?

2021-10-18  Jakub Jelinek  

PR c++/102753
* typeck.c (cp_build_addr_expr_1): Diagnose taking address of
an immediate method.  Use t instead of TREE_OPERAND (arg, 1).

* g++.dg/cpp2a/consteval20.C: New test.

--- gcc/cp/typeck.c.jj  2021-10-05 09:53:55.382734051 +0200
+++ gcc/cp/typeck.c 2021-10-15 19:28:38.034213437 +0200
@@ -6773,9 +6773,21 @@ cp_build_addr_expr_1 (tree arg, bool str
return error_mark_node;
  }
 
+   if (TREE_CODE (t) == FUNCTION_DECL
+   && DECL_IMMEDIATE_FUNCTION_P (t)
+   && cp_unevaluated_operand == 0
+   && (current_function_decl == NULL_TREE
+   || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
+ {
+   if (complain & tf_error)
+ error_at (loc, "taking address of an immediate function %qD",
+   t);
+   return error_mark_node;
+ }
+
type = build_ptrmem_type (context_for_name_lookup (t),
  TREE_TYPE (t));
-   t = make_ptrmem_cst (type, TREE_OPERAND (arg, 1));
+   t = make_ptrmem_cst (type, t);
return t;
   }
 
--- gcc/testsuite/g++.dg/cpp2a/consteval20.C.jj 2021-10-15 19:40:38.691900472 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/consteval20.C2021-10-15 19:49:15.281508419 
+0200
@@ -0,0 +1,24 @@
+// PR c++/102753
+// { dg-do compile { target c++20 } }
+
+struct S {
+  consteval int foo () const { return 42; }
+};
+
+constexpr S s;
+
+int
+bar ()
+{
+  return (s.*::foo) ();  // { dg-error "taking address of an 
immediate function" }
+}
+
+constexpr auto a = ::foo;// { dg-error "taking address of an 
immediate function" }
+
+consteval int
+baz ()
+{
+  return (s.*::foo) ();
+}
+
+static_assert (baz () == 42);

Jakub



[PATCH] gcc: implement AIX-style constructors

2021-10-18 Thread CHIGOT, CLEMENT via Gcc-patches
AIX linker now supports constructors and destructors detection. For such
functions to be detected, their name must starts with __sinit or __sterm.
and -bcdtors must be passed to linker calls. It will create "_cdtors"
symbol which can be used to launch the initialization.

This patch creates a new RS6000 flag "-mcdtors=".
With "-mcdtors=aix", gcc will generate these new constructors/destructors.
With "-mcdtors=gcc", which is currently the default, gcc will continue
to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D
symbols).
Ideally, it would have been better to enable the AIX format by default
instead of using collect2. However, the compatibility between the
previously-built binaries and the new ones is too complex to be done.

gcc/ChangeLog:
2021-10-04  Clément Chigot  

        * collect2.c (aixbcdtors_flags): New variable.
        (main): Use it to detect -bcdtors and remove -binitfini flag.
        (write_c_file_stat): Adapt to new AIX format.
        * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define.
        (FILE_STERM_FORMAT): New define.
        (TARGET_FILE_FUNCTION_FORMAT): New define.
        * config/rs6000/aix64.opt: Add -mcdtors flag.
        * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when
          -mcdtors=aix is passed.
        * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise.
        * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise.
        * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum.
        * tree.c (get_file_function_name): Add
          TARGET_FILE_FUNCTION_FORMAT support.

gcc/testsuite/ChangeLog:
2021-10-04  Clément Chigot  

        * gcc.target/powerpc/constructor-aix.c: New test.


From e1297880a2abe53db6422bcf25dcd883a2658260 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Cl=C3=A9ment=20Chigot?= 
Date: Mon, 4 Oct 2021 09:24:43 +0200
Subject: [PATCH] gcc: implement AIX-style constructors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

AIX linker now supports constructors and destructors detection. For such
functions to be detected, their name must starts with __sinit or __sterm.
and -bcdtors must be passed to linker calls. It will create "_cdtors"
symbol which can be used to launch the initialization.

This patch creates a new RS6000 flag "-mcdtors=".
With "-mcdtors=aix", gcc will generate these new constructors/destructors.
With "-mcdtors=gcc", which is currently the default, gcc will continue
to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D
symbols).
Ideally, it would have been better to enable the AIX format by default
instead of using collect2. However, the compatibility between the
previously-built binaries and the new ones is too complex to be done.

gcc/ChangeLog:
2021-10-04  Clément Chigot  

	* collect2.c (aixbcdtors_flags): New variable.
	(main): Use it to detect -bcdtors and remove -binitfini flag.
	(write_c_file_stat): Adapt to new AIX format.
	* config/rs6000/aix.h (FILE_SINIT_FORMAT): New define.
	(FILE_STERM_FORMAT): New define.
	(TARGET_FILE_FUNCTION_FORMAT): New define.
	* config/rs6000/aix64.opt: Add -mcdtors flag.
	* config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when
	  -mcdtors=aix is passed.
	* config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise.
	* config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise.
	* config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum.
	* tree.c (get_file_function_name): Add
	  TARGET_FILE_FUNCTION_FORMAT support.

gcc/testsuite/ChangeLog:
2021-10-04  Clément Chigot  

	* gcc.target/powerpc/constructor-aix.c: New test.
---
 gcc/collect2.c| 91 +--
 gcc/config/rs6000/aix.h   | 56 
 gcc/config/rs6000/aix64.opt   | 17 
 gcc/config/rs6000/aix71.h |  2 +-
 gcc/config/rs6000/aix72.h |  2 +-
 gcc/config/rs6000/aix73.h |  2 +-
 gcc/config/rs6000/rs6000-opts.h   |  8 ++
 .../gcc.target/powerpc/constructor-aix.c  | 12 +++
 gcc/tree.c|  5 +
 9 files changed, 184 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/constructor-aix.c

diff --git a/gcc/collect2.c b/gcc/collect2.c
index 6f913041f26..59658cbadb7 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -186,6 +186,7 @@ static int aix64_flag;			/* true if -b64 */
 static int aixrtl_flag;			/* true if -brtl */
 static int aixlazy_flag;		/* true if -blazy */
 static int visibility_flag;		/* true if -fvisibility */
+static int aixbcdtors_flag;/* True if -bcdtors */
 #endif
 
 enum lto_mode_d {
@@ -984,6 +985,8 @@ main (int argc, char **argv)
 	  aixrtl_flag = 0;
 	else if (strcmp (argv[i], "-blazy") == 0)
 	  aixlazy_flag = 1;
+	else if (strcmp (argv[i], "-bcdtors") == 0)
+	  aixbcdtors_flag = 1;
 #endif
   }
 
@@ -1731,7 +1734,9 @@ main (int argc, char **argv)
   /* Tell the linker that we have 

Re: [Patch] libgomp.texi: Update OMP_PLACES

2021-10-18 Thread Jakub Jelinek via Gcc-patches
On Mon, Oct 18, 2021 at 09:22:51AM +0200, Tobias Burnus wrote:
> This patch updates the OMP_PLACES description for the recent
> OpenMP 5.1 changes.
> 
> OK?
> 
> I actually wonder when/whether the spec reference
> should be updated to OpenMP 5.1 or an additional
> reference to it should be added.
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> libgomp.texi: Update OMP_PLACES
> 
> libgomp/ChangeLog:
> 
>   * libgomp.texi (OMP_PLACES): Extend description for OMP 5.1 changes.
> 
> diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> index e9fa8ba0bf7..58d63c50935 100644
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -2031,18 +2031,22 @@ When undefined, @env{OMP_PROC_BIND} defaults to 
> @code{TRUE} when
>  @table @asis
>  @item @emph{Description}:
>  The thread placement can be either specified using an abstract name or by an
> -explicit list of the places.  The abstract names @code{threads}, @code{cores}
> -and @code{sockets} can be optionally followed by a positive number in
> -parentheses, which denotes the how many places shall be created.  With
> -@code{threads} each place corresponds to a single hardware thread; 
> @code{cores}
> -to a single core with the corresponding number of hardware threads; and with
> -@code{sockets} the place corresponds to a single socket.  The resulting
> -placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
> -variable.
> +explicit list of the places.  The abstract names @code{threads}, 
> @code{cores},
> +@code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
> +followed by a positive number in parentheses, which denotes the how many 
> places
> +shall be created.  With @code{threads} each place corresponds to a single
> +hardware thread; @code{cores} to a single core with the corresponding number 
> of
> +hardware threads; with @code{sockets} the place corresponds to a single
> +socket; with @code{ll_caches} to a set of cores that shares the last level
> +cache on the device; and @code{numa_domains} to a set of cores for which 
> their
> +closest memory on the device is the same meory and at a similar distance from
> +the cores.  The resulting placement can be shown by setting the
> +@env{OMP_DISPLAY_ENV} environment variable.
>  
>  Alternatively, the placement can be specified explicitly as comma-separated
> -list of places.  A place is specified by set of nonnegative numbers in curly
> -braces, denoting the denoting the hardware threads.  The hardware threads
> +list of places.  A place is specified by a single nonnegative number or
> +by a set of nonnegative numbers in curly braces, denoting the denoting
> +the hardware threads.  The hardware threads
>  belonging to a place can either be specified as comma-separated list of
>  nonnegative thread numbers or using an interval.  Multiple places can also be
>  either specified by a comma-separated list of places or by an interval.  To

The first paragraph looks good, but I think the latter change only adds to
confusion that the following text already has.
At least I'd say by a single nonnegative number which is treated as { number }
or ...
The confusion I see is that for both the placement list and place level the
wording first talks about comma-separated list of ..., and then later says
it can be specified either using that or by using an interval.
While actually it is always comma-separated list of ... or ... intervals,
one can mix the intervals with just mere places/numbers etc.
And there is no mention of ! and that the ! form can't use intervals.

Do you think you could try to reword it so that already in the first
sentence it shows all the 3 options ..., ... interval, !...
(where ... would be place or non-negative number)
or should I?
There is also "after after" in the paragraph.

Jakub



Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-18 Thread Aldy Hernandez via Gcc-patches




On 10/18/21 12:49 AM, Jeff Law wrote:



On 10/15/2021 4:39 AM, Aldy Hernandez wrote:



On 10/15/21 2:47 AM, Andrew MacLeod wrote:

On 10/14/21 6:07 PM, Martin Sebor via Gcc-patches wrote:

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Martin


I'm willing to bet he didn't submit it for review because he doesn't 
have time this release to polish and track it...  (I think the 
threader has been quite consuming).  Rather, it was offered as a 
starting point for someone else who might be interested in continuing 
to pursue this work...  *everyone* is interested in cleanup work 
others do :-)


Exactly.  There's a lot of work that could be done in this area, and 
I'm trying to avoid the situation with the threaders where what 
started as refactoring ended up with me basically owning them ;-).

I wouldn't go that far ;-)  I'm still here, just focused on other stuff.


Heh.  Indeed, and I'm very grateful for your guidance and quick reviews. 
 They've made a huge difference!


What I should've said was that I'm trying to avoid going too deep here, 
because things that seem simple tend to grow tentacles that drag me into 
months of work, or in the case of ranger, years ;-).


I'm hoping that by next year we can unify all the threaders, and I can 
put them back in the Pandora's box where they belong.






That being said, I there are enough cleanups that are useful on their 
own.  I've removed all the passing around of GSIs, as well as ptr_qry, 
with the exception of anything dealing with the sprintf pass, since it 
has a slightly different interface.
You know, it's funny.   The 0001 patch looks a lot like what I ended up 
doing here and there i when I start cleaning things up.  Pull state into 
a class, make functions which need the state member functions, repeat 
until it works.


I've found that abstracting all this out not only helps understand the 
code better, but it separates functionality making glimmers of APIs 
shine through.




This is patch 0001, which I'm formally submitting for inclusion. No 
functional changes with this patch.  OK for trunk?

I'll ACK this now :-)




Also, I am PINGing patch 0002, which is the strlen pass conversion to 
the ranger.  As mentioned, this is just a change from an evrp client 
to a ranger client.  The APIs are exactly the same, and besides, the 
evrp analyzer is deprecated and slated for removal. OK for trunk?
I'll defer on this a bit.  I've got to step away and may not be back 
online tonight.  I worry more about the unintended testsuite fallout 
here more than anything.  Which argues it should go into the tester to 
see if there is any such fallout :-)


Thanks for doing this.

Aldy



  1   2   >