date:20191106

[PATCH][vect] PR92317: fix skip_epilogue creation for epilogues

2019-11-06 Thread Andre Vieira (lists)


Hi,

When investigating PR92317 I noticed that when we create the skip 
epilogue condition, see ('if (skip_epilog)' in 'vect_do_peeling'), we 
only copy phi-nodes that are not constants in 
'slpeel_update_phi_nodes_for_guard2'.  This can cause problems later 
when we create the scalar epilogue for this epilogue, since if the 
'scalar_loop' is not the same as 'loop' 
'slpeel_tree_duplicate_loop_to_edge_cfg' will expect both to have 
identical single_exit bb's and use that to copy the current_def 
meta_data of phi-nodes.


This makes sure that is true even if these phi-nodes are constants, 
fixing PR92317.  I copied the failing testcase and added the options 
that originally made it fail.


Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:

2019-11-06  Andre Vieira  

   * tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_guard2): Also
   update phi's with constant phi arguments.


gcc/testsuite/ChangeLog:
2019-11-06  Andre Vieira  

   * gcc/testsuite/g++.dg/opt/pr92317.C
diff --git a/gcc/testsuite/g++.dg/opt/pr92317.C b/gcc/testsuite/g++.dg/opt/pr92317.C
new file mode 100644
index ..2bb9729fc961a998db9a88045ee04a81e12b07a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr92317.C
@@ -0,0 +1,51 @@
+// Copied from pr87967.C
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -ftree-vectorize -fno-tree-pre --param vect-epilogues-nomask=1" }
+
+void h();
+template  struct k { using d = b; };
+template  class> using e = k;
+template  class f>
+using g = typename e::d;
+struct l {
+  template  using ab = typename i::j;
+};
+struct n : l {
+  using j = g;
+};
+class o {
+public:
+  long r();
+};
+char m;
+char s() {
+  if (m)
+return '0';
+  return 'A';
+}
+class t {
+public:
+  typedef char *ad;
+  ad m_fn2();
+};
+void fn3() {
+  char *a;
+  t b;
+  bool p = false;
+  while (*a) {
+h();
+o c;
+if (*a)
+  a++;
+if (c.r()) {
+  n::j q;
+  for (t::ad d = b.m_fn2(), e; d != e; d++) {
+char f = *q;
+*d = f + s();
+  }
+  p = true;
+}
+  }
+  if (p)
+throw;
+}
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 1fbcaf2676f3a0995d4f63e18f4a689abeff..54f3ccf3ec373b5621e7778e6e80bab853a57687 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2291,12 +2291,14 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog,
 {
   gphi *update_phi = gsi.phi ();
   tree old_arg = PHI_ARG_DEF (update_phi, 0);
-  /* This loop-closed-phi actually doesn't represent a use out of the
-	 loop - the phi arg is a constant.  */
-  if (TREE_CODE (old_arg) != SSA_NAME)
-	continue;
 
-  tree merge_arg = get_current_def (old_arg);
+  tree merge_arg = NULL_TREE;
+
+  /* If the old argument is a SSA_NAME use its current_def.  */
+  if (TREE_CODE (old_arg) == SSA_NAME)
+	merge_arg = get_current_def (old_arg);
+  /* If it's a constant or doesn't have a current_def, just use the old
+	 argument.  */
   if (!merge_arg)
 	merge_arg = old_arg;

Re: [1/6] Fix vectorizable_conversion costs

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 3:25 PM Richard Sandiford
 wrote:
>
> This patch makes two tweaks to vectorizable_conversion.  The first
> is to use "modifier" to distinguish between promotion, demotion,
> and neither promotion nor demotion, rather than using a code for
> some cases and "modifier" for others.  The second is to take ncopies
> into account for the promotion and demotion costs; previously we gave
> multiple copies the same cost as a single copy.
>
> Later patches test this, but it seemed worth splitting out.

OK, but does ncopies properly handle unrolling with SLP?

Richard.

>
> 2019-11-05  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (vect_model_promotion_demotion_cost): Take the
> number of ncopies as an additional argument.
> (vectorizable_conversion): Update call accordingly.  Use "modifier"
> to check whether a conversion is between vectors with the same
> numbers of units.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-05 11:08:12.521631453 +
> +++ gcc/tree-vect-stmts.c   2019-11-05 14:17:43.330141911 +
> @@ -917,26 +917,27 @@ vect_model_simple_cost (stmt_vec_info st
>  }
>
>
> -/* Model cost for type demotion and promotion operations.  PWR is normally
> -   zero for single-step promotions and demotions.  It will be one if
> -   two-step promotion/demotion is required, and so on.  Each additional
> +/* Model cost for type demotion and promotion operations.  PWR is
> +   normally zero for single-step promotions and demotions.  It will be
> +   one if two-step promotion/demotion is required, and so on.  NCOPIES
> +   is the number of vector results (and thus number of instructions)
> +   for the narrowest end of the operation chain.  Each additional
> step doubles the number of instructions required.  */
>
>  static void
>  vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
> -   enum vect_def_type *dt, int pwr,
> +   enum vect_def_type *dt,
> +   unsigned int ncopies, int pwr,
> stmt_vector_for_cost *cost_vec)
>  {
> -  int i, tmp;
> +  int i;
>int inside_cost = 0, prologue_cost = 0;
>
>for (i = 0; i < pwr + 1; i++)
>  {
> -  tmp = (STMT_VINFO_TYPE (stmt_info) == type_promotion_vec_info_type) ?
> -   (i + 1) : i;
> -  inside_cost += record_stmt_cost (cost_vec, vect_pow2 (tmp),
> -  vec_promote_demote, stmt_info, 0,
> -  vect_body);
> +  inside_cost += record_stmt_cost (cost_vec, ncopies, vec_promote_demote,
> +  stmt_info, 0, vect_body);
> +  ncopies *= 2;
>  }
>
>/* FORNOW: Assuming maximum 2 args per stmts.  */
> @@ -4981,7 +4982,7 @@ vectorizable_conversion (stmt_vec_info s
>if (!vec_stmt)   /* transformation not required.  */
>  {
>DUMP_VECT_SCOPE ("vectorizable_conversion");
> -  if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
> +  if (modifier == NONE)
>  {
>   STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
>   vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node,
> @@ -4990,14 +4991,17 @@ vectorizable_conversion (stmt_vec_info s
>else if (modifier == NARROW)
> {
>   STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
> - vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
> - cost_vec);
> + /* The final packing step produces one vector result per copy.  */
> + vect_model_promotion_demotion_cost (stmt_info, dt, ncopies,
> + multi_step_cvt, cost_vec);
> }
>else
> {
>   STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
> - vect_model_promotion_demotion_cost (stmt_info, dt, multi_step_cvt,
> - cost_vec);
> + /* The initial unpacking step produces two vector results
> +per copy.  */
> + vect_model_promotion_demotion_cost (stmt_info, dt, ncopies * 2,
> + multi_step_cvt, cost_vec);
> }
>interm_types.release ();
>return true;

Re: [3/4] Don't vectorise single-iteration epilogues

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford
>  wrote:
>>
>> With a later patch I saw a case in which we peeled a single iteration
>> for gaps but didn't need to peel further iterations to make up a full
>> vector.  We then tried to vectorise the single-iteration epilogue.
>
> But when peeling for gaps we peel off a full vector iteration and thus
> have possibly VF-1 iterations in the epilogue, enough for vectorizing
> with VF/2?

Peeling for gaps just means we need to peel off one final scalar
iteration.  Often that means we need to peel more to keep the vector
loop operating on a multiple of VF, but if so, that additional peeling
counts as LOOP_VINFO_PEELING_FOR_NITER.

If we have a VF of 32 and a known iteration count of 65, we can peel a
single iteration for gaps without having to peel any more.  (Obviously
we'd peel that iteration anyway if we didn't have to peel it for gaps.)
And when using fully-masked/predicated loops, peeling one iteration for
gaps doesn't force us to peel more, even if the iteration count isn't
known.

Thanks,
Richard

>>
>> 2019-11-04  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize
>> the epilogue if there are peeled iterations for it to handle.
>>
>> Index: gcc/tree-vect-loop.c
>> ===
>> --- gcc/tree-vect-loop.c2019-11-04 15:18:26.684592505 +
>> +++ gcc/tree-vect-loop.c2019-11-04 15:18:36.608524542 +
>> @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo
>>   vect_epilogues = (!loop->simdlen
>> && loop->inner == NULL
>> && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
>> +   && LOOP_VINFO_PEELING_FOR_NITER 
>> (first_loop_vinfo)
>> /* For now only allow one epilogue loop.  */
>> && first_loop_vinfo->epilogue_vinfos.is_empty 
>> ());
>>

Re: [11/n] Support vectorisation with mixed vector sizes

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, Oct 25, 2019 at 2:43 PM Richard Sandiford
>  wrote:
>>
>> After previous patches, it's now possible to make the vectoriser
>> support multiple vector sizes in the same vector region, using
>> related_vector_mode to pick the right vector mode for a given
>> element mode.  No port yet takes advantage of this, but I have
>> a follow-on patch for AArch64.
>>
>> This patch also seemed like a good opportunity to add some more dump
>> messages: one to make it clear which vector size/mode was being used
>> when analysis passed or failed, and another to say when we've decided
>> to skip a redundant vector size/mode.
>
> OK.
>
> I wonder if, when we requested a specific size previously, we now
> have to verify we got that constraint satisfied after the change.
> Esp. the epilogue vectorization cases want to get V2DI
> from V4DI.
>
>   sz /= 2;
> - vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
> + vectype1 = get_related_vectype_for_scalar_type (TYPE_MODE (vectype),
> + scalar_type,
> + sz / scalar_bytes);
>
> doesn't look like an improvement in readability to me there.

Yeah, guess it isn't great.

> Maybe re-formulating the whole code in terms of lanes instead of size
> would make it easier to follow?

OK, how about this version?  It still won't win awards, but it's at
least a bit more readable.

Tested as before.

Richard

2019-11-06  Richard Sandiford  

gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode.  Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2019-11-06 12:35:12.460201615 +
+++ gcc/machmode.h  2019-11-06 12:35:27.972093472 +
@@ -258,6 +258,9 @@ #define CLASS_HAS_WIDER_MODES_P(CLASS)
   bool exists () const;
   template bool exists (U *) const;

+  bool operator== (const T ) const { return m_mode == m; }
+  bool operator!= (const T ) const { return m_mode != m; }
+
 private:
   machine_mode m_mode;
 };
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-11-06 12:35:12.764199495 +
+++ gcc/tree-vectorizer.h   2019-11-06 12:35:27.976093444 +
@@ -335,8 +335,9 @@ typedef std::pair vec_object
   /* Cost data used by the target cost model.  */
   void *target_cost_data;

-  /* If we've chosen a vector size for this vectorization region,
- this is one mode that has such a size, otherwise it is VOIDmode.  */
+  /* The argument we should pass to related_vector_mode when looking up
+ the vector mode for a scalar mode, or VOIDmode if we haven't yet
+ made any decisions about which vector modes to use.  */
   machine_mode vector_mode;

 private:
@@ -1609,8 +1610,9 @@ extern bool vect_can_advance_ivs_p (loop
 extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code);

 /* In tree-vect-stmts.c.  */
+extern tree get_related_vectype_for_scalar_type (machine_mode, tree,
+poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree);
-extern tree get_vectype_for_scalar_type_and_size (tree, poly_uint64);
 extern tree get_mask_type_for_scalar_type (vec_info *, tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_get_loop_mask_type (loop_vec_info);
Index: gcc/tree-vect-slp.c
===

Re: [14/n] Vectorise conversions between differently-sized integer vectors

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, Oct 25, 2019 at 2:51 PM Richard Sandiford
>  wrote:
>>
>> This patch adds AArch64 patterns for converting between 64-bit and
>> 128-bit integer vectors, and makes the vectoriser and expand pass
>> use them.
>
> So on GIMPLE we'll see
>
> v4si _1;
> v4di _2;
>
>  _1 = (v4si) _2;
>
> then, correct?  Likewise for float conversions.
>
> I think that's "new", can you add to tree-cfg.c:verify_gimple_assign_unary
> verification that the number of lanes of the LHS and the RHS match please?

Ah, yeah.  How's this?  Tested as before.

Richard

2019-11-06  Richard Sandiford  

gcc/
* tree-cfg.c (verify_gimple_assign_unary): Handle conversions
between vector types.
* tree-vect-stmts.c (vectorizable_conversion): Extend the
non-widening and non-narrowing path to handle standard
conversion codes, if the target supports them.
* expr.c (convert_move): Try using the extend and truncate optabs
for vectors.
* optabs-tree.c (supportable_convert_operation): Likewise.
* config/aarch64/iterators.md (Vnarroqw): New iterator.
* config/aarch64/aarch64-simd.md (2)
(trunc2): New patterns.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr69907.c: Do not expect BB vectorization
to fail for aarch64 targets.
* gcc.dg/vect/no-scevccp-outer-12.c: Expect the test to pass
on aarch64 targets.
* gcc.dg/vect/vect-double-reduc-5.c: Likewise.
* gcc.dg/vect/vect-outer-4e.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_5.c: New test.
* gcc.target/aarch64/vect_mixed_sizes_6.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_7.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_8.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_11.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_12.c: Likewise.
* gcc.target/aarch64/vect_mixed_sizes_13.c: Likewise.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  2019-09-05 08:49:30.829739618 +0100
+++ gcc/tree-cfg.c  2019-11-06 12:44:22.832365429 +
@@ -3553,6 +3553,24 @@ verify_gimple_assign_unary (gassign *stm
 {
 CASE_CONVERT:
   {
+   /* Allow conversions between vectors with the same number of elements,
+  provided that the conversion is OK for the element types too.  */
+   if (VECTOR_TYPE_P (lhs_type)
+   && VECTOR_TYPE_P (rhs1_type)
+   && known_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
+TYPE_VECTOR_SUBPARTS (rhs1_type)))
+ {
+   lhs_type = TREE_TYPE (lhs_type);
+   rhs1_type = TREE_TYPE (rhs1_type);
+ }
+   else if (VECTOR_TYPE_P (lhs_type) || VECTOR_TYPE_P (rhs1_type))
+ {
+   error ("invalid vector types in nop conversion");
+   debug_generic_expr (lhs_type);
+   debug_generic_expr (rhs1_type);
+   return true;
+ }
+
/* Allow conversions from pointer type to integral type only if
   there is no sign or zero extension involved.
   For targets were the precision of ptrofftype doesn't match that
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-06 12:44:10.896448608 +
+++ gcc/tree-vect-stmts.c   2019-11-06 12:44:22.832365429 +
@@ -4869,7 +4869,9 @@ vectorizable_conversion (stmt_vec_info s
   switch (modifier)
 {
 case NONE:
-  if (code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
+  if (code != FIX_TRUNC_EXPR
+ && code != FLOAT_EXPR
+ && !CONVERT_EXPR_CODE_P (code))
return false;
   if (supportable_convert_operation (code, vectype_out, vectype_in,
 , ))
Index: gcc/expr.c
===
--- gcc/expr.c  2019-11-06 12:29:17.394677341 +
+++ gcc/expr.c  2019-11-06 12:44:22.828365457 +
@@ -250,6 +250,31 @@ convert_move (rtx to, rtx from, int unsi

   if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
 {
+  if (GET_MODE_UNIT_PRECISION (to_mode)
+ > GET_MODE_UNIT_PRECISION (from_mode))
+   {
+ optab op = unsignedp ? zext_optab : sext_optab;
+ insn_code icode = convert_optab_handler (op, to_mode, from_mode);
+ if (icode != CODE_FOR_nothing)
+   {
+ emit_unop_insn (icode, to, from,
+ unsignedp ? ZERO_EXTEND : SIGN_EXTEND);
+ return;
+   }
+   }
+
+  if (GET_MODE_UNIT_PRECISION (to_mode)
+ < GET_MODE_UNIT_PRECISION (from_mode))
+   {
+ insn_code icode = convert_optab_handler (trunc_optab,
+  to_mode, from_mode);
+

Re: [3/4] Don't vectorise single-iteration epilogues

2019-11-06 Thread Richard Biener

On Wed, Nov 6, 2019 at 1:22 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford
> >  wrote:
> >>
> >> With a later patch I saw a case in which we peeled a single iteration
> >> for gaps but didn't need to peel further iterations to make up a full
> >> vector.  We then tried to vectorise the single-iteration epilogue.
> >
> > But when peeling for gaps we peel off a full vector iteration and thus
> > have possibly VF-1 iterations in the epilogue, enough for vectorizing
> > with VF/2?
>
> Peeling for gaps just means we need to peel off one final scalar
> iteration.  Often that means we need to peel more to keep the vector
> loop operating on a multiple of VF, but if so, that additional peeling
> counts as LOOP_VINFO_PEELING_FOR_NITER.
>
> If we have a VF of 32 and a known iteration count of 65, we can peel a
> single iteration for gaps without having to peel any more.  (Obviously
> we'd peel that iteration anyway if we didn't have to peel it for gaps.)
> And when using fully-masked/predicated loops, peeling one iteration for
> gaps doesn't force us to peel more, even if the iteration count isn't
> known.

For sure when we do not have any epiloge it's pointless to try vectorize it.
It seems LOOP_VINFO_PEELING_FOR_NITER is set in "interesting"
ways, deciphering it seems to show that when we have an epilogue but
not LOOP_VINFO_PEELING_FOR_NITER then that epilogue always
has a single iteration only.

So, OK ...

Richard.

> Thanks,
> Richard
>
> >>
> >> 2019-11-04  Richard Sandiford  
> >>
> >> gcc/
> >> * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize
> >> the epilogue if there are peeled iterations for it to handle.
> >>
> >> Index: gcc/tree-vect-loop.c
> >> ===
> >> --- gcc/tree-vect-loop.c2019-11-04 15:18:26.684592505 +
> >> +++ gcc/tree-vect-loop.c2019-11-04 15:18:36.608524542 +
> >> @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo
> >>   vect_epilogues = (!loop->simdlen
> >> && loop->inner == NULL
> >> && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
> >> +   && LOOP_VINFO_PEELING_FOR_NITER 
> >> (first_loop_vinfo)
> >> /* For now only allow one epilogue loop.  */
> >> && first_loop_vinfo->epilogue_vinfos.is_empty 
> >> ());
> >>

[PATCH][Arm] Only enable fsched-pressure with Ofast

2019-11-06 Thread Wilco Dijkstra

The current pressure scheduler doesn't appear to correctly track register
pressure and avoid creating unnecessary spills when register pressure is high.
As a result disabling the early scheduler improves integer performance
considerably and reduces codesize as a bonus. Since scheduling floating point
code is generally beneficial (more registers and higher latencies), only enable
the pressure scheduler with -Ofast.

On Cortex-A57 this gives a 0.7% performance gain on SPECINT2006 as well
as a 0.2% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-11-06  Wilco Dijkstra  

* gcc/common/config/arm-common.c (arm_option_optimization_table):
Enable fsched_pressure with Ofast only.

--
diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c
index 
41a920f6dc96833e778faa8dbcc19beac483734c..b761d3abd670a144a593c4b410b1e7fbdcb52f56
 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -38,7 +38,7 @@ static const struct default_options 
arm_option_optimization_table[] =
   {
 /* Enable section anchors by default at -O1 or higher.  */
 { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 },
-{ OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fsched_pressure, NULL, 1 },
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Patch 12/n makes the AArch64 port add four entries to
>> >> autovectorize_vector_modes.  Each entry describes a different
>> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> >> fewer element sizes than that, we could end up trying the same
>> >> combination of vector modes multiple times.  This patch adds a
>> >> check to prevent that.
>> >>
>> >> As before: each patch tested individually on aarch64-linux-gnu and the
>> >> series as a whole on x86_64-linux-gnu.
>> >>
>> >>
>> >> 2019-11-04  Richard Sandiford  
>> >>
>> >> gcc/
>> >> * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> >> (vec_info::used_vector_mode): New member variable.
>> >> (vect_chooses_same_modes_p): Declare.
>> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> >> chosen vector mode in vec_info::used_vector_mode.
>> >> (vect_chooses_same_modes_p): New function.
>> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> >> the same vector statements multiple times.
>> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>> >>
>> >> Index: gcc/tree-vectorizer.h
>> >> ===
>> >> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
>> >> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
>> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object
>> >>  /* Vectorizer state common between loop and basic-block vectorization.  
>> >> */
>> >>  class vec_info {
>> >>  public:
>> >> +  typedef hash_set > 
>> >> mode_set;
>> >>enum vec_kind { bb, loop };
>> >>
>> >>vec_info (vec_kind, void *, vec_info_shared *);
>> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object
>> >>/* Cost data used by the target cost model.  */
>> >>void *target_cost_data;
>> >>
>> >> +  /* The set of vector modes used in the vectorized region.  */
>> >> +  mode_set used_vector_modes;
>> >> +
>> >>/* The argument we should pass to related_vector_mode when looking up
>> >>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
>> >>   made any decisions about which vector modes to use.  */
>> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>> >>  extern tree get_same_sized_vectype (tree, tree);
>> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
>> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>> >> stmt_vec_info * = NULL, gimple ** = NULL);
>> >> Index: gcc/tree-vect-stmts.c
>> >> ===
>> >> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
>> >> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
>> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>> >>   scalar_type);
>> >>if (vectype && vinfo->vector_mode == VOIDmode)
>> >>  vinfo->vector_mode = TYPE_MODE (vectype);
>> >> +
>> >> +  if (vectype)
>> >> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> >> +
>> >
>> > Do we actually end up _using_ all types returned by this function?
>>
>> No, not all of them, so it's a bit crude.  E.g. some types might end up
>> not being relevant after pattern recognition, or after we've made a
>> final decision about which parts of an address calculation to include
>> in a gather or scatter op.  So we can still end up retrying the same
>> thing even after the patch.
>>
>> The problem is that we're trying to avoid pointless retries on failure
>> as well as success, so we could end up stopping at arbitrary points.
>> I wasn't sure where else to handle this.
>
> Yeah, I think this "iterating" is somewhat bogus (crude) now.

I think it was crude even before the series though. :-)  Not sure the
series is making things worse.

The problem is that there's a chicken-and-egg problem between how
we decide to vectorise and which vector subarchitecture and VF we use.
E.g. if we have:

  unsigned char *x, *y;
  ...
  x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;

do we build the SLP graph on the assumption that we need to use short
elements, or on the assumption that we can use IFN_AVG_CEIL?  This
affects the VF we get out: using IFN_AVG_CEIL gives double the VF
relative to doing unsigned short arithmetic.

And we need to know which vector subarchitecture we're targetting when
making that decision:

Re: [PATCH][vect] PR92317: fix skip_epilogue creation for epilogues

2019-11-06 Thread Andre Vieira (lists)


Sorry for the double post, ignore please.

On 06/11/2019 10:57, Andre Vieira (lists) wrote:

Hi,

When investigating PR92317 I noticed that when we create the skip 
epilogue condition, see ('if (skip_epilog)' in 'vect_do_peeling'), we 
only copy phi-nodes that are not constants in 
'slpeel_update_phi_nodes_for_guard2'.  This can cause problems later 
when we create the scalar epilogue for this epilogue, since if the 
'scalar_loop' is not the same as 'loop' 
'slpeel_tree_duplicate_loop_to_edge_cfg' will expect both to have 
identical single_exit bb's and use that to copy the current_def 
meta_data of phi-nodes.


This makes sure that is true even if these phi-nodes are constants, 
fixing PR92317.  I copied the failing testcase and added the options 
that originally made it fail.


Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:

2019-11-06  Andre Vieira  

    * tree-vect-loop-manip.c (slpeel_update_phi_nodes_for_guard2): Also
    update phi's with constant phi arguments.


gcc/testsuite/ChangeLog:
2019-11-06  Andre Vieira  

    * gcc/testsuite/g++.dg/opt/pr92317.C

Re: [PATCH] [ARC] Add builtins for identifying floating point support

2019-11-06 Thread Claudiu Zissulescu

Ok, I'll push it asap.

Thank you for your help,
Claudiu

On Tue, Nov 5, 2019 at 8:19 PM Vineet Gupta  wrote:
>
> Currently for hard float we need to check for
>  __ARC_FPU_SP__ || __ARC_FPU_DP__ and for soft float inverse of that.
> So define single convenience macros for either cases
>
> gcc/
> -xx-xx  Vineet Gupta  
>
> * config/arc/arc-c.c (arc_cpu_cpp_builtins): Add
>   __arc_hard_float__, __ARC_HARD_FLOAT__,
>   __arc_soft_float__, __ARC_SOFT_FLOAT__
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/ChangeLog  |  6 ++
>  gcc/config/arc/arc-c.c | 10 ++
>  2 files changed, 16 insertions(+)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index f3deffc701ff..9237e81aa011 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2019-11-05  Vineet Gupta  
> +
> +   * config/arc/arc-c.c (arc_cpu_cpp_builtins) : Add
> +   __arc_hard_float__, __ARC_HARD_FLOAT__,
> +   __arc_soft_float__, __ARC_SOFT_FLOAT__
> +
>  2019-11-05  Martin Sebor  
>
> PR middle-end/92333
> diff --git a/gcc/config/arc/arc-c.c b/gcc/config/arc/arc-c.c
> index cf3340d29c27..1a5ff3e88a67 100644
> --- a/gcc/config/arc/arc-c.c
> +++ b/gcc/config/arc/arc-c.c
> @@ -71,4 +71,14 @@ arc_cpu_cpp_builtins (cpp_reader * pfile)
>if (TARGET_BIG_ENDIAN)
>  builtin_define ("__big_endian__");
>
> +  if (TARGET_HARD_FLOAT)
> +{
> +  builtin_define ("__arc_hard_float__");
> +  builtin_define ("__ARC_HARD_FLOAT__");
> +}
> +  else
> +{
> +  builtin_define ("__arc_soft_float__");
> +  builtin_define ("__ARC_SOFT_FLOAT__");
> +}
>  }
> --
> 2.20.1
>

Re: [4/4] Use scan-tree-dump instead of scan-tree-dump-times for some vect tests

2019-11-06 Thread Richard Biener

On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford
 wrote:
>
> With later patches, we're able to vectorise the epilogues of these tests
> on AArch64 and so get two instances of "vectorizing stmts using SLP".
> Although it would be possible with a bit of effort to predict when
> this happens, it doesn't seem important whether we get 1 vs. 2
> occurrences.  All that matters is zero vs. nonzero.

OK.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/testsuite/
> * gcc.dg/vect/slp-9.c: Use scan-tree-dump rather than
> scan-tree-dump-times.
> * gcc.dg/vect/slp-widen-mult-s16.c: Likewise.
> * gcc.dg/vect/slp-widen-mult-u8.c: Likewise.
>
> Index: gcc/testsuite/gcc.dg/vect/slp-9.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-9.c   2019-03-08 18:15:02.276871200 +
> +++ gcc/testsuite/gcc.dg/vect/slp-9.c   2019-11-04 15:18:14.656674872 +
> @@ -44,5 +44,5 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_widen_mult_hi_to_si } } }*/
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target vect_widen_mult_hi_to_si } } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
> vect_widen_mult_hi_to_si } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c  2019-03-08 
> 18:15:02.304871094 +
> +++ gcc/testsuite/gcc.dg/vect/slp-widen-mult-s16.c  2019-11-04 
> 15:18:14.656674872 +
> @@ -38,5 +38,5 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_hi_to_si || vect_unpack } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
> { vect_widen_mult_hi_to_si || vect_unpack } } } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c   2019-03-08 
> 18:15:02.292871138 +
> +++ gcc/testsuite/gcc.dg/vect/slp-widen-mult-u8.c   2019-11-04 
> 15:18:14.668674793 +
> @@ -38,5 +38,5 @@ int main (void)
>  }
>
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_qi_to_hi || vect_unpack } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target { vect_widen_mult_hi_to_si || vect_unpack } } } } */
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target 
> { vect_widen_mult_hi_to_si || vect_unpack } } } } */
>

Re: [PATCH] Support multiple registers for the frame pointer

2019-11-06 Thread Kwok Cheung Yeung


On 04/11/2019 08:02 pm, Dimitar Dimitrov wrote:

On Sat, 2 Nov 2019, 19:28:38 EET Kwok Cheung Yeung wrote:

diff --git a/gcc/ira.c b/gcc/ira.c
index 9f8da67..25e9359 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -515,7 +515,13 @@ setup_alloc_regs (bool use_hard_frame_p)
   #endif
 no_unit_alloc_regs = fixed_nonglobal_reg_set;
 if (! use_hard_frame_p)
-SET_HARD_REG_BIT (no_unit_alloc_regs, HARD_FRAME_POINTER_REGNUM);
+{
+  int fp_reg_count = hard_regno_nregs (HARD_FRAME_POINTER_REGNUM,
Pmode);
+  for (int reg = HARD_FRAME_POINTER_REGNUM;
+  reg < HARD_FRAME_POINTER_REGNUM + fp_reg_count;
+  reg++)
+   SET_HARD_REG_BIT (no_unit_alloc_regs, reg);
+}

Please consider using the existing helper function instead:
add_to_hard_reg_set (_unit_alloc_regs, Pmode, reg);



Thank you for the suggestion - I have applied the change. As Vladimir 
has already given his approval for the patch, I will commit it later 
today if there are no objections.


Best regards

Kwok Yeung


Add support for using multiple registers to hold the frame pointer

2019-11-06  Kwok Cheung Yeung  

gcc/
* ira.c (setup_alloc_regs): Setup no_unit_alloc_regs for
frame pointer in multiple registers.
(ira_setup_eliminable_regset): Setup eliminable_regset,
ira_no_alloc_regs and regs_ever_live for frame pointer in
multiple registers.

diff --git a/gcc/ira.c b/gcc/ira.c
index 9f8da67..5df9953 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -515,7 +515,8 @@ setup_alloc_regs (bool use_hard_frame_p)
 #endif
   no_unit_alloc_regs = fixed_nonglobal_reg_set;
   if (! use_hard_frame_p)
-SET_HARD_REG_BIT (no_unit_alloc_regs, HARD_FRAME_POINTER_REGNUM);
+add_to_hard_reg_set (_unit_alloc_regs, Pmode,
+HARD_FRAME_POINTER_REGNUM);
   setup_class_hard_regs ();
 }

@@ -2248,6 +2249,7 @@ ira_setup_eliminable_regset (void)
 {
   int i;
   static const struct {const int from, to; } eliminables[] = 
ELIMINABLE_REGS;

+  int fp_reg_count = hard_regno_nregs (HARD_FRAME_POINTER_REGNUM, Pmode);

   /* Setup is_leaf as frame_pointer_required may use it.  This function
  is called by sched_init before ira if scheduling is enabled.  */
@@ -2276,7 +2278,8 @@ ira_setup_eliminable_regset (void)
frame pointer in LRA.  */

   if (frame_pointer_needed)
-df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true);
+for (i = 0; i < fp_reg_count; i++)
+  df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);

   ira_no_alloc_regs = no_unit_alloc_regs;
   CLEAR_HARD_REG_SET (eliminable_regset);
@@ -2306,17 +2309,21 @@ ira_setup_eliminable_regset (void)
 }
   if (!HARD_FRAME_POINTER_IS_FRAME_POINTER)
 {
-  if (!TEST_HARD_REG_BIT (crtl->asm_clobbers, 
HARD_FRAME_POINTER_REGNUM))

-   {
- SET_HARD_REG_BIT (eliminable_regset, HARD_FRAME_POINTER_REGNUM);
- if (frame_pointer_needed)
-   SET_HARD_REG_BIT (ira_no_alloc_regs, HARD_FRAME_POINTER_REGNUM);
-   }
-  else if (frame_pointer_needed)
-   error ("%s cannot be used in % here",
-  reg_names[HARD_FRAME_POINTER_REGNUM]);
-  else
-   df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM, true);
+  for (i = 0; i < fp_reg_count; i++)
+   if (!TEST_HARD_REG_BIT (crtl->asm_clobbers,
+   HARD_FRAME_POINTER_REGNUM + i))
+ {
+   SET_HARD_REG_BIT (eliminable_regset,
+ HARD_FRAME_POINTER_REGNUM + i);
+   if (frame_pointer_needed)
+ SET_HARD_REG_BIT (ira_no_alloc_regs,
+   HARD_FRAME_POINTER_REGNUM + i);
+ }
+   else if (frame_pointer_needed)
+ error ("%s cannot be used in % here",
+reg_names[HARD_FRAME_POINTER_REGNUM + i]);
+   else
+ df_set_regs_ever_live (HARD_FRAME_POINTER_REGNUM + i, true);
 }
 }

Re: introduce -fcallgraph-info option

2019-11-06 Thread Alexandre Oliva

On Nov  4, 2019, Richard Biener  wrote:

> Please leave that part out for now, I'd rather discuss this separately
> from the bulk of the patch.  That is, I wonder why we shouldn't
> simply adjust aux_base_name to something else for -flto [in the driver].

*nod*, that makes sense to me.  After seeing your suggestion, I started
looking into how to do that, but didn't get very far yet.  For now, I've
split that bit out of the main patch.  So I'm installing the first, big
one, and not installing the latter, posted mainly so that the
documentation bit can be picked up.  Thanks!


introduce -fcallgraph-info option

This was first submitted many years ago
https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html

The command line option -fcallgraph-info is added and makes the
compiler generate another output file (xxx.ci) for each compilation
unit (or LTO partitoin), which is a valid VCG file (you can launch
your favorite VCG viewer on it unmodified) and contains the "final"
callgraph of the unit.  "final" is a bit of a misnomer as this is
actually the callgraph at RTL expansion time, but since most
high-level optimizations are done at the Tree level and RTL doesn't
usually fiddle with calls, it's final in almost all cases.  Moreover,
the nodes can be decorated with additional info: -fcallgraph-info=su
adds stack usage info and -fcallgraph-info=da dynamic allocation info.


for  gcc/ChangeLog
>From  Eric Botcazou  , Alexandre Oliva  
>

* common.opt (-fcallgraph-info[=]): New option.
* doc/invoke.texi (Developer options): Document it.
* opts.c (common_handle_option): Handle it.
* builtins.c (expand_builtin_alloca): Record allocation if
-fcallgraph-info=da.
* calls.c (expand_call): If -fcallgraph-info, record the call.
(emit_library_call_value_1): Likewise.
* flag-types.h (enum callgraph_info_type): New type.
* explow.c: Include stringpool.h.
(set_stack_check_libfunc): Set SET_SYMBOL_REF_DECL on the symbol.
* function.c (allocate_stack_usage_info): New.
(allocate_struct_function): Call it for -fcallgraph-info.
(prepare_function_start): Call it otherwise.
(record_final_call, record_dynamic_alloc): New.
* function.h (struct callinfo_callee): New.
(CALLEE_FROM_CGRAPH_P): New.
(struct callinfo_dalloc): New.
(struct stack_usage): Add callees and dallocs.
(record_final_call, record_dynamic_alloc): Declare.
* gimplify.c (gimplify_decl_expr): Record dynamically-allocated
object if -fcallgraph-info=da.
* optabs-libfuncs.c (build_libfunc_function): Keep SYMBOL_REF_DECL.
* print-tree.h (print_decl_identifier): Declare.
(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
* print-tree.c: Include print-tree.h.
(print_decl_identifier): New function.
* toplev.c: Include print-tree.h.
(callgraph_info_file): New global variable.
(callgraph_info_external_printed): Likewise.
(output_stack_usage): Rename to...
(output_stack_usage_1): ... this.  Make it static, add cf
parameter.  If -fcallgraph-info=su, print stack usage to cf.
If -fstack-usage, use print_decl_identifier for
pretty-printing.
(INDIRECT_CALL_NAME): New.
(dump_final_node_vcg_start): New.
(dump_final_callee_vcg, dump_final_node_vcg): New.
(output_stack_usage): New.
(lang_dependent_init): Open and start file if
-fcallgraph-info.  Allocated callgraph_info_external_printed.
(finalize): If callgraph_info_file is not null, finish it,
close it, and release callgraph_info_external_printed.

for  gcc/ada/ChangeLog

* gcc-interface/misc.c (callgraph_info_file): Delete.
---
 gcc/ada/gcc-interface/misc.c |3 -
 gcc/builtins.c   |4 +
 gcc/calls.c  |6 +
 gcc/common.opt   |8 ++
 gcc/doc/invoke.texi  |   23 +
 gcc/explow.c |5 +
 gcc/flag-types.h |   16 
 gcc/function.c   |   59 +-
 gcc/function.h   |   30 +++
 gcc/gimplify.c   |4 +
 gcc/optabs-libfuncs.c|4 -
 gcc/opts.c   |   26 ++
 gcc/output.h |2 
 gcc/print-tree.c |   76 ++
 gcc/print-tree.h |4 +
 gcc/toplev.c |  178 ++
 16 files changed, 397 insertions(+), 51 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 4abd4d5..d68b373 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -54,9 +54,6 @@
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info_file = NULL;
-
 /* Command-line argc and argv.  These variables are global since they are

Re: [3/4] Don't vectorise single-iteration epilogues

2019-11-06 Thread Richard Biener

On Mon, Nov 4, 2019 at 4:30 PM Richard Sandiford
 wrote:
>
> With a later patch I saw a case in which we peeled a single iteration
> for gaps but didn't need to peel further iterations to make up a full
> vector.  We then tried to vectorise the single-iteration epilogue.

But when peeling for gaps we peel off a full vector iteration and thus
have possibly VF-1 iterations in the epilogue, enough for vectorizing
with VF/2?

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_analyze_loop): Only try to vectorize
> the epilogue if there are peeled iterations for it to handle.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-04 15:18:26.684592505 +
> +++ gcc/tree-vect-loop.c2019-11-04 15:18:36.608524542 +
> @@ -2462,6 +2462,7 @@ vect_analyze_loop (class loop *loop, loo
>   vect_epilogues = (!loop->simdlen
> && loop->inner == NULL
> && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
> +   && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
> /* For now only allow one epilogue loop.  */
> && first_loop_vinfo->epilogue_vinfos.is_empty ());
>

[PATCH] Some vectorizable_reduction TLC

2019-11-06 Thread Richard Biener



Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-06  Richard Biener  

* tree-vect-loop.c (vectorizable_reduction): Remember reduction
PHI.  Use STMT_VINFO_REDUC_IDX to skip the reduction operand.
Simplify single_defuse_cycle condition.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 277874)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5725,6 +5725,7 @@ vectorizable_reduction (stmt_vec_info st
 }
 
   stmt_vec_info orig_stmt_of_analysis = stmt_info;
+  stmt_vec_info phi_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
   || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {
@@ -5749,8 +5750,8 @@ vectorizable_reduction (stmt_vec_info st
  bool res = single_imm_use (gimple_phi_result (stmt_info->stmt),
 _p, _stmt);
  gcc_assert (res);
- stmt_info = loop_vinfo->lookup_stmt (use_stmt);
- stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (stmt_info));
+ phi_info = loop_vinfo->lookup_stmt (use_stmt);
+ stmt_info = vect_stmt_to_vectorize (STMT_VINFO_REDUC_DEF (phi_info));
}
   /* STMT_VINFO_REDUC_DEF doesn't point to the first but the last
  element.  */
@@ -5760,6 +5761,8 @@ vectorizable_reduction (stmt_vec_info st
  stmt_info = REDUC_GROUP_FIRST_ELEMENT (stmt_info);
}
 }
+  /* PHIs should not participate in patterns.  */
+  gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
 
   if (nested_in_vect_loop_p (loop, stmt_info))
 {
@@ -5822,9 +5825,6 @@ vectorizable_reduction (stmt_vec_info st
  The last use is the reduction variable.  In case of nested cycle this
  assumption is not true: we use reduc_index to record the index of the
  reduction variable.  */
-  stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
-  /* PHIs should not participate in patterns.  */
-  gcc_assert (!STMT_VINFO_RELATED_STMT (phi_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
 
   /* Verify following REDUC_IDX from the latch def leads us back to the PHI
@@ -5873,11 +5873,8 @@ vectorizable_reduction (stmt_vec_info st
 "use not simple.\n");
  return false;
}
-  if ((dt == vect_reduction_def || dt == vect_nested_cycle)
- && op == reduc_def)
-   {
- continue;
-   }
+  if (i == STMT_VINFO_REDUC_IDX (stmt_info))
+   continue;
 
   /* There should be only one cycle def in the stmt, the one
  leading to reduc_def.  */
@@ -6338,14 +6335,9 @@ vectorizable_reduction (stmt_vec_info st
This only works when we see both the reduction PHI and its only consumer
in vectorizable_reduction and there are no intermediate stmts
participating.  */
-  stmt_vec_info use_stmt_info;
-  tree reduc_phi_result = gimple_phi_result (reduc_def_phi);
   if (ncopies > 1
   && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
-  && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result))
-  && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
- || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
-  && vect_stmt_to_vectorize (use_stmt_info) == stmt_info)
+  && reduc_chain_length == 1)
 single_defuse_cycle = true;
 
   if (single_defuse_cycle || lane_reduc_code_p)

Re: [PATCH][RFC] Param to options conversion (demo).

2019-11-06 Thread Richard Biener

On Wed, Nov 6, 2019 at 9:56 AM Martin Liška  wrote:
>
> On 11/5/19 5:01 PM, Richard Biener wrote:
> > On Tue, Nov 5, 2019 at 4:22 PM Martin Liška  wrote:
> >>
> >> On 11/5/19 3:13 PM, Richard Biener wrote:
> >>> On Thu, Oct 31, 2019 at 2:17 PM Martin Liška  wrote:
> 
>  On 10/31/19 2:16 PM, Martin Liška wrote:
> > On 10/31/19 2:01 PM, Martin Liška wrote:
> >> Hi.
> >>
> >> Based on the discussion with Honza and Richard I'm sending a proposal
> >> for conversion of param machinery into the existing option machinery.
> >> Our motivation for the change is to provide per function param values,
> >> similarly what 'Optimization' keyword does for options.
> >>
> >> Right now, we support the following format:
> >> gcc --param=lto-partitions=4 /tmp/main.c -c
> >>
> >> And so that I decided to name newly the params like:
> >>
> >> -param=ipa-sra-ptr-growth-factor=
> >> Common Joined UInteger Var(param_ipa_sra_ptr_growth_factor) Init(2) 
> >> Param Optimization
> >> Maximum allowed growth of number and total size of new parameters
> >> that ipa-sra replaces a pointer to an aggregate with.
> >>
> >> And I learnt decoder to parse '--param' 'name=value' as 
> >> '--param=name=value'. Doing that
> >> the transformation works. Help provides reasonable output as well:
> >>
> >> $ ./xgcc -B. --param predictable-branch-outcome=5  /tmp/main.c -c -Q 
> >> --help=param
> >> The --param option recognizes the following as parameters:
> >> --param=ipa-sra-ptr-growth-factor= 2
> >> --param=predictable-branch-outcome=<0,50>  5
> >>
> >> Thoughts?
> >> Thanks,
> >> Martin
> >>
> >> ---
> >>gcc/common.opt| 18 +++---
> >>gcc/ipa-sra.c |  3 +--
> >>gcc/opt-functions.awk |  3 ++-
> >>gcc/opts-common.c |  9 +
> >>gcc/opts.c| 36 
> >>gcc/params.def| 10 --
> >>gcc/predict.c |  4 ++--
> >>7 files changed, 25 insertions(+), 58 deletions(-)
> >>
> >>
> >
> > I forgot to add gcc-patches to To.
> >
> > Martin
> >
> 
>  + the patch.
> >>>
> >>> Nice.
> >>
> >> Thanks.
> >>
> >>>
> >>> I wonder if we can auto-generate params.h so that
> >>> PARAM_VALUE (...) can continue to "work"?  But maybe that's too much
> >>> and against making them first-class (but "unsupported") options.  At least
> >>> it would make the final patch _much_ smaller... (one could think of
> >>> auto-generating an enum and using an array of params for the storage
> >>> again - but then possibly split for [non-]Optimization - ugh).  If we
> >>> (auto-)name
> >>> the variables all-uppercase like PARAM_IPA_SRA_PTR_GROWTH_FACTOR
> >>> we could have
> >>>
> >>> #define PARAM_VALUE (x) x
> >>>
> >>> ... (that said, everything that helps making the transition hit GCC 10
> >>> is appreciated ;))
> >>
> >> Well, to be honest I would like to get rid of the current syntax:
> >> PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) and replace it with 
> >> something
> >> what we have for normal options: param_predictable_branch_outcome.
> >> It will require a quite big mechanical change in usage, but I can do
> >> the replacement almost automatically.
> >
> > OK, the more interesting uses are probably maybe_set_param_value ...
>
> Which is actually for free ;) Please see the attached patch where I introduced
> a new macro SET_OPTION_IF_UNSET.

works for me.

> Do you like it so that I can transform current options with that:
>
> -  if (!opts_set->x_flag_branch_probabilities)
> -opts->x_flag_branch_probabilities = value;
> +  SET_OPTION_IF_UNSET (opts, opts_set, flag_branch_probabilities, value);
>
> ?
>
> >
> >>>
> >>> For
> >>>
> >>> +-param=ipa-sra-ptr-growth-factor=
> >>> +Common Joined UInteger Var(param_ipa_sra_ptr_growth_factor) Init(2)
> >>> Param Optimization
> >>>
> >>> I wonder if both Var(...) and Param can be "autodetected" (aka
> >>> actually required)?
> >>
> >> Right now, Var is probably not required. Param can be handled by putting 
> >> all
> >> the params into a new param.opt file.
> >
> > Param could be also autodetected from the name of the option (so could Var).
>
> Well, looking at the current options, we are also quite explicit about 
> various option flags.
> I'll auto-generate the new params.opt file, so setting Var and Param will be 
> for free.

ok

> > Why is Var not required?
>
> You are right, it's required.
>
> >
> >>>
> >>> At least the core of the patch looks nicely small!  How do the OPT_ enum 
> >>> values
> >>> for a --param look like?
> >>
> >> Yep, I also like how small it is.
> >>
> >> OPT__param_ipa_sra_ptr_growth_factor_ = 62,/* 
> >> --param=ipa-sra-ptr-growth-factor= */
> >> OPT__param_predictable_branch_outcome_ = 63,/* 
> >> --param=predictable-branch-outcome= */
> >
> >

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
 wrote:
>
> This patch adds a mode in which the vectoriser tries each available
> base vector mode and picks the one with the lowest cost.  For now
> the behaviour is behind a default-off --param, but a later patch
> enables it by default for SVE.
>
> The patch keeps the current behaviour of preferring a VF of
> loop->simdlen over any larger or smaller VF, regardless of costs
> or target preferences.

Can you avoid using a --param for this?  Instead I'd suggest to
amend the vectorize_modes target hook to return some
flags like VECT_FIRST_MODE_WINS.  We'd eventually want
to make the target able to say do-not-vectorize-epiloges-of-MODE
(I think we may not want to vectorize SSE vectorized loop
epilogues with MMX-with-SSE or GPRs for example).  I guess
for the latter we'd use a new target hook.

Otherwise looks reasonable.

Richard.

>
> 2019-11-05  Richard Sandiford  
>
> gcc/
> * params.def (vect-compare-loop-costs): New param.
> * doc/invoke.texi: Document it.
> * tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
> (_loop_vec_info::vec_inside_cost): New member variables.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
> (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
> (vect_analyze_loop): When the new parameter allows, try vectorizing
> the loop with each available vector mode and picking the one with
> the lowest cost.
> (vect_estimate_min_profitable_iters): Record the computed costs
> in the loop_vec_info.
>
> Index: gcc/params.def
> ===
> --- gcc/params.def  2019-10-31 17:15:25.470517368 +
> +++ gcc/params.def  2019-11-05 14:19:58.781197820 +
> @@ -661,6 +661,13 @@ DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIG
>   "Maximum number of loop peels to enhance alignment of data 
> references in a loop.",
>   -1, -1, 64)
>
> +DEFPARAM(PARAM_VECT_COMPARE_LOOP_COSTS,
> +"vect-compare-loop-costs",
> +"Whether to try vectorizing a loop using each supported"
> +" combination of vector types and picking the version with the"
> +" lowest cost.",
> +0, 0, 1)
> +
>  DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
>  "max-cselib-memory-locations",
>  "The maximum memory locations recorded by cselib.",
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi 2019-11-04 21:13:57.611756365 +
> +++ gcc/doc/invoke.texi 2019-11-05 14:19:58.777197850 +
> @@ -11563,6 +11563,12 @@ doing loop versioning for alias in the v
>  The maximum number of loop peels to enhance access alignment
>  for vectorizer. Value -1 means no limit.
>
> +@item vect-compare-loop-costs
> +Whether to try vectorizing a loop using each supported combination of
> +vector types and picking the version with the lowest cost.  This parameter
> +has no effect when @option{-fno-vect-cost-model} or
> +@option{-fvect-cost-model=unlimited} are used.
> +
>  @item max-iterations-to-track
>  The maximum number of iterations of a loop the brute-force algorithm
>  for analysis of the number of iterations of the loop tries to evaluate.
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-05 14:19:33.829371745 +
> +++ gcc/tree-vectorizer.h   2019-11-05 14:19:58.781197820 +
> @@ -601,6 +601,13 @@ typedef class _loop_vec_info : public ve
>/* Cost of a single scalar iteration.  */
>int single_scalar_iteration_cost;
>
> +  /* The cost of the vector prologue and epilogue, including peeled
> + iterations and set-up code.  */
> +  int vec_outside_cost;
> +
> +  /* The cost of the vector loop body.  */
> +  int vec_inside_cost;
> +
>/* Is the loop vectorizable? */
>bool vectorizable;
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-05 14:19:33.829371745 +
> +++ gcc/tree-vect-loop.c2019-11-05 14:19:58.781197820 +
> @@ -830,6 +830,8 @@ _loop_vec_info::_loop_vec_info (class lo
>  scan_map (NULL),
>  slp_unrolling_factor (1),
>  single_scalar_iteration_cost (0),
> +vec_outside_cost (0),
> +vec_inside_cost (0),
>  vectorizable (false),
>  can_fully_mask_p (true),
>  fully_masked_p (false),
> @@ -2373,6 +2375,80 @@ vect_analyze_loop_2 (loop_vec_info loop_
>goto start_over;
>  }
>
> +/* Return true if vectorizing a loop using NEW_LOOP_VINFO appears
> +   to be better than vectorizing it using OLD_LOOP_VINFO.  Assume that
> +   OLD_LOOP_VINFO is better unless something specifically indicates
> +   otherwise.
> +
> +   Note that this deliberately isn't a partial order.  */
> +
> +static bool
> +vect_better_loop_vinfo_p

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
>> >  wrote:
>> >>
>> >> Richard Biener  writes:
>> >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>> >> >  wrote:
>> >> >>
>> >> >> Patch 12/n makes the AArch64 port add four entries to
>> >> >> autovectorize_vector_modes.  Each entry describes a different
>> >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> >> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> >> >> fewer element sizes than that, we could end up trying the same
>> >> >> combination of vector modes multiple times.  This patch adds a
>> >> >> check to prevent that.
>> >> >>
>> >> >> As before: each patch tested individually on aarch64-linux-gnu and the
>> >> >> series as a whole on x86_64-linux-gnu.
>> >> >>
>> >> >>
>> >> >> 2019-11-04  Richard Sandiford  
>> >> >>
>> >> >> gcc/
>> >> >> * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> >> >> (vec_info::used_vector_mode): New member variable.
>> >> >> (vect_chooses_same_modes_p): Declare.
>> >> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> >> >> chosen vector mode in vec_info::used_vector_mode.
>> >> >> (vect_chooses_same_modes_p): New function.
>> >> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> >> >> the same vector statements multiple times.
>> >> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>> >> >>
>> >> >> Index: gcc/tree-vectorizer.h
>> >> >> ===
>> >> >> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
>> >> >> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
>> >> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object
>> >> >>  /* Vectorizer state common between loop and basic-block 
>> >> >> vectorization.  */
>> >> >>  class vec_info {
>> >> >>  public:
>> >> >> +  typedef hash_set > 
>> >> >> mode_set;
>> >> >>enum vec_kind { bb, loop };
>> >> >>
>> >> >>vec_info (vec_kind, void *, vec_info_shared *);
>> >> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object
>> >> >>/* Cost data used by the target cost model.  */
>> >> >>void *target_cost_data;
>> >> >>
>> >> >> +  /* The set of vector modes used in the vectorized region.  */
>> >> >> +  mode_set used_vector_modes;
>> >> >> +
>> >> >>/* The argument we should pass to related_vector_mode when looking 
>> >> >> up
>> >> >>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
>> >> >>   made any decisions about which vector modes to use.  */
>> >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>> >> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>> >> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>> >> >>  extern tree get_same_sized_vectype (tree, tree);
>> >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>> >> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
>> >> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type 
>> >> >> *,
>> >> >> stmt_vec_info * = NULL, gimple ** = 
>> >> >> NULL);
>> >> >> Index: gcc/tree-vect-stmts.c
>> >> >> ===
>> >> >> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
>> >> >> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
>> >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>> >> >>   scalar_type);
>> >> >>if (vectype && vinfo->vector_mode == VOIDmode)
>> >> >>  vinfo->vector_mode = TYPE_MODE (vectype);
>> >> >> +
>> >> >> +  if (vectype)
>> >> >> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> >> >> +
>> >> >
>> >> > Do we actually end up _using_ all types returned by this function?
>> >>
>> >> No, not all of them, so it's a bit crude.  E.g. some types might end up
>> >> not being relevant after pattern recognition, or after we've made a
>> >> final decision about which parts of an address calculation to include
>> >> in a gather or scatter op.  So we can still end up retrying the same
>> >> thing even after the patch.
>> >>
>> >> The problem is that we're trying to avoid pointless retries on failure
>> >> as well as success, so we could end up stopping at arbitrary points.
>> >> I wasn't sure where else to handle this.
>> >
>> > Yeah, I think this "iterating" is somewhat bogus (crude) now.
>>
>> I think it was crude even before the series though. :-)  Not sure the
>> series is making things worse.
>>
>> The problem is that there's a chicken-and-egg problem between how
>> we decide to vectorise and which vector subarchitecture and VF we use.
>> E.g. if we have:
>>
>>

Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses

2019-11-06 Thread Harwath, Frederik

Hi Thomas,

On 05.11.19 15:22, Thomas Schwinge wrote:

> For your convenience, I'm attaching an incremental patch, to be merged
> into yours.> [...]> With that addressed, OK for trunk.

Thank you. I have merged the patches and committed.

> A few more comments to address separately, later on.

I will look into your remaining questions.

Best regards,
Frederik

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Biener

On Wed, Nov 6, 2019 at 12:02 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Patch 12/n makes the AArch64 port add four entries to
> >> >> autovectorize_vector_modes.  Each entry describes a different
> >> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> >> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> >> >> fewer element sizes than that, we could end up trying the same
> >> >> combination of vector modes multiple times.  This patch adds a
> >> >> check to prevent that.
> >> >>
> >> >> As before: each patch tested individually on aarch64-linux-gnu and the
> >> >> series as a whole on x86_64-linux-gnu.
> >> >>
> >> >>
> >> >> 2019-11-04  Richard Sandiford  
> >> >>
> >> >> gcc/
> >> >> * tree-vectorizer.h (vec_info::mode_set): New typedef.
> >> >> (vec_info::used_vector_mode): New member variable.
> >> >> (vect_chooses_same_modes_p): Declare.
> >> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> >> >> chosen vector mode in vec_info::used_vector_mode.
> >> >> (vect_chooses_same_modes_p): New function.
> >> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> >> >> the same vector statements multiple times.
> >> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
> >> >>
> >> >> Index: gcc/tree-vectorizer.h
> >> >> ===
> >> >> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
> >> >> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
> >> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object
> >> >>  /* Vectorizer state common between loop and basic-block vectorization. 
> >> >>  */
> >> >>  class vec_info {
> >> >>  public:
> >> >> +  typedef hash_set > 
> >> >> mode_set;
> >> >>enum vec_kind { bb, loop };
> >> >>
> >> >>vec_info (vec_kind, void *, vec_info_shared *);
> >> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object
> >> >>/* Cost data used by the target cost model.  */
> >> >>void *target_cost_data;
> >> >>
> >> >> +  /* The set of vector modes used in the vectorized region.  */
> >> >> +  mode_set used_vector_modes;
> >> >> +
> >> >>/* The argument we should pass to related_vector_mode when looking up
> >> >>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
> >> >>   made any decisions about which vector modes to use.  */
> >> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
> >> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> >> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> >> >>  extern tree get_same_sized_vectype (tree, tree);
> >> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
> >> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
> >> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> >> >> stmt_vec_info * = NULL, gimple ** = 
> >> >> NULL);
> >> >> Index: gcc/tree-vect-stmts.c
> >> >> ===
> >> >> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
> >> >> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
> >> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
> >> >>   scalar_type);
> >> >>if (vectype && vinfo->vector_mode == VOIDmode)
> >> >>  vinfo->vector_mode = TYPE_MODE (vectype);
> >> >> +
> >> >> +  if (vectype)
> >> >> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> >> >> +
> >> >
> >> > Do we actually end up _using_ all types returned by this function?
> >>
> >> No, not all of them, so it's a bit crude.  E.g. some types might end up
> >> not being relevant after pattern recognition, or after we've made a
> >> final decision about which parts of an address calculation to include
> >> in a gather or scatter op.  So we can still end up retrying the same
> >> thing even after the patch.
> >>
> >> The problem is that we're trying to avoid pointless retries on failure
> >> as well as success, so we could end up stopping at arbitrary points.
> >> I wasn't sure where else to handle this.
> >
> > Yeah, I think this "iterating" is somewhat bogus (crude) now.
>
> I think it was crude even before the series though. :-)  Not sure the
> series is making things worse.
>
> The problem is that there's a chicken-and-egg problem between how
> we decide to vectorise and which vector subarchitecture and VF we use.
> E.g. if we have:
>
>   unsigned char *x, *y;
>   ...
>   x[i] = (unsigned short) (x[i] + y[i] + 1) >> 1;
>
> do we build the SLP graph on the assumption that we need to use short
>

Re: [1/4] Restructure vect_analyze_loop

2019-11-06 Thread Richard Biener

On Mon, Nov 4, 2019 at 4:26 PM Richard Sandiford
 wrote:
>
> Once vect_analyze_loop has found a valid loop_vec_info X, we carry
> on searching for alternatives if (1) X doesn't satisfy simdlen or
> (2) we want to vectorise the epilogue of X.  I have a patch that
> optionally adds a third reason: we want to see if there are cheaper
> alternatives to X.
>
> This patch restructures vect_analyze_loop so that it's easier
> to add more reasons for continuing.  There's supposed to be no
> behavioural change.
>
> If we wanted to, we could allow vectorisation of epilogues once
> loop->simdlen has been reached by changing "loop->simdlen" to
> "simdlen" in the new vect_epilogues condition.  That should be
> a separate change though.
>
> This may conflict with Andre's fix for libgomp; I'll adjust if
> that goes in first.

OK.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_analyze_loop): Break out of the main
> loop when we've finished, rather than returning directly from
> the loop.  Use a local variable to track whether we're still
> searching for the preferred simdlen.  Make vect_epilogues
> record whether the next iteration should try to treat the
> loop as an epilogue.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-10-31 17:15:24.838521761 +
> +++ gcc/tree-vect-loop.c2019-11-04 15:17:35.924940111 +
> @@ -2383,16 +2383,13 @@ vect_analyze_loop (class loop *loop, loo
>poly_uint64 lowest_th = 0;
>unsigned vectorized_loops = 0;
>
> -  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, this
> - is not a simd loop and it is the most inner loop.  */
> -  bool vect_epilogues
> -= !loop->simdlen && loop->inner == NULL
> -  && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
> +  bool vect_epilogues = false;
> +  opt_result res = opt_result::success ();
> +  unsigned HOST_WIDE_INT simdlen = loop->simdlen;
>while (1)
>  {
>/* Check the CFG characteristics of the loop (nesting, entry/exit).  */
> -  opt_loop_vec_info loop_vinfo
> -   = vect_analyze_loop_form (loop, shared);
> +  opt_loop_vec_info loop_vinfo = vect_analyze_loop_form (loop, shared);
>if (!loop_vinfo)
> {
>   if (dump_enabled_p ())
> @@ -2407,67 +2404,70 @@ vect_analyze_loop (class loop *loop, loo
>
>if (orig_loop_vinfo)
> LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo;
> -  else if (vect_epilogues && first_loop_vinfo)
> +  else if (vect_epilogues)
> LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = first_loop_vinfo;
>
> -  opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, _stmts);
> +  res = vect_analyze_loop_2 (loop_vinfo, fatal, _stmts);
>if (next_size == 0)
> autodetected_vector_size = loop_vinfo->vector_size;
>
> +  loop->aux = NULL;
>if (res)
> {
>   LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1;
>   vectorized_loops++;
>
> - if ((loop->simdlen
> -  && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> -   (unsigned HOST_WIDE_INT) loop->simdlen))
> - || vect_epilogues)
> + /* Once we hit the desired simdlen for the first time,
> +discard any previous attempts.  */
> + if (simdlen
> + && known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), simdlen))
> {
> - if (first_loop_vinfo == NULL)
> -   {
> - first_loop_vinfo = loop_vinfo;
> - lowest_th
> -   = LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo);
> - loop->aux = NULL;
> -   }
> - else
> -   {
> - /* Keep track of vector sizes that we know we can vectorize
> -the epilogue with.  Only vectorize first epilogue.  */
> - if (vect_epilogues
> - && first_loop_vinfo->epilogue_vinfos.is_empty ())
> -   {
> - loop->aux = NULL;
> - first_loop_vinfo->epilogue_vinfos.reserve (1);
> - first_loop_vinfo->epilogue_vinfos.quick_push 
> (loop_vinfo);
> - LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = 
> first_loop_vinfo;
> - poly_uint64 th
> -   = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
> - gcc_assert (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
> - || maybe_ne (lowest_th, 0U));
> - /* Keep track of the known smallest versioning
> -threshold.  */
> - if (ordered_p (lowest_th, th))
> -   lowest_th = ordered_min (lowest_th, th);
> -   }
> - else
> -   delete

Re: [2/4] Check the VF is small enough for an epilogue loop

2019-11-06 Thread Richard Biener

On Mon, Nov 4, 2019 at 4:29 PM Richard Sandiford
 wrote:
>
> The number of iterations of an epilogue loop is always smaller than the
> VF of the main loop.  vect_analyze_loop_costing was taking this into
> account when deciding whether the loop is cheap enough to vectorise,
> but that has no effect with the unlimited cost model.  We need to use
> a separate check for correctness as well.
>
> This can happen if the sizes returned by autovectorize_vector_sizes
> happen to be out of order, e.g. because the target prefers smaller
> vectors.  It can also happen with later patches if two vectorisation
> attempts happen to end up with the same VF.

OK.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_analyze_loop_2): When vectorizing an
> epilogue loop, make sure that the VF is small enough or that
> the epilogue loop can be fully-masked.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-04 15:17:35.924940111 +
> +++ gcc/tree-vect-loop.c2019-11-04 15:17:50.736838681 +
> @@ -2142,6 +2142,16 @@ vect_analyze_loop_2 (loop_vec_info loop_
>" support peeling for gaps.\n");
>  }
>
> +  /* If we're vectorizing an epilogue loop, we either need a fully-masked
> + loop or a loop that has a lower VF than the main loop.  */
> +  if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && !LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +  && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
> +  LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
> +return opt_result::failure_at (vect_location,
> +  "Vectorization factor too high for"
> +  " epilogue loop.\n");
> +
>/* Check the costings of the loop make vectorizing worthwhile.  */
>res = vect_analyze_loop_costing (loop_vinfo);
>if (res < 0)

Re: [3/6] Avoid accounting for non-existent vector loop versioning

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 3:28 PM Richard Sandiford
 wrote:
>
> vect_analyze_loop_costing uses two profitability thresholds: a runtime
> one and a static compile-time one.  The runtime one is simply the point
> at which the vector loop is cheaper than the scalar loop, while the
> static one also takes into account the cost of choosing between the
> scalar and vector loops at runtime.  We compare this static cost against
> the expected execution frequency to decide whether it's worth generating
> any vector code at all.
>
> However, we never reclaimed the cost of applying the runtime threshold
> if it turned out that the vector code can always be used.  And we only
> know whether that's true once we've calculated what the runtime
> threshold would be.

OK.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
> New function.
> * tree-vect-loop-manip.c (vect_loop_versioning): Use it.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> (vect_transform_loop): Likewise.
> (vect_analyze_loop_costing): Don't take the cost of versioning
> into account for the static profitability threshold if it turns
> out that no versioning is needed.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-05 11:14:42.786884473 +
> +++ gcc/tree-vectorizer.h   2019-11-05 14:19:33.829371745 +
> @@ -1557,6 +1557,17 @@ vect_get_scalar_dr_size (dr_vec_info *dr
>return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_info->dr;
>  }
>
> +/* Return true if LOOP_VINFO requires a runtime check for whether the
> +   vector loop is profitable.  */
> +
> +inline bool
> +vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo)
> +{
> +  unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> +  return (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> + && th >= vect_vf_for_cost (loop_vinfo));
> +}
> +
>  /* Source location + hotness information. */
>  extern dump_user_location_t vect_location;
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2019-11-05 10:38:31.838181047 +
> +++ gcc/tree-vect-loop-manip.c  2019-11-05 14:19:33.825371773 +
> @@ -3173,8 +3173,7 @@ vect_loop_versioning (loop_vec_info loop
>  = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
>unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>
> -  if (th >= vect_vf_for_cost (loop_vinfo)
> -  && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +  if (vect_apply_runtime_profitability_check_p (loop_vinfo)
>&& !ordered_p (th, versioning_threshold))
>  cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
>  build_int_cst (TREE_TYPE (scalar_loop_iters),
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-05 11:14:42.782884501 +
> +++ gcc/tree-vect-loop.c2019-11-05 14:19:33.829371745 +
> @@ -1689,6 +1689,24 @@ vect_analyze_loop_costing (loop_vec_info
>return 0;
>  }
>
> +  /* The static profitablity threshold min_profitable_estimate includes
> + the cost of having to check at runtime whether the scalar loop
> + should be used instead.  If it turns out that we don't need or want
> + such a check, the threshold we should use for the static estimate
> + is simply the point at which the vector loop becomes more profitable
> + than the scalar loop.  */
> +  if (min_profitable_estimate > min_profitable_iters
> +  && !LOOP_REQUIRES_VERSIONING (loop_vinfo)
> +  && !LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +  && !LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +  && !vect_apply_runtime_profitability_check_p (loop_vinfo))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_NOTE, vect_location, "no need for a runtime"
> +" choice between the scalar and vector loops\n");
> +  min_profitable_estimate = min_profitable_iters;
> +}
> +
>HOST_WIDE_INT estimated_niter;
>
>/* If we are vectorizing an epilogue then we know the maximum number of
> @@ -2225,8 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>
>/*  Use the same condition as vect_transform_loop to decide when to use
>   the cost to determine a versioning threshold.  */
> -  if (th >= vect_vf_for_cost (loop_vinfo)
> - && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +  if (vect_apply_runtime_profitability_check_p (loop_vinfo)
>   && ordered_p (th, niters_th))
> niters_th = ordered_max (poly_uint64 (th), niters_th);
>
> @@ -8268,14 +8285,13 @@ vect_transform_loop (loop_vec_info loop_
>   run at least the (estimated) vectorization factor number of

Re: [2/6] Don't assign a cost to vectorizable_assignment

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 3:27 PM Richard Sandiford
 wrote:
>
> vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
> we don't see in practice) and no-op conversions that are required
> to maintain correct gimple, such as changes between signed and
> unsigned types.  These cases shouldn't generate any code and so
> shouldn't count against either the scalar or vector costs.
>
> Later patches test this, but it seemed worth splitting out.

Hmm, but you have to adjust vect_compute_single_scalar_iteration_cost and
possibly the SLP cost walk as well, otherwise we're artificially making
those copies cheaper when vectorized.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (vectorizable_assignment): Don't add a cost.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-05 14:17:43.330141911 +
> +++ gcc/tree-vect-stmts.c   2019-11-05 14:18:39.169752725 +
> @@ -5305,7 +5305,7 @@ vectorizable_conversion (stmt_vec_info s
>  static bool
>  vectorizable_assignment (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
>  stmt_vec_info *vec_stmt, slp_tree slp_node,
> -stmt_vector_for_cost *cost_vec)
> +stmt_vector_for_cost *)
>  {
>tree vec_dest;
>tree scalar_dest;
> @@ -5313,7 +5313,6 @@ vectorizable_assignment (stmt_vec_info s
>loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>tree new_temp;
>enum vect_def_type dt[1] = {vect_unknown_def_type};
> -  int ndts = 1;
>int ncopies;
>int i, j;
>vec vec_oprnds = vNULL;
> @@ -5409,7 +5408,8 @@ vectorizable_assignment (stmt_vec_info s
>  {
>STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
>DUMP_VECT_SCOPE ("vectorizable_assignment");
> -  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, 
> cost_vec);
> +  /* Don't add a cost here.  SSA copies and no-op conversions
> +shouldn't generate any code in either scalar or vector form.  */
>return true;
>  }
>

Re: [5/6] Account for the cost of generating loop masks

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 3:31 PM Richard Sandiford
 wrote:
>
> We didn't take the cost of generating loop masks into account, and so
> tended to underestimate the cost of loops that need multiple masks.

OK.

>
> 2019-11-05  Richard Sandiford  
>
> gcc/
> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Include
> the cost of generating loop masks.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/mask_struct_store_3.c: Add
> -fno-vect-cost-model.
> * gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
> * gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
> * gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-05 14:19:58.781197820 +
> +++ gcc/tree-vect-loop.c2019-11-05 14:20:40.188909187 +
> @@ -3435,6 +3435,32 @@ vect_estimate_min_profitable_iters (loop
>   si->kind, si->stmt_info, si->misalign,
>   vect_epilogue);
> }
> +
> +  /* Calculate how many masks we need to generate.  */
> +  unsigned int num_masks = 0;
> +  rgroup_masks *rgm;
> +  unsigned int num_vectors_m1;
> +  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm)
> +   if (rgm->mask_type)
> + num_masks += num_vectors_m1 + 1;
> +  gcc_assert (num_masks > 0);
> +
> +  /* In the worst case, we need to generate each mask in the prologue
> +and in the loop body.  One of the loop body mask instructions
> +replaces the comparison in the scalar loop, and since we don't
> +count the scalar comparison against the scalar body, we shouldn't
> +count that vector instruction against the vector body either.
> +
> +Sometimes we can use unpacks instead of generating prologue
> +masks and sometimes the prologue mask will fold to a constant,
> +so the actual prologue cost might be smaller.  However, it's
> +simpler and safer to use the worst-case cost; if this ends up
> +being the tie-breaker between vectorizing or not, then it's
> +probably better not to vectorize.  */
> +  (void) add_stmt_cost (target_cost_data, num_masks, vector_stmt,
> +   NULL, 0, vect_prologue);
> +  (void) add_stmt_cost (target_cost_data, num_masks - 1, vector_stmt,
> +   NULL, 0, vect_body);
>  }
>else if (npeel < 0)
>  {
> Index: gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c  2019-03-08 
> 18:14:29.768994780 +
> +++ gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3.c  2019-11-05 
> 14:20:40.184909216 +
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
>
>  #include 
>
> Index: gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c  
> 2019-03-08 18:14:29.772994767 +
> +++ gcc/testsuite/gcc.target/aarch64/sve/mask_struct_store_3_run.c  
> 2019-11-05 14:20:40.184909216 +
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target aarch64_sve_hw } } */
> -/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */
> +/* { dg-options "-O2 -ftree-vectorize -ffast-math -fno-vect-cost-model" } */
>
>  #include "mask_struct_store_3.c"
>
> Index: gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c   2019-03-08 
> 18:14:29.776994751 +
> +++ gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3.c   2019-11-05 
> 14:20:40.184909216 +
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* Pick an arbitrary target for which unaligned accesses are more
> expensive.  */
> -/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx" } */
> +/* { dg-options "-O3 -msve-vector-bits=256 -mtune=thunderx 
> -fno-vect-cost-model" } */
>
>  #define N 32
>  #define MAX_START 8
> Index: gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c
> ===
> --- gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c   2019-03-08 
> 18:14:29.784994721 +
> +++ gcc/testsuite/gcc.target/aarch64/sve/peel_ind_3_run.c   2019-11-05 
> 14:20:40.184909216 +
> @@ -1,6 +1,6 @@
>  /* { dg-do run { target aarch64_sve_hw } } */
> -/* { dg-options "-O3 -mtune=thunderx" } */
> -/* { dg-options "-O3 -mtune=thunderx -msve-vector-bits=256" { target 
> aarch64_sve256_hw } } */
> +/* { dg-options "-O3 -mtune=thunderx

Re: [Patch] Add OpenACC 2.6's no_create

2019-11-06 Thread Thomas Schwinge

Hi Tobias!

On 2019-11-06T00:47:05+0100, I wrote:
> --- a/libgomp/testsuite/libgomp.oacc-fortran/common-block-2.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/common-block-2.f90
> @@ -76,7 +76,9 @@ program main
>  
>!$acc enter data create(b)
>  
> -  !$acc parallel loop pcopy(b)
> +  !$acc parallel loop &
> +  !$acc   no_create(b) ! ... here means 'present(b)'.
> +  !TODO But we get: "libgomp: cuStreamSynchronize error: an illegal memory 
> access was encountered".
>do i = 1, n
>   b(i) = i
>end do

Either I'm completely confused -- always possible ;-) -- or there's
something wrong; see the two attached test cases, not actually related to
Fortran common blocks at all.  If such a basic usage of the 'no_create'
clause doesn't work...?  So, again..., seems that my suspicion was right
that this patch doesn't have sufficient test coverage at all.  Or, I'm
completely confused -- we still have that option, too.  ;-\


Grüße
 Thomas


From 38fcb35dcb98b0fd709db72896455895243d8e54 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 6 Nov 2019 13:39:12 +0100
Subject: [PATCH] 'libgomp.oacc-c-c++-common/common-block-2_.c',
 'libgomp.oacc-fortran/common-block-2_.f90'

---
 .../common-block-2_.c | 19 +++
 .../libgomp.oacc-fortran/common-block-2_.f90  | 23 +++
 2 files changed, 42 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/common-block-2_.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/common-block-2_.f90

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/common-block-2_.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/common-block-2_.c
new file mode 100644
index 000..5cf547049ab
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/common-block-2_.c
@@ -0,0 +1,19 @@
+// Adapted/reduced from 'libgomp.oacc-fortran/common-block-2.f90'.
+
+int main()
+{
+#define N 100
+  float b[N];
+
+#pragma acc enter data create(b)
+
+#pragma acc parallel loop \
+  /*present(b)*/ /* ... works.  */ \
+  no_create(b) /* ... here also means 'present(b)', but we get: "libgomp: cuStreamSynchronize error: an illegal memory access was encountered".  */
+  for (int i = 0; i < N; ++i)
+b[i] = i;
+
+#pragma acc exit data delete(b)
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/common-block-2_.f90 b/libgomp/testsuite/libgomp.oacc-fortran/common-block-2_.f90
new file mode 100644
index 000..f3f25869bea
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/common-block-2_.f90
@@ -0,0 +1,23 @@
+! { dg-do run }
+
+! Adapted/reduced from 'libgomp.oacc-fortran/common-block-2.f90'.
+
+program main
+  implicit none
+  integer i
+  integer, parameter :: n = 100
+  real*4 b(n)
+  !common /BLOCK/ b
+
+  !$acc enter data create(b)
+
+  !$acc parallel loop &
+  !!$acc   present(b) ! ... works.
+  !$acc   no_create(b) ! ... here also means 'present(b)', but we get: "libgomp: cuStreamSynchronize error: an illegal memory access was encountered".
+  do i = 1, n
+ b(i) = i
+  end do
+  !$acc end parallel loop
+
+  !$acc exit data delete(b)
+end program main
-- 
2.17.1



signature.asc
Description: PGP signature

[PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-06 Thread frederik

From: frederik 

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
warnings if it is violated.

2019-11-06  Gergö Barany  
Frederik Harwath  
Thomas Schwinge  

gcc/
* omp-low.c (struct omp_context): New fields
local_reduction_clauses, outer_reduction_clauses.
(new_omp_context): Initialize these.
(scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
(scan_omp_for): Check reduction clauses for incorrect nesting.
gcc/testsuite/
* c-c++-common/goacc/nested-reductions-warn.c: New test.
* c-c++-common/goacc/nested-reductions.c: New test.
* gfortran.dg/goacc/nested-reductions-warn.f90: New test.
* gfortran.dg/goacc/nested-reductions.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
Add expected warnings about missing reduction clauses.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Likewise.

Reviewed-by: Thomas Schwinge 



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277875 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  10 +
 gcc/omp-low.c |  97 +++
 gcc/testsuite/ChangeLog   |   9 +
 .../goacc/nested-reductions-warn.c| 525 ++
 .../c-c++-common/goacc/nested-reductions.c| 420 +++
 .../goacc/nested-reductions-warn.f90  | 674 ++
 .../gfortran.dg/goacc/nested-reductions.f90   | 540 ++
 libgomp/ChangeLog |  11 +
 .../par-loop-comb-reduction-1.c   |   2 +-
 .../par-loop-comb-reduction-2.c   |   2 +-
 .../par-loop-comb-reduction-3.c   |   2 +-
 .../par-loop-comb-reduction-4.c   |   2 +-
 12 files changed, 2290 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-warn.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7fee0f37e9bf..38160dd631e9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2019-11-06  Gergö Barany  
+   Frederik Harwath  
+   Thomas Schwinge  
+
+   * omp-low.c (struct omp_context): New fields
+   local_reduction_clauses, outer_reduction_clauses.
+   (new_omp_context): Initialize these.
+   (scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
+   (scan_omp_for): Check reduction clauses for incorrect nesting.
+   
 2019-11-06  Jakub Jelinek  
 
PR inline-asm/92352
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 122f42788813..fa76ceba33c6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -128,6 +128,12 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->outer = outer_ctx;
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
   ctx->depth = outer_ctx->depth + 1;
 }
   else
@@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = NULL;
   ctx->depth = 1;
 }
 
@@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  goto do_private;
 
case OMP_CLAUSE_REDUCTION:
+ if

Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-06 Thread Jakub Jelinek

On Wed, Nov 06, 2019 at 01:41:47PM +0100, frede...@codesourcery.com wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -128,6 +128,12 @@ struct omp_context
>   corresponding tracking loop iteration variables.  */
>hash_map *lastprivate_conditional_map;
>  
> +  /* A tree_list of the reduction clauses in this context.  */
> +  tree local_reduction_clauses;
> +
> +  /* A tree_list of the reduction clauses in outer contexts.  */
> +  tree outer_reduction_clauses;

Could there be acc in the name to make it clear it is OpenACC only?

>/* Nesting depth of this context.  Used to beautify error messages re
>   invalid gotos.  The outermost ctx is depth 1, with depth 0 being
>   reserved for the main body of the function.  */
> @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>ctx->outer = outer_ctx;
>ctx->cb = outer_ctx->cb;
>ctx->cb.block = NULL;
> +  ctx->local_reduction_clauses = NULL;
> +  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
>ctx->depth = outer_ctx->depth + 1;
>  }
>else
> @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
>ctx->cb.adjust_array_error_bounds = true;
>ctx->cb.dont_remap_vla_if_no_change = true;
> +  ctx->local_reduction_clauses = NULL;
> +  ctx->outer_reduction_clauses = NULL;

The = NULL assignments are unnecessary in all 3 cases, ctx is allocated with
XCNEW.
>ctx->depth = 1;
>  }
>  
> @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> goto do_private;
>  
>   case OMP_CLAUSE_REDUCTION:
> +   if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
> + ctx->local_reduction_clauses
> +   = tree_cons (NULL, c, ctx->local_reduction_clauses);

I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
more natural, wouldn't it.
Or, wouldn't it be better to do this checking in the gimplifier instead of
omp-low.c?  There we have splay trees with GOVD_REDUCTION etc. for the
variables, so it wouldn't be O(#reductions^2) compile time?
It is true that the gimplifier doesn't record the reduction codes (after
all, OpenMP has UDRs and so there can be fairly arbitrary reductions).
Consider million reduction clauses on nested loops.
If gimplifier is not the right spot, then use a splay tree + vector instead?
splay tree for the outer ones, vector for the local ones, and put into both
the clauses, so you can compare reduction code etc.

Jakub

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
 wrote:
>
> Patch 12/n makes the AArch64 port add four entries to
> autovectorize_vector_modes.  Each entry describes a different
> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> fewer element sizes than that, we could end up trying the same
> combination of vector modes multiple times.  This patch adds a
> check to prevent that.
>
> As before: each patch tested individually on aarch64-linux-gnu and the
> series as a whole on x86_64-linux-gnu.
>
>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (vec_info::mode_set): New typedef.
> (vec_info::used_vector_mode): New member variable.
> (vect_chooses_same_modes_p): Declare.
> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> chosen vector mode in vec_info::used_vector_mode.
> (vect_chooses_same_modes_p): New function.
> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> the same vector statements multiple times.
> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
> @@ -298,6 +298,7 @@ typedef std::pair vec_object
>  /* Vectorizer state common between loop and basic-block vectorization.  */
>  class vec_info {
>  public:
> +  typedef hash_set > mode_set;
>enum vec_kind { bb, loop };
>
>vec_info (vec_kind, void *, vec_info_shared *);
> @@ -335,6 +336,9 @@ typedef std::pair vec_object
>/* Cost data used by the target cost model.  */
>void *target_cost_data;
>
> +  /* The set of vector modes used in the vectorized region.  */
> +  mode_set used_vector_modes;
> +
>/* The argument we should pass to related_vector_mode when looking up
>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
>   made any decisions about which vector modes to use.  */
> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>  extern tree get_same_sized_vectype (tree, tree);
> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> stmt_vec_info * = NULL, gimple ** = NULL);
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>   scalar_type);
>if (vectype && vinfo->vector_mode == VOIDmode)
>  vinfo->vector_mode = TYPE_MODE (vectype);
> +
> +  if (vectype)
> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> +

Do we actually end up _using_ all types returned by this function?

Otherwise OK.

Richard.

>return vectype;
>  }
>
> @@ -11274,6 +11278,20 @@ get_same_sized_vectype (tree scalar_type
>   scalar_type, nunits);
>  }
>
> +/* Return true if replacing LOOP_VINFO->vector_mode with VECTOR_MODE
> +   would not change the chosen vector modes.  */
> +
> +bool
> +vect_chooses_same_modes_p (vec_info *vinfo, machine_mode vector_mode)
> +{
> +  for (vec_info::mode_set::iterator i = vinfo->used_vector_modes.begin ();
> +   i != vinfo->used_vector_modes.end (); ++i)
> +if (!VECTOR_MODE_P (*i)
> +   || related_vector_mode (vector_mode, GET_MODE_INNER (*i), 0) != *i)
> +  return false;
> +  return true;
> +}
> +
>  /* Function vect_is_simple_use.
>
> Input:
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-05 10:48:11.238092407 +
> +++ gcc/tree-vect-loop.c2019-11-05 10:57:41.658071173 +
> @@ -2430,6 +2430,19 @@ vect_analyze_loop (class loop *loop, vec
> }
>
>loop->aux = NULL;
> +
> +  if (!fatal)
> +   while (mode_i < vector_modes.length ()
> +  && vect_chooses_same_modes_p (loop_vinfo, 
> vector_modes[mode_i]))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "* The result for vector mode %s would"
> +  " be the same\n",
> +  GET_MODE_NAME (vector_modes[mode_i]));
> +   mode_i += 1;
> + }
> +
>if (res)
> {
>   LOOP_VINFO_VECTORIZABLE_P

Re: [10a/n] Require equal type sizes for vectorised calls

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 9:10 PM Richard Sandiford
 wrote:
>
> As explained in the comment, vectorizable_call needs more work to
> support mixtures of sizes.  This avoids testsuite fallout for
> later SVE patches.
>
> Was originally going to be later in the series, but applying it
> before 11/n seems safer.  As before each patch tested individually
> on aarch64-linux-gnu and the series as a whole on x86_64-linux-gnu.

OK.

>
> 2019-11-04  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (vectorizable_call): Require the types
> to have the same size.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-05 10:38:50.718047381 +
> +++ gcc/tree-vect-stmts.c   2019-11-05 10:38:55.542013228 +
> @@ -3317,6 +3317,19 @@ vectorizable_call (stmt_vec_info stmt_in
>
>return false;
>  }
> +  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> + just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> + are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> + by a pack of the two vectors into an SI vector.  We would need
> + separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> +  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"mismatched vector sizes %T and %T\n",
> +vectype_in, vectype_out);
> +  return false;
> +}
>
>/* FORNOW */
>nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Biener

On Wed, Nov 6, 2019 at 11:21 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
> >  wrote:
> >>
> >> Patch 12/n makes the AArch64 port add four entries to
> >> autovectorize_vector_modes.  Each entry describes a different
> >> vector mode assignment for vector code that mixes 8-bit, 16-bit,
> >> 32-bit and 64-bit elements.  But if (as usual) the vector code has
> >> fewer element sizes than that, we could end up trying the same
> >> combination of vector modes multiple times.  This patch adds a
> >> check to prevent that.
> >>
> >> As before: each patch tested individually on aarch64-linux-gnu and the
> >> series as a whole on x86_64-linux-gnu.
> >>
> >>
> >> 2019-11-04  Richard Sandiford  
> >>
> >> gcc/
> >> * tree-vectorizer.h (vec_info::mode_set): New typedef.
> >> (vec_info::used_vector_mode): New member variable.
> >> (vect_chooses_same_modes_p): Declare.
> >> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
> >> chosen vector mode in vec_info::used_vector_mode.
> >> (vect_chooses_same_modes_p): New function.
> >> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
> >> the same vector statements multiple times.
> >> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
> >>
> >> Index: gcc/tree-vectorizer.h
> >> ===
> >> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
> >> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
> >> @@ -298,6 +298,7 @@ typedef std::pair vec_object
> >>  /* Vectorizer state common between loop and basic-block vectorization.  */
> >>  class vec_info {
> >>  public:
> >> +  typedef hash_set > 
> >> mode_set;
> >>enum vec_kind { bb, loop };
> >>
> >>vec_info (vec_kind, void *, vec_info_shared *);
> >> @@ -335,6 +336,9 @@ typedef std::pair vec_object
> >>/* Cost data used by the target cost model.  */
> >>void *target_cost_data;
> >>
> >> +  /* The set of vector modes used in the vectorized region.  */
> >> +  mode_set used_vector_modes;
> >> +
> >>/* The argument we should pass to related_vector_mode when looking up
> >>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
> >>   made any decisions about which vector modes to use.  */
> >> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
> >>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
> >>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
> >>  extern tree get_same_sized_vectype (tree, tree);
> >> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
> >>  extern bool vect_get_loop_mask_type (loop_vec_info);
> >>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
> >> stmt_vec_info * = NULL, gimple ** = NULL);
> >> Index: gcc/tree-vect-stmts.c
> >> ===
> >> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
> >> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
> >> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
> >>   scalar_type);
> >>if (vectype && vinfo->vector_mode == VOIDmode)
> >>  vinfo->vector_mode = TYPE_MODE (vectype);
> >> +
> >> +  if (vectype)
> >> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
> >> +
> >
> > Do we actually end up _using_ all types returned by this function?
>
> No, not all of them, so it's a bit crude.  E.g. some types might end up
> not being relevant after pattern recognition, or after we've made a
> final decision about which parts of an address calculation to include
> in a gather or scatter op.  So we can still end up retrying the same
> thing even after the patch.
>
> The problem is that we're trying to avoid pointless retries on failure
> as well as success, so we could end up stopping at arbitrary points.
> I wasn't sure where else to handle this.

Yeah, I think this "iterating" is somewhat bogus (crude) now.  What we'd
like to collect is for all defs the vector types we could use and then
vectorizable_ defines constraints between input and output vector types.
>From that we'd arrive at a (possibly quite large) set of "SLP graphs
with vector types"
we'd choose from.  I believe we'll never want to truly explore the whole space
but guess we want to greedily compute those "SLP graphs with vector types"
starting from what (grouped) datarefs tells us is possible (which is
kind of what
we do now).

Richard.

> Thanks,
> Richard

Ping^1 [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-06 Thread Georg-Johann Lay


Ping #1

Am 31.10.19 um 22:55 schrieb Georg-Johann Lay:

Hi, this adds the possibility to enable IEEE compatible double
and long double support in avr-gcc.

It supports 2 configure options

--with-double={32|64|32,64|64,32}
--with-long-double={32|64|32,64|64,32|double}

which select the default layout of these types and also chose
which mutlilib variants are built and available.

These two config option map to the new compiler options
-mdouble= and -mlong-double= which are new multilib options.

The patch only deals with option handling and multilib bits,
it does not add any double functionality.  The double support
functions are supposed to be provided by avr-libc which also hosts
all the float stuff, including __addsf3 etc.

Ok for trunk?

Johann


gcc/
 Support 64-bit double and 64-bit long double configurations.

 PR target/92055
 * config.gcc (tm_defines) [avr]: Set from --with-double=,
 --with-long-double=.
 * config/avr/t-multilib: Remove.
 * config/avr/t-avr: Output of genmultilib.awk is now fully
 dynamically generated and no more part of the repo.
 (HAVE_DOUBLE_MULTILIB, HAVE_LONG_DOUBLE_MULTILIB): New variables.
 Pass them down to...
 * config/avr/genmultilib.awk: ...here and handle them.
 * gcc/config/avr/avr.opt (-mdouble=, avr_double). New option and var.
 (-mlong-double=, avr_long_double). New option and var.
 * common/config/avr/avr-common.c (opts.h): Include.
 (diagnostic.h): Include.
 (TARGET_OPTION_OPTIMIZATION_TABLE) -mdouble=: Set default as
 requested by --with-double=.
 -mlong-double=: Set default as requested by 
--with-long-double=.

 (TARGET_OPTION_OPTIMIZATION_TABLE) -mdouble=, -mlong-double=:
 Set default as requested by --with-double=
 (TARGET_HANDLE_OPTION): Define to this...
 (avr_handle_option): ...new hook worker.
 * config/avr/avr.h (DOUBLE_TYPE_SIZE): Define to avr_double.
 (LONG_DOUBLE_TYPE_SIZE): Define to avr_long_double.
 (avr_double_lib): New proto for spec function.
 (EXTRA_SPEC_FUNCTIONS) double-lib: Add.
 (DRIVER_SELF_SPECS): Call %:double-lib.
 * config/avr/avr.c (avr_option_override): Assert
 sizeof(long double) = sizeof(double) for the target.
 * config/avr/avr-c.c (avr_cpu_cpp_builtins)
 [__HAVE_DOUBLE_MULTILIB__, __HAVE_LONG_DOUBLE_MULTILIB__]
 [__HAVE_DOUBLE64__, __HAVE_DOUBLE32__, __DEFAULT_DOUBLE__=]
 [__HAVE_LONG_DOUBLE64__, __HAVE_LONG_DOUBLE32__]
 [__HAVE_LONG_DOUBLE_IS_DOUBLE__, __DEFAULT_LONG_DOUBLE__=]:
 New built-in defined depending on --with-double=, --with-long-double=.
 * config/avr/driver-avr.c (avr_double_lib): New spec function.
 * doc/invoke.tex (AVR Options) -mdouble=,-mlong-double=: Doc.

libgcc/
 Support 64-bit double and 64-bit long double configurations.

 PR target/92055
 * config/avr/t-avr (HOST_LIBGCC2_CFLAGS): Only add -DF=SF if
 long double is a 32-bit type.
 * config/avr/t-avrlibc: Copy double64 and long-double64
 multilib(s) from the vanilla one.
 * config/avr/t-copy-libgcc: New Makefile snip.

Re: [PATCH][RFC] Param to options conversion (demo).

2019-11-06 Thread Martin Liška


On 11/5/19 5:01 PM, Richard Biener wrote:

On Tue, Nov 5, 2019 at 4:22 PM Martin Liška  wrote:


On 11/5/19 3:13 PM, Richard Biener wrote:

On Thu, Oct 31, 2019 at 2:17 PM Martin Liška  wrote:


On 10/31/19 2:16 PM, Martin Liška wrote:

On 10/31/19 2:01 PM, Martin Liška wrote:

Hi.

Based on the discussion with Honza and Richard I'm sending a proposal
for conversion of param machinery into the existing option machinery.
Our motivation for the change is to provide per function param values,
similarly what 'Optimization' keyword does for options.

Right now, we support the following format:
gcc --param=lto-partitions=4 /tmp/main.c -c

And so that I decided to name newly the params like:

-param=ipa-sra-ptr-growth-factor=
Common Joined UInteger Var(param_ipa_sra_ptr_growth_factor) Init(2) Param 
Optimization
Maximum allowed growth of number and total size of new parameters
that ipa-sra replaces a pointer to an aggregate with.

And I learnt decoder to parse '--param' 'name=value' as '--param=name=value'. 
Doing that
the transformation works. Help provides reasonable output as well:

$ ./xgcc -B. --param predictable-branch-outcome=5  /tmp/main.c -c -Q 
--help=param
The --param option recognizes the following as parameters:
--param=ipa-sra-ptr-growth-factor= 2
--param=predictable-branch-outcome=<0,50>  5

Thoughts?
Thanks,
Martin

---
   gcc/common.opt| 18 +++---
   gcc/ipa-sra.c |  3 +--
   gcc/opt-functions.awk |  3 ++-
   gcc/opts-common.c |  9 +
   gcc/opts.c| 36 
   gcc/params.def| 10 --
   gcc/predict.c |  4 ++--
   7 files changed, 25 insertions(+), 58 deletions(-)




I forgot to add gcc-patches to To.

Martin



+ the patch.


Nice.


Thanks.



I wonder if we can auto-generate params.h so that
PARAM_VALUE (...) can continue to "work"?  But maybe that's too much
and against making them first-class (but "unsupported") options.  At least
it would make the final patch _much_ smaller... (one could think of
auto-generating an enum and using an array of params for the storage
again - but then possibly split for [non-]Optimization - ugh).  If we
(auto-)name
the variables all-uppercase like PARAM_IPA_SRA_PTR_GROWTH_FACTOR
we could have

#define PARAM_VALUE (x) x

... (that said, everything that helps making the transition hit GCC 10
is appreciated ;))


Well, to be honest I would like to get rid of the current syntax:
PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) and replace it with something
what we have for normal options: param_predictable_branch_outcome.
It will require a quite big mechanical change in usage, but I can do
the replacement almost automatically.


OK, the more interesting uses are probably maybe_set_param_value ...


Which is actually for free ;) Please see the attached patch where I introduced
a new macro SET_OPTION_IF_UNSET.

Do you like it so that I can transform current options with that:

-  if (!opts_set->x_flag_branch_probabilities)
-opts->x_flag_branch_probabilities = value;
+  SET_OPTION_IF_UNSET (opts, opts_set, flag_branch_probabilities, value);

?





For

+-param=ipa-sra-ptr-growth-factor=
+Common Joined UInteger Var(param_ipa_sra_ptr_growth_factor) Init(2)
Param Optimization

I wonder if both Var(...) and Param can be "autodetected" (aka
actually required)?


Right now, Var is probably not required. Param can be handled by putting all
the params into a new param.opt file.


Param could be also autodetected from the name of the option (so could Var).


Well, looking at the current options, we are also quite explicit about various 
option flags.
I'll auto-generate the new params.opt file, so setting Var and Param will be 
for free.


Why is Var not required?


You are right, it's required.





At least the core of the patch looks nicely small!  How do the OPT_ enum values
for a --param look like?


Yep, I also like how small it is.

OPT__param_ipa_sra_ptr_growth_factor_ = 62,/* 
--param=ipa-sra-ptr-growth-factor= */
OPT__param_predictable_branch_outcome_ = 63,/* 
--param=predictable-branch-outcome= */


OK, looks fine.

So I guess go ahead unless somebody objects over the next weekend?  (in case not


Sure. I asked Honza about feedback and he looks happy with what I suggested.


you have another week to do the mechanical change?)  Maybe post a final patch
earlier w/o the mechanical work.


Martin



Richard.


Martin



Thanks,
Richard.


Martin




>From 0f56be4223a578268a8b7678393a3bc9609c3c1f Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 31 Oct 2019 13:53:54 +0100
Subject: [PATCH] Param to options conversion (demo).

---
 gcc/common.opt | 22 +--
 gcc/config/i386/i386-options.c |  6 ++
 gcc/ipa-sra.c  |  3 +--
 gcc/opt-functions.awk  |  3 ++-
 gcc/opts-common.c  |  9 
 gcc/opts.c | 39 +-

Re: Add object allocators to symbol and call summaries

2019-11-06 Thread Richard Biener

On Tue, Nov 5, 2019 at 6:53 PM Jan Hubicka  wrote:
>
> > On 11/5/19 3:48 PM, Jan Hubicka wrote:
> > > > >
> > > > > stringpool.c:63 (alloc_node)47M:  2.3%
> > > > > 0 :  0.0%0 :  0.0%0 :  0.0% 1217k
> > > > > ipa-prop.c:4480 (ipa_read_edge_info)51M:  2.4%
> > > > > 0 :  0.0%  260k:  0.0%  404k:  0.3%  531k
> > > > > hash-table.h:801 (expand)   81M:  3.9%
> > > > > 0 :  0.0%   80M:  4.7%   88k:  0.1% 3349
> > > > >^^^ some of memory comes here which ought to be accounted to 
> > > > > caller of
> > > > >expand.
> > > >
> > > > Yes, these all come from ggc_internal_alloc. Ideally we should register 
> > > > a mem_alloc_description
> > > > for each created symbol/call_summary and register manually every 
> > > > allocation to such descriptor.
> > >
> > > Or just pass memory stats from caller of expand and transitively pass it
> > > from caller of summary. This will get us the line info of get_create
> > > call that is IMO OK.
> >
> > The issue with this approach is that you will spread a summary allocation
> > along all the ::get_create places. Which is not ideal.
>
> We get it with other allocations, too. Not ideal, but better.
> Even better solutions are welcome :)
> >
> > Try to take a look, or we can debug that on Thursday together?
> > Martin
>
> Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
> all active allocations objects and looks if they was collected accoutnig
> their collection and then throws away all allocations (including those
> not colelcted) and those gets no longer accounted later.  So we
> basically misaccount everything that survives ggc_collect.
>
> No wonder that it makes me to hunt ghosts 8-O
>
> Also the last memory report was sorted by garbage not leak for reason -
> for normal compilation we care about garbage produces primarily because
> those triggers ggc collects and makes compiler slow.
>
> BTW I like how advanced C++ gets back to lisp :)
>
> With the fix I get following stats by end of firefox WPA
>
> cfg.c:127 (alloc_block) 32M:  1.2%   12M: 
>  2.6%0 :  0.0%0 :  0.0%  446k
> symtab.c:582 (create_reference) 42M:  1.6%0 : 
>  0.0%   65M:  1.7% 1329k:  0.4%  840k
> gimple-streamer-in.c:101 (input_gimple_stmt)49M:  1.9%   17M: 
>  3.5%0 :  0.0%  375k:  0.1%  747k
> tree-ssanames.c:308 (make_ssa_name_fn)  51M:  2.0%   16M: 
>  3.4%0 :  0.0%0 :  0.0%  973k
> ipa-cp.c:5157 (ipcp_store_vr_results)   51M:  2.0% 1243k: 
>  0.2%0 :  0.0% 9561k:  3.0%  146k
> stringpool.c:63 (alloc_node)53M:  2.0%0 : 
>  0.0%0 :  0.0%0 :  0.0% 1362k
> ipa-prop.c:3988 (duplicate) 63M:  2.4% 1115k: 
>  0.2%0 :  0.0%   10M:  3.2%  264k
> toplev.c:904 (realloc_for_line_map) 72M:  2.8%0 : 
>  0.0%   71M:  1.9%   15M:  5.1%   27
> tree-ssanames.c:83 (init_ssanames)  96M:  3.7%  121M: 
> 24.4%   44M:  1.2%   87M: 27.8%  174k
> tree-ssa-operands.c:265 (ssa_operand_alloc)104M:  4.0%0 : 
>  0.0%   39M:  1.0%0 :  0.0%  105k
> stringpool.c:41 (stringpool_ggc_alloc) 106M:  4.1%0 : 
>  0.0%0 :  0.0% 7652k:  2.4% 1362k
> lto/lto-common.c:204 (lto_read_in_decl_state)  160M:  6.2%0 : 
>  0.0%  105M:  2.8%   19M:  6.1% 1731k
> cgraph.c:851 (create_edge) 248M:  9.5%0 : 
>  0.0%   70M:  1.9%0 :  0.0% 3141k
> cgraph.h:2712 (allocate_cgraph_symbol) 383M: 14.7%0 : 
>  0.0%  155M:  4.1%0 :  0.0% 1567k
> tree-streamer-in.c:631 (streamer_alloc_tree)   718M: 27.5%  136M: 
> 27.5% 1267M: 33.3%   64M: 20.6%   15M
> 
> GGC memory  Leak  Garbage 
>FreedOverheadTimes
> 
> Total 2609M:100.0%  
> 497M:100.0% 3804M:100.0%  313M:100.0%   49M
> 
>
> This looks more realistic. ssa_operands and init_ssanames shows that we
> read really a lot of bodies into memory. I also wonder if we realy want
> to compute

Re: [PATCH][vect] PR92317: fix skip_epilogue creation for epilogues

2019-11-06 Thread Richard Biener

On Tue, 5 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> When investigating PR92317 I noticed that when we create the skip epilogue
> condition, see ('if (skip_epilog)' in 'vect_do_peeling'), we only copy
> phi-nodes that are not constants in 'slpeel_update_phi_nodes_for_guard2'.
> This can cause problems later when we create the scalar epilogue for this
> epilogue, since if the 'scalar_loop' is not the same as 'loop'
> 'slpeel_tree_duplicate_loop_to_edge_cfg' will expect both to have identical
> single_exit bb's and use that to copy the current_def meta_data of phi-nodes.
> 
> This makes sure that is true even if these phi-nodes are constants, fixing
> PR92317.  I copied the failing testcase and added the options that originally
> made it fail.
> 
> Is this OK for trunk?

OK.

Thanks,
Richard.

Re: [11a/n] Avoid retrying with the same vector modes

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Nov 5, 2019 at 9:25 PM Richard Sandiford
>  wrote:
>>
>> Patch 12/n makes the AArch64 port add four entries to
>> autovectorize_vector_modes.  Each entry describes a different
>> vector mode assignment for vector code that mixes 8-bit, 16-bit,
>> 32-bit and 64-bit elements.  But if (as usual) the vector code has
>> fewer element sizes than that, we could end up trying the same
>> combination of vector modes multiple times.  This patch adds a
>> check to prevent that.
>>
>> As before: each patch tested individually on aarch64-linux-gnu and the
>> series as a whole on x86_64-linux-gnu.
>>
>>
>> 2019-11-04  Richard Sandiford  
>>
>> gcc/
>> * tree-vectorizer.h (vec_info::mode_set): New typedef.
>> (vec_info::used_vector_mode): New member variable.
>> (vect_chooses_same_modes_p): Declare.
>> * tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
>> chosen vector mode in vec_info::used_vector_mode.
>> (vect_chooses_same_modes_p): New function.
>> * tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
>> the same vector statements multiple times.
>> * tree-vect-slp.c (vect_slp_bb_region): Likewise.
>>
>> Index: gcc/tree-vectorizer.h
>> ===
>> --- gcc/tree-vectorizer.h   2019-11-05 10:48:11.246092351 +
>> +++ gcc/tree-vectorizer.h   2019-11-05 10:57:41.662071145 +
>> @@ -298,6 +298,7 @@ typedef std::pair vec_object
>>  /* Vectorizer state common between loop and basic-block vectorization.  */
>>  class vec_info {
>>  public:
>> +  typedef hash_set > mode_set;
>>enum vec_kind { bb, loop };
>>
>>vec_info (vec_kind, void *, vec_info_shared *);
>> @@ -335,6 +336,9 @@ typedef std::pair vec_object
>>/* Cost data used by the target cost model.  */
>>void *target_cost_data;
>>
>> +  /* The set of vector modes used in the vectorized region.  */
>> +  mode_set used_vector_modes;
>> +
>>/* The argument we should pass to related_vector_mode when looking up
>>   the vector mode for a scalar mode, or VOIDmode if we haven't yet
>>   made any decisions about which vector modes to use.  */
>> @@ -1615,6 +1619,7 @@ extern tree get_related_vectype_for_scal
>>  extern tree get_vectype_for_scalar_type (vec_info *, tree);
>>  extern tree get_mask_type_for_scalar_type (vec_info *, tree);
>>  extern tree get_same_sized_vectype (tree, tree);
>> +extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>>  extern bool vect_get_loop_mask_type (loop_vec_info);
>>  extern bool vect_is_simple_use (tree, vec_info *, enum vect_def_type *,
>> stmt_vec_info * = NULL, gimple ** = NULL);
>> Index: gcc/tree-vect-stmts.c
>> ===
>> --- gcc/tree-vect-stmts.c   2019-11-05 10:48:11.242092379 +
>> +++ gcc/tree-vect-stmts.c   2019-11-05 10:57:41.662071145 +
>> @@ -11235,6 +11235,10 @@ get_vectype_for_scalar_type (vec_info *v
>>   scalar_type);
>>if (vectype && vinfo->vector_mode == VOIDmode)
>>  vinfo->vector_mode = TYPE_MODE (vectype);
>> +
>> +  if (vectype)
>> +vinfo->used_vector_modes.add (TYPE_MODE (vectype));
>> +
>
> Do we actually end up _using_ all types returned by this function?

No, not all of them, so it's a bit crude.  E.g. some types might end up
not being relevant after pattern recognition, or after we've made a
final decision about which parts of an address calculation to include
in a gather or scatter op.  So we can still end up retrying the same
thing even after the patch.

The problem is that we're trying to avoid pointless retries on failure
as well as success, so we could end up stopping at arbitrary points.
I wasn't sure where else to handle this.

Thanks,
Richard

Re: Free memory used by optimization/target options

2019-11-06 Thread Martin Liška


On 11/5/19 11:40 AM, Jan Hubicka wrote:

+   print "  if (ptr->" name")";
+   print "free (const_cast (ptr->" name"));";


If I'm correct, you can call free even for a NULL pointer.

Martin

Re: [PATCH, rs6000] Make load cost more in vectorization cost for P8/P9

2019-11-06 Thread Segher Boessenkool

Hi!

On Tue, Nov 05, 2019 at 10:14:46AM +0800, Kewen.Lin wrote:
> >> + benefits were observed on Power8 and up, we can unify it if similar
> >> + profits are measured on Power6 and Power7.  */
> >> +  if (TARGET_P8_VECTOR)
> >> +return 2;
> >> +  else
> >> +return 1;
> > 
> > Hrm, but you showed benchmark improvements for p9 as well?
> > 
> 
> No significant gains but no degradation as well, so I thought it's fine to 
> align
> it together.  Does it make sense?

It's a bit strange at this point to do tunings for p8 that do we do not
do for later cpus.

> > What happens if you enable this for everything as well?
> 
> My concern was that if we enable it for everything, it's possible to introduce
> degradation for some benchmarks on P6 or P7 where we didn't evaluate the
> performance impact.

No one cares about p6.

We reasonably expect it will work just as well on p7 as on p8 and later.
That you haven't tested on p7 yet says something about how important that
platform is now ;-)

> Although it's reasonable from the point view of load latency,
> it's possible to get worse result in the actual benchmarks based on my fine 
> grain
> cost adjustment experiment before.  
> 
> Or do you suggest enabling it everywhere and solve the degradation issue if 
> exposed?
> I'm also fine with that.  :)

Yeah, let's just enable it everywhere.


Segher

Re: [PATCH] simplify-rtx: simplify_logical_relational_operation

2019-11-06 Thread Jeff Law

On 11/6/19 8:00 AM, Segher Boessenkool wrote:
> This introduces simplify_logical_relational_operation.  Currently the
> only thing implemented it can simplify is the IOR of two CONDs of the
> same arguments.
> 
> Tested on powerpc64-linux {-m32,-m64}.
> 
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2018-11-06  Segher Boessenkool  
> 
>   * simplify-rtx.c (comparison_to_mask): New function.
>   (mask_to_comparison): New function.
>   (simplify_logical_relational_operation): New function.
>   (simplify_binary_operation_1): Call
>   simplify_logical_relational_operation.
OK.

BTW, I think there's enough overlap between simplify-rtx and combine
that if you wanted to maintain simplify-rtx as well that I'd fully
support it.  Thoughts?

jeff

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-06 Thread Segher Boessenkool

Hi!

On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
> --- gcc/common/config/rs6000/rs6000-common.c  (revision 277765)
> +++ gcc/common/config/rs6000/rs6000-common.c  (working copy)
> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> -{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
> +/* Enable  -funroll-loops with -munroll-small-loops.  */
> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },

I guess the comment should say what we enable here more than the generic
code does.  Something like

/* Enable -funroll-loops at -O2 already.  Also enable
   -munroll-small-loops.  */

> +  /* Explicit funroll-loops turns -munroll-small-loops off.
> +  Implicit funroll-loops does not turn fweb or frename-registers on.  */
> +  if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
> +   || (global_options_set.x_flag_unroll_all_loops
> +   && flag_unroll_all_loops))
>   {
> +   if (!global_options_set.x_unroll_small_loops)
> + unroll_small_loops = 0;
> + }
> +  else
> + {
> if (!global_options_set.x_flag_web)
> + flag_web = 0;
> if (!global_options_set.x_flag_rename_registers)
> + flag_rename_registers = 0;
>   }

So unroll-small-loops should better be called unroll-only-small-loops?

Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
Isn't it good and/or bad in all the same cases, implicit and explicit?

> +munroll-small-loops
> +Target Undocumented Var(unroll_small_loops) Init(0) Save
> +Use conservative small loop unrolling.

Undocumented means undocumented, so you don't have a comment string in
here.  But you can comment it:

; Use conservative small loop unrolling.


Segher

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Jeff Law

On 11/6/19 11:00 AM, Martin Sebor wrote:
> The -Wstringop-overflow warnings for single-byte and multi-byte
> stores mention the amount of data being stored and the amount of
> space remaining in the destination, such as:
> 
> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>   123 |   *p = 0;
>   |   ~~~^~~
> note: destination object declared here
>    45 |   char b[N];
>   |^
> 
> A warning like this can take some time to analyze.  First, the size
> of the destination isn't mentioned and may not be easy to tell from
> the sources.  In the note above, when N's value is the result of
> some non-trivial computation, chasing it down may be a small project
> in and of itself.  Second, it's also not clear why the region size
> is zero.  It could be because the offset is exactly N, or because
> it's negative, or because it's in some range greater than N.
> 
> Mentioning both the size of the destination object and the offset
> makes the existing messages clearer, are will become essential when
> GCC starts diagnosing overflow into allocated buffers (as my
> follow-on patch does).
> 
> The attached patch enhances -Wstringop-overflow to do this by
> letting compute_objsize return the offset to its caller, doing
> something similar in get_stridx, and adding a new function to
> the strlen pass to issue this enhanced warning (eventually, I'd
> like the function to replace the -Wstringop-overflow handler in
> builtins.c).  With the change, the note above might read something
> like:
> 
> note: at offset 11 to object ‘b’ with size 8 declared here
>    45 |   char b[N];
>   |^
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-store-offset.diff
> 
> gcc/ChangeLog:
> 
>   * builtins.c (compute_objsize): Add an argument and set it to offset
>   into destination.
>   * builtins.h (compute_objsize): Add an argument.
>   * tree-object-size.c (addr_object_size): Add an argument and set it
>   to offset into destination.
>   (compute_builtin_object_size): Same.
>   * tree-object-size.h (compute_builtin_object_size): Add an argument.
>   * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>   to offset into destination.
>   (maybe_warn_overflow): New function.
>   (handle_store): Call maybe_warn_overflow to issue warnings.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
>   * g++.dg/warn/Wstringop-overflow-3.C: Same.
>   * gcc.dg/Wstringop-overflow-17.c: Same.
> 

> Index: gcc/tree-ssa-strlen.c
> ===
> --- gcc/tree-ssa-strlen.c (revision 277886)
> +++ gcc/tree-ssa-strlen.c (working copy)
> @@ -189,6 +189,52 @@ struct laststmt_struct
>  static int get_stridx_plus_constant (strinfo *, unsigned HOST_WIDE_INT, 
> tree);
>  static void handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator 
> *);
>  
> +/* Sets MINMAX to either the constant value or the range VAL is in
> +   and returns true on success.  */
> +
> +static bool
> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
> +{
> +  if (tree_fits_uhwi_p (val))
> +{
> +  minmax[0] = minmax[1] = wi::to_wide (val);
> +  return true;
> +}
> +
> +  if (TREE_CODE (val) != SSA_NAME)
> +return false;
> +
> +  if (rvals)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (val);
> +  if (gimple_assign_single_p (def)
> +   && gimple_assign_rhs_code (def) == INTEGER_CST)
> + {
> +   /* get_value_range returns [0, N] for constant assignments.  */
> +   val = gimple_assign_rhs1 (def);
> +   minmax[0] = minmax[1] = wi::to_wide (val);
> +   return true;
> + }
Umm, something seems really off with this hunk.  If the SSA_NAME is set
via a simple constant assignment, then the range ought to be a singleton
ie [CONST,CONST].   Is there are particular test were this is not true?

The only way offhand I could see this happening is if originally the RHS
wasn't a constant, but due to optimizations it either simplified into a
constant or a constant was propagated into an SSA_NAME appearing on the
RHS.  This would have to happen between the last range analysis and the
point where you're making this query.




> +
> +  // FIXME: handle anti-ranges?
> +  return false;
Please don't if we can avoid them.  anti-ranges are on the chopping
block, so I'd prefer not to add cases where we're trying to handle them
if we can reasonably avoid it.


No objections elsewhere.  So I think we just need to figure out what's
going on with the ranges when you've got an INTEGER_CST on the RHS of an
assignment.

jeff

[PATCH] C++20 CA378 - Remove constrained non-template functions.

2019-11-06 Thread Jason Merrill

No real use cases have arisen for constraints on non-templated
functions, and handling of them has never been entirely clear, so the
committee agreed to accept this national body comment proposing that we
remove them.

Tested x86_64-pc-linux-gnu, applying to trunk.

* decl.c (grokfndecl): Reject constraints on non-templated function.
---
 gcc/cp/decl.c | 12 -
 gcc/testsuite/g++.dg/cpp2a/concepts-friend4.C | 13 ++
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda2.C | 26 +--
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C |  4 +--
 .../g++.dg/cpp2a/concepts-requires1.C |  4 +--
 5 files changed, 41 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-friend4.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 3bfcfb2c6b7..5c5a85e3221 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -9262,7 +9262,9 @@ grokfndecl (tree ctype,
   if (flag_concepts)
 {
   tree tmpl_reqs = NULL_TREE;
-  if (processing_template_decl > template_class_depth (ctype))
+  tree ctx = friendp ? current_class_type : ctype;
+  bool memtmpl = (processing_template_decl > template_class_depth (ctx));
+  if (memtmpl)
 tmpl_reqs = TEMPLATE_PARMS_CONSTRAINTS (current_template_parms);
   tree ci = build_constraints (tmpl_reqs, decl_reqs);
   if (concept_p && ci)
@@ -9270,6 +9272,14 @@ grokfndecl (tree ctype,
   error_at (location, "a function concept cannot be constrained");
   ci = NULL_TREE;
 }
+  /* C++20 CA378: Remove non-templated constrained functions.  */
+  if (ci && !flag_concepts_ts
+ && (!processing_template_decl
+ || (friendp && !memtmpl && !funcdef_flag)))
+   {
+ error_at (location, "constraints on a non-templated function");
+ ci = NULL_TREE;
+   }
   set_constraints (decl, ci);
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-friend4.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-friend4.C
new file mode 100644
index 000..88f9fe825f8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-friend4.C
@@ -0,0 +1,13 @@
+// C++20 NB comment US115
+// { dg-do compile { target c++2a } }
+
+template  concept Any = true;
+
+template 
+struct A
+{
+  friend void f() requires Any { } // OK
+  friend void g() requires Any;// { dg-error "" }
+};
+
+A a;
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda2.C
index ffad95cb77a..a7419d69a46 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-lambda2.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-lambda2.C
@@ -60,19 +60,19 @@ void test0()
   auto g0 = [](T t) { return t; };
   auto g1 = [] requires False (T t) { return t; };
   auto g2 = [](T t) requires False { return t; };
-  auto g3 = [](int t) requires False { return t; };
+  auto g3 = [](int t) requires False { return t; }; // { dg-error 
"non-templated" }
   auto g4 = [](False auto t) { return t; };
   auto g5 = [](auto t) requires False { return t; };
-  auto g6 = [](int t) requires False { return t; };
-  auto g7 = [](int t) requires false { return t; };
+  auto g6 = [](int t) requires False { return t; }; // { dg-error 
"non-templated" }
+  auto g7 = [](int t) requires false { return t; }; // { dg-error 
"non-templated" }
   g0(0); // { dg-error "no match" }
   g1(0); // { dg-error "no match" }
   g2(0); // { dg-error "no match" }
-  g3(0); // { dg-error "no match" }
+  g3(0);
   g4(0); // { dg-error "no match" }
   g5(0); // { dg-error "no match" }
-  g6(0); // { dg-error "no match" }
-  g7(0); // { dg-error "no match" }
+  g6(0);
+  g7(0);
 }
 
 void test1()
@@ -81,19 +81,19 @@ void test1()
   auto g0 = [&](T t) { return t; };
   auto g1 = [&] requires False (T t) { return t; };
   auto g2 = [&](T t) requires False { return t; };
-  auto g3 = [&](int t) requires False { return t; };
+  auto g3 = [&](int t) requires False { return t; }; // { 
dg-error "non-templated" }
   auto g4 = [&](False auto t) { return t; };
   auto g5 = [&](auto t) requires False { return t; };
-  auto g6 = [&](int t) requires False { return t; };
-  auto g7 = [&](int t) requires false { return t; };
+  auto g6 = [&](int t) requires False { return t; }; // { dg-error 
"non-templated" }
+  auto g7 = [&](int t) requires false { return t; }; // { dg-error 
"non-templated" }
   g0(0); // { dg-error "no match" }
   g1(0); // { dg-error "no match" }
   g2(0); // { dg-error "no match" }
-  g3(0); // { dg-error "no match" }
+  g3(0);
   g4(0); // { dg-error "no match" }
   g5(0); // { dg-error "no match" }
-  g6(0); // { dg-error "no match" }
-  g7(0); // { dg-error "no match" }
+  g6(0);
+  g7(0);
 }
 
 void test2()
@@ -147,7 +147,7 @@ using Func = int(*)(int);
 
 void test6()
 {
-  Func f1 = [](int a) requires false { return a; }; // { dg-error "cannot 
convert" }
+  Func f1 = [](int a) requires false { return a; }; // { dg-error 
"non-templated" }
   Func f2 = [](auto a)

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Martin Sebor


On 11/6/19 11:55 AM, Jeff Law wrote:

On 11/6/19 11:00 AM, Martin Sebor wrote:

The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
   123 |   *p = 0;
   |   ~~~^~~
note: destination object declared here
    45 |   char b[N];
   |^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
    45 |   char b[N];
   |^

Tested on x86_64-linux.

Martin

gcc-store-offset.diff

gcc/ChangeLog:

* builtins.c (compute_objsize): Add an argument and set it to offset
into destination.
* builtins.h (compute_objsize): Add an argument.
* tree-object-size.c (addr_object_size): Add an argument and set it
to offset into destination.
(compute_builtin_object_size): Same.
* tree-object-size.h (compute_builtin_object_size): Add an argument.
* tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
to offset into destination.
(maybe_warn_overflow): New function.
(handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

* c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
* g++.dg/warn/Wstringop-overflow-3.C: Same.
* gcc.dg/Wstringop-overflow-17.c: Same.




Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c   (revision 277886)
+++ gcc/tree-ssa-strlen.c   (working copy)
@@ -189,6 +189,52 @@ struct laststmt_struct
  static int get_stridx_plus_constant (strinfo *, unsigned HOST_WIDE_INT, tree);
  static void handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator 
*);
  
+/* Sets MINMAX to either the constant value or the range VAL is in

+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
+{
+  if (tree_fits_uhwi_p (val))
+{
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+}
+
+  if (TREE_CODE (val) != SSA_NAME)
+return false;
+
+  if (rvals)
+{
+  gimple *def = SSA_NAME_DEF_STMT (val);
+  if (gimple_assign_single_p (def)
+ && gimple_assign_rhs_code (def) == INTEGER_CST)
+   {
+ /* get_value_range returns [0, N] for constant assignments.  */
+ val = gimple_assign_rhs1 (def);
+ minmax[0] = minmax[1] = wi::to_wide (val);
+ return true;
+   }

Umm, something seems really off with this hunk.  If the SSA_NAME is set
via a simple constant assignment, then the range ought to be a singleton
ie [CONST,CONST].   Is there are particular test were this is not true?

The only way offhand I could see this happening is if originally the RHS
wasn't a constant, but due to optimizations it either simplified into a
constant or a constant was propagated into an SSA_NAME appearing on the
RHS.  This would have to happen between the last range analysis and the
point where you're making this query.


Yes, I think that's right.  Here's an example where it happens:

  void f (void)
  {
char s[] = "1234";
unsigned n = strlen (s);
char vla[n];   // or malloc (n)
vla[n] = 0;// n = [4, 4]
...
  }

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.


+  // FIXME: handle anti-ranges?
+  return false;

Please don't if we can avoid them.  anti-ranges are on the chopping
block, so I'd prefer not to add cases where we're trying to handle them
if we can reasonably avoid it.


It's mostly a reminder that there may be room for improvement
here.  Maybe not for ranges of sizes but possibly for ranges
of offsets (e.g., if an offset's range is the union of
[-4, -1] and [7, 9] and the

Re: [PATCH] Add if-chain to switch conversion pass.

2019-11-06 Thread Bernhard Reutner-Fischer

On Tue, 5 Nov 2019 13:38:27 +0100
Richard Biener  wrote:

> On Mon, Nov 4, 2019 at 3:49 PM Jakub Jelinek  wrote:
> >
> > On Mon, Nov 04, 2019 at 03:23:20PM +0100, Martin Liška wrote:  
> > > The patch adds a new pass that identifies a series of if-elseif
> > > statements and transform then into a GIMPLE switch (if possible).
> > > The pass runs right after tree-ssa pass and I decided to implement
> > > matching of various forms that are introduced by folder 
> > > (fold_range_test):  
> >
> > Not a review, just a few questions:  
> 
> Likewise - please do not name switches -ftree-*, 'tree' doens't add anything
> but confusion to users.  Thus use -fif-to-switch or -fconvert-if-to-switch
> 
> +The transformation can help to produce a faster code for
> +the switch statement.
> 
> produce faster code.
> 
> Doesn't it also produce smaller code eventually?
> 
> Please to not put code transform passes into build_ssa_passes (why did
> you choose this place)?  The pass should go into pass_all_early_optimizations
> instead, and I'm quite sure you want to run _after_ CSE.  I'd even say
> that the pass should run as part of switch-conversion, so we build
> a representation of a switch internally and then code-generate the optimal
> form directly.  For now just put the pass before switch-conversion.

Also why do you punt on duplicate conditions like in

> +++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c

> +int main(int argc, char **argv)
> +{
> +  if (argc == 1)
> +  else if (argc == 2)
> +  else if (argc == 3)
> +  else if (argc == 4)
> +  else if (argc == 1)
> +{

This block is dead, isn't it. Why don't you just discard it but punt?

> +  global = 2;
> +}
> +  else
> +global -= 123;

thanks,
> 
> There are functions without comments in the patch and you copied
> from DSE which shows in confusing comments left over from the original.
> 
> +  mark_virtual_operands_for_renaming (cfun);
> 
> if you did nothing renaming all vops is expensive.
> 
> I'm missing an overall comment - you are using a dominator walk
> but do nothing in the after hook which means you are not really
> gathering any data?  You're also setting visited bits on BBs which
> means you are visiting alternate BBs during the DOM walk.
> 
> > 1) what does it do if __builtin_expect* has been used, does it preserve
> >the probabilities and if in the end decides to expand as ifs, are those
> >probabilities retained through it?
> > 2) for the reassoc-*.c testcases, do you get identical or better code
> >with the patch?
> > 3) shouldn't it be gimple-if-to-switch.c instead?
> > 4) what code size effect does the patch have say on cc1plus (if you don't
> >count the code changes of the patch itself, i.e. revert the patch in the
> >stage3 and rebuild just the stage3)?
> >  
> > > +struct case_range
> > > +{
> > > +  /* Default constructor.  */
> > > +  case_range ():
> > > +m_min (NULL_TREE), m_max (NULL_TREE)  
> >
> > I admit I'm never sure about coding conventions for C++,
> > but shouldn't there be a space before :, or even better :
> > be on the next line before m_min ?
> >
> > Jakub
> >

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-06 Thread Jiufu Guo

Segher Boessenkool  writes:

> Hi!
>
> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
>> --- gcc/common/config/rs6000/rs6000-common.c (revision 277765)
>> +++ gcc/common/config/rs6000/rs6000-common.c (working copy)
>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>> -{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>> +/* Enable  -funroll-loops with -munroll-small-loops.  */
>> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>
> I guess the comment should say what we enable here more than the generic
> code does.  Something like
>
> /* Enable -funroll-loops at -O2 already.  Also enable
>-munroll-small-loops.  */

updated to:
/* Enable -munroll-only-small-loops with -funroll-loops to unroll small
loops at -O2 and above by default.   */
{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
/* Disable -fweb and -frename-registers to avoid bad impacts.  */
{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

Thanks for more comments to make it better!

>
>> +  /* Explicit funroll-loops turns -munroll-small-loops off.
>> + Implicit funroll-loops does not turn fweb or frename-registers on.  */
>> +  if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
>> +  || (global_options_set.x_flag_unroll_all_loops
>> +  && flag_unroll_all_loops))
>>  {
>> +  if (!global_options_set.x_unroll_small_loops)
>> +unroll_small_loops = 0;
>> +}
>> +  else
>> +{
>>if (!global_options_set.x_flag_web)
>> +flag_web = 0;
>>if (!global_options_set.x_flag_rename_registers)
>> +flag_rename_registers = 0;
>>  }
>
> So unroll-small-loops should better be called unroll-only-small-loops?
Thanks again.  Right, unroll-only-small-loops is better.
>
> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
> Isn't it good and/or bad in all the same cases, implicit and explicit?
Good question!

Turn off them by default, because they do not help too much for generic
cases, and did not see performance gain on SPEC2017. And turning them
off will help to consistent with previous -O2/-O3 which does not turn
them on. This could avoid regressions in test cases.
If do not turn them on with -funroll-loops, user may see performance
difference on some cases.  For example, in SPEC peak which option
contains -funroll-loops, it may need to add -frename-registers manually
for some benchmarks.

Any sugguestions? Do you think it is a good idea to disable them by
default, and let user to add them when they are helpful? e.g. add them
for some benchmarks at `peak`.

>
>> +munroll-small-loops
>> +Target Undocumented Var(unroll_small_loops) Init(0) Save
>> +Use conservative small loop unrolling.
>
> Undocumented means undocumented, so you don't have a comment string in
> here.  But you can comment it:
>
> ; Use conservative small loop unrolling.
Thanks again for you kindly review!

Jiufu,

BR.
>
>
> Segher

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Jeff Law

On 11/6/19 1:27 PM, Martin Sebor wrote:
> On 11/6/19 11:55 AM, Jeff Law wrote:
>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>> stores mention the amount of data being stored and the amount of
>>> space remaining in the destination, such as:
>>>
>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>    123 |   *p = 0;
>>>    |   ~~~^~~
>>> note: destination object declared here
>>> 45 |   char b[N];
>>>    |^
>>>
>>> A warning like this can take some time to analyze.  First, the size
>>> of the destination isn't mentioned and may not be easy to tell from
>>> the sources.  In the note above, when N's value is the result of
>>> some non-trivial computation, chasing it down may be a small project
>>> in and of itself.  Second, it's also not clear why the region size
>>> is zero.  It could be because the offset is exactly N, or because
>>> it's negative, or because it's in some range greater than N.
>>>
>>> Mentioning both the size of the destination object and the offset
>>> makes the existing messages clearer, are will become essential when
>>> GCC starts diagnosing overflow into allocated buffers (as my
>>> follow-on patch does).
>>>
>>> The attached patch enhances -Wstringop-overflow to do this by
>>> letting compute_objsize return the offset to its caller, doing
>>> something similar in get_stridx, and adding a new function to
>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>> like the function to replace the -Wstringop-overflow handler in
>>> builtins.c).  With the change, the note above might read something
>>> like:
>>>
>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>> 45 |   char b[N];
>>>    |^
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> gcc-store-offset.diff
>>>
>>> gcc/ChangeLog:
>>>
>>> * builtins.c (compute_objsize): Add an argument and set it to offset
>>> into destination.
>>> * builtins.h (compute_objsize): Add an argument.
>>> * tree-object-size.c (addr_object_size): Add an argument and set it
>>> to offset into destination.
>>> (compute_builtin_object_size): Same.
>>> * tree-object-size.h (compute_builtin_object_size): Add an argument.
>>> * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>> to offset into destination.
>>> (maybe_warn_overflow): New function.
>>> (handle_store): Call maybe_warn_overflow to issue warnings.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>> messages.
>>> * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>> * gcc.dg/Wstringop-overflow-17.c: Same.
>>>
>>
>>> Index: gcc/tree-ssa-strlen.c
>>> ===
>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>   static int get_stridx_plus_constant (strinfo *, unsigned
>>> HOST_WIDE_INT, tree);
>>>   static void handle_builtin_stxncpy (built_in_function,
>>> gimple_stmt_iterator *);
>>>   +/* Sets MINMAX to either the constant value or the range VAL is in
>>> +   and returns true on success.  */
>>> +
>>> +static bool
>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
>>> +{
>>> +  if (tree_fits_uhwi_p (val))
>>> +    {
>>> +  minmax[0] = minmax[1] = wi::to_wide (val);
>>> +  return true;
>>> +    }
>>> +
>>> +  if (TREE_CODE (val) != SSA_NAME)
>>> +    return false;
>>> +
>>> +  if (rvals)
>>> +    {
>>> +  gimple *def = SSA_NAME_DEF_STMT (val);
>>> +  if (gimple_assign_single_p (def)
>>> +  && gimple_assign_rhs_code (def) == INTEGER_CST)
>>> +    {
>>> +  /* get_value_range returns [0, N] for constant assignments.  */
>>> +  val = gimple_assign_rhs1 (def);
>>> +  minmax[0] = minmax[1] = wi::to_wide (val);
>>> +  return true;
>>> +    }
>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>> via a simple constant assignment, then the range ought to be a singleton
>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>
>> The only way offhand I could see this happening is if originally the RHS
>> wasn't a constant, but due to optimizations it either simplified into a
>> constant or a constant was propagated into an SSA_NAME appearing on the
>> RHS.  This would have to happen between the last range analysis and the
>> point where you're making this query.
> 
> Yes, I think that's right.  Here's an example where it happens:
> 
>   void f (void)
>   {
>     char s[] = "1234";
>     unsigned n = strlen (s);
>     char vla[n];   // or malloc (n)
>     vla[n] = 0;    // n = [4, 4]
>     ...
>   }
> 
> The strlen call is folded to 4 but that's not propagated to
> the access until sometime after the strlen pass is done.
Hmm.  Are we calling set_range_info in that case?  That

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Martin Sebor


On 11/6/19 1:39 PM, Jeff Law wrote:

On 11/6/19 1:27 PM, Martin Sebor wrote:

On 11/6/19 11:55 AM, Jeff Law wrote:

On 11/6/19 11:00 AM, Martin Sebor wrote:

The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
    123 |   *p = 0;
    |   ~~~^~~
note: destination object declared here
 45 |   char b[N];
    |^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
 45 |   char b[N];
    |^

Tested on x86_64-linux.

Martin

gcc-store-offset.diff

gcc/ChangeLog:

 * builtins.c (compute_objsize): Add an argument and set it to offset
 into destination.
 * builtins.h (compute_objsize): Add an argument.
 * tree-object-size.c (addr_object_size): Add an argument and set it
 to offset into destination.
 (compute_builtin_object_size): Same.
 * tree-object-size.h (compute_builtin_object_size): Add an argument.
 * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
 to offset into destination.
 (maybe_warn_overflow): New function.
 (handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

 * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
messages.
 * g++.dg/warn/Wstringop-overflow-3.C: Same.
 * gcc.dg/Wstringop-overflow-17.c: Same.




Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c    (revision 277886)
+++ gcc/tree-ssa-strlen.c    (working copy)
@@ -189,6 +189,52 @@ struct laststmt_struct
   static int get_stridx_plus_constant (strinfo *, unsigned
HOST_WIDE_INT, tree);
   static void handle_builtin_stxncpy (built_in_function,
gimple_stmt_iterator *);
   +/* Sets MINMAX to either the constant value or the range VAL is in
+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
+{
+  if (tree_fits_uhwi_p (val))
+    {
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }
+
+  if (TREE_CODE (val) != SSA_NAME)
+    return false;
+
+  if (rvals)
+    {
+  gimple *def = SSA_NAME_DEF_STMT (val);
+  if (gimple_assign_single_p (def)
+  && gimple_assign_rhs_code (def) == INTEGER_CST)
+    {
+  /* get_value_range returns [0, N] for constant assignments.  */
+  val = gimple_assign_rhs1 (def);
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }

Umm, something seems really off with this hunk.  If the SSA_NAME is set
via a simple constant assignment, then the range ought to be a singleton
ie [CONST,CONST].   Is there are particular test were this is not true?

The only way offhand I could see this happening is if originally the RHS
wasn't a constant, but due to optimizations it either simplified into a
constant or a constant was propagated into an SSA_NAME appearing on the
RHS.  This would have to happen between the last range analysis and the
point where you're making this query.


Yes, I think that's right.  Here's an example where it happens:

   void f (void)
   {
     char s[] = "1234";
     unsigned n = strlen (s);
     char vla[n];   // or malloc (n)
     vla[n] = 0;    // n = [4, 4]
     ...
   }

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.

Hmm.  Are we calling set_range_info in that case?  That goes behind the
back of pass instance of vr_values.  If so, that might argue we want to
be setting it in vr_values too.


No, set_range_info is only called for ranges.  In this case,
handle_builtin_strlen replaces the strlen() call with 4:

  s = "1234";
  _1 = __builtin_strlen ();
  n_2 = (unsigned int) _1;
  a.1_8 = __builtin_alloca_with_align (_1, 8);

Re: [PATCH 1/3] libgcc: Add --disable-eh-frame-registry configure option

2019-11-06 Thread Jozef Lawrynowicz

On Wed, 6 Nov 2019 16:16:27 +
Jozef Lawrynowicz  wrote:

> The attached patch enables the EH Frame Registry to be explicitly disabled
> with a configure option "-disable-eh-frame-registry", thereby removing code to
> support it in crtstuff.c
> 
> Default behaviour is unchanged since USE_EH_FRAME_REGISTRY was previously
> referenced only internally in crtstuff.c, and now is only defined to 0
> when it would previously have not been defined at all.

I retract this patch, since I have found a better solution to the problem this
was going to solve.

Passing "-U__LIBGCC_EH_FRAME_SECTION_NAME__" when building crtstuff.c objects
completely removes references to .eh_frame.

The original patch still resulted in the .eh_frame section being created,
since code to add a 4byte NULL sentinel to the end of the section was retained.

If someone thinks the original patch might still be useful, I can go ahead and
commit it anyway.

Re: introduce -fcallgraph-info option

2019-11-06 Thread Thomas Schwinge

Hi Alexandre!

On 2019-10-26T01:35:43-0300, Alexandre Oliva  wrote:
> This was first submitted many years ago
> https://gcc.gnu.org/ml/gcc-patches/2010-10/msg02468.html
>
> The command line option -fcallgraph-info is added and makes the
> compiler generate another output file (xxx.ci) for each compilation
> unit

Yay, for such analysis tools!  :-)

But I'm curious:

> which is a valid VCG file (you can launch your favorite VCG
> viewer on it unmodified)

What should be my "favorite VCG viewer"?

Google lead me to
, where
I downloaded 'vcg.20050204.tgz' (which I understand is the 1995 sources
with just some licensing changed?), which I managed to build, but which
fails to run on a simple file:

Waite*** stack smashing detected ***:  
terminated
Aborted (core dumped)

I found the patch from
 to cure
that one problem.

(It seems that vcg is not packaged in Debian/Ubuntu anymore, nowadays?)

> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

> +@item -fcallgraph-info
> +@itemx -fcallgraph-info=@var{MARKERS}
> +@opindex fcallgraph-info
> +Makes the compiler output callgraph information for the program, on a
> +per-file basis.  The information is generated in the common VCG format.

Eh, "common VCG format" -- maybe common in the mid-90s?  ;-)

> +It can be decorated with additional, per-node and/or per-edge information,
> +if a list of comma-separated markers is additionally specified.  When the
> +@code{su} marker is specified, the callgraph is decorated with stack usage
> +information; it is equivalent to @option{-fstack-usage}.  When the @code{da}
> +marker is specified, the callgraph is decorated with information about
> +dynamically allocated objects.

I tried that, but 'xvcg' didn't render anything useful for a
'-fcallgraph-info=su,da' dump, hmm.

Also, I found that many years ago, in 2012, Steven Bosscher did "Rework
RTL CFG graph dumping to dump DOT format" (that's Graphviz), and then did
"remove vcg CFG dumper".

Note that I'm not actively objecting VCG if there's a good reason to use
unmaintained mid-90s software, containing obfuscated layout/rendering
source code (as far as I understand), not really in spirit of its GPL
license?  (But I'm not a lawyer, of course.)

So I guess I'm just curious why it's now coming back.

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
>  wrote:
>>
>> This patch adds a mode in which the vectoriser tries each available
>> base vector mode and picks the one with the lowest cost.  For now
>> the behaviour is behind a default-off --param, but a later patch
>> enables it by default for SVE.
>>
>> The patch keeps the current behaviour of preferring a VF of
>> loop->simdlen over any larger or smaller VF, regardless of costs
>> or target preferences.
>
> Can you avoid using a --param for this?  Instead I'd suggest to
> amend the vectorize_modes target hook to return some
> flags like VECT_FIRST_MODE_WINS.  We'd eventually want
> to make the target able to say do-not-vectorize-epiloges-of-MODE
> (I think we may not want to vectorize SSE vectorized loop
> epilogues with MMX-with-SSE or GPRs for example).  I guess
> for the latter we'd use a new target hook.

The reason for using a --param was that I wanted a way of turning
this on and off on the command line, so that users can experiment
with it if necessary.  E.g. enabling the --param could be a viable
alternative to -mprefix-* in some cases.  Disabling it would be
a way of working around a bad cost model decision without going
all the way to -fno-vect-cost-model.

These kinds of --params can become useful workarounds until an
optimisation bug is fixed.

Thanks,
Richard

[PATCH][ARM] Improve max_cond_insns setting for Cortex cores

2019-11-06 Thread Wilco Dijkstra

Various CPUs have max_cond_insns set to 5 due to historical reasons.
Benchmarking shows that max_cond_insns=2 is fastest on modern Cortex-A
cores, so change it to 2 for all Cortex-A cores.  Set max_cond_insns
to 4 on Thumb-2 architectures given it's already limited to that by
MAX_INSN_PER_IT_BLOCK.  Also use the CPU tuning setting when a CPU/tune
is selected if -mrestrict-it is not explicitly set.

On Cortex-A57 this gives 1.1% performance gain on SPECINT2006 as well
as a 0.4% codesize reduction.

Bootstrapped on armhf. OK for commit?

ChangeLog:

2019-08-19  Wilco Dijkstra  

* gcc/config/arm/arm.c (arm_option_override_internal):
Use max_cond_insns from CPU tuning unless -mrestrict-it is used.
(arm_v6t2_tune): set max_cond_insns to 4.
(arm_cortex_tune): set max_cond_insns to 2.
(arm_cortex_a8_tune): Likewise.
(arm_cortex_a7_tune): Likewise.
(arm_cortex_a35_tune): Likewise.
(arm_cortex_a53_tune): Likewise.
(arm_cortex_a5_tune): Likewise.
(arm_cortex_a9_tune): Likewise.
(arm_v6m_tune): set max_cond_insns to 4.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
628cf02f23fb29392a63d87f561c3ee2fb73a515..38ac16ad1def91ca78ccfa98fd1679b2b5114851
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1943,7 +1943,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  4,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1968,7 +1968,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1991,7 +1991,7 @@ const struct tune_params arm_cortex_a8_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2014,7 +2014,7 @@ const struct tune_params arm_cortex_a7_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2060,7 +2060,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2083,7 +2083,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -2167,9 +2167,6 @@ const struct tune_params arm_xgene1_tune =
   tune_params::SCHED_AUTOPREF_OFF
 };
 
-/* Branches can be dual-issued on Cortex-A5, so conditional execution is
-   less appealing.  Set max_insns_skipped to a low value.  */
-
 const struct tune_params arm_cortex_a5_tune =
 {
   _extra_costs,
@@ -2178,7 +2175,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  1,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,

Re: [PATCH] Fix parser to recognize operator?:

2019-11-06 Thread Jason Merrill


On 10/14/19 11:27 AM, Matthias Kretz wrote:

This time with testcase. Is the subdir for the test ok?


Yes.


gcc/ChangeLog:

2019-10-11  Matthias Kretz  

* gcc/cp/parser.c (cp_parser_operator): Parse operator?: as an
attempt to overload the conditional operator. Then
grok_op_properties can print its useful "ISO C++ prohibits
overloading operator ?:" message instead of the cryptic error
message about a missing type-specifier before '?' token.


The first sentence is enough for a ChangeLog; the second should be an 
explanatory comment above the ChangeLog entries.


Also, this entry belongs in gcc/cp/ChangeLog.


+++ b/gcc/cp/parser.c
@@ -15502,6 +15502,15 @@ cp_parser_operator (cp_parser* parser, location_t
start_loc)


Word wrap broke the patch here.  Feel free to send patches as 
attachments if it's hard to suppress word wrap in your mailer (as it is 
in mine).


I also saw an additional failure in the testsuite:

> FAIL: g++.old-deja/g++.jason/operator.C  -std=gnu++2a  (test for 
errors, line 8)
> FAIL: g++.old-deja/g++.jason/operator.C  -std=gnu++2a (test for 
excess errors)


Here's what I'm applying:

Jason
commit c3e97f573b6e29324ef72ded63e0307cec8d2ce2
Author: Jason Merrill 
Date:   Tue Nov 5 17:31:44 2019 +

Fix parser to recognize operator?:

This change lets grok_op_properties print its useful "ISO C++ prohibits
overloading operator ?:" message instead of the cryptic error message about
a missing type-specifier before '?' token.

2019-11-06  Matthias Kretz  

* parser.c (cp_parser_operator): Parse operator?: as an
attempt to overload the conditional operator.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cbbf946d32c..b17e0336e1c 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15542,6 +15542,15 @@ cp_parser_operator (cp_parser* parser, location_t start_loc)
   op = COMPONENT_REF;
   break;
 
+case CPP_QUERY:
+  op = COND_EXPR;
+  /* Consume the `?'.  */
+  cp_lexer_consume_token (parser->lexer);
+  /* Look for the matching `:'.  */
+  cp_parser_require (parser, CPP_COLON, RT_COLON);
+  consumed = true;
+  break;
+
 case CPP_OPEN_PAREN:
   {
 /* Consume the `('.  */
diff --git a/gcc/testsuite/g++.dg/parse/operator9.C b/gcc/testsuite/g++.dg/parse/operator9.C
new file mode 100644
index 000..d66355afab5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/operator9.C
@@ -0,0 +1,5 @@
+// { dg-do compile }
+
+struct A {};
+struct B {};
+int operator?:(bool, A, B);  // { dg-error "prohibits overloading" }
diff --git a/gcc/testsuite/g++.old-deja/g++.jason/operator.C b/gcc/testsuite/g++.old-deja/g++.jason/operator.C
index c2fc212cef0..69a41cf2448 100644
--- a/gcc/testsuite/g++.old-deja/g++.jason/operator.C
+++ b/gcc/testsuite/g++.old-deja/g++.jason/operator.C
@@ -5,7 +5,7 @@
 typedef __SIZE_TYPE__ size_t;
 
 struct A {
-  int operator?:(int a, int b);	   // { dg-error "expected type-specifier" } 
+  int operator?:(int a, int b);	   // { dg-error "prohibits overloading" } 
   static int operator()(int a);	   // { dg-error "14:.static int A::operator\\(\\)\\(int\\). must be a nonstatic member function" }
   static int operator+(A,A);	   // { dg-error "14:.static int A::operator\\+\\(A, A\\). must be either a non-static member function or a non-member function" } 
   int operator+(int a, int b = 1); // { dg-error "7:.int A::operator\\+\\(int, int\\). must have either zero or one argument" }
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index d0b47f7c562..e60a45b869e 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2019-11-06  Matthias Kretz  
+
+	* parser.c (cp_parser_operator): Parse operator?: as an
+	attempt to overload the conditional operator.
+
 2019-11-05  Jason Merrill  
 
 	Implement C++20 operator<=>.

Re: GCC wwwdocs move to git done

2019-11-06 Thread Georg-Johann Lay


Am 06.11.19 um 15:03 schrieb Georg-Johann Lay:

Am 09.10.19 um 02:27 schrieb Joseph Myers:

I've done the move of GCC wwwdocs to git (using the previously posted and
discussed scripts), including setting up the post-receive hook to do the
same things previously covered by the old CVS hooks, and minimal updates
to the web pages dealing with the CVS setup for wwwdocs.


Hi,

May it be the case that some parts are missing?  In particular, I cannot
find the source of

https://gcc.gnu.org/install/configure.html

Johann



Ok, found it in install/README. knew it had something special about it...

Johann

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Biener

On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
> >  wrote:
> >>
> >> This patch adds a mode in which the vectoriser tries each available
> >> base vector mode and picks the one with the lowest cost.  For now
> >> the behaviour is behind a default-off --param, but a later patch
> >> enables it by default for SVE.
> >>
> >> The patch keeps the current behaviour of preferring a VF of
> >> loop->simdlen over any larger or smaller VF, regardless of costs
> >> or target preferences.
> >
> > Can you avoid using a --param for this?  Instead I'd suggest to
> > amend the vectorize_modes target hook to return some
> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
> > to make the target able to say do-not-vectorize-epiloges-of-MODE
> > (I think we may not want to vectorize SSE vectorized loop
> > epilogues with MMX-with-SSE or GPRs for example).  I guess
> > for the latter we'd use a new target hook.
>
> The reason for using a --param was that I wanted a way of turning
> this on and off on the command line, so that users can experiment
> with it if necessary.  E.g. enabling the --param could be a viable
> alternative to -mprefix-* in some cases.  Disabling it would be
> a way of working around a bad cost model decision without going
> all the way to -fno-vect-cost-model.
>
> These kinds of --params can become useful workarounds until an
> optimisation bug is fixed.

I'm arguing that the default depends on the actual ISAs so there isn't
a one-fits all and given we have OMP SIMD and target cloning for
multiple ISAs this looks like a wrong approach.  For sure the
target can use its own switches to override defaults here, or alternatively
we might want to have a #pragma GCC simdlen mimicing OMP behavior
here.

Richard.

>
> Thanks,
> Richard

Re: Ping^1 [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-06 Thread Georg-Johann Lay


Am 06.11.19 um 11:39 schrieb Georg-Johann Lay:

Ping #1

Am 31.10.19 um 22:55 schrieb Georg-Johann Lay:

Hi, this adds the possibility to enable IEEE compatible double
and long double support in avr-gcc.

It supports 2 configure options

--with-double={32|64|32,64|64,32}
--with-long-double={32|64|32,64|64,32|double}

which select the default layout of these types and also chose
which mutlilib variants are built and available.

These two config option map to the new compiler options
-mdouble= and -mlong-double= which are new multilib options.

The patch only deals with option handling and multilib bits,
it does not add any double functionality.  The double support
functions are supposed to be provided by avr-libc which also hosts
all the float stuff, including __addsf3 etc.

Ok for trunk?

Johann


..and here is the addendum that documents the new configure options.

Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi(revision 277236)
+++ gcc/doc/install.texi(working copy)
@@ -2277,15 +2277,45 @@ omitted from @file{libgcc.a} on the assu
 @samp{newlib}.

 @item --with-avrlibc
-Specifies that @samp{AVR-Libc} is
-being used as the target C library.  This causes float support
+Only supported for the AVR target. Specifies that @samp{AVR-Libc} is
+being used as the target C@tie{} library.  This causes float support
 functions like @code{__addsf3} to be omitted from @file{libgcc.a} on
 the assumption that it will be provided by @file{libm.a}.  For more
 technical details, cf. @uref{http://gcc.gnu.org/PR54461,,PR54461}.
-This option is only supported for the AVR target.  It is not supported for
+It is not supported for
 RTEMS configurations, which currently use newlib.  The option is
 supported since version 4.7.2 and is the default in 4.8.0 and newer.

+@item --with-double=@{32|64|32,64|64,32@}
+@itemx --with-long-double=@{32|64|32,64|64,32|double@}
+Only supported for the AVR target since version@tie{}10.
+Specify the default layout available for the C/C++ @samp{double}
+and @samp{long double} type, respectively. The following rules apply:
+@itemize
+@item
+The first value after the @samp{=} specifies the default layout (in bits)
+of the type and also the default for the @option{-mdouble=} resp.
+@option{-mlong-double=} compiler option.
+@item
+If more than one value is specified, respective multilib variants are
+available, and  @option{-mdouble=} resp. @option{-mlong-double=} acts
+as a multilib option.
+@item
+If @option{--with-long-double=double} is specified, @samp{double} and
+@samp{long double} will have the same layout.
+@item
+If the configure option is not set, it defaults to @samp{32} which
+is compatible with older versions of the compiler that use non-standard
+32-bit types for @samp{double} and @samp{long double}.
+@end itemize
+Not all combinations of @option{--with-double=} and
+@option{--with-long-double=} are valid.  For example, the combination
+@option{--with-double=32,64} @option{--with-long-double=32} will be
+rejected because the first option specifies the availability of
+multilibs for @samp{double}, whereas the second option implies
+that @samp{long double} --- and hence also @samp{double} --- is always
+32@tie{}bits wide.
+
 @item --with-nds32-lib=@var{library}
 Specifies that @var{library} setting is used for building @file{libgcc.a}.
 Currently, the valid @var{library} is @samp{newlib} or @samp{mculib}.

[PATCH 0/3] libgcc/crtstuff.c tweaks to reduce code size

2019-11-06 Thread Jozef Lawrynowicz

Some functionality in crtstuff.c will never be used for some targets,
resulting in unnecessary code bloat in the crt* object files.

For example, msp430-elf uses .{init,fini}_array, does not support shared
objects, does not support transactional memory and could be configured
to remove support for exceptions.

Therefore __do_global_dtors_aux(), frame_dummy(),
{,de}register_tm_clones could be safely removed, saving code size.

The following patches implement the generic mechanisms which enable the features
associated with the this functionality to be removed.

Successfully bootstrapped and regtested for x86_64-pc-linx-gnu.
Successfully regtested for msp430-elf.

Ok to apply?

P.S. An MSP430-specific series of patches to make use of the functionality added
here will be submitted separately.

Jozef Lawrynowicz (3):
  libgcc: Add --disable-eh-frame-registry configure option
  libgcc: Dont define __do_global_dtors_aux if it will be empty
  libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE

 gcc/doc/install.texi | 11 +++
 gcc/doc/tm.texi  | 11 +++
 gcc/doc/tm.texi.in   | 11 +++
 libgcc/Makefile.in   |  4 +++-
 libgcc/configure | 22 ++
 libgcc/configure.ac  | 17 +
 libgcc/crtstuff.c| 33 +++--
 7 files changed, 98 insertions(+), 11 deletions(-)

-- 
2.17.1

[PATCH 3/3] libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE

2019-11-06 Thread Jozef Lawrynowicz

For some targets __dso_handle will never be used, and its definition in
crtstuff.c can cause a domino effect resulting in the size of the final
executable increasing much more than just the size of this piece of data.

For msp430, CRT functions to initialize global data are only included if there
is global data to initialize and it is more than feasible to write functional
programs which do not use any global data. In these cases, the definition of
__dso_handle will cause code size to increase by around 100 bytes as library
code to initialize data is linked into the executable.

Removing __dso_handle can therefore save on code size.

This does require the target to NOT use __cxa_atexit, so either
TARGET_CXX_USE_ATEXIT_FOR_CXA_ATEXIT must return true or -fno-use-cxa-atexit
must be used as a target flag when building GCC.
This is because __cxa_atexit includes functionality to unload dynamic shared
objects and so cp/decl.c will create a reference to __dso_handle to facilitate
this in programs with static destructors.
>From 7bc0971d2936ebe71e7b7d3d805cf1bbf9f9f5af Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 4 Nov 2019 17:38:13 +
Subject: [PATCH 3/3] libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE

gcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz  

	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in: Define TARGET_LIBGCC_REMOVE_DSO_HANDLE.

libgcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz  

	* crtstuff.c: Don't declare __dso_handle if
	TARGET_LIBGCC_REMOVE_DSO_HANDLE is defined.

---
 gcc/doc/tm.texi| 11 +++
 gcc/doc/tm.texi.in | 11 +++
 libgcc/crtstuff.c  |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd9aed9874f..89ef0a8e616 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11425,6 +11425,17 @@ from shared libraries (DLLs).
 You need not define this macro if it would always evaluate to zero.
 @end defmac
 
+@defmac TARGET_LIBGCC_REMOVE_DSO_HANDLE
+Define this macro if, for targets where dynamic shared objects cannot be used,
+the declaration of @samp{__dso_handle} in libgcc/crtstuff.c can be removed.
+
+If this macro is defined, __cxa_atexit must be disabled at GCC configure time,
+otherwise references to __dso_handle will be created when the middle-end
+creates destructors for static objects.
+
+This macro is undefined by default.
+@end defmac
+
 @deftypefn {Target Hook} {rtx_insn *} TARGET_MD_ASM_ADJUST (vec& @var{outputs}, vec& @var{inputs}, vec& @var{constraints}, vec& @var{clobbers}, HARD_REG_SET& @var{clobbered_regs})
 This target hook may add @dfn{clobbers} to @var{clobbers} and
 @var{clobbered_regs} for any hard regs the port wishes to automatically
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2739e9ceec5..3a211a7658a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7853,6 +7853,17 @@ from shared libraries (DLLs).
 You need not define this macro if it would always evaluate to zero.
 @end defmac
 
+@defmac TARGET_LIBGCC_REMOVE_DSO_HANDLE
+Define this macro if, for targets where dynamic shared objects cannot be used,
+the declaration of @samp{__dso_handle} in libgcc/crtstuff.c can be removed.
+
+If this macro is defined, __cxa_atexit must be disabled at GCC configure time,
+otherwise references to __dso_handle will be created when the middle-end
+creates destructors for static objects.
+
+This macro is undefined by default.
+@end defmac
+
 @hook TARGET_MD_ASM_ADJUST
 
 @defmac MATH_LIBRARY
diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 0b0a0b865fe..d1d17f959d3 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -335,6 +335,7 @@ register_tm_clones (void)
in one DSO or the main program is not used in another object.  The
dynamic linker takes care of this.  */
 
+#ifndef TARGET_LIBGCC_REMOVE_DSO_HANDLE
 #ifdef TARGET_LIBGCC_SDATA_SECTION
 extern void *__dso_handle __attribute__ ((__section__ (TARGET_LIBGCC_SDATA_SECTION)));
 #endif
@@ -346,6 +347,7 @@ void *__dso_handle = &__dso_handle;
 #else
 void *__dso_handle = 0;
 #endif
+#endif /* TARGET_LIBGCC_REMOVE_DSO_HANDLE */
 
 /* The __cxa_finalize function may not be available so we use only a
weak declaration.  */
-- 
2.17.1

[PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be empty

2019-11-06 Thread Jozef Lawrynowicz

__do_global_dtors_aux in crtstuff.c will not do anything meaningful if:
 * crtstuff.c is not being compiled for use in a shared library
 * the target uses .{init,fini}_array sections
 * TM clone registry is disabled
 * EH frame registry is disabled

The attached patch prevents it from being defined at all if all the above
conditions are true. This saves code size in the final linked executable.
>From 967262117f0c838fe8a9226484bf6e014c86f0ba Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 29 Oct 2019 13:02:08 +
Subject: [PATCH 2/3] libgcc: Dont define __do_global_dtors_aux if it will be
 empty

libgcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz  

	* crtstuff.c (__do_global_dtors_aux): Wrap in #if so it's only defined
	if it will have contents.

---
 libgcc/crtstuff.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 9a3247b7848..0b0a0b865fe 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -368,8 +368,12 @@ extern void __cxa_finalize (void *) TARGET_ATTRIBUTE_WEAK;
On some systems, this routine is run more than once from the .fini,
when exit is called recursively, so we arrange to remember where in
the list we left off processing, and we resume at that point,
-   should we be re-invoked.  */
+   should we be re-invoked.
 
+   This routine does not need to be run if none of the following clauses are
+   true, as it will not do anything, so can be removed.  */
+#if defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP) \
+  || USE_TM_CLONE_REGISTRY || USE_EH_FRAME_REGISTRY
 static void __attribute__((used))
 __do_global_dtors_aux (void)
 {
@@ -455,6 +459,9 @@ __do_global_dtors_aux_1 (void)
 CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__,
 			  __do_global_dtors_aux_1)
 #endif
+#endif /* defined(CRTSTUFFS_O) || !defined(FINI_ARRAY_SECTION_ASM_OP)
+  || defined(USE_TM_CLONE_REGISTRY) || defined(USE_EH_FRAME_REGISTRY) */
+
 
 #if USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY
 /* Stick a call to __register_frame_info into the .init section.  For some
-- 
2.17.1

Ping2: [PATCH] Fix parser to recognize operator?:

2019-11-06 Thread Matthias Kretz

ping2

On Montag, 14. Oktober 2019 12:27:11 CET Matthias Kretz wrote:
> This time with testcase. Is the subdir for the test ok?
> 
> gcc/ChangeLog:
> 
> 2019-10-11  Matthias Kretz  
> 
>   * gcc/cp/parser.c (cp_parser_operator): Parse operator?: as an
>   attempt to overload the conditional operator. Then
>   grok_op_properties can print its useful "ISO C++ prohibits
>   overloading operator ?:" message instead of the cryptic error
>   message about a missing type-specifier before '?' token.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-14  Matthias Kretz  
>   * testsuite/g++.dg/parse/operator9.C: New test verifying the
>   correct error message is printed.
> 
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 3ee8da7db94..73385cb3dcb 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -15502,6 +15502,15 @@ cp_parser_operator (cp_parser* parser, location_t
> start_loc)
>op = COMPONENT_REF;
>break;
> 
> +case CPP_QUERY:
> +  op = COND_EXPR;
> +  /* Consume the `?'.  */
> +  cp_lexer_consume_token (parser->lexer);
> +  /* Look for the matching `:'.  */
> +  cp_parser_require (parser, CPP_COLON, RT_COLON);
> +  consumed = true;
> +  break;
> +
>  case CPP_OPEN_PAREN:
>{
>  /* Consume the `('.  */
> diff --git a/gcc/testsuite/g++.dg/parse/operator9.C b/gcc/testsuite/g++.dg/
> parse/operator9.C
> new file mode 100644
> index 000..d66355afab5
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/parse/operator9.C
> @@ -0,0 +1,5 @@
> +// { dg-do compile }
> +
> +struct A {};
> +struct B {};
> +int operator?:(bool, A, B);  // { dg-error "prohibits overloading" }
> 
> On Freitag, 11. Oktober 2019 16:17:09 CEST you wrote:
> > On Fri, Oct 11, 2019 at 04:06:43PM +0200, Matthias Kretz wrote:
> > > This is a minor bugfix for improved error reporting. Overloading ?: is
> > > just as disallowed as it is without this change.
> > 
> > Thanks.  Can you provide a testcase that shows why this change makes
> > sense?
> > That testcase then should be part of the patch submission.


-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
 SIMD easy and portable https://github.com/VcDevel/Vc
──

Re: [2/6] Don't assign a cost to vectorizable_assignment

2019-11-06 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Nov 5, 2019 at 3:27 PM Richard Sandiford
>  wrote:
>>
>> vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
>> we don't see in practice) and no-op conversions that are required
>> to maintain correct gimple, such as changes between signed and
>> unsigned types.  These cases shouldn't generate any code and so
>> shouldn't count against either the scalar or vector costs.
>>
>> Later patches test this, but it seemed worth splitting out.
>
> Hmm, but you have to adjust vect_compute_single_scalar_iteration_cost and
> possibly the SLP cost walk as well, otherwise we're artificially making
> those copies cheaper when vectorized.

Ah, yeah.  It looks complicated to reproduce the conditions exactly
there, so how about just costing 1 copy in vectorizable_assignment
to counteract it, and ignore ncopies?

Seems like vectorizable_* ought to be costing the scalar code as
well as the vector code, but that's too much for GCC 10 at this stage.

Thanks,
Richard


>
>>
>> 2019-11-04  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-stmts.c (vectorizable_assignment): Don't add a cost.
>>
>> Index: gcc/tree-vect-stmts.c
>> ===
>> --- gcc/tree-vect-stmts.c   2019-11-05 14:17:43.330141911 +
>> +++ gcc/tree-vect-stmts.c   2019-11-05 14:18:39.169752725 +
>> @@ -5305,7 +5305,7 @@ vectorizable_conversion (stmt_vec_info s
>>  static bool
>>  vectorizable_assignment (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
>>  stmt_vec_info *vec_stmt, slp_tree slp_node,
>> -stmt_vector_for_cost *cost_vec)
>> +stmt_vector_for_cost *)
>>  {
>>tree vec_dest;
>>tree scalar_dest;
>> @@ -5313,7 +5313,6 @@ vectorizable_assignment (stmt_vec_info s
>>loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>tree new_temp;
>>enum vect_def_type dt[1] = {vect_unknown_def_type};
>> -  int ndts = 1;
>>int ncopies;
>>int i, j;
>>vec vec_oprnds = vNULL;
>> @@ -5409,7 +5408,8 @@ vectorizable_assignment (stmt_vec_info s
>>  {
>>STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
>>DUMP_VECT_SCOPE ("vectorizable_assignment");
>> -  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, 
>> cost_vec);
>> +  /* Don't add a cost here.  SSA copies and no-op conversions
>> +shouldn't generate any code in either scalar or vector form.  */
>>return true;
>>  }
>>

Generalise gather and scatter optabs

2019-11-06 Thread Richard Sandiford

The gather and scatter optabs required the vector offset to be
the integer equivalent of the vector mode being loaded or stored.
This patch generalises them so that the two vectors can have different
element sizes, although they still need to have the same number of
elements.

One consequence of this is that it's possible (if unlikely)
for two IFN_GATHER_LOADs to have the same arguments but different
return types.  E.g. the same scalar base and vector of 32-bit offsets
could be used to load 8-bit elements and to load 16-bit elements.
>From just looking at the arguments, we could wrongly deduce that
they're equivalent.

I know we saw this happen at one point with IFN_WHILE_ULT,
and we dealt with it there by passing a zero of the return type
as an extra argument.  Doing the same here also makes the load
and store functions have the same argument assignment.

For now this patch should be a no-op, but later SVE patches take
advantage of the new flexibility.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-06  Richard Sandiford  

gcc/
* optabs.def (gather_load_optab, mask_gather_load_optab)
(scatter_store_optab, mask_scatter_store_optab): Turn into
conversion optabs, with the offset mode given explicitly.
* doc/md.texi: Update accordingly.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld1_gather_impl::expand): Likewise.
(svst1_scatter_impl::expand): Likewise.
* internal-fn.c (gather_load_direct, scatter_store_direct): Likewise.
(expand_scatter_store_optab_fn): Likewise.
(direct_gather_load_optab_supported_p): Likewise.
(direct_scatter_store_optab_supported_p): Likewise.
(expand_gather_load_optab_fn): Likewise.  Expect the mask argument
to be argument 4.
(internal_fn_mask_index): Return 4 for IFN_MASK_GATHER_LOAD.
(internal_gather_scatter_fn_supported_p): Replace the offset sign
argument with the offset vector type.  Require the two vector
types to have the same number of elements but allow their element
sizes to be different.  Treat the optabs as conversion optabs.
* internal-fn.h (internal_gather_scatter_fn_supported_p): Update
prototype accordingly.
* optabs-query.c (supports_at_least_one_mode_p): Replace with...
(supports_vec_convert_optab_p): ...this new function.
(supports_vec_gather_load_p): Update accordingly.
(supports_vec_scatter_store_p): Likewise.
* tree-vectorizer.h (vect_gather_scatter_fn_p): Take a vec_info.
Replace the offset sign and bits parameters with a scalar type tree.
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
Pass back the offset vector type instead of the scalar element type.
Allow the offset to be wider than the memory elements.  Search for
an offset type that the target supports, stopping once we've
reached the maximum of the element size and pointer size.
Update call to internal_gather_scatter_fn_supported_p.
(vect_check_gather_scatter): Update calls accordingly.
When testing a new scale before knowing the final offset type,
check whether the scale is supported for any signed or unsigned
offset type.  Check whether the target supports the source and
target types of a conversion before deciding whether to look
through the conversion.  Record the chosen offset_vectype.
* tree-vect-patterns.c (vect_get_gather_scatter_offset_type): Delete.
(vect_recog_gather_scatter_pattern): Get the scalar offset type
directly from the gs_info's offset_vectype instead.  Pass a zero
of the result type to IFN_GATHER_LOAD and IFN_MASK_GATHER_LOAD.
* tree-vect-stmts.c (check_load_store_masking): Update call to
internal_gather_scatter_fn_supported_p, passing the offset vector
type recorded in the gs_info.
(vect_truncate_gather_scatter_offset): Update call to
vect_check_gather_scatter, leaving it to search for a valid
offset vector type.
(vect_use_strided_gather_scatters_p): Convert the offset to the
element type of the gs_info's offset_vectype.
(vect_get_gather_scatter_ops): Get the offset vector type directly
from the gs_info.
(vect_get_strided_load_store_ops): Likewise.
(vectorizable_load): Pass a zero of the result type to IFN_GATHER_LOAD
and IFN_MASK_GATHER_LOAD.
* config/aarch64/aarch64-sve.md (gather_load): Rename to...
(gather_load): ...this.
(mask_gather_load): Rename to...
(mask_gather_load): ...this.
(scatter_store): Rename to...
(scatter_store): ...this.
(mask_scatter_store): Rename to...
(mask_scatter_store): ...this.

Index: gcc/optabs.def
===
--- gcc/optabs.def  2019-09-30

[Patch][OpenMP][Fortran] Support absent optional args with use_device_{ptr,addr} (+ OpenACC's use_device clause)

2019-11-06 Thread Tobias Burnus

This patch is based on Kwok's patch, posted as (4/5) at 
https://gcc.gnu.org/ml/gcc-patches/2019-07/msg00964.html – which is 
targeting OpenACC's use_device* – but it also applies to OpenMP 
use_device_{ptr,addr}.


I added an OpenMP test case. It showed that for arguments with value 
attribute and for assumed-shape array, one needs to do more — as the 
decl cannot be directly used for the is-argument-present check.


(For 'value', a hidden boolean '_' + arg-name is passed in addition; for 
assumed-shape arrays, the array descriptor "x" is replaced by the local 
variable "x.0" (with "x.0 = x->data") and the original decl "x" is in 
GFC_DECL_SAVED_DESCRIPTOR. Especially for assumed-shape arrays, the new 
decl cannot be used unconditionally as it is uninitialized when the 
argument is absent.)


Bootstrapped and regtested on x86_64-gnu-linux without offloading + with 
nvptx.

OK?

Cheers,

Tobias

*The OpenACC test cases are in 5/5 and depend on some other changes. 
Submission of {1,missing one line of 2,3,5}/5 is planned next.
PPS: For fully absent-optional support, mapping needs to be handled for 
OpenACC (see Kwok's …/5 patches) and OpenMP (which is quite different on 
FE level) – and OpenMP also needs changes for the share clauses.]


2019-11-06  Tobias Burnus  
	Kwok Cheung Yeung  

	gcc/
	* langhooks-def.h (LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT):
	Renamed from LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT; update define.
	(LANG_HOOKS_DECLS): Rename also here.
	* langhooks.h (lang_hooks_for_decls): Rename
	omp_is_optional_argument to omp_check_optional_argument; take
	additional bool argument.
	* omp-general.h (omp_check_optional_argument): Likewise.
	* omp-general.h (omp_check_optional_argument): Likewise.
	* omp-low.c (lower_omp_target): Update calls; handle absent
	Fortran optional arguments with USE_DEVICE_ADDR/USE_DEVICE_PTR.

	gcc/fortran/
	* trans-decl.c (create_function_arglist): Also set
	GFC_DECL_OPTIONAL_ARGUMENT for per-value arguments.
	* f95-lang.c (LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT):
	Renamed from LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT; point
	to gfc_omp_check_optional_argument.
	* trans.h (gfc_omp_check_optional_argument): Subsitutes
	gfc_omp_is_optional_argument declaration.
	* trans-openmp.c (gfc_omp_is_optional_argument): Make static.
	(gfc_omp_check_optional_argument): New function.

	libgomp/
	* testsuite/libgomp.fortran/use_device_ptr-optional-2.f90: New.

 gcc/fortran/f95-lang.c  |  4 ++--
 gcc/fortran/trans-decl.c|  3 +--
 gcc/fortran/trans-openmp.c  | 62 +-
 gcc/fortran/trans.h |  2 +-
 gcc/langhooks-def.h |  4 ++--
 gcc/langhooks.h | 13 -
 gcc/omp-general.c   | 14 ++
 gcc/omp-general.h   |  2 +-
 gcc/omp-low.c   | 98 --
 libgomp/testsuite/libgomp.fortran/use_device_ptr-optional-2.f90 | 33 +
 10 files changed, 191 insertions(+), 44 deletions(-)

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 0684c3b99cf..c7b592dbfe2 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -115,7 +115,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #undef LANG_HOOKS_INIT_TS
 #undef LANG_HOOKS_OMP_ARRAY_DATA
 #undef LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR
-#undef LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT
+#undef LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT
 #undef LANG_HOOKS_OMP_PRIVATIZE_BY_REFERENCE
 #undef LANG_HOOKS_OMP_PREDETERMINED_SHARING
 #undef LANG_HOOKS_OMP_REPORT_DECL
@@ -150,7 +150,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #define LANG_HOOKS_INIT_TS		gfc_init_ts
 #define LANG_HOOKS_OMP_ARRAY_DATA		gfc_omp_array_data
 #define LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR	gfc_omp_is_allocatable_or_ptr
-#define LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT	gfc_omp_is_optional_argument
+#define LANG_HOOKS_OMP_CHECK_OPTIONAL_ARGUMENT	gfc_omp_check_optional_argument
 #define LANG_HOOKS_OMP_PRIVATIZE_BY_REFERENCE	gfc_omp_privatize_by_reference
 #define LANG_HOOKS_OMP_PREDETERMINED_SHARING	gfc_omp_predetermined_sharing
 #define LANG_HOOKS_OMP_REPORT_DECL		gfc_omp_report_decl
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index ffa6316..80ef45d892e 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -2691,9 +2691,8 @@ create_function_arglist (gfc_symbol * sym)
 	  && (!f->sym->attr.proc_pointer
 	  && f->sym->attr.flavor != FL_PROCEDURE))
 	DECL_BY_REFERENCE (parm) = 1;
-  if (f->sym->attr.optional && !f->sym->attr.value)

Re: introduce -fcallgraph-info option

2019-11-06 Thread Alexandre Oliva

On Nov  4, 2019, Richard Biener  wrote:

> I wonder why we shouldn't simply adjust aux_base_name to something
> else for -flto [in the driver].

About that, having tried to make sense of the current uses of
aux_base_name and of lto-wrapper, three main possibilities occur to me:

a) adjust the driver code to accept -auxbase, and have lto-wrapper
explicitly pass -aux-base ${output_dir-.}/$(lbasename ${output_name}) or
somesuch for each -fltrans command;

b) introduce -auxdir and get lto-wrapper to pass -auxdir ${output_dir-.}
for each -fltrans (and offload) command; or

c) get -fltrans to implicitly adjust aux_base_name with the directory
passed to -dumpdir, if any, or . otherwise

Any preferences?

-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);

Re: [PATCH, GCC/ARM, 2/10] Add command line support for Armv8.1-M Mainline

2019-11-06 Thread Kyrill Tkachov


Hi Mihail,

On 11/4/19 4:49 PM, Kyrill Tkachov wrote:

Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:
> [PATCH, GCC/ARM, 2/10] Add command line support
>
> Hi,
>
> === Context ===
>
> This patch is part of a patch series to add support for Armv8.1-M
> Mainline Security Extensions architecture. Its purpose is to add
> command-line support for that new architecture.
>
> === Patch description ===
>
> Besides the expected enabling of the new value for the -march
> command-line option (-march=armv8.1-m.main) and its extensions (see
> below), this patch disables support of the Security Extensions for this
> newly added architecture. This is done both by not including the cmse
> bit in the architecture description and by throwing an error message
> when user request Armv8.1-M Mainline Security Extensions. Note that
> Armv8-M Baseline and Mainline Security Extensions are still enabled.
>
> Only extensions for already supported instructions are implemented in
> this patch. Other extensions (MVE integer and float) will be added in
> separate patches. The following configurations are allowed for Armv8.1-M
> Mainline with regards to FPU and implemented in this patch:
> + no FPU (+nofp)
> + single precision VFPv5 with FP16 (+fp)
> + double precision VFPv5 with FP16 (+fp.dp)
>
> ChangeLog entry are as follow:
>
> *** gcc/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * config/arm/arm-cpus.in (armv8_1m_main): New feature.
>     (ARMv4, ARMv4t, ARMv5t, ARMv5te, ARMv5tej, ARMv6, ARMv6j, 
ARMv6k,
>     ARMv6z, ARMv6kz, ARMv6zk, ARMv6t2, ARMv6m, ARMv7, ARMv7a, 
ARMv7ve,

>     ARMv7r, ARMv7m, ARMv7em, ARMv8a, ARMv8_1a, ARMv8_2a, ARMv8_3a,
>     ARMv8_4a, ARMv8_5a, ARMv8m_base, ARMv8m_main, ARMv8r): Reindent.
>     (ARMv8_1m_main): New feature group.
>     (armv8.1-m.main): New architecture.
>     * config/arm/arm-tables.opt: Regenerate.
>     * config/arm/arm.c (arm_arch8_1m_main): Define and default
> initialize.
>     (arm_option_reconfigure_globals): Initialize arm_arch8_1m_main.
>     (arm_options_perform_arch_sanity_checks): Error out when 
targeting

>     Armv8.1-M Mainline Security Extensions.
>     * config/arm/arm.h (arm_arch8_1m_main): Declare.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2019-10-23  Mihail-Calin Ionescu 
> 2019-10-23  Thomas Preud'homme 
>
>     * lib/target-supports.exp
> (check_effective_target_arm_arch_v8_1m_main_ok): Define.
>     (add_options_for_arm_arch_v8_1m_main): Likewise.
> (check_effective_target_arm_arch_v8_1m_main_multilib): Likewise.
>
> Testing: bootstrapped on arm-linux-gnueabihf and arm-none-eabi; 
testsuite

> shows no regression.
>
> Is this ok for trunk?
>
Ok.



Something that I remembered last night upon reflection...

New command-line options (or arguments to them) need documentation in 
invoke.texi.


Please add some either as part of this patch or as a separate patch if 
you prefer.


Thanks,

Kyrill



Thanks,

Kyrill


> Best regards,
>
> Mihail
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index
> 
f8a3b3db67a537163bfe787d78c8f2edc4253ab3..652f2a4be9388fd7a74f0ec4615a292fd1cfcd36 


> 100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -126,6 +126,9 @@ define feature armv8_5
>  # M-Profile security extensions.
>  define feature cmse
>
> +# Architecture rel 8.1-M.
> +define feature armv8_1m_main
> +
>  # Floating point and Neon extensions.
>  # VFPv1 is not supported in GCC.
>
> @@ -223,21 +226,21 @@ define fgroup ALL_FPU_INTERNAL vfpv2 vfpv3 vfpv4
> fpv5 fp16conv fp_dbl ALL_SIMD_I
>  # -mfpu support.
>  define fgroup ALL_FP    fp16 ALL_FPU_INTERNAL
>
> -define fgroup ARMv4   armv4 notm
> -define fgroup ARMv4t  ARMv4 thumb
> -define fgroup ARMv5t  ARMv4t armv5t
> -define fgroup ARMv5te ARMv5t armv5te
> -define fgroup ARMv5tej    ARMv5te
> -define fgroup ARMv6   ARMv5te armv6 be8
> -define fgroup ARMv6j  ARMv6
> -define fgroup ARMv6k  ARMv6 armv6k
> -define fgroup ARMv6z  ARMv6
> -define fgroup ARMv6kz ARMv6k quirk_armv6kz
> -define fgroup ARMv6zk ARMv6k
> -define fgroup ARMv6t2 ARMv6 thumb2
> +define fgroup ARMv4 armv4 notm
> +define fgroup ARMv4t    ARMv4 thumb
> +define fgroup ARMv5t    ARMv4t armv5t
> +define fgroup ARMv5te   ARMv5t armv5te
> +define fgroup ARMv5tej  ARMv5te
> +define fgroup ARMv6 ARMv5te armv6 be8
> +define fgroup ARMv6j    ARMv6
> +define fgroup ARMv6k    ARMv6 armv6k
> +define fgroup ARMv6z    ARMv6
> +define fgroup ARMv6kz   ARMv6k quirk_armv6kz
> +define fgroup ARMv6zk   ARMv6k
> +define fgroup ARMv6t2   ARMv6 thumb2
>  # This is suspect.  ARMv6-m doesn't really pull in any useful features
>  # from ARMv5* or ARMv6.
> -define fgroup ARMv6m  armv4 thumb armv5t armv5te armv6 be8
> +define

[PATCH 1/3] libgcc: Add --disable-eh-frame-registry configure option

2019-11-06 Thread Jozef Lawrynowicz

The attached patch enables the EH Frame Registry to be explicitly disabled
with a configure option "-disable-eh-frame-registry", thereby removing code to
support it in crtstuff.c

Default behaviour is unchanged since USE_EH_FRAME_REGISTRY was previously
referenced only internally in crtstuff.c, and now is only defined to 0
when it would previously have not been defined at all.
>From 31fdea3564fd0a9a25547df0d5052133d7bdc8a6 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Tue, 29 Oct 2019 12:55:11 +
Subject: [PATCH 1/3] libgcc: Add --disable-eh-frame-registry configure option

gcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz  

	* doc/install.texi: Document --disable-eh-frame-registry.

libgcc/ChangeLog:

2019-11-06  Jozef Lawrynowicz  

	* Makefile.in: Add USE_EH_FRAME_REGISTRY variable.
	Use USE_EH_FRAME_REGISTRY variable in CRTSTUFF_CFLAGS. 
	* configure: Regenerate.
	* configure.ac: Support --disable-eh-frame-registry.
	* crtstuff.c [!USE_EH_FRAME_REGISTRY]: Define USE_EH_FRAME_REGISTRY.
	s/#ifdef USE_EH_FRAME_REGISTRY/#if USE_EH_FRAME_REGISTRY/.
	s/#if defined(USE_EH_FRAME_REGISTRY)/#if USE_EH_FRAME_REGISTRY/.

---
 gcc/doc/install.texi | 11 +++
 libgcc/Makefile.in   |  4 +++-
 libgcc/configure | 22 ++
 libgcc/configure.ac  | 17 +
 libgcc/crtstuff.c| 22 +-
 5 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 563de705881..af61a34a477 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1314,6 +1314,17 @@ Disable TM clone registry in libgcc. It is enabled in libgcc by default.
 This option helps to reduce code size for embedded targets which do
 not use transactional memory.
 
+@item --disable-eh-frame-registry
+Disable the EH frame registry in libgcc.  It is enabled in libgcc by default
+for most ELF targets.
+
+This should not be used unless exceptions have been disabled for the target
+configuration.
+
+This option reduces code size by removing functionality to register the
+exception handling frame information that would normally run before
+@samp{main()}.
+
 @item --with-cpu=@var{cpu}
 @itemx --with-cpu-32=@var{cpu}
 @itemx --with-cpu-64=@var{cpu}
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 5608352a900..59f7f3cc381 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -261,6 +261,8 @@ CET_FLAGS = @CET_FLAGS@
 
 USE_TM_CLONE_REGISTRY = @use_tm_clone_registry@
 
+USE_EH_FRAME_REGISTRY = @use_eh_frame_registry@
+
 # Defined in libgcc2.c, included only in the static library.
 LIB2FUNCS_ST = _eprintf __gcc_bcmp
 
@@ -301,7 +303,7 @@ CRTSTUFF_CFLAGS = -O2 $(GCC_CFLAGS) $(INCLUDES) $(MULTILIB_CFLAGS) -g0 \
   $(NO_PIE_CFLAGS) -finhibit-size-directive -fno-inline -fno-exceptions \
   -fno-zero-initialized-in-bss -fno-toplevel-reorder -fno-tree-vectorize \
   -fbuilding-libgcc -fno-stack-protector $(FORCE_EXPLICIT_EH_REGISTRY) \
-  $(INHIBIT_LIBC_CFLAGS) $(USE_TM_CLONE_REGISTRY)
+  $(INHIBIT_LIBC_CFLAGS) $(USE_TM_CLONE_REGISTRY) $(USE_EH_FRAME_REGSITRY)
 
 # Extra flags to use when compiling crt{begin,end}.o.
 CRTSTUFF_T_CFLAGS =
diff --git a/libgcc/configure b/libgcc/configure
index 117e9c97e57..341c609252e 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -605,6 +605,7 @@ solaris_ld_v2_maps
 real_host_noncanonical
 accel_dir_suffix
 use_tm_clone_registry
+use_eh_frame_registry
 force_explicit_eh_registry
 CET_FLAGS
 fixed_point
@@ -713,6 +714,7 @@ enable_decimal_float
 with_system_libunwind
 enable_cet
 enable_explicit_exception_frame_registration
+enable_eh_frame_registry
 enable_tm_clone_registry
 with_glibc_version
 enable_tls
@@ -1357,6 +1359,7 @@ Optional Features:
   register exception tables explicitly at module
   start, for use e.g. for compatibility with
   installations without PT_GNU_EH_FRAME support
+  --disable-eh-frame-registrydisable EH frame registry
   --disable-tm-clone-registrydisable TM clone registry
   --enable-tlsUse thread-local storage [default=yes]
 
@@ -4956,6 +4959,25 @@ fi
 
 
 
+# EH Frame Registry is implicitly enabled by default (although it is not
+# "forced"), and libgcc/crtstuff.c will setup the support for it if it is
+# supported by the target.  So we don't handle --enable-eh-frame-registry.
+# Check whether --enable-eh-frame-registry was given.
+if test "${enable_eh_frame_registry+set}" = set; then :
+  enableval=$enable_eh_frame_registry;
+use_eh_frame_registry=
+if test "$enable_eh_frame_registry" = no; then
+  if test "$enable_explicit_exception_frame_registration" = yes; then
+as_fn_error $? "Can't --disable-eh-frame-registry with
+		  with --enable-explicit-exception-frame-registration" "$LINENO" 5
+  fi
+  use_eh_frame_registry=-DUSE_EH_FRAME_REGISTRY=0
+fi
+
+fi
+
+
+
 # Check whether --enable-tm-clone-registry was given.
 if test "${enable_tm_clone_registry+set}" = set; then :

[PATCH, rs6000] Add xxswapd support for V2DF and V2DI modes

2019-11-06 Thread Kelvin Nilsen



It was recently discovered that the existing xxswapd instruction patterns lack 
support for the V2DF and V2DI modes.  Support for these modes is required for 
certain new instruction patterns that are being implemented.

This patch adds the desired support.

The patch has been bootstrapped and tested without regressions on 
powerpc64le-unknown-linux.

Is this ok for trunk?

gcc/ChangeLog:

2019-11-06  Kelvin Nilsen  

* config/rs6000/vsx.md (xxswapd_): Add support for V2DF and
V2DI modes.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 277861)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2987,6 +2987,17 @@
   "xxpermdi %x0,%x1,%x1,2"
   [(set_attr "type" "vecperm")])
 
+(define_insn "xxswapd_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_select:VSX_D
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 1) (const_int 0)])))]
+  "TARGET_VSX"
+;; AIX does not support extended mnemonic xxswapd.  Use the basic
+;; mnemonic xxpermdi instead.
+  "xxpermdi %x0,%x1,%x1,2"
+  [(set_attr "type" "vecperm")])
+
 ;; lxvd2x for little endian loads.  We need several of
 ;; these since the form of the PARALLEL differs by mode.
 (define_insn "*vsx_lxvd2x2_le_"

Re: GCC wwwdocs move to git done

2019-11-06 Thread Georg-Johann Lay


Am 09.10.19 um 02:27 schrieb Joseph Myers:

I've done the move of GCC wwwdocs to git (using the previously posted and
discussed scripts), including setting up the post-receive hook to do the
same things previously covered by the old CVS hooks, and minimal updates
to the web pages dealing with the CVS setup for wwwdocs.


Hi,

May it be the case that some parts are missing?  In particular, I cannot
find the source of

https://gcc.gnu.org/install/configure.html

Johann

[PATCH] simplify-rtx: simplify_logical_relational_operation

2019-11-06 Thread Segher Boessenkool

This introduces simplify_logical_relational_operation.  Currently the
only thing implemented it can simplify is the IOR of two CONDs of the
same arguments.

Tested on powerpc64-linux {-m32,-m64}.

Is this okay for trunk?


Segher


2018-11-06  Segher Boessenkool  

* simplify-rtx.c (comparison_to_mask): New function.
(mask_to_comparison): New function.
(simplify_logical_relational_operation): New function.
(simplify_binary_operation_1): Call
simplify_logical_relational_operation.

---
 gcc/simplify-rtx.c | 130 +
 1 file changed, 130 insertions(+)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 9a70720..b2ba922 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2125,6 +2125,132 @@ simplify_associative_operation (enum rtx_code code, 
machine_mode mode,
   return 0;
 }
 
+/* Return a mask describing the COMPARISON.  */
+static int
+comparison_to_mask (enum rtx_code comparison)
+{
+  switch (comparison)
+{
+case LT:
+  return 8;
+case GT:
+  return 4;
+case EQ:
+  return 2;
+case UNORDERED:
+  return 1;
+
+case LTGT:
+  return 12;
+case LE:
+  return 10;
+case GE:
+  return 6;
+case UNLT:
+  return 9;
+case UNGT:
+  return 5;
+case UNEQ:
+  return 3;
+
+case ORDERED:
+  return 14;
+case NE:
+  return 13;
+case UNLE:
+  return 11;
+case UNGE:
+  return 7;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+/* Return a comparison corresponding to the MASK.  */
+static enum rtx_code
+mask_to_comparison (int mask)
+{
+  switch (mask)
+{
+case 8:
+  return LT;
+case 4:
+  return GT;
+case 2:
+  return EQ;
+case 1:
+  return UNORDERED;
+
+case 12:
+  return LTGT;
+case 10:
+  return LE;
+case 6:
+  return GE;
+case 9:
+  return UNLT;
+case 5:
+  return UNGT;
+case 3:
+  return UNEQ;
+
+case 14:
+  return ORDERED;
+case 13:
+  return NE;
+case 11:
+  return UNLE;
+case 7:
+  return UNGE;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+/* Simplify a logical operation CODE with result mode MODE, operating on OP0
+   and OP1, which should be both relational operations.  Return 0 if no such
+   simplification is possible.  */
+rtx
+simplify_logical_relational_operation (enum rtx_code code, machine_mode mode,
+  rtx op0, rtx op1)
+{
+  /* We only handle IOR of two relational operations.  */
+  if (code != IOR)
+return 0;
+
+  if (!(COMPARISON_P (op0) && COMPARISON_P (op1)))
+return 0;
+
+  if (!(rtx_equal_p (XEXP (op0, 0), XEXP (op1, 0))
+   && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1
+return 0;
+
+  enum rtx_code code0 = GET_CODE (op0);
+  enum rtx_code code1 = GET_CODE (op1);
+
+  /* We don't handle unsigned comparisons currently.  */
+  if (code0 == LTU || code0 == GTU || code0 == LEU || code0 == GEU)
+return 0;
+  if (code1 == LTU || code1 == GTU || code1 == LEU || code1 == GEU)
+return 0;
+
+  int mask0 = comparison_to_mask (code0);
+  int mask1 = comparison_to_mask (code1);
+
+  int mask = mask0 | mask1;
+
+  if (mask == 15)
+return const_true_rtx;
+
+  code = mask_to_comparison (mask);
+
+  op0 = XEXP (op1, 0);
+  op1 = XEXP (op1, 1);
+
+  return simplify_gen_relational (code, mode, VOIDmode, op0, op1);
+}
 
 /* Simplify a binary operation CODE with result mode MODE, operating on OP0
and OP1.  Return 0 if no simplification is possible.
@@ -2888,6 +3014,10 @@ simplify_binary_operation_1 (enum rtx_code code, 
machine_mode mode,
   tem = simplify_associative_operation (code, mode, op0, op1);
   if (tem)
return tem;
+
+  tem = simplify_logical_relational_operation (code, mode, op0, op1);
+  if (tem)
+   return tem;
   break;
 
 case XOR:
-- 
1.8.3.1

Re: Add object allocators to symbol and call summaries

2019-11-06 Thread Martin Liška


On 11/5/19 6:53 PM, Jan Hubicka wrote:

Found it.  It turns out that ggc_prune_ovehread_list is bogus.  It walks
all active allocations objects and looks if they was collected accoutnig
their collection and then throws away all allocations (including those
not colelcted) and those gets no longer accounted later.  So we
basically misaccount everything that survives ggc_collect.


I've just read the patch and it's correct. It was really a bogus.

Thanks for it,
Martin

Re: [PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

2019-11-06 Thread Kyrill Tkachov


Hi Mihail,

On 10/23/19 10:26 AM, Mihail Ionescu wrote:

[PATCH, GCC/ARM, 3/10] Save/restore FPCXTNS in nsentry functions

Hi,

=== Context ===

This patch is part of a patch series to add support for Armv8.1-M
Mainline Security Extensions architecture. Its purpose is to enable
saving/restoring of nonsecure FP context in function with the
cmse_nonsecure_entry attribute.

=== Motivation ===

In Armv8-M Baseline and Mainline, the FP context is cleared on return from
nonsecure entry functions. This means the FP context might change when
calling a nonsecure entry function. This patch uses the new VLDR and
VSTR instructions available in Armv8.1-M Mainline to save/restore the FP
context when calling a nonsecure entry functionfrom nonsecure code.

=== Patch description ===

This patch consists mainly of creating 2 new instruction patterns to
push and pop special FP registers via vldm and vstr and using them in
prologue and epilogue. The patterns are defined as push/pop with an
unspecified operation on the memory accessed, with an unspecified
constant indicating what special FP register is being saved/restored.

Other aspects of the patch include:
  * defining the set of special registers that can be saved/restored and
    their name
  * reserving space in the stack frames for these push/pop
  * preventing return via pop
  * guarding the clearing of FPSCR to target architecture not having
    Armv8.1-M Mainline instructions.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * config/arm/arm.c (fp_sysreg_names): Declare and define.
    (use_return_insn): Also return false for Armv8.1-M Mainline.
    (output_return_instruction): Skip FPSCR clearing if Armv8.1-M
    Mainline instructions are available.
    (arm_compute_frame_layout): Allocate space in frame for FPCXTNS
    when targeting Armv8.1-M Mainline Security Extensions.
    (arm_expand_prologue): Save FPCXTNS if this is an Armv8.1-M
    Mainline entry function.
    (cmse_nonsecure_entry_clear_before_return): Clear IP and r4 if
    targeting Armv8.1-M Mainline or successor.
    (arm_expand_epilogue): Fix indentation of caller-saved register
    clearing.  Restore FPCXTNS if this is an Armv8.1-M Mainline
    entry function.
    * config/arm/arm.h (TARGET_HAVE_FP_CMSE): New macro.
    (FP_SYSREGS): Likewise.
    (enum vfp_sysregs_encoding): Define enum.
    (fp_sysreg_names): Declare.
    * config/arm/unspecs.md (VUNSPEC_VSTR_VLDR): New volatile unspec.
    * config/arm/vfp.md (push_fpsysreg_insn): New define_insn.
    (pop_fpsysreg_insn): Likewise.

*** gcc/testsuite/Changelog ***

2019-10-23  Mihail-Calin Ionescu 
2019-10-23  Thomas Preud'homme 

    * gcc.target/arm/cmse/bitfield-1.c: add checks for VSTR and VLDR.
    * gcc.target/arm/cmse/bitfield-2.c: Likewise.
    * gcc.target/arm/cmse/bitfield-3.c: Likewise.
    * gcc.target/arm/cmse/cmse-1.c: Likewise.
    * gcc.target/arm/cmse/struct-1.c: Likewise.
    * gcc.target/arm/cmse/cmse.exp: Run existing Armv8-M Mainline 
tests
    from mainline/8m subdirectory and new Armv8.1-M Mainline tests 
from

    mainline/8_1m subdirectory.
    * gcc.target/arm/cmse/mainline/bitfield-4.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-4.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-5.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-6.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-6.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-7.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-8.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-9.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-9.c: This.
    * gcc.target/arm/cmse/mainline/bitfield-and-union-1.c: Move 
and rename

    into ...
    * gcc.target/arm/cmse/mainline/8m/bitfield-and-union.c: This.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-13.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-5.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-7.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    * gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: Move into ...
    * gcc.target/arm/cmse/mainline/8m/hard-sp/cmse-8.c: This.  
Clean up

    dg-skip-if directive for float ABI.
    *

[PATCH] Fix copy-paste typo syntax error by r277872

2019-11-06 Thread luoxhu

Tested pass and committed to r277904.


gcc/testsuite/ChangeLog:

2019-11-07  Xiong Hu Luo  

* gcc.target/powerpc/pr72804.c: Move inline options from
dg-require-effective-target to dg-options.
---
 gcc/testsuite/gcc.target/powerpc/pr72804.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr72804.c 
b/gcc/testsuite/gcc.target/powerpc/pr72804.c
index 0fc3df1d89b..10e37caed6b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr72804.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr72804.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok  -fno-inline-functions --param 
max-inline-insns-single-O2=200 } */
-/* { dg-options "-O2 -mvsx" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx -fno-inline-functions --param 
max-inline-insns-single-O2=200" } */
 
 __int128_t
 foo (__int128_t *src)
-- 
2.21.0.777.g83232e3864

[PATCH, rs6000 v2] Make load cost more in vectorization cost for P8/P9

2019-11-06 Thread Kewen.Lin

Hi Segher,

on 2019/11/7 上午1:38, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Nov 05, 2019 at 10:14:46AM +0800, Kewen.Lin wrote:
 + benefits were observed on Power8 and up, we can unify it if similar
 + profits are measured on Power6 and Power7.  */
 +  if (TARGET_P8_VECTOR)
 +return 2;
 +  else
 +return 1;
>>>
>>> Hrm, but you showed benchmark improvements for p9 as well?
>>>
>>
>> No significant gains but no degradation as well, so I thought it's fine to 
>> align
>> it together.  Does it make sense?
> 
> It's a bit strange at this point to do tunings for p8 that do we do not
> do for later cpus.
> 
>>> What happens if you enable this for everything as well?
>>
>> My concern was that if we enable it for everything, it's possible to 
>> introduce
>> degradation for some benchmarks on P6 or P7 where we didn't evaluate the
>> performance impact.
> 
> No one cares about p6.

OK.  :)

> 
> We reasonably expect it will work just as well on p7 as on p8 and later.
> That you haven't tested on p7 yet says something about how important that
> platform is now ;-)
> 

Yes, exactly.

>> Although it's reasonable from the point view of load latency,
>> it's possible to get worse result in the actual benchmarks based on my fine 
>> grain
>> cost adjustment experiment before.  
>>
>> Or do you suggest enabling it everywhere and solve the degradation issue if 
>> exposed?
>> I'm also fine with that.  :)
> 
> Yeah, let's just enable it everywhere.

One updated patch to enable it everywhere attached.


BR,
Kewen

---
gcc/ChangeLog

2019-11-07  Kewen Lin  

* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Make
scalar_load, vector_load, unaligned_load and vector_gather_load cost
more to conform hardware latency and insn cost settings.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5876714..1094fbd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4763,15 +4763,17 @@ rs6000_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   switch (type_of_cost)
 {
   case scalar_stmt:
-  case scalar_load:
   case scalar_store:
   case vector_stmt:
-  case vector_load:
   case vector_store:
   case vec_to_scalar:
   case scalar_to_vec:
   case cond_branch_not_taken:
 return 1;
+  case scalar_load:
+  case vector_load:
+   /* Like rs6000_insn_cost, make load insns cost a bit more.  */
+ return 2;
 
   case vec_perm:
/* Power7 has only one permute unit, make it a bit expensive.  */
@@ -4792,42 +4794,44 @@ rs6000_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
 
   case unaligned_load:
   case vector_gather_load:
+   /* Like rs6000_insn_cost, make load insns cost a bit more.  */
if (TARGET_EFFICIENT_UNALIGNED_VSX)
- return 1;
-
-if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
-  {
-elements = TYPE_VECTOR_SUBPARTS (vectype);
-if (elements == 2)
-  /* Double word aligned.  */
-  return 2;
-
-if (elements == 4)
-  {
-switch (misalign)
-  {
-case 8:
-  /* Double word aligned.  */
-  return 2;
+ return 2;
 
-case -1:
-  /* Unknown misalignment.  */
-case 4:
-case 12:
-  /* Word aligned.  */
-  return 22;
+   if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
+ {
+   elements = TYPE_VECTOR_SUBPARTS (vectype);
+   if (elements == 2)
+ /* Double word aligned.  */
+ return 4;
 
-default:
-  gcc_unreachable ();
-  }
-  }
-  }
+   if (elements == 4)
+ {
+   switch (misalign)
+ {
+ case 8:
+   /* Double word aligned.  */
+   return 4;
+
+ case -1:
+   /* Unknown misalignment.  */
+ case 4:
+ case 12:
+   /* Word aligned.  */
+   return 44;
+
+ default:
+   gcc_unreachable ();
+ }
+ }
+ }
 
-if (TARGET_ALTIVEC)
-  /* Misaligned loads are not supported.  */
-  gcc_unreachable ();
+   if (TARGET_ALTIVEC)
+ /* Misaligned loads are not supported.  */
+ gcc_unreachable ();
 
-return 2;
+   /* Like rs6000_insn_cost, make load insns cost a bit more.  */
+   return 4;
 
   case unaligned_store:
   case vector_scatter_store:

Re: [PATCH] Fix hash_operand for fields of a CONSTRUCTOR.

2019-11-06 Thread Jeff Law

On 11/5/19 1:35 AM, Martin Liška wrote:
> On 11/4/19 4:24 PM, Jeff Law wrote:
>> On 11/4/19 6:36 AM, Richard Biener wrote:
>>> On Mon, Nov 4, 2019 at 2:35 PM Richard Biener
>>>  wrote:

 On Mon, Nov 4, 2019 at 10:09 AM Martin Liška  wrote:
>
> On 11/1/19 10:51 PM, Jeff Law wrote:
>> On 10/31/19 10:01 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> operand_equal_p can properly handle situation where we have a 
>>> CONSTRUCTOR
>>>
>>> where indices are NULL:
>>>
>>> if (!operand_equal_p (c0->value, c1->value, flags)
>>> /* In GIMPLE the indexes can be either NULL or matching 
>>> i.
>>>
>>>    Double check this so we won't get false
>>>    positives for GENERIC.  */
>>> || (c0->index
>>> && (TREE_CODE (c0->index) != INTEGER_CST
>>> || compare_tree_int (c0->index, i)))
>>> || (c1->index
>>> && (TREE_CODE (c1->index) != INTEGER_CST
>>> || compare_tree_int (c1->index, i
>>>   return false;
>>>
>>> but the corresponding hash function always hashes field (which
>>> can be NULL_TREE or equal to ctor index).
>>>
>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>>
>>>
>>> Ready to be installed?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-10-31  Martin Liska  
>>>
>>>   PR ipa/92304
>>>   * fold-const.c (operand_compare::hash_operand): Fix field
>>>   hashing of CONSTRUCTOR.
>> OK.  One question though, do these routines need to handle
>> CONSTRUCTOR_NO_CLEARING?
>
> Good point, but I bet it's just a flag used in GENERIC, right?

 Yes.  It matters for gimplification only.  I don't think we can
 optimistically make use of it in operand_equal_p.
>>>
>>> OTOH for GENERIC and sth like ICF the flags have to match.
>> Precisely my concern.  I'm not immediately aware of any case where it
>> matters, but it'd be nice to future proof this if we can.
>>
>> jeff
>>
> 
> Sure, I've got the following tested patch.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> 0001-Add-CONSTRUCTOR_NO_CLEARING-to-operand_equal_p.patch
> 
> From 2302c15cb2568bc71b4b7bc3abbfd66aafc7c06c Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 4 Nov 2019 15:39:40 +0100
> Subject: [PATCH] Add CONSTRUCTOR_NO_CLEARING to operand_equal_p.
> 
> gcc/ChangeLog:
> 
> 2019-11-04  Martin Liska  
> 
>   * fold-const.c (operand_compare::operand_equal_p): Add comparison
>   of CONSTRUCTOR_NO_CLEARING.
>   (operand_compare::hash_operand): Likewise.
OK
jeff

[PATCH 0/2] Introduce a new GCC option, --record-gcc-command-line

2019-11-06 Thread Egeyar Bagcioglu

Hello,

I would like to propose the following patches which introduce a compile option 
--record-gcc-command-line. When passed to gcc, it saves the command line option 
into the produced object file. The option makes it trivial to trace back how a 
file was compiled and by which version of the gcc. It helps with debugging, 
reproducing bugs and repeating the build process.

This option is similar to -frecord-gcc-switches. However, they have three 
fundamental differences: Firstly, -frecord-gcc-switches saves the internal 
state after the argv is processed and passed by the driver. As opposed to that, 
--record-gcc-command-line saves the command-line as received by the driver. 
Secondly, -frecord-gcc-switches saves the switches as separate entries into a 
mergeable string section. Therefore, the entries belonging to different object 
files get mixed up after being linked. The new --record-gcc-command-line, on 
the other hand, creates one entry per invocation. By doing so, it makes it 
clear which options were used together in a single gcc invocation. Lastly, 
--record-gcc-command-line also adds the version of the gcc into this single 
entry to make it clear which version of gcc was called with any given command 
line. This is useful in cases where .comment section reports multiple versions.

While there are also similarities between the implementations of these two 
options, they are completely independent. These commands can be used separately 
or together without issues. I used the same section that -frecord-gcc-switches 
uses on purpose. I could not use the name -frecord-gcc-command-line for this 
option; because of a {f*} in the specs, which forwards all options starting 
with -f to cc1/cc1plus as is. This is not we want for this option. We would 
like to append it a filename as well to pass the argv of the driver to child 
processes.

This functionality operates as the following: It saves gcc's argv into a 
temporary file, and passes --record-gcc-command-line  to cc1 or 
cc1plus. The functionality of the backend is implemented via a hook. This patch 
includes an example implementation of the hook for elf targets: 
elf_record_gcc_command_line function. This function reads the given file and 
writes gcc's version and the command line into a mergeable string section, 
.GCC.command.line.

Here is an *example usage* of the option:
[egeyar@localhost save-commandline]$ gcc main.c --record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
  [ 0]  10.0.0 20191025 (experimental) : gcc main.c 
--record-gcc-command-line


The following is a *second example* calling g++ with -save-temps, 
-frecord-gcc-switches, and repetition of options, where --save-temps saves the 
intermediate file, main.cmdline in this case. You can see that the options are 
recorded unprocessed:

[egeyar@localhost save-commandline]$ g++ main.c -save-temps 
--record-gcc-command-line -O0 -O2 -O3 --record-gcc-command-line
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
  [ 0]  10.0.0 20191025 (experimental) : g++ main.c -save-temps 
--record-gcc-command-line -O0 -O2 -O3 --record-gcc-command-line


Here is a *third example* calling g++ with both -frecord-gcc-switches and 
--record-gcc-command-line for comparison:
[egeyar@localhost save-commandline]$ g++ main.c --record-gcc-command-line 
-frecord-gcc-switches
[egeyar@localhost save-commandline]$ readelf -p .GCC.command.line a.out

String dump of section '.GCC.command.line':
  [ 0]  10.0.0 20191025 (experimental) : g++ main.c 
--record-gcc-command-line -frecord-gcc-switches
  [5c]  -D_GNU_SOURCE
  [6a]  main.c
  [71]  -mtune=generic
  [80]  -march=x86-64
  [8e]  --record-gcc-command-line /tmp/ccgC4ZtS.cmdline


The first patch of this two-patch-series only extends the testsuite machinery, 
while the second patch implements this functionality and adds a test case for 
it. In addition to that new test case, I built binutils as my test case after 
passing this option to CFLAGS. The added .GCC.command.line section of ld listed 
many compile commands as expected. Tested on x86_64-pc-linux-gnu.

Please review the patches, let me know what you think and apply if appropriate.

Regards
Egeyar
 

Egeyar Bagcioglu (2):
  Introduce dg-require-target-object-format
  Introduce the gcc option --record-gcc-command-line

 gcc/common.opt |  4 +++
 gcc/config/elfos.h |  5 +++
 gcc/doc/tm.texi| 22 
 gcc/doc/tm.texi.in |  4 +++
 gcc/gcc.c  | 41 ++
 gcc/gcc.h  |  1 +
 gcc/target.def | 30 
 gcc/target.h   |  3

URGENT REPLY NEEDED

2019-11-06 Thread PCH

WHY HAVE I NOT HEARD FROM YOU?

[PATCH 2/2] Introduce the gcc option --record-gcc-command-line

2019-11-06 Thread Egeyar Bagcioglu

gcc/ChangeLog:
2019-10-21  Egeyar Bagcioglu  

* common.opt (--record-gcc-command-line): New option.
* config/elfos.h (TARGET_ASM_RECORD_GCC_COMMAND_LINE): Define
as elf_record_gcc_command_line.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_ASM_RECORD_GCC_COMMAND_LINE): Introduce.
(TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION): Likewise.
* gcc.c (_gcc_argc): New static variable.
(_gcc_argv): Likewise.
(record_gcc_command_line_spec_function): New function.
(cc1_options): Add --record-gcc-command-line.
(static_spec_functions): Add record_gcc_command_line_spec_function
with pseudo name record-gcc-command-line.
(driver::main): Call set_commandline.
(driver::set_commandline): Declare.
* gcc.h (driver::set_commandline): Declare.
* target.def (record_gcc_command_line): A new hook.
(record_gcc_command_line_section): A new hookpod.
* target.h (elf_record_gcc_command_line): Declare.
* toplev.c (init_asm_output): Check gcc_command_line_file and
call record_gcc_command_line if necessary.
* varasm.c: Include "version.h".
(elf_record_gcc_command_line): Define.

gcc/testsuite/ChangeLog:
2019-10-30  Egeyar Bagcioglu  

* c-c++-common/record-gcc-command-line.c: New test case.


---
 gcc/common.opt |  4 +++
 gcc/config/elfos.h |  5 +++
 gcc/doc/tm.texi| 22 
 gcc/doc/tm.texi.in |  4 +++
 gcc/gcc.c  | 41 ++
 gcc/gcc.h  |  1 +
 gcc/target.def | 30 
 gcc/target.h   |  3 ++
 .../c-c++-common/record-gcc-command-line.c |  8 +
 gcc/toplev.c   | 13 +++
 gcc/varasm.c   | 36 +++
 11 files changed, 167 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/record-gcc-command-line.c

diff --git a/gcc/common.opt b/gcc/common.opt
index cc279f4..59d670f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -394,6 +394,10 @@ Driver Alias(print-sysroot-headers-suffix)
 -profile
 Common Alias(p)
 
+-record-gcc-command-line
+Common Driver NoDriverArg Separate Var(gcc_command_line_file)
+Record the command line making this gcc call in the produced object file.
+
 -save-temps
 Driver Alias(save-temps)
 
diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h
index e00d437..5caa9e0 100644
--- a/gcc/config/elfos.h
+++ b/gcc/config/elfos.h
@@ -451,6 +451,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #undef  TARGET_ASM_RECORD_GCC_SWITCHES
 #define TARGET_ASM_RECORD_GCC_SWITCHES elf_record_gcc_switches
 
+/* Allow the use of the --record-gcc-command-line switch via the
+   elf_record_gcc_command_line function defined in varasm.c.  */
+#undef  TARGET_ASM_RECORD_GCC_COMMAND_LINE
+#define TARGET_ASM_RECORD_GCC_COMMAND_LINE elf_record_gcc_command_line
+
 /* A C statement (sans semicolon) to output to the stdio stream STREAM
any text necessary for declaring the name of an external symbol
named NAME which is referenced in this compilation but not defined.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 915e961..6da5e1b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -8061,6 +8061,28 @@ ELF implementation of the 
@code{TARGET_ASM_RECORD_GCC_SWITCHES} target
 hook.
 @end deftypevr
 
+@deftypefn {Target Hook} int TARGET_ASM_RECORD_GCC_COMMAND_LINE ()
+Provides the target with the ability to record the command line that
+has been passed to the compiler driver. The @var{gcc_command_line_file}
+variable specifies the intermediate file that holds the command line.
+
+The return value must be zero.  Other return values may be supported
+in the future.
+
+By default this hook is set to NULL, but an example implementation,
+@var{elf_record_gcc_command_line}, is provided for ELF based targets.
+it records the command line as ASCII text inside a new, mergable string
+section in the assembler output file.  The name of the new section is
+provided by the @code{TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION}
+target hook.
+@end deftypefn
+
+@deftypevr {Target Hook} {const char *} 
TARGET_ASM_RECORD_GCC_COMMAND_LINE_SECTION
+This is the name of the section that will be created by the example
+ELF implementation of the @code{TARGET_ASM_RECORD_GCC_COMMAND_LINE}
+target hook.
+@end deftypevr
+
 @need 2000
 @node Data Output
 @subsection Output of Data
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index ac0f049..73ca552 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -5192,6 +5192,10 @@ It must not be modified by command-line option 
processing.
 
 @hook TARGET_ASM_RECORD_GCC_SWITCHES_SECTION

[PATCH 1/2] Introduce dg-require-target-object-format

2019-11-06 Thread Egeyar Bagcioglu

gcc/testsuite/ChangeLog:
2019-11-06  Egeyar Bagcioglu  

* lib/target-supports-dg.exp: Define dg-require-target-object-format.

---
 gcc/testsuite/lib/target-supports-dg.exp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index e1da57a..e923754 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -164,6 +164,17 @@ proc dg-require-dll { args } {
 set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
 }
 
+# If this target does not produce the given object format skip this test.
+
+proc dg-require-target-object-format { args } {
+if { [gcc_target_object_format] == [lindex $args 1] } {
+   return
+}
+
+upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+
 # If this host does not support an ASCII locale, skip this test.
 
 proc dg-require-ascii-locale { args } {
-- 
1.8.3.1

Re: [PATCH, rs6000] Add xxswapd support for V2DF and V2DI modes

2019-11-06 Thread Segher Boessenkool

Hi!

On Wed, Nov 06, 2019 at 10:41:42AM -0600, Kelvin Nilsen wrote:
> It was recently discovered that the existing xxswapd instruction patterns 
> lack support for the V2DF and V2DI modes.  Support for these modes is 
> required for certain new instruction patterns that are being implemented.

Okay for trunk.  Thanks!


Segher


> 2019-11-06  Kelvin Nilsen  
> 
>   * config/rs6000/vsx.md (xxswapd_): Add support for V2DF and
>   V2DI modes.

Re: Free memory used by optimization/target options

2019-11-06 Thread Jeff Law

On 11/6/19 2:22 AM, Martin Liška wrote:
> On 11/5/19 11:40 AM, Jan Hubicka wrote:
>> +    print "  if (ptr->" name")";
>> +    print "    free (const_cast (ptr->" name"));";
> 
> If I'm correct, you can call free even for a NULL pointer.
You can and I think we expunged all the NULL tests before calling free
years ago.

jeff

[PATCH] libstdc++: remove redundant equality operators

2019-11-06 Thread Jonathan Wakely


Now that operator<=> is supported, these operators can be generated by
the compiler.

* include/bits/iterator_concepts.h (unreachable_sentinel_t): Remove
redundant equality operators.
* testsuite/util/testsuite_iterators.h (test_range::sentinel):
Likewise.

Tested powerpc64le-linux, committed to trunk.


commit f11a631b97047ef97d7658ca6aebeb392d55f2b3
Author: Jonathan Wakely 
Date:   Wed Nov 6 00:53:23 2019 +

libstdc++: remove redundant equality operators

Now that operator<=> is supported, these operators can be generated by
the compiler.

* include/bits/iterator_concepts.h (unreachable_sentinel_t): Remove
redundant equality operators.
* testsuite/util/testsuite_iterators.h (test_range::sentinel):
Likewise.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index e30645e05cf..8b398616a56 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -797,23 +797,6 @@ namespace ranges
   friend constexpr bool
   operator==(unreachable_sentinel_t, const _It&) noexcept
   { return false; }
-
-#ifndef __cpp_lib_three_way_comparison
-template
-  friend constexpr bool
-  operator!=(unreachable_sentinel_t, const _It&) noexcept
-  { return true; }
-
-template
-  friend constexpr bool
-  operator==(const _It&, unreachable_sentinel_t) noexcept
-  { return false; }
-
-template
-  friend constexpr bool
-  operator!=(const _It&, unreachable_sentinel_t) noexcept
-  { return true; }
-#endif
   };
 
   inline constexpr unreachable_sentinel_t unreachable_sentinel{};
diff --git a/libstdc++-v3/testsuite/util/testsuite_iterators.h 
b/libstdc++-v3/testsuite/util/testsuite_iterators.h
index d20257c1b31..4c5e9a3cc1d 100644
--- a/libstdc++-v3/testsuite/util/testsuite_iterators.h
+++ b/libstdc++-v3/testsuite/util/testsuite_iterators.h
@@ -677,15 +677,6 @@ namespace __gnu_test
 
  friend bool operator==(const sentinel& s, const I& i)
  { return s.end == i.ptr; }
-
- friend bool operator!=(const sentinel& s, const I& i)
- { return !(s == i); }
-
- friend bool operator==(const I& i, const sentinel& s)
- { return s == i; }
-
- friend bool operator!=(const I& i, const sentinel& s)
- { return !(s == i); }
};
 
   auto

[PATCH] libstdc++: Add compare_three_way and install header

2019-11-06 Thread Jonathan Wakely


* include/Makefile.in: Regenerate.
* libsupc++/Makefile.in: Regenerate.
* libsupc++/compare (__3way_builtin_ptr_cmp): Define helper.
(compare_three_way): Add missing implementation.

Tested powerpc64le-linux, committed to trunk.


commit b5b6317a7d969dc65d853a2d461bf7c07ff88d28
Author: Jonathan Wakely 
Date:   Wed Nov 6 08:08:19 2019 +

libstdc++: Add compare_three_way and install  header

* include/Makefile.in: Regenerate.
* libsupc++/Makefile.in: Regenerate.
* libsupc++/compare (__3way_builtin_ptr_cmp): Define helper.
(compare_three_way): Add missing implementation.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index 379b2d48582..2e518ccbffd 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -519,7 +519,8 @@ namespace std
 
   // [cmp.common], common comparison category type
   template
-struct common_comparison_category {
+struct common_comparison_category
+{
   // using type = TODO
 };
 
@@ -527,7 +528,7 @@ namespace std
 using common_comparison_category_t
   = typename common_comparison_category<_Ts...>::type;
 
-#if __cpp_concepts
+#if __cpp_lib_concepts
   namespace __detail
   {
 template
@@ -604,20 +605,42 @@ namespace std
 using compare_three_way_result_t
   = typename compare_three_way_result<_Tp, _Up>::__type;
 
+#if __cpp_lib_concepts
+  namespace __detail
+  {
+// BUILTIN-PTR-THREE-WAY(T, U)
+template
+  concept __3way_builtin_ptr_cmp
+   = convertible_to<_Tp, const volatile void*>
+ && convertible_to<_Up, const volatile void*>
+ && ! requires(_Tp&& __t, _Up&& __u)
+{ operator<=>(static_cast<_Tp&&>(__t), static_cast<_Up&&>(__u)); }
+ && ! requires(_Tp&& __t, _Up&& __u)
+{ static_cast<_Tp&&>(__t).operator<=>(static_cast<_Up&&>(__u)); };
+  } // namespace __detail
+
   // [cmp.object], typename compare_three_way
   struct compare_three_way
   {
-// TODO
-#if 0
 template
   requires (three_way_comparable_with<_Tp, _Up>
- || BUILTIN-PTR-THREE-WAY(_Tp, _Up))
+ || __detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
 constexpr auto
 operator()(_Tp&& __t, _Up&& __u) const noexcept
 {
-  // TODO
+  if constexpr (__detail::__3way_builtin_ptr_cmp<_Tp, _Up>)
+   {
+ auto __pt = static_cast(__t);
+ auto __pu = static_cast(__u);
+ if (__builtin_is_constant_evaluated())
+   return __pt <=> __pu;
+ auto __it = reinterpret_cast<__UINTPTR_TYPE__>(__pt);
+ auto __iu = reinterpret_cast<__UINTPTR_TYPE__>(__pu);
+ return __it <=> __iu;
+   }
+  else
+   return static_cast<_Tp&&>(__t) <=> static_cast<_Up&&>(__u);
 }
-#endif
 
 using is_transparent = void;
   };
@@ -635,7 +658,8 @@ namespace std
 inline constexpr unspecified compare_partial_order_fallback = unspecified;
 #endif
   }
-}
+#endif
+} // namespace std
 
 #pragma GCC visibility pop

[PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Martin Sebor


The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
  123 |   *p = 0;
  |   ~~~^~~
note: destination object declared here
   45 |   char b[N];
  |^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
   45 |   char b[N];
  |^

Tested on x86_64-linux.

Martin
gcc/ChangeLog:

	* builtins.c (compute_objsize): Add an argument and set it to offset
	into destination.
	* builtins.h (compute_objsize): Add an argument.
	* tree-object-size.c (addr_object_size): Add an argument and set it
	to offset into destination.
	(compute_builtin_object_size): Same.
	* tree-object-size.h (compute_builtin_object_size): Add an argument.
	* tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
	to offset into destination.
	(maybe_warn_overflow): New function.
	(handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
	* g++.dg/warn/Wstringop-overflow-3.C: Same.
	* gcc.dg/Wstringop-overflow-17.c: Same.

Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 277886)
+++ gcc/builtins.c	(working copy)
@@ -3571,23 +3571,29 @@ check_access (tree exp, tree, tree, tree dstwrite,
a non-constant offset in some range the returned value represents
the largest size given the smallest non-negative offset in the
range.  If nonnull, set *PDECL to the decl of the referenced
-   subobject if it can be determined, or to null otherwise.
+   subobject if it can be determined, or to null otherwise.  Likewise,
+   when POFF is nonnull *POFF is set to the offset into *PDECL.
The function is intended for diagnostics and should not be used
to influence code generation or optimization.  */
 
 tree
-compute_objsize (tree dest, int ostype, tree *pdecl /* = NULL */)
+compute_objsize (tree dest, int ostype, tree *pdecl /* = NULL */,
+		 tree *poff /* = NULL */)
 {
-  tree dummy = NULL_TREE;
+  tree dummy_decl = NULL_TREE;
   if (!pdecl)
-pdecl = 
+pdecl = _decl;
 
+  tree dummy_off = size_zero_node;
+  if (!poff)
+poff = _off;
+
   unsigned HOST_WIDE_INT size;
 
   /* Only the two least significant bits are meaningful.  */
   ostype &= 3;
 
-  if (compute_builtin_object_size (dest, ostype, , pdecl))
+  if (compute_builtin_object_size (dest, ostype, , pdecl, poff))
 return build_int_cst (sizetype, size);
 
   if (TREE_CODE (dest) == SSA_NAME)
@@ -3608,7 +3614,7 @@ tree
 	  tree off = gimple_assign_rhs2 (stmt);
 	  if (TREE_CODE (off) == INTEGER_CST)
 	{
-	  if (tree size = compute_objsize (dest, ostype, pdecl))
+	  if (tree size = compute_objsize (dest, ostype, pdecl, poff))
 		{
 		  wide_int wioff = wi::to_wide (off);
 		  wide_int wisiz = wi::to_wide (size);
@@ -3619,10 +3625,16 @@ tree
 		  if (wi::sign_mask (wioff))
 		;
 		  else if (wi::ltu_p (wioff, wisiz))
-		return wide_int_to_tree (TREE_TYPE (size),
-	 wi::sub (wisiz, wioff));
+		{
+		  *poff = size_binop (PLUS_EXPR, *poff, off);
+		  return wide_int_to_tree (TREE_TYPE (size),
+	   wi::sub (wisiz, wioff));
+		}
 		  else
-		return size_zero_node;
+		{
+		  *poff = size_binop (PLUS_EXPR, *poff, off);
+		  return size_zero_node;
+		}
 		}
 	}
 	  else if (TREE_CODE (off) == SSA_NAME
@@ -3644,10 +3656,18 @@ tree
 			  || wi::sign_mask (max))
 			;
 		  else if (wi::ltu_p (min, wisiz))
-			return wide_int_to_tree (TREE_TYPE (size),
-		 wi::sub (wisiz, min));
+			{
+			  *poff = size_binop (PLUS_EXPR, *poff,
+	  wide_int_to_tree (sizetype, min));
+			  return wide_int_to_tree (TREE_TYPE (size),
+		   wi::sub (wisiz, min));

Re: make value_range the base class and value_range_equiv the derived class

2019-11-06 Thread Jeff Law

On 11/5/19 6:21 AM, Aldy Hernandez wrote:
> The base class for ranges is currently value_range_base, which is rather
> long and cumbersome.  It also occurs more often than the derived class
> of value_range.  To avoid confusion, and save typing, this patch does a
> global rename from value_range to value_range_equiv, and from
> value_range_base to value_range.
> 
> This way, the base class is simply value_range, and the derived class is
> value_range_equiv which explicitly states what it does.
> 
> OK?
> 
> Aldy
> 
> p.s. There are a few minor cleanups throughout... like moving some
> random variable definitions closer to their first use.  I figured they
> were harmless while I was in the vicinity.
OK.  As would be Richi's suggestion for value_range_with_equiv.

I don't think we want to use evrp in the name though.  I don't think
this stuff is going to be limited to the EVRP space.

jeff

Re: [PATCH] Clear version_info_node in delete_function_version.

2019-11-06 Thread Jeff Law

On 11/5/19 8:12 AM, Martin Liška wrote:
> Hi.
> 
> When calling delete_function_version, we should also clear
> version_info_node once it can be seen GGC collect.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-11-05  Martin Liska  
> 
> PR c++/92354
> * cgraph.c (delete_function_version): Clear global
> variable version_info_node if equal to deleted
> function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-05  Martin Liska  
> 
> PR c++/92354
> * g++.target/i386/pr92354.C: New test.
OK
jeff

Re: [PATCH] Fix attribute((section)) for templates

2019-11-06 Thread Strager Neds

Summary: Do not merge my patch. It needs more work.

I realized a problem with my patch. With -flto,
__attribute__((section)) is still broken (i.e. my patch has no effect
for LTO builds). I narrowed the problem to localize_node
(gcc/ipa-visibility.c, historically part of
function_and_variable_visibility) deliberately clearing the section
name [1]. If I delete the section-clearing code in localize_node, the
bug is fixed with -flto. However, that code probably exists for a
reason. =]

While trying to find the meaning of the code in localize_node, I
stumbled upon some discussion on copy_node and section names [2]. It
looks like I should carefully audit callers of copy_node, and I should
consider putting the section name copy logic in a caller instead.

[1] https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00692.html
[2] https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00965.html


On Tue, Nov 5, 2019 at 3:38 PM Strager Neds  wrote:
>
> Aside: This is my first time working in GCC's code base. I probably made
> some mistakes. Please point them out. =]
>
> When GCC encounters __attribute__((section("foo"))) on a function or
> variable declaration, it adds an entry in the symbol table for the
> declaration to remember its desired section. The symbol table is
> separate from the declaration's tree node.
>
> When instantiating a template, GCC copies the tree of the template
> recursively. GCC does *not* copy symbol table entries when copying
> function and variable declarations.
>
> Combined, these two details mean that section attributes on function and
> variable declarations in a template have no effect.
>
> Fix this issue by copying the section name (in the symbol table) when
> copying a tree node for template instantiation. This addresses PR
> c++/70435 and PR c++/88061.
>
> Known unknowns (these are questions I'm thinking aloud to myself):
>
> * For all targets which support the section attribute, are functions and
>   variables deduplicated (comdat) when using a custom section? It seems
>   to work with GNU ELF on Linux (i.e. I end up with only one copy), but
>   I'm unsure about other platforms. Richard Biener raised this concern
>   in PR c++/88061
> * Are there other callers of copy_node which do not want section
>   attributes to be copied? I did not audit callers of copy_node.
> * Did this patch break anything? I had trouble running GCC's full test
>   suite, so I have not compared test results with and without this
>   patch.
>
> 2019-11-05  Matthew Glazar 
>
> * gcc/tree.c (copy_node): Copy section name from source SYMTAB_NODE, not
> just init priority.
>
> diff --git 
> gcc/testsuite/g++.dg/ext/section-class-template-specialized-static-variable.C
> gcc/testsuite/g++.dg/ext/section-class-template-specialized-static-variable.C
> new file mode 100644
> index 000..20f51fe665d
> --- /dev/null
> +++ 
> gcc/testsuite/g++.dg/ext/section-class-template-specialized-static-variable.C
> @@ -0,0 +1,29 @@
> +// PR c++/70435
> +// attribute((section)) should affect specialized static
> +// variables in class templates.
> +
> +// { dg-options "-std=c++17" }
> +// { dg-require-named-sections "" }
> +// { dg-final { scan-assembler-not {\.data.*my_var} } }
> +// { dg-final { scan-assembler {\.charsection.*my_var} } }
> +// { dg-final { scan-assembler {\.floatsection.*my_var} } }
> +
> +template
> +struct s
> +{
> +  static int my_var;
> +};
> +
> +template<>
> +int s::
> +my_var __attribute__((section(".charsection"))) = 1;
> +
> +template<>
> +int s::
> +my_var __attribute__((section(".floatsection"))) = 2;
> +
> +int *
> +f(bool which)
> +{
> +  return which ? ::my_var : ::my_var;
> +}
> diff --git 
> gcc/testsuite/g++.dg/ext/section-class-template-static-inline-variable.C
> gcc/testsuite/g++.dg/ext/section-class-template-static-inline-variable.C
> new file mode 100644
> index 000..e047c90c601
> --- /dev/null
> +++ gcc/testsuite/g++.dg/ext/section-class-template-static-inline-variable.C
> @@ -0,0 +1,20 @@
> +// PR c++/70435
> +// attribute((section)) should affect static inline variables in class
> +// templates.
> +
> +// { dg-options "-std=c++17" }
> +// { dg-require-named-sections "" }
> +// { dg-final { scan-assembler-not {\.data.*my_var} } }
> +// { dg-final { scan-assembler {\.testsection.*my_var} } }
> +
> +template
> +struct s
> +{
> +  inline static int my_var __attribute__((section(".testsection"))) = 1;
> +};
> +
> +int *
> +f(bool which)
> +{
> +  return which ? ::my_var : ::my_var;
> +}
> diff --git gcc/testsuite/g++.dg/ext/section-class-template-static-variable.C
> gcc/testsuite/g++.dg/ext/section-class-template-static-variable.C
> new file mode 100644
> index 000..ccf71e7c5df
> --- /dev/null
> +++ gcc/testsuite/g++.dg/ext/section-class-template-static-variable.C
> @@ -0,0 +1,22 @@
> +// attribute((section)) should affect static variables in class templates.
> +
> +// { dg-options "-std=c++17" }
> +// { dg-require-named-sections "" }
> +// { dg-final { scan-assembler-not

Re: [PATCH rs6000]Fix PR92132

2019-11-06 Thread Segher Boessenkool

Hi Ke Wen,

On Tue, Nov 05, 2019 at 04:35:05PM +0800, Kewen.Lin wrote:
> >>  ;; 128-bit one's complement
> >> -(define_insn_and_split "*one_cmpl3_internal"
> >> +(define_insn_and_split "one_cmpl3_internal"
> > 
> > Instead, rename it to "one_cmpl3" and delete the define_expand that
> > serves no function?
> 
> Renamed.  Sorry, what's the "define_expand" specified here.  I thought it's
> for existing one_cmpl3 but I didn't find it. 

The expander named "one_cmpl3":

Erm.  2, not 3 :-)

(define_expand "one_cmpl2"
  [(set (match_operand:BOOL_128 0 "vlogical_operand")
(not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")))]
  ""
  "")

while the define_insn is

(define_insn_and_split "*one_cmpl3_internal"
  [(set (match_operand:BOOL_128 0 "vlogical_operand" "=")
(not:BOOL_128
  (match_operand:BOOL_128 1 "vlogical_operand" "")))]
  ""
{

etc., so you can just delete the expand and rename the insn to the proper
name (one_cmpl2).  It sometimes is useful to have an expand like
this if there are multiple insns that could implement this, but that is
not the case here.

> >> +(define_code_iterator fpcmpun [ungt unge unlt unle])
> > 
> > Why these four?  Should there be more?  Should this be added to some
> > existing iterator?
> 
> For floating point comparison operator and vector type, currently rs6000
> supports eq, gt, ge, *ltgt, *unordered, *ordered, *uneq (* for unnamed).
> We can leverage gt, ge, eq for lt, le, ne, then these four left.

There are four conditions for FP: lt/gt/eq/un.  For every comparison,
exactly one of the four is true.  If not HONOR_NANS for this mode you
never have un, so it is one of lt/gt/eq then, just like with integers.

If we have HONOR_NANS(mode) (or !flag_finite_math_only), there are 14
possible combinations to test for (testing for any of the four or none
of the four is easy ;-) )

Four test just if lt, gt, eq, or un is set.  Another four test if one of
the flags is *not* set, or said differently, if one of three flags is set:
ordered, ne, unle, unge.  The remaining six test two flags each: ltgt, le,
unlt, ge, ungt, uneq.

> I originally wanted to merge them into the existing unordered or uneq, but
> I found it's hard to share their existing patterns.  For example, the uneq
> looks like:
> 
>   [(set (match_dup 3)
>   (gt:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 4)
>   (gt:VEC_F (match_dup 2)
> (match_dup 1)))
>(set (match_dup 0)
>   (and:VEC_F (not:VEC_F (match_dup 3))
>  (not:VEC_F (match_dup 4]

Or ge/ge/eqv, etc. -- there are multiple options.

> While ungt looks like:
> 
>   [(set (match_dup 3)
>   (ge:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 4)
>   (ge:VEC_F (match_dup 2)
> (match_dup 1)))
>(set (match_dup 3)
>   (ior:VEC_F (not:VEC_F (match_dup 3))
>  (not:VEC_F (match_dup 4
>(set (match_dup 4)
>   (gt:VEC_F (match_dup 1)
> (match_dup 2)))
>(set (match_dup 3)
>   (ior:VEC_F (match_dup 3)
>  (match_dup 4)))]

(set (match_dup 3)
 (ge:VEC_F (match_dup 2)
   (match_dup 1)))
(set (match_dup 0)
 (not:VEC_F (match_dup 3)))

should be enough?


So we have only gt/ge/eq.

I think the following are ooptimal (not tested!):

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
un(a,b) = ~(ge(a,b) | ge(b,a))

ltgt(a,b) = ge(a,b) ^ ge(b,a)
le(a,b)   = ge(b,a)
unlt(a,b) = ~ge(a,b)
ge(a,b)   = ge(a,b)
ungt(a,b) = ~ge(b,a)
uneq(a,b) = ~(ge(a,b) ^ ge(b,a))

ord(a,b)  = ge(a,b) | ge(b,a)
ne(a,b)   = ~eq(a,b)
unle(a,b) = ~gt(a,b)
unge(a,b) = ~gt(b,a)

This is quite regular :-)  5 are done with one cmp; 5 are done with a cmp
and an inversion; 4 are done with two compares and one xor/eqv/or/nor.


Half are pretty simple:

lt(a,b) = gt(b,a)
gt(a,b) = gt(a,b)
eq(a,b) = eq(a,b)
le(a,b) = ge(b,a)
ge(a,b) = ge(a,b)

ltgt(a,b) = ge(a,b) ^ ge(b,a)
ord(a,b)  = ge(a,b) | ge(b,a)

The other half are the negations of those:

unge(a,b) = ~gt(b,a)
unle(a,b) = ~gt(a,b)
ne(a,b)   = ~eq(a,b)
ungt(a,b) = ~ge(b,a)
unlt(a,b) = ~ge(a,b)

uneq(a,b) = ~(ge(a,b) ^ ge(b,a))
un(a,b) = ~(ge(a,b) | ge(b,a))


And please remember to test everythin with -ffast-math :-)  That is, when
flag_finite_math_only is set.  You cannot get unordered results, then,
making the optimal sequences different in some cases (and changing what
"ne" means!)

8 codes, ordered:never lt   gt   ltgt eq   le   ge   ordered
8 codes, unordered:  unordered unlt ungt ne   uneq unle unge always
8 codes, fast-math:  never lt   gt   ne   eq   le   ge   always
8 codes, non-fp: never lt   gt   ne   eq   le   ge   always


> >> +;; Same mode for condition true/false values and predicate operand.
> >> +(define_expand "vcond_mask_"
> >> +  [(match_operand:VEC_I 0 "vint_operand")
> >> +   (match_operand:VEC_I 1 "vint_operand")
> >> +   (match_operand:VEC_I 2 "vint_operand")
>

Re: [PATCH] bring -Warray-bounds closer to -Wstringop-overflow (PR91647, 91463, 91679)

2019-11-06 Thread Maciej W. Rozycki

On Wed, 6 Nov 2019, Jeff Law wrote:

> >  It is what I believe has also broken glibc:
> > 
> > In file included from ../sysdeps/riscv/libc-tls.c:19:
> > ../csu/libc-tls.c: In function '__libc_setup_tls':
> > ../csu/libc-tls.c:209:30: error: array subscript 1 is outside the bounds of 
> > an interior zero-length array 'struct dtv_slotinfo[0]' 
> > [-Werror=zero-length-bounds]
> >   209 |   static_slotinfo.si.slotinfo[1].map = main_map;
> >   |   ~~~^~~
> > In file included from ../sysdeps/riscv/ldsodefs.h:46,
> >  from ../sysdeps/gnu/ldsodefs.h:46,
> >  from ../sysdeps/unix/sysv/linux/ldsodefs.h:25,
> >  from ../sysdeps/unix/sysv/linux/riscv/ldsodefs.h:22,
> >  from ../csu/libc-tls.c:21,
> >  from ../sysdeps/riscv/libc-tls.c:19:
> > ../sysdeps/generic/ldsodefs.h:423:7: note: while referencing 'slotinfo'
> >   423 | } slotinfo[0];
> >   |   ^~~~
> > cc1: all warnings being treated as errors
> > 
> > (here in a RISC-V build).
> > 
> >  Has anybody looked yet into how the breakage could possibly be addressed?
> Yea, Florian posted patches over the weekend to fix glibc.  They're
> still going through the review/update cycle.

 Thanks, I have found them now, now that I knew what to look for and in 
what time frame.

 Unfortunately there's no mention of the error message or at least the 
name of the `-Wzero-length-bounds' option (which is how I found the GCC 
patch) in the respective glibc change descriptions so my mailing list 
searches returned nothing.  I think it would be good to try and have 
keywords potentially looked for in change descriptions, and verbatim error 
messages are certainly good candidates IMO.

 So I went for `-Wno-zero-length-bounds' for my glibc build for the time 
being, as my objective now is to get some outstanding GCC stuff in before 
stage 1 ends rather than being drawn into glibc build issues.

  Maciej

[C++ PATCH] Implement D1907R1 "structural type".

2019-11-06 Thread Jason Merrill

ISO C++ paper D1907R1 proposes "structural type" as an alternative to the
current notion of "strong structural equality", which has various problems.
I'm implementing it to give people a chance to try it.

The build_base_field changes are to make it easier for structural_type_p to
see whether a base is private or protected.

Tested x86_64-pc-linux-gnu, applying to trunk.

* tree.c (structural_type_p): New.
* pt.c (invalid_nontype_parm_type_p): Use it.
* class.c (build_base_field_1): Take binfo.  Copy TREE_PRIVATE.
(build_base_field): Pass binfo.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/class.c|  9 ++-
 gcc/cp/pt.c   | 18 ++
 gcc/cp/tree.c | 57 +++
 .../g++.dg/cpp2a/udlit-class-nttp-neg2.C  |  2 +-
 5 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2b45d62ce21..adc021b2a5c 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7302,6 +7302,7 @@ extern bool trivial_type_p
(const_tree);
 extern bool trivially_copyable_p   (const_tree);
 extern bool type_has_unique_obj_representations (const_tree);
 extern bool scalarish_type_p   (const_tree);
+extern bool structural_type_p  (tree, bool = false);
 extern bool type_has_nontrivial_default_init   (const_tree);
 extern bool type_has_nontrivial_copy_init  (const_tree);
 extern void maybe_warn_parm_abi(tree, location_t);
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 89ed1c040f6..a9aa5e77171 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4353,15 +4353,18 @@ layout_empty_base_or_field (record_layout_info rli, 
tree binfo_or_decl,
fields at NEXT_FIELD, and return it.  */
 
 static tree
-build_base_field_1 (tree t, tree basetype, tree *_field)
+build_base_field_1 (tree t, tree binfo, tree *_field)
 {
   /* Create the FIELD_DECL.  */
+  tree basetype = BINFO_TYPE (binfo);
   gcc_assert (CLASSTYPE_AS_BASE (basetype));
   tree decl = build_decl (input_location,
  FIELD_DECL, NULL_TREE, CLASSTYPE_AS_BASE (basetype));
   DECL_ARTIFICIAL (decl) = 1;
   DECL_IGNORED_P (decl) = 1;
   DECL_FIELD_CONTEXT (decl) = t;
+  TREE_PRIVATE (decl) = TREE_PRIVATE (binfo);
+  TREE_PROTECTED (decl) = TREE_PROTECTED (binfo);
   if (is_empty_class (basetype))
 /* CLASSTYPE_SIZE is one byte, but the field needs to have size zero.  */
 DECL_SIZE (decl) = DECL_SIZE_UNIT (decl) = size_zero_node;
@@ -4414,7 +4417,7 @@ build_base_field (record_layout_info rli, tree binfo,
   CLASSTYPE_EMPTY_P (t) = 0;
 
   /* Create the FIELD_DECL.  */
-  decl = build_base_field_1 (t, basetype, next_field);
+  decl = build_base_field_1 (t, binfo, next_field);
 
   /* Try to place the field.  It may take more than one try if we
 have a hard time placing the field without putting two
@@ -4448,7 +4451,7 @@ build_base_field (record_layout_info rli, tree binfo,
 aggregate bases.  */
   if (cxx_dialect >= cxx17 && !BINFO_VIRTUAL_P (binfo))
{
- tree decl = build_base_field_1 (t, basetype, next_field);
+ tree decl = build_base_field_1 (t, binfo, next_field);
  DECL_FIELD_OFFSET (decl) = BINFO_OFFSET (binfo);
  DECL_FIELD_BIT_OFFSET (decl) = bitsize_zero_node;
  SET_DECL_OFFSET_ALIGN (decl, BITS_PER_UNIT);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 061a92c9db0..8bacb3952ff 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -25748,21 +25748,15 @@ invalid_nontype_parm_type_p (tree type, 
tsubst_flags_t complain)
return false;
   if (!complete_type_or_else (type, NULL_TREE))
return true;
-  if (!literal_type_p (type))
+  if (!structural_type_p (type))
{
- error ("%qT is not a valid type for a template non-type parameter "
-"because it is not literal", type);
- explain_non_literal_class (type);
+ auto_diagnostic_group d;
+ if (complain & tf_error)
+   error ("%qT is not a valid type for a template non-type parameter "
+  "because it is not structural", type);
+ structural_type_p (type, true);
  return true;
}
-  if (cp_has_mutable_p (type))
-   {
- error ("%qT is not a valid type for a template non-type parameter "
-"because it has a mutable member", type);
- return true;
-   }
-  /* FIXME check op<=> and strong structural equality once spaceship is
-implemented.  */
   return false;
 }
 
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 5cdeb6a07fe..ba635d4ddbd 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -4378,6 +4378,63 @@ zero_init_p (const_tree t)
   return 1;
 }
 
+/* True IFF T is a C++20 structural type (P1907R1) that can be used as a
+   non-type template parameter.

Re: [PATCH, OpenACC, v2] Non-contiguous array support for OpenACC data clauses

2019-11-06 Thread Thomas Schwinge

Hi Chung-Lin!

On 2019-11-05T22:35:43+0800, Chung-Lin Tang  wrote:
> Hi Thomas,
> after your last round of review, I realized that the bulk of the compiler 
> omp-low work was
> simply a case of dumb over-engineering in the wrong direction :P
> (although it did painstakingly function correctly)

Hehe -- that happens.  ;-)

> However, the issue of ACC_DEVICE_TYPE=host not working (and hence 
> "!openacc_host_selected"
> in the testcases)

Actually not just for that, but also generally for any shared-memory
models that may come into existance at some point, such as CUDA Unified
Memory, for example?

> actually is a bit more sophisticated than I thought:
>
> The reason it doesn't work for the host device, is because we use the map 
> pointer (i.e.
> a hostaddrs[] entry when passed into libgomp) to point to an array descriptor 
> to pass
> the whole array information, and rely on code inside gomp_map_vars_* to setup 
> things,
> and place the final on-device address of the non-contig. array into 
> devaddrs[], therefore
> only using a single map entry (something I thought was quite clever)
>
> However, this broke down on the host and host-fallback devices, simply 
> because, there
> we do NOT do any gomp_map_vars processing; our current code in 
> GOACC_parallel_keyed
> simply skips it and passes the offload function the original hostaddrs[] 
> contents.
> Lacking the processing to transform the descriptor pointer into a proper 
> array ref,
> things of course segfault.
>
> So I think we have three options for this (which may have some interactions 
> with say,
> the "proper" host-side parallelization we eventually need to implement for 
> OpenACC 2.7)
>
> (1) The simplest solution: implement a processing which searches and reverts 
> such
> non-contiguous array map entries in GOACC_parallel_keyed.
> (note: I have implemented this in the current attached "v2" patch)
>
> (2) Make the GOACC_parallel_keyed code to not make short cuts for host-modes;
> i.e. still do the proper gomp_map_vars processing for all cases.
>
> (3) Modify the non-contiguous array map conventions: a possible solution is 
> to use
> two maps placed together: one for the array pointer, another for the array 
> descriptor (as
> opposed to the current style of using only one map) This needs more further 
> elaborate
> compiler/runtime work.
>
> The first two options will pessimize host-mode performance somewhat. The 
> third I have
> some WIP patches, but it's still buggy ATM. Seeking your opinion on what we 
> should do.

I'll have to think about it some more, but variant (1) doesn't seem so
bad actually, for a first take.  While it's not nice to pessimize in
particular directives with 'if (false)' clauses, at least it does work,
the run-time overhead should not be too bad (also compared to variant
(2), I suppose), and variant (3) can still be implemented later.


A few comments/questions:

Please reference PR76739 in your submission/ChangeLog updates.

> --- gcc/c/c-typeck.c  (revision 277827)
> +++ gcc/c/c-typeck.c  (working copy)
> @@ -12868,7 +12868,7 @@ c_finish_omp_cancellation_point (location_t loc, t
>  static tree
>  handle_omp_array_sections_1 (tree c, tree t, vec ,
>bool _zero_len, unsigned int _non_one,
> -  enum c_omp_region_type ort)
> +  bool _contiguous, enum c_omp_region_type ort)
>  {
>tree ret, low_bound, length, type;
>if (TREE_CODE (t) != TREE_LIST)

> @@ -13160,14 +13161,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec return error_mark_node;
>   }
>/* If there is a pointer type anywhere but in the very first
> -  array-section-subscript, the array section can't be contiguous.  */
> +  array-section-subscript, the array section can't be contiguous.
> +  Note that OpenACC does accept these kinds of non-contiguous pointer
> +  based arrays.  */

That comment update should instead be moved to the function comment
before the 'handle_omp_array_sections_1' function definition, and should
then also explain the new 'non_contiguous' out variable.  The latter
needs to be done anyway, and the former (no comment here) is easy enough
to tell from the code:

>if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
> && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
>   {
> -   error_at (OMP_CLAUSE_LOCATION (c),
> - "array section is not contiguous in %qs clause",
> - omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> -   return error_mark_node;
> +   if (ort == C_ORT_ACC)
> + non_contiguous = true;
> +   else
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + "array section is not contiguous in %qs clause",
> + omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> +   return error_mark_node;
> + }
>   }

> @@ -13238,6 +13247,7 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>

Re: [patch][avr] PR92055: Add switches to enable 64-bit [long] double.

2019-11-06 Thread Jeff Law

On 10/31/19 3:55 PM, Georg-Johann Lay wrote:
> Hi, this adds the possibility to enable IEEE compatible double
> and long double support in avr-gcc.
> 
> It supports 2 configure options
> 
> --with-double={32|64|32,64|64,32}
> --with-long-double={32|64|32,64|64,32|double}
> 
> which select the default layout of these types and also chose
> which mutlilib variants are built and available.
> 
> These two config option map to the new compiler options
> -mdouble= and -mlong-double= which are new multilib options.
> 
> The patch only deals with option handling and multilib bits,
> it does not add any double functionality.  The double support
> functions are supposed to be provided by avr-libc which also hosts
> all the float stuff, including __addsf3 etc.
> 
> Ok for trunk?
> 
> Johann
> 
> 
> gcc/
> Support 64-bit double and 64-bit long double configurations.
> 
> PR target/92055
> * config.gcc (tm_defines) [avr]: Set from --with-double=,
> --with-long-double=.
> * config/avr/t-multilib: Remove.
> * config/avr/t-avr: Output of genmultilib.awk is now fully
> dynamically generated and no more part of the repo.
> (HAVE_DOUBLE_MULTILIB, HAVE_LONG_DOUBLE_MULTILIB): New variables.
> Pass them down to...
> * config/avr/genmultilib.awk: ...here and handle them.
> * gcc/config/avr/avr.opt (-mdouble=, avr_double). New option and var.
> (-mlong-double=, avr_long_double). New option and var.
> * common/config/avr/avr-common.c (opts.h): Include.
> (diagnostic.h): Include.
> (TARGET_OPTION_OPTIMIZATION_TABLE) -mdouble=: Set default as
> requested by --with-double=.
> -mlong-double=: Set default as requested by --with-long-double=.
> 
> (TARGET_OPTION_OPTIMIZATION_TABLE) -mdouble=, -mlong-double=:
> Set default as requested by --with-double=
> (TARGET_HANDLE_OPTION): Define to this...
> (avr_handle_option): ...new hook worker.
> * config/avr/avr.h (DOUBLE_TYPE_SIZE): Define to avr_double.
> (LONG_DOUBLE_TYPE_SIZE): Define to avr_long_double.
> (avr_double_lib): New proto for spec function.
> (EXTRA_SPEC_FUNCTIONS) double-lib: Add.
> (DRIVER_SELF_SPECS): Call %:double-lib.
> * config/avr/avr.c (avr_option_override): Assert
> sizeof(long double) = sizeof(double) for the target.
> * config/avr/avr-c.c (avr_cpu_cpp_builtins)
> [__HAVE_DOUBLE_MULTILIB__, __HAVE_LONG_DOUBLE_MULTILIB__]
> [__HAVE_DOUBLE64__, __HAVE_DOUBLE32__, __DEFAULT_DOUBLE__=]
> [__HAVE_LONG_DOUBLE64__, __HAVE_LONG_DOUBLE32__]
> [__HAVE_LONG_DOUBLE_IS_DOUBLE__, __DEFAULT_LONG_DOUBLE__=]:
> New built-in defined depending on --with-double=, --with-long-double=.
> * config/avr/driver-avr.c (avr_double_lib): New spec function.
> * doc/invoke.tex (AVR Options) -mdouble=,-mlong-double=: Doc.
> 
> libgcc/
> Support 64-bit double and 64-bit long double configurations.
> 
> PR target/92055
> * config/avr/t-avr (HOST_LIBGCC2_CFLAGS): Only add -DF=SF if
> long double is a 32-bit type.
> * config/avr/t-avrlibc: Copy double64 and long-double64
> multilib(s) from the vanilla one.
> * config/avr/t-copy-libgcc: New Makefile snip.
> 
OK
jeff

Re: [PATCH] bring -Warray-bounds closer to -Wstringop-overflow (PR91647, 91463, 91679)

2019-11-06 Thread Maciej W. Rozycki

On Fri, 1 Nov 2019, Martin Sebor wrote:

> Rebuilding the kernel with the updated patch results in the following
> breakdown of the two warnings (the numbers are total instances of each,
> unique instances, and files they come from):
> 
>-Wzero-length-bounds 49   46   13
>-Warray-bounds   45   148
> 
> The -Warray-bounds instances I checked look legitimate even though
> the code is some of them still looks benign.  I'm not sure there's
> a good way to relax the warning to sanction some of these abuses
> without also missing some bugs.  It might be worth looking into
> some more in stage 3, depending on the fallout during mass rebuild.
> 
> After bootstrapping on x86_64 and i385 and regtesting I committed
> the attached patch in r277728.

 It is what I believe has also broken glibc:

In file included from ../sysdeps/riscv/libc-tls.c:19:
../csu/libc-tls.c: In function '__libc_setup_tls':
../csu/libc-tls.c:209:30: error: array subscript 1 is outside the bounds of an 
interior zero-length array 'struct dtv_slotinfo[0]' [-Werror=zero-length-bounds]
  209 |   static_slotinfo.si.slotinfo[1].map = main_map;
  |   ~~~^~~
In file included from ../sysdeps/riscv/ldsodefs.h:46,
 from ../sysdeps/gnu/ldsodefs.h:46,
 from ../sysdeps/unix/sysv/linux/ldsodefs.h:25,
 from ../sysdeps/unix/sysv/linux/riscv/ldsodefs.h:22,
 from ../csu/libc-tls.c:21,
 from ../sysdeps/riscv/libc-tls.c:19:
../sysdeps/generic/ldsodefs.h:423:7: note: while referencing 'slotinfo'
  423 | } slotinfo[0];
  |   ^~~~
cc1: all warnings being treated as errors

(here in a RISC-V build).

 Has anybody looked yet into how the breakage could possibly be addressed?

  Maciej

C++ PATCH for nested requirements normalization

2019-11-06 Thread Jason Merrill

Andrew sent me this patch separately, for handling nested requirements 
by normalizing immediately.  It still had a regression in 
concepts-pr67148.C at that point, due to an issue with 
tsubst_pack_expansion: normalizing


requires Same >();

meant substituting the template arguments of the concept through the 
requirement, including the (Args&&)args... pack expansion.  Here we were 
deciding we needed to use PACK_EXPANSION_EXTRA_ARGS, and then getting 
confused when we later tried to substitute actual template arguments 
into the pack expansion.  I first fixed that substitution, but then it 
occurred to me that we shouldn't have needed to use 
PACK_EXPANSION_EXTRA_ARGS in the first place, we should be able to 
substitute directly into the pack expansion pattern.  Which worked.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 05ddc6fb420104088a5dccfcd31e426ffbd2dc17
Author: Andrew Sutton 
Date:   Wed Nov 6 10:12:36 2019 -0500

Use satisfaction with nested requirements.

gcc/cp/

2019-11-06  Andrew Sutton  

* constraint.cc (build_parameter_mapping): Use
current_template_parms when the declaration is not available.
(norm_info::norm_info) Make explicit.
(normalize_constraint_expression): Factor into a separate overload
that takes arguments, and use that in the original function.
(tsubst_nested_requirement): Use satisfy_constraint instead of
trying to evaluate this as a constant expression.
(finish_nested_requirement): Keep the normalized constraint and the
original normalization arguments with the requirement.
(diagnose_nested_requirement): Use satisfy_constraint. Tentatively
implement more comprehensive diagnostics, but do not enable.
* parser.c (cp_parser_requires_expression): Relax requirement that
requires-expressions can live only inside templates.
* pt.c (any_template_parm_r): Look into type of PARM_DECL.

2019-11-06  Jason Merrill  

* pt.c (use_pack_expansion_extra_args_p): Still do substitution if
all packs are simple pack expansions.
(add_extra_args): Check that the extra args aren't dependent.

gcc/testsuite/
* lib/prune.exp: Ignore "in requirements" in diagnostics.
* g++.dg/cpp2a/requires-18.C: New test.
* g++.dg/cpp2a/requires-19.C: New test.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index db2a30ced7c..00b59a90868 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -98,6 +98,8 @@ struct subst_info
   tree in_decl;
 };
 
+static tree satisfy_constraint (tree, tree, subst_info);
+
 /* True if T is known to be some type other than bool. Note that this
is false for dependent types and errors.  */
 
@@ -564,6 +566,15 @@ build_parameter_mapping (tree expr, tree args, tree decl)
   tree parms = DECL_TEMPLATE_PARMS (decl);
   depth = TREE_INT_CST_LOW (TREE_PURPOSE (parms));
 }
+  else if (current_template_parms)
+{
+  /* TODO: This should probably be the only case, but because the
+	 point of declaration of concepts is currently set after the
+	 initializer, the template parameter lists are not available
+	 when normalizing concept definitions, hence the case above.  */
+  depth = TMPL_PARMS_DEPTH (current_template_parms);
+}
+
   tree parms = find_template_parameters (expr, depth);
   tree map = map_arguments (parms, args);
   return map;
@@ -592,7 +603,7 @@ parameter_mapping_equivalent_p (tree t1, tree t2)
 
 struct norm_info : subst_info
 {
-  norm_info(tsubst_flags_t complain)
+  explicit norm_info (tsubst_flags_t complain)
 : subst_info (tf_warning_or_error | complain, NULL_TREE),
   context()
   {}
@@ -872,6 +883,20 @@ normalize_nontemplate_requirements (tree decl, bool diag = false)
   return get_normalized_constraints_from_decl (decl, diag);
 }
 
+/* Normalize an EXPR as a constraint using ARGS.  */
+
+static tree
+normalize_constraint_expression (tree expr, tree args, bool diag = false)
+{
+  if (!expr || expr == error_mark_node)
+return expr;
+  ++processing_template_decl;
+  norm_info info (diag ? tf_norm : tf_none);
+  tree norm = get_normalized_constraints (expr, args, info);
+  --processing_template_decl;
+  return norm;
+}
+
 /* Normalize an EXPR as a constraint.  */
 
 static tree
@@ -891,11 +916,7 @@ normalize_constraint_expression (tree expr, bool diag = false)
   else
 args = NULL_TREE;
 
-  ++processing_template_decl;
-  norm_info info (diag ? tf_norm : tf_none);
-  tree norm = get_normalized_constraints (expr, args, info);
-  --processing_template_decl;
-  return norm;
+  return normalize_constraint_expression (expr, args, diag);
 }
 
 /* 17.4.1.2p2. Two constraints are identical if they are formed
@@ -1930,33 +1951,14 @@ tsubst_compound_requirement (tree t, tree args, subst_info info)
 static tree

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-06 Thread Jiufu Guo

Jiufu Guo  writes:

> Segher Boessenkool  writes:
>
>> Hi!
>>
>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
>>> --- gcc/common/config/rs6000/rs6000-common.c(revision 277765)
>>> +++ gcc/common/config/rs6000/rs6000-common.c(working copy)
>>> @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
>>>  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
>>>  /* Enable -fsched-pressure for first pass instruction scheduling.  */
>>>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
>>> -{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
>>> +/* Enable  -funroll-loops with -munroll-small-loops.  */
>>> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>>> +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>>
>> I guess the comment should say what we enable here more than the generic
>> code does.  Something like
>>
>> /* Enable -funroll-loops at -O2 already.  Also enable
>>-munroll-small-loops.  */
>
> updated to:
> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
> loops at -O2 and above by default.   */
> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
> /* Disable -fweb and -frename-registers to avoid bad impacts.  */
> { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
> { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
/* Enable -munroll-only-small-loops with -funroll-loops to unroll small
loops at -O2 and above by default.   */
{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
/* -fweb and -frename-registers are useless in general, turn them off.  */
{ OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
{ OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },

A little better?
Updated patch is attached at the end of this mail, maybe it is easy for
review.  :)

Jiufu,
BR.

>
> Thanks for more comments to make it better!
>
>>
>>> +  /* Explicit funroll-loops turns -munroll-small-loops off.
>>> +Implicit funroll-loops does not turn fweb or frename-registers on.  */
>>> +  if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
>>> + || (global_options_set.x_flag_unroll_all_loops
>>> + && flag_unroll_all_loops))
>>> {
>>> + if (!global_options_set.x_unroll_small_loops)
>>> +   unroll_small_loops = 0;
>>> +   }
>>> +  else
>>> +   {
>>>   if (!global_options_set.x_flag_web)
>>> +   flag_web = 0;
>>>   if (!global_options_set.x_flag_rename_registers)
>>> +   flag_rename_registers = 0;
>>> }
>>
>> So unroll-small-loops should better be called unroll-only-small-loops?
> Thanks again.  Right, unroll-only-small-loops is better.
>>
>> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
>> Isn't it good and/or bad in all the same cases, implicit and explicit?
> Good question!
>
> Turn off them by default, because they do not help too much for generic
> cases, and did not see performance gain on SPEC2017. And turning them
> off will help to consistent with previous -O2/-O3 which does not turn
> them on. This could avoid regressions in test cases.
> If do not turn them on with -funroll-loops, user may see performance
> difference on some cases.  For example, in SPEC peak which option
> contains -funroll-loops, it may need to add -frename-registers manually
> for some benchmarks.
>
> Any sugguestions? Do you think it is a good idea to disable them by
> default, and let user to add them when they are helpful? e.g. add them
> for some benchmarks at `peak`.
>
>>
>>> +munroll-small-loops
>>> +Target Undocumented Var(unroll_small_loops) Init(0) Save
>>> +Use conservative small loop unrolling.
>>
>> Undocumented means undocumented, so you don't have a comment string in
>> here.  But you can comment it:
>>
>> ; Use conservative small loop unrolling.
> Thanks again for you kindly review!
>
> Jiufu,
>
> BR.
>>
>>
>> Segher
gcc/
2019-11-06  Jiufu Guo  

PR tree-optimization/88760
* gcc/config/rs6000/rs6000.opt (-munroll-small-loops): New option.
* gcc/common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
Turn on -funroll-loops and -munroll-small-loops.
[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
Turn off -munroll-small-loops, turn on -fweb and -frename-registers
for explicit funroll-loops.
(TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
(rs6000_loop_unroll_adjust): Define it.  Use -munroll-small-loops.

gcc.testsuite/
2019-11-06  Jiufu Guo  

PR tree-optimization/88760
* gcc.dg/pr59643.c:

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-06 Thread Martin Sebor


On 11/6/19 2:06 PM, Martin Sebor wrote:

On 11/6/19 1:39 PM, Jeff Law wrote:

On 11/6/19 1:27 PM, Martin Sebor wrote:

On 11/6/19 11:55 AM, Jeff Law wrote:

On 11/6/19 11:00 AM, Martin Sebor wrote:

The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=] 


123 |   *p = 0;
|   ~~~^~~
note: destination object declared here
 45 |   char b[N];
|^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
 45 |   char b[N];
|^

Tested on x86_64-linux.

Martin

gcc-store-offset.diff

gcc/ChangeLog:

 * builtins.c (compute_objsize): Add an argument and set it to 
offset

 into destination.
 * builtins.h (compute_objsize): Add an argument.
 * tree-object-size.c (addr_object_size): Add an argument and 
set it

 to offset into destination.
 (compute_builtin_object_size): Same.
 * tree-object-size.h (compute_builtin_object_size): Add an 
argument.

 * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
 to offset into destination.
 (maybe_warn_overflow): New function.
 (handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

 * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
messages.
 * g++.dg/warn/Wstringop-overflow-3.C: Same.
 * gcc.dg/Wstringop-overflow-17.c: Same.




Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c    (revision 277886)
+++ gcc/tree-ssa-strlen.c    (working copy)
@@ -189,6 +189,52 @@ struct laststmt_struct
   static int get_stridx_plus_constant (strinfo *, unsigned
HOST_WIDE_INT, tree);
   static void handle_builtin_stxncpy (built_in_function,
gimple_stmt_iterator *);
   +/* Sets MINMAX to either the constant value or the range VAL is in
+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals = 
NULL)

+{
+  if (tree_fits_uhwi_p (val))
+    {
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }
+
+  if (TREE_CODE (val) != SSA_NAME)
+    return false;
+
+  if (rvals)
+    {
+  gimple *def = SSA_NAME_DEF_STMT (val);
+  if (gimple_assign_single_p (def)
+  && gimple_assign_rhs_code (def) == INTEGER_CST)
+    {
+  /* get_value_range returns [0, N] for constant assignments.  */
+  val = gimple_assign_rhs1 (def);
+  minmax[0] = minmax[1] = wi::to_wide (val);
+  return true;
+    }

Umm, something seems really off with this hunk.  If the SSA_NAME is set
via a simple constant assignment, then the range ought to be a 
singleton

ie [CONST,CONST].   Is there are particular test were this is not true?

The only way offhand I could see this happening is if originally the 
RHS

wasn't a constant, but due to optimizations it either simplified into a
constant or a constant was propagated into an SSA_NAME appearing on the
RHS.  This would have to happen between the last range analysis and the
point where you're making this query.


Yes, I think that's right.  Here's an example where it happens:

   void f (void)
   {
 char s[] = "1234";
 unsigned n = strlen (s);
 char vla[n];   // or malloc (n)
 vla[n] = 0;    // n = [4, 4]
 ...
   }

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.

Hmm.  Are we calling set_range_info in that case?  That goes behind the
back of pass instance of vr_values.  If so, that might argue we want to
be setting it in vr_values too.


No, set_range_info is only called for ranges.  In this case,
handle_builtin_strlen replaces the strlen() call with 4:

   s = "1234";
   _1 = __builtin_strlen ();
   n_2 = (unsigned

Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2019-11-06 Thread Julian Brown

Hi!

This is a new patch that takes a different approach to the last-posted
version in this thread. I have combined the previous incremental patches
on the og9 branch that culminated in the following patch:

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01220.html

From that email, the following explanation was given of the previous
approaches taken as to how the partitioning level for OpenACC "private"
variables was calculated and represented in the compiler, and how this
patch differs:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was
   deemed too early; the final determination about which loops are
   assigned which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

 https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism
   levels to omp-low.c (but kept the "oacc gangprivate" attribute and
   the NVPTX backend parts). However (as mentioned in that mail), this
   is still too early: in fact the final determination of the
   parallelism level for each loop (especially for loops without
   explicit gang/worker/vector clauses) does not happen until we reach
   the device compiler, in the oaccloops pass.

This patch builds on the second approach, but delays fixing the
parallelism level of each "private" variable (those that are
addressable, and declared private using OpenACC clauses or by defining
them in a scope nested within a compute region or partitioned loop)
until the oaccdevlow pass. This is done by adding a new internal UNIQUE
function (OACC_PRIVATE) that lists (the address of) each private
variable as an argument. These new internal functions fit into the
existing scheme for demarking OpenACC loops, as described in comments
in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight
mechanism as a followup).

I realised I omitted to make some of the cosmetic changes Thomas
highlighted below on starting to write this email, but I can do that
(with suitable retesting) if desired before committing.

On Wed, 12 Jun 2019 20:42:16 +0100
Julian Brown  wrote:

> On Wed, 12 Jun 2019 13:57:22 +0200
> Thomas Schwinge  wrote:
> 
> > I understand right that this will address some aspects of PR90115
> > "OpenACC: predetermined private levels for variables declared in
> > blocks" (so please mention that one in the ChangeLog updates, and
> > commit log), but it doesn't address all of these aspects (and see
> > also Cesar's list in
> > ),
> > and also not yet PR90114 "Predetermined private levels for variables
> > declared in OpenACC accelerator routines"?  
> 
> There's two possible reasons for placing gang-private variables in
> shared memory: correct implementation of OpenACC semantics, or
> optimisation, since shared memory is faster than local memory (on
> NVidia devices). Handling of private variables is intimately tied
> with the execution model for gangs/workers/vectors implemented by a
> particular target: for PTX, that's handled in the backend using a
> broadcasting/neutering scheme.
> 
> That is sufficient for code that e.g. sets a variable in worker-single
> mode and expects to use the value in worker-partitioned mode. The
> difficulty (semantics-wise) comes when the user wants to do something
> like an atomic operation in worker-partitioned mode and expects a
> worker-single variable to be shared across each partitioned worker.
> Forcing use of shared memory for such variables makes that work
> properly.
> 
> It is *not* sufficient for the next level down, though -- expecting to
> perform atomic operations in vector-partitioned mode on a variable
> that is declared in vector-single mode, i.e. so that it is supposed to
> be shared across all vector elements. AFAIK, that's not
> straightforward, and we haven't attempted to implement it.
> 
> I think the original motivation for this patch was optimisation,
> though -- typical code won't try to use atomics in this way. Cesar's
> list of caveats that you linked to seems to support that notion.

After a little further investigation, I came to the conclusion that the
patch was always originally about correctness, but optimisation. But
that's largely academic now.

> > I guess I'm not terribly happy with the 'goacc.expand_accel_var'
> > name. Using different "memories" for specially tagged DECLs seems
> > to be a pretty generic concept (address spaces?), and...  
> 
> This is partly another NVPTX weirdness -- the target uses address
> spaces, but only within the backend, and without using the generic
> middle-end address space machinery. The other reason for using an
> attribute instead of assigning an address space is

Re: [PATCH] bring -Warray-bounds closer to -Wstringop-overflow (PR91647, 91463, 91679)

2019-11-06 Thread Jeff Law

On 11/6/19 4:09 PM, Maciej W. Rozycki wrote:
> On Fri, 1 Nov 2019, Martin Sebor wrote:
> 
>> Rebuilding the kernel with the updated patch results in the following
>> breakdown of the two warnings (the numbers are total instances of each,
>> unique instances, and files they come from):
>>
>>-Wzero-length-bounds 49   46   13
>>-Warray-bounds   45   148
>>
>> The -Warray-bounds instances I checked look legitimate even though
>> the code is some of them still looks benign.  I'm not sure there's
>> a good way to relax the warning to sanction some of these abuses
>> without also missing some bugs.  It might be worth looking into
>> some more in stage 3, depending on the fallout during mass rebuild.
>>
>> After bootstrapping on x86_64 and i385 and regtesting I committed
>> the attached patch in r277728.
> 
>  It is what I believe has also broken glibc:
> 
> In file included from ../sysdeps/riscv/libc-tls.c:19:
> ../csu/libc-tls.c: In function '__libc_setup_tls':
> ../csu/libc-tls.c:209:30: error: array subscript 1 is outside the bounds of 
> an interior zero-length array 'struct dtv_slotinfo[0]' 
> [-Werror=zero-length-bounds]
>   209 |   static_slotinfo.si.slotinfo[1].map = main_map;
>   |   ~~~^~~
> In file included from ../sysdeps/riscv/ldsodefs.h:46,
>  from ../sysdeps/gnu/ldsodefs.h:46,
>  from ../sysdeps/unix/sysv/linux/ldsodefs.h:25,
>  from ../sysdeps/unix/sysv/linux/riscv/ldsodefs.h:22,
>  from ../csu/libc-tls.c:21,
>  from ../sysdeps/riscv/libc-tls.c:19:
> ../sysdeps/generic/ldsodefs.h:423:7: note: while referencing 'slotinfo'
>   423 | } slotinfo[0];
>   |   ^~~~
> cc1: all warnings being treated as errors
> 
> (here in a RISC-V build).
> 
>  Has anybody looked yet into how the breakage could possibly be addressed?
Yea, Florian posted patches over the weekend to fix glibc.  They're
still going through the review/update cycle.

jeff

Move string concatenation for C into the parser

2019-11-06 Thread Joseph Myers

This patch is another piece of preparation for C2x attributes support.

C2x attributes require unbounded lookahead in the parser, because the
token sequence '[[' that starts a C2x attribute is also valid in
Objective-C in some of the same contexts, so it is necessary to see
whether the matching ']]' are consecutive tokens or not to determine
whether those tokens start an attribute.

Unbounded lookahead means lexing an unbounded number of tokens before
they are parsed.  c_lex_one_token does various context-sensitive
processing of tokens that cannot be done at that lookahead time,
because it depends on information (such as whether particular
identifiers are typedefs) that may be different at the time it is
relevant than at the time the lookahead is needed (recall that more or
less arbitrary C code, including declarations and statements, can
appear inside expressions in GNU C).

Most of that context-sensitive processing is not a problem, simply
because it is not needed for lookahead purposes so can be deferred
until the tokens lexed during lookahead are parsed.  However, the
earliest piece of context-sensitive processing is the handling of
string literals based on flags passed to c_lex_with_flags, which
determine whether adjacent literals are concatenated and whether
translation to the execution character set occurs.

Because the choice of whether to translate to the execution character
set is context-sensitive, this means that unbounded lookahead requires
the C parser to move to the approach used by the C++ parser, where
string literals are generally not translated or concatenated from
within c_lex_with_flags, but only later in the parser once it knows
whether translation is needed.  (Translation requires the tokens in
their form before concatenation.)

Thus, this patch makes that change to the C parser.  Flags in the
parser are still used for two special cases similar to C++: the
handling of an initial #pragma pch_preprocess, and arranging for
strings inside attributes not to be translated (the latter is made
more logically correct by saving and restoring the flags, as in the
C++ parser, rather than assuming that the state outside the attribute
was always to translate string literals, which might not be the case
in corner cases involving declarations and attributes inside
attributes).

The consequent change to pragma_lex to use c_parser_string_literal
makes it disallow wide strings and disable translation in that
context, which also follows C++ and is more logically correct than the
previous state without special handling in that regard.  Translation
to the execution character set is always disabled when string
constants are handled in the GIMPLE parser.

Although the handling of strings is now a lot closer to that in C++,
there are still some differences, in particular regarding the handling
of locations.  See c-c++-common/Wformat-pr88257.c, which has different
expected multiline diagnostic output for C and C++, for example; I'm
not sure whether the C or C++ output is better there (C++ has a more
complete range than C, C mentions a macro definition location that C++
doesn't), but I tried to keep the locations the same as those
previously used by the C front end, as far as possible, to minimize
the testsuite changes needed, rather than possibly making them closer
to those used with C++.

The only changes needed for tests of user-visible diagnostics were for
the wording of one diagnostic changing to match C++ (as a consequence
of having a check for wide strings based on a flag in a general
string-handling function rather than in a function specific to asm).
However, although locations are extremely similar to what they were
before, I couldn't make them completely identical in all cases.  (My
understanding of the implementation reason for the differences is as
follows: lex_string uses src_loc from each cpp_token; the C parser is
using the virtual location from cpp_get_token_with_location as called
by c_lex_with_flags, and while passing that through
linemap_resolve_location with LRK_MACRO_DEFINITION_LOCATION, as this
patch does, produces something very close to what lex_string uses,
it's not completely identical in some cases.)

This results in changes being needed to two of the gcc.dg/plugin tests
that use a plugin to test details of how string locations are handled.
Because the tests being changed are for ICEs and the only change is to
the details of the particular non-user-visible error that code gives
in cases it can't handle (one involving __FILE__, one involving a
string literal from stringizing), I think it's OK to change that
non-user-visible error and that the new errors are no worse than the
old ones.  So these particular errors are now different for C and C++
(some other messages in those tests already had differences between C
and C++).

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2019-11-07  Joseph Myers  

* c-parser.c (c_parser): Remove

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-06 Thread Jiufu Guo

Jiufu Guo  writes:

Hi Segher,

I updated the patch for option name at the end of this mail. Thanks for
review in advance.

> Jiufu Guo  writes:
>
>> Segher Boessenkool  writes:
>>
>>> Hi!
>>>
>>> On Tue, Nov 05, 2019 at 04:33:23PM +0800, Jiufu Guo wrote:
 --- gcc/common/config/rs6000/rs6000-common.c   (revision 277765)
 +++ gcc/common/config/rs6000/rs6000-common.c   (working copy)
 @@ -35,7 +35,9 @@ static const struct default_options rs6000_option_
  { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
  /* Enable -fsched-pressure for first pass instruction scheduling.  */
  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
 -{ OPT_LEVELS_2_PLUS, OPT_funroll_loops, NULL, 1 },
 +/* Enable  -funroll-loops with -munroll-small-loops.  */
 +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
 +{ OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>>>
>>> I guess the comment should say what we enable here more than the generic
>>> code does.  Something like
>>>
>>> /* Enable -funroll-loops at -O2 already.  Also enable
>>>-munroll-small-loops.  */
>>
>> updated to:
>> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>> loops at -O2 and above by default.   */
>> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
>> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_small_loops, NULL, 1 },
>> /* Disable -fweb and -frename-registers to avoid bad impacts.  */
>> { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
>> { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
> loops at -O2 and above by default.   */
> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
> { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 },
> /* -fweb and -frename-registers are useless in general, turn them off.  */
> { OPT_LEVELS_ALL, OPT_fweb, NULL, 0 },
> { OPT_LEVELS_ALL, OPT_frename_registers, NULL, 0 },
>
> A little better?
> Updated patch is attached at the end of this mail, maybe it is easy for
> review.  :)
>
> Jiufu,
> BR.
>
>>
>> Thanks for more comments to make it better!
>>
>>>
 +  /* Explicit funroll-loops turns -munroll-small-loops off.
 +   Implicit funroll-loops does not turn fweb or frename-registers on.  */
 +  if ((global_options_set.x_flag_unroll_loops && flag_unroll_loops)
 +|| (global_options_set.x_flag_unroll_all_loops
 +&& flag_unroll_all_loops))
{
 +if (!global_options_set.x_unroll_small_loops)
 +  unroll_small_loops = 0;
 +  }
 +  else
 +  {
  if (!global_options_set.x_flag_web)
 +  flag_web = 0;
  if (!global_options_set.x_flag_rename_registers)
 +  flag_rename_registers = 0;
}
>>>
>>> So unroll-small-loops should better be called unroll-only-small-loops?
>> Thanks again.  Right, unroll-only-small-loops is better.
>>>
>>> Why does explicit unroll-loops turns on web and rnreg?  Why only explicit?
>>> Isn't it good and/or bad in all the same cases, implicit and explicit?
>> Good question!
>>
>> Turn off them by default, because they do not help too much for generic
>> cases, and did not see performance gain on SPEC2017. And turning them
>> off will help to consistent with previous -O2/-O3 which does not turn
>> them on. This could avoid regressions in test cases.
>> If do not turn them on with -funroll-loops, user may see performance
>> difference on some cases.  For example, in SPEC peak which option
>> contains -funroll-loops, it may need to add -frename-registers manually
>> for some benchmarks.
>>
>> Any sugguestions? Do you think it is a good idea to disable them by
>> default, and let user to add them when they are helpful? e.g. add them
>> for some benchmarks at `peak`.
>>
>>>
 +munroll-small-loops
 +Target Undocumented Var(unroll_small_loops) Init(0) Save
 +Use conservative small loop unrolling.
>>>
>>> Undocumented means undocumented, so you don't have a comment string in
>>> here.  But you can comment it:
>>>
>>> ; Use conservative small loop unrolling.
>> Thanks again for you kindly review!
>>
>> Jiufu,
>>
>> BR.
>>>
>>>
>>> Segher

gcc/
2019-11-07  Jiufu Guo  

PR tree-optimization/88760
* gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
* gcc/common/config/rs6000/rs6000-common.c
(rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
Turn on -funroll-loops and -munroll-only-small-loops.
[OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
Turn off -munroll-only-small-loops, turn on -fweb and -frename-registers
for explicit -funroll-loops.

[PATCH] debug-counter for GIMPLE unrolling

2019-11-06 Thread Richard Biener



Helps to narrow down some bugs.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-07  Richard Biener  

* dbgcnt.def (gimple_unroll): New.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Check
gimple_unroll debug counter before applying transform.
(try_peel_loop): Likewise.

Index: gcc/dbgcnt.def
===
--- gcc/dbgcnt.def  (revision 277873)
+++ gcc/dbgcnt.def  (working copy)
@@ -198,3 +198,4 @@ DEBUG_COUNTER (vect_slp)
 DEBUG_COUNTER (dom_unreachable_edges)
 DEBUG_COUNTER (match)
 DEBUG_COUNTER (store_merging)
+DEBUG_COUNTER (gimple_unroll)
Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 277873)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
 #include "tree-cfgcleanup.h"
 #include "builtins.h"
 #include "tree-ssa-sccvn.h"
+#include "dbgcnt.h"
 
 /* Specifies types of loops that may be unrolled.  */
 
@@ -884,6 +887,9 @@ try_unroll_loop_completely (class loop *
}
}
 
+  if (!dbg_cnt (gimple_unroll))
+   return false;
+
   initialize_original_copy_tables ();
   auto_sbitmap wont_exit (n_unroll + 1);
   if (exit && niter
@@ -1074,6 +1080,9 @@ try_peel_loop (class loop *loop,
   return false;
 }
 
+  if (!dbg_cnt (gimple_unroll))
+return false;
+
   /* Duplicate possibly eliminating the exits.  */
   initialize_original_copy_tables ();
   auto_sbitmap wont_exit (npeel + 1);

Re: [PATCH target/92295] Fix inefficient vector constructor

2019-11-06 Thread Hongtao Liu

Ping!

On Sat, Nov 2, 2019 at 9:38 PM Hongtao Liu  wrote:
>
> Hi Jakub:
>   Could you help reviewing this patch.
>
> PS: Since this patch is related to vectors(avx512f), and Uros
> mentioned before that he has no intension to maintain avx512f.
>
> On Fri, Nov 1, 2019 at 9:12 AM Hongtao Liu  wrote:
> >
> > Hi uros:
> >   This patch is about to fix inefficient vector constructor.
> >   Currently in ix86_expand_vector_init_concat, vector are initialized
> > per 2 elements which can miss some optimization opportunity like
> > pr92295.
> >
> >   Bootstrap and i386 regression test is ok.
> >   Ok for trunk?
> >
> > Changelog
> > gcc/
> > PR target/92295
> > * config/i386/i386-expand.c (ix86_expand_vector_init_concat)
> > Enhance ix86_expand_vector_init_concat.
> >
> > gcc/testsuite
> > * gcc.target/i386/pr92295.c: New test.
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

1 2 >

1 - 100 of 103 matches

Mail list logo