Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener
On October 15, 2019 5:09:52 PM GMT+02:00, Peter Bergner  
wrote:
>On 10/15/19 4:32 AM, Richard Biener wrote:
>> I believe this is going to bite you exactly in the case you want the
>> opposite behavior.  If you have CUs compiled with defaults and
>> a specialized one with VSX that calls into generic compiled functions
>> you _do_ want to allow inlining into the VSX enabled routines.
>
>First off, there's nothing special about VSX/non-VSX here, so when I
>talk
>about VSX below, I'm really just using it as a stand-in for any option.
>
>So what you're saying is that a VSX enabled function that calls a
>non-VSX
>enabled function should be able to inline that non-VSX function.  That
>is
>what the current code allows and I agree that should still be allowed.
>My extra code only disallows that scenario *IF* the user explicitly
>said
>DO NOT compile the callee function with VSX.  It still allows it to be
>inlined if the callee was compiled without VSX implicitly / by default,
>so I don't think my patch disallows the scenario you mention above.
>
>If the user explicitly said not to compile a function with a particular
>option, how can we justify ignoring that request just because we're
>inlining it?  We don't do that for the out of line version of that
>callee
>function.

You can probably tell whether there's an explicit -mno-vsx on the command line 
I wonder how you can tell apart explicit vs. Implicit in the LTO context where 
the option is represented as target attribute on the function.

>> How can it be fatal to inline a non-VSX function into a VSX one?
>
>I can think of scenarios where it could be fatal (again, VSX is just a
>stand-in for any option), but maybe the user used -mno-vsx for
>performance
>reasons or maybe this is kernel code and the user knows this function
>will
>be run with VSX hardware disabled or ...
>
>
>Peter



[PATCH] Avoid writing to trees during streaming

2019-10-15 Thread Richard Biener


Honza figured that variably_modified_type_p uses TREE_VISITED
to not run into an Ada abdomination.  That causes havoc during
WPA streaming which happens in multiple forked processes and
thus causes a lot of COW faulting and resident memory usage.
It also stands in the way of using threads here.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu.

Honza - does this look OK?

Thanks,
Richard.

2019-10-15  Richard Biener  

* lto-streamer-out.c (lto_variably_modified_type_p): New.
(tree_is_indexable): Use it.
* tree-streamer-out.c (pack_ts_type_common_value_fields):
Stream variably_modified_type_p as TYPE_LANG_FLAG_0.
* tree-streamer-in.c (unpack_ts_type_common_value_fields): Likewise.

Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 276985)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -120,6 +120,17 @@ output_type_ref (struct output_block *ob
   lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
 }
 
+/* Wrapper around variably_modified_type_p avoiding type modification
+   during WPA streaming.  */
+
+static bool
+lto_variably_modified_type_p (tree type)
+{
+  return (in_lto_p
+ ? TYPE_LANG_FLAG_0 (TYPE_MAIN_VARIANT (type))
+ : variably_modified_type_p (type, NULL_TREE));
+}
+
 
 /* Return true if tree node T is written to various tables.  For these
nodes, we sometimes want to write their phyiscal representation
@@ -134,7 +145,7 @@ tree_is_indexable (tree t)
  definition.  */
   if ((TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
   && DECL_CONTEXT (t))
-return variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)), NULL_TREE);
+return lto_variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)));
   /* IMPORTED_DECL is put into BLOCK and thus it never can be shared.
  We should no longer need to stream it.  */
   else if (TREE_CODE (t) == IMPORTED_DECL)
@@ -154,10 +165,10 @@ tree_is_indexable (tree t)
  them we have to localize their members as well.
  ???  In theory that includes non-FIELD_DECLs as well.  */
   else if (TYPE_P (t)
-  && variably_modified_type_p (t, NULL_TREE))
+  && lto_variably_modified_type_p (t))
 return false;
   else if (TREE_CODE (t) == FIELD_DECL
-  && variably_modified_type_p (DECL_CONTEXT (t), NULL_TREE))
+  && lto_variably_modified_type_p (DECL_CONTEXT (t)))
 return false;
   else
 return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
Index: gcc/tree-streamer-in.c
===
--- gcc/tree-streamer-in.c  (revision 276985)
+++ gcc/tree-streamer-in.c  (working copy)
@@ -378,6 +378,7 @@ unpack_ts_type_common_value_fields (stru
   TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1);
+  TYPE_LANG_FLAG_0 (expr) = (unsigned) bp_unpack_value (bp, 1);
   if (RECORD_OR_UNION_TYPE_P (expr))
 {
   TYPE_TRANSPARENT_AGGR (expr) = (unsigned) bp_unpack_value (bp, 1);
Index: gcc/tree-streamer-out.c
===
--- gcc/tree-streamer-out.c (revision 276985)
+++ gcc/tree-streamer-out.c (working copy)
@@ -326,6 +326,12 @@ pack_ts_type_common_value_fields (struct
   bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
   bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
   bp_pack_value (bp, TYPE_READONLY (expr), 1);
+  unsigned vla_p;
+  if (in_lto_p)
+vla_p = TYPE_LANG_FLAG_0 (TYPE_MAIN_VARIANT (expr));
+  else
+vla_p = variably_modified_type_p (expr, NULL_TREE);
+  bp_pack_value (bp, vla_p, 1);
   /* We used to stream TYPE_ALIAS_SET == 0 information to let frontends mark
  types that are opaque for TBAA.  This however did not work as intended,
  because TYPE_ALIAS_SET == 0 was regularly lost in type merging.  */


Re: [SLP] SLP vectorization: vectorize vector constructors

2019-10-15 Thread Richard Biener
rgument to a call or as source of a store.  So I'd simply
remove this check (and the function).

Thanks,
Richard.

> Currently SLP vectorization can build SLP trees starting from reductions or
> from group stores. This patch adds a third starting point: vector 
> constructors.
> 
> For the following test case (compiled with -O3):
> 
> char g_d[1024], g_s1[1024], g_s2[1024];
> void test_loop(void)
> {
>    char d = g_d, s1 = g_s1, *s2 = g_s2;
>    for ( int y = 0; y < 128; y++ )
> 
>    {
>      for ( int x = 0; x < 16; x++ )
>    d[x] = s1[x] + s2[x];
>      d += 16;
>    }
> }
> 
> before patch:
> test_loop:
> .LFB0:
>      .cfi_startproc
>      adrp    x0, g_s1
>      adrp    x2, g_s2
>      add x3, x0, :lo12:g_s1
>      add x4, x2, :lo12:g_s2
>      ldrb    w7, [x2, #:lo12:g_s2]
>      ldrb    w1, [x0, #:lo12:g_s1]
>      adrp    x0, g_d
>      ldrb    w6, [x4, 1]
>      add x0, x0, :lo12:g_d
>      ldrb    w5, [x3, 1]
>      add w1, w1, w7
>      fmov    s0, w1
>      ldrb    w7, [x4, 2]
>      add w5, w5, w6
>      ldrb    w1, [x3, 2]
>      ldrb    w6, [x4, 3]
>      add x2, x0, 2048
>      ins v0.b[1], w5
>      add w1, w1, w7
>      ldrb    w7, [x3, 3]
>      ldrb    w5, [x4, 4]
>      add w7, w7, w6
>      ldrb    w6, [x3, 4]
>      ins v0.b[2], w1
>      ldrb    w8, [x4, 5]
>      add w6, w6, w5
>      ldrb    w5, [x3, 5]
>      ldrb    w9, [x4, 6]
>      add w5, w5, w8
>      ldrb    w1, [x3, 6]
>      ins v0.b[3], w7
>      ldrb    w8, [x4, 7]
>      add w1, w1, w9
>      ldrb    w11, [x3, 7]
>      ldrb    w7, [x4, 8]
>      add w11, w11, w8
>      ldrb    w10, [x3, 8]
>      ins v0.b[4], w6
>      ldrb    w8, [x4, 9]
>      add w10, w10, w7
>      ldrb    w9, [x3, 9]
>      ldrb    w7, [x4, 10]
>      add w9, w9, w8
>      ldrb    w8, [x3, 10]
>      ins v0.b[5], w5
>      ldrb    w6, [x4, 11]
>      add w8, w8, w7
>      ldrb    w7, [x3, 11]
>      ldrb    w5, [x4, 12]
>      add w7, w7, w6
>      ldrb    w6, [x3, 12]
>      ins v0.b[6], w1
>      ldrb    w12, [x4, 13]
>      add w6, w6, w5
>      ldrb    w5, [x3, 13]
>      ldrb    w1, [x3, 14]
>      add w5, w5, w12
>      ldrb    w13, [x4, 14]
>      ins v0.b[7], w11
>      ldrb    w12, [x4, 15]
>      add w4, w1, w13
>      ldrb    w1, [x3, 15]
>      add w1, w1, w12
>      ins v0.b[8], w10
>      ins v0.b[9], w9
>      ins v0.b[10], w8
>      ins v0.b[11], w7
>      ins v0.b[12], w6
>      ins v0.b[13], w5
>      ins v0.b[14], w4
>      ins v0.b[15], w1
>      .p2align 3,,7
> .L2:
>      str q0, [x0], 16
>      cmp x2, x0
>      bne .L2
>      ret
>      .cfi_endproc
> .LFE0:
> 
> After patch:
> 
> test_loop:
> .LFB0:
>      .cfi_startproc
>      adrp    x3, g_s1
>      adrp    x2, g_s2
>      add x3, x3, :lo12:g_s1
>      add x2, x2, :lo12:g_s2
>      adrp    x0, g_d
>      add x0, x0, :lo12:g_d
>      add x1, x0, 2048
>      ldr q1, [x2]
>      ldr q0, [x3]
>      add v0.16b, v0.16b, v1.16b
>      .p2align 3,,7
> .L2:
>      str q0, [x0], 16
>      cmp x0, x1
>      bne .L2
>      ret
>      .cfi_endproc
> .LFE0:
> 
> 
> 2019-10-11  Joel Hutton  joel.hut...@arm.com
> 
>      * tree-vect-slp.c (vect_analyze_slp_instance): Add case for vector 
> constructors.
>      (vect_bb_slp_scalar_cost): Likewise.
>      (vect_ssa_use_outside_bb): New function.
>      (vect_slp_check_for_constructors): New function.
>      (vect_slp_analyze_bb_1): Add check for vector constructors.
>      (vect_schedule_slp_instance): Add case to fixup vector constructor 
> stmt.
>      * tree-vectorizer.h (SLP_INSTANCE_ROOT_STMT): New field.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-11  Joel Hutton  joel.hut...@arm.com
> 
>      * gcc.dg/vect/bb-slp-40.c: New test.
> 
> bootstrapped and regression tested on aarch64-none-linux-gnu
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener
On Tue, Oct 15, 2019 at 1:33 PM Segher Boessenkool
 wrote:
>
> On Tue, Oct 15, 2019 at 01:19:51PM +0200, Richard Biener wrote:
> > On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
> >  wrote:
> > > On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > > > I think we just need to fix the bug in the current logic when checking
> > > > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > > > for that, I think we just need to add a test that enforces that the
> > > > > caller's ISA flags match exactly the callee's flags, for those flags
> > > > > that were explicitly set in the callee.  The patch below seems to fix
> > > > > the issue (regtesting now).  Does this look like what we want?
> > > >
> > > > I believe this is going to bite you exactly in the case you want the
> > > > opposite behavior.  If you have CUs compiled with defaults and
> > > > a specialized one with VSX that calls into generic compiled functions
> > > > you _do_ want to allow inlining into the VSX enabled routines.
> > >
> > > Yes, but *not* inlining is relatively harmless, while inlining can be
> > > fatal.  I don't see how we can handle both scenarios optimally.
> >
> > How can it be fatal to inline a non-VSX function into a VSX one?
>
> Oh I misread, I thought it was the other way around.
>
> > > > Just
> > > > think of LTO, C++ and comdats - you'll get a random comdat entity
> > > > at link time for inlining - either from the VSX CU or the non-VSX one.
> > >
> > > This would make LTO totally unusable, with or without this patch?  
> > > Something
> > > else must be going on?
> >
> > It's the same without LTO - the linker will simply choose (randomly)
> > one of the comdats from one of the CUs providing it, not caring about
> > some built with and some without VSX.
>
> Hrm, so how does that ever work?

Possibly people are "not doing this"?  Aka have a program with a runtime
capability check for VSX, dispatch to std::vector/algorithm CUs
with/without -mvsx
and then link the result?  Because any instantiated template therein will end up
as a comdat...

Guess we still live in a mostly C world ;)

Richard.

>
> Segher


Re: [SVE] PR86753

2019-10-15 Thread Richard Biener
On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
 wrote:
>
> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
> >  wrote:
> > >
> > > Leaving the main review to Richard, just some comments...
> > >
> > > Prathamesh Kulkarni  writes:
> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> > > >
> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is 
> > > > true.
> > > >
> > > > +   For COND_EXPR if T comes from masked load, and is 
> > > > conditional
> > > > +   on C, we apply loop mask to result of vector comparison, if it's 
> > > > present.
> > > > +   Similarly for E, if it is conditional on !C.
> > > > +
> > > > Return true if STMT_INFO is vectorizable in this way.  */
> > > >
> > > >  bool
> > >
> > > I think this is a bit misleading.  But IMO it'd be better not to have
> > > a comment here and just rely on the one in the main function body.
> > > This optimisation isn't really changing the vectorisation strategy,
> > > and the comment could easily get forgotten if things change in future.
> > >
> > > > [...]
> > > > @@ -,6 +10006,35 @@ vectorizable_condition (stmt_vec_info 
> > > > stmt_info, gimple_stmt_iterator *gsi,
> > > >/* Handle cond expr.  */
> > > >for (j = 0; j < ncopies; j++)
> > > >  {
> > > > +  tree loop_mask = NULL_TREE;
> > > > +  bool swap_cond_operands = false;
> > > > +
> > > > +  /* Look up if there is a loop mask associated with the
> > > > +  scalar cond, or it's inverse.  */
> > >
> > > Maybe:
> > >
> > >See whether another part of the vectorized code applies a loop
> > >mask to the condition, or to its inverse.
> > >
> > > > +
> > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > + {
> > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > vectype, j);
> > > > + }
> > > > +   else
> > > > + {
> > > > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > > +   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> > > > +   vectype, j);
> > > > +   cond_code = cond.code;
> > > > +   swap_cond_operands = true;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > >stmt_vec_info new_stmt_info = NULL;
> > > >if (j == 0)
> > > >   {
> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info 
> > > > stmt_info, gimple_stmt_iterator *gsi,
> > > >   }
> > > >   }
> > > >   }
> > > > +
> > > > +   /* If loop mask is present, then AND it with
> > >
> > > Maybe "If we decided to apply a loop mask, ..."
> > >
> > > > +  result of vec comparison, so later passes (fre4)
> > >
> > > Probably better not to name the pass -- could easily change in future.
> > >
> > > > +  will reuse the same condition used in masked load.
> > >
> > > Could be a masked store, or potentially other things too.
> > > So maybe just "will reuse the masked condition"?
> > >
> > > > +
> > > > +  For example:
> > > > +  for (int i = 0; i < 100; ++i)
> > > > +x[i] = y[i] ? z[i] : 10;
> > > > +
> > > > +  results in following optimized GIMPLE:
> > > > +
> > > > +  mask__35.8_43 = vect__4.7_41 != { 0, ... };
> > > > +  vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> > > > +  _19 = [base: z_12(D), index: ivtmp_56, step: 4, offset: 
> > > > 0B];
> > > > +  vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> > > > +  vect_iftmp.12_52 = VEC_COND_EXPR  > > > +vect_iftmp.11_47, { 10, 
> > > > ... }>;
> > > > +
> > > > +  instead of recomputing vec != { 0, ... } in vec_cond_expr  */
> > >
> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
> > > is the main goal, whereas we could do that just by forcing a three-operand
> > > COND_EXPR.  It's really more about making sure that vec != { 0, ... }
> > > and its masked form aren't both live at the same time.  So maybe:
> > >
> > >  instead of using a masked and unmasked forms of
> > >  vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD,
> > >  unmasked in the VEC_COND_EXPR).  */
> > >
> > Hi Richard,
> > Thanks for the suggestions, I have updated comments in the attached 

Re: Add a constant_range_value_p function (PR 92033)

2019-10-15 Thread Richard Biener
On Tue, Oct 15, 2019 at 12:35 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
> >  wrote:
> >>Richard Biener  writes:
> >>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
> >>>  wrote:
> >>>>
> >>>> The range-tracking code has a pretty hard-coded assumption that
> >>>> is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
> >>>> ADDR_EXPR".  It seems better to add a predicate specifically for
> >>>> that rather than contiually fight cases in which it can't handle
> >>>> other invariants.
> >>>>
> >>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>>
> >>> ICK.  Nobody is going to remember this new restriction and
> >>> constant_range_value_p reads like constant_value_range_p ;)
> >>>
> >>> Btw, is_gimple_invariant_address shouldn't have been exported,
> >>> it's only use could have used is_gimple_min_invariant...
> >>
> >>What do you think we should do instead?
> >
> > Just handle POLY_INT_CST in a few place to quickly enough drop to varying.
>
> OK, how about this?  Aldy's suggestion would be fine by me too,
> but I thought I'd try this first given Aldy's queasiness about
> allowing POLY_INT_CSTs further in.
>
> The main case in which this gives useful ranges is a lower bound
> of A + B * X becoming A when B >= 0.  E.g.:
>
>   (1) [32 + 16X, 100] -> [32, 100]
>   (2) [32 + 16X, 32 + 16X] -> [32, MAX]
>
> But the same thing can be useful for the upper bound with negative
> X coefficients.
>
> We can revisit this later if keeping a singleton range for (2)
> would be better.
>
> Tested as before.

Works for me.

Richard.

> Richard
>
>
> 2019-10-15  Richard Sandiford  
>
> gcc/
> PR middle-end/92033
> * poly-int.h (constant_lower_bound_with_limit): New function.
> (constant_upper_bound_with_limit): Likewise.
> * doc/poly-int.texi: Document them.
> * tree-vrp.c (value_range_base::set): Convert POLY_INT_CST bounds
> into the worst-case INTEGER_CST bounds.
>
> Index: gcc/poly-int.h
> ===
> --- gcc/poly-int.h  2019-07-10 19:41:26.395898027 +0100
> +++ gcc/poly-int.h  2019-10-15 11:30:14.099625553 +0100
> @@ -1528,6 +1528,29 @@ constant_lower_bound (const poly_int_pod
>return a.coeffs[0];
>  }
>
> +/* Return the constant lower bound of A, given that it is no less than B.  */
> +
> +template
> +inline POLY_CONST_COEFF (Ca, Cb)
> +constant_lower_bound_with_limit (const poly_int_pod , const Cb )
> +{
> +  if (known_ge (a, b))
> +return a.coeffs[0];
> +  return b;
> +}
> +
> +/* Return the constant upper bound of A, given that it is no greater
> +   than B.  */
> +
> +template
> +inline POLY_CONST_COEFF (Ca, Cb)
> +constant_upper_bound_with_limit (const poly_int_pod , const Cb )
> +{
> +  if (known_le (a, b))
> +return a.coeffs[0];
> +  return b;
> +}
> +
>  /* Return a value that is known to be no greater than A and B.  This
> will be the greatest lower bound for some indeterminate values but
> not necessarily for all.  */
> Index: gcc/doc/poly-int.texi
> ===
> --- gcc/doc/poly-int.texi   2019-03-08 18:14:25.333011645 +
> +++ gcc/doc/poly-int.texi   2019-10-15 11:30:14.099625553 +0100
> @@ -803,6 +803,18 @@ the assertion is known to hold.
>  @item constant_lower_bound (@var{a})
>  Assert that @var{a} is nonnegative and return the smallest value it can have.
>
> +@item constant_lower_bound_with_limit (@var{a}, @var{b})
> +Return the least value @var{a} can have, given that the context in
> +which @var{a} appears guarantees that the answer is no less than @var{b}.
> +In other words, the caller is asserting that @var{a} is greater than or
> +equal to @var{b} even if @samp{known_ge (@var{a}, @var{b})} doesn't hold.
> +
> +@item constant_upper_bound_with_limit (@var{a}, @var{b})
> +Return the greatest value @var{a} can have, given that the context in
> +which @var{a} appears guarantees that the answer is no greater than @var{b}.
> +In other words, the caller is asserting that @var{a} is less than or equal
> +to @var{b} even if @samp{known_le (@var{a}, @var{b})} doesn't hold.
> +
>  @item lower_bound (@var{a}, @var{b})
>  Return a value that is always less than or equal to both @var{a} and @var{b}.
>  It will be the grea

Re: [PATCH] More PR92046 fixes, make --param allow-store-data-races a -f option

2019-10-15 Thread Richard Biener
On Tue, 15 Oct 2019, Kyrill Tkachov wrote:

> Hi Richard,
> 
> On 10/15/19 8:17 AM, Richard Biener wrote:
> >
> > This makes allow-store-data-races adjustable per function by making it
> > a regular option rather than a --param.
> 
> 
> Note that the kernel has --param=allow-store-data-races=0 in its build flags.
> 
> I guess that will break unless they rename it to 
> -fno-allow-store-data-races?

Yes.  Or simply drop the --param since unless they happen to use -Ofast
their setting is the default.

Richard.

> Thanks,
> 
> Kyrill
> 
> 
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> >
> > Thanks,
> > Richard.
> >
> > 2019-10-15  Richard Biener  
> >
> >     PR middle-end/92046
> >     * common.opt (fallow-store-data-races): New.
> >     * params.def (PARAM_ALLOW_STORE_DATA_RACES): Remove.
> >     * params.h (ALLOW_STORE_DATA_RACES): Likewise.
> >     * doc/invoke.texi (fallow-store-data-races): Document.
> >     (--param allow-store-data-races): Remove docs.
> >     * opts.c (default_options_table): Enable -fallow-store-data-races
> >     at -Ofast.
> >     (default_options_optimization): Do not enable --param
> >     allow-store-data-races at -Ofast.
> >     * tree-if-conv.c (ifcvt_memrefs_wont_trap): Use
> > flag_store_data_races
> >     instead of PARAM_ALLOW_STORE_DATA_RACES.
> >     * tree-ssa-loop-im.c (execute_sm): Likewise.
> >
> >     * c-c++-common/cxxbitfields-3.c: Adjust.
> >     * c-c++-common/cxxbitfields-6.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-1.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-2.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-3.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-4.c: Likewise.
> >     * g++.dg/simulate-thread/bitfields-2.C: Likewise.
> >     * g++.dg/simulate-thread/bitfields.C: Likewise.
> >     * gcc.dg/lto/pr52097_0.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-2.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-3.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-4.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store.c: Likewise.
> >     * gcc.dg/tree-ssa/20050314-1.c: Likewise.
> >
> > Index: gcc/common.opt
> > ===
> > --- gcc/common.opt  (revision 276983)
> > +++ gcc/common.opt  (working copy)
> > @@ -993,6 +993,10 @@ Align the start of loops.
> >  falign-loops=
> >  Common RejectNegative Joined Var(str_align_loops) Optimization
> >
> > +fallow-store-data-races
> > +Common Report Var(flag_store_data_races) Optimization
> > +Allow the compiler to introduce new data races on stores.
> > +
> >  fargument-alias
> >  Common Ignore
> >  Does nothing. Preserved for backward compatibility.
> > Index: gcc/doc/invoke.texi
> > ===
> > --- gcc/doc/invoke.texi (revision 276983)
> > +++ gcc/doc/invoke.texi (working copy)
> > @@ -406,6 +406,7 @@ Objective-C and Objective-C++ Dialects}.
> >  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> >  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> >  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> > +-fallow-store-data-races @gol
> >  -fassociative-math  -fauto-profile -fauto-profile[=@var{path}] @gol
> >  -fauto-inc-dec  -fbranch-probabilities @gol
> >  -fcaller-saves @gol
> > @@ -8463,9 +8464,9 @@ designed to reduce code size.
> >  Disregard strict standards compliance.  @option{-Ofast} enables all
> >  @option{-O3} optimizations.  It also enables optimizations that are not
> >  valid for all standard-compliant programs.
> > -It turns on @option{-ffast-math} and the Fortran-specific
> > -@option{-fstack-arrays}, unless @option{-fmax-stack-var-size} is
> > -specified, and @option{-fno-protect-parens}.
> > +It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
> > +and the Fortran-specific @option{-fstack-arrays}, unless
> > +@option{-fmax-stack-var-size} is specified, and
> > @option{-fno-protect-parens}.
> >
> >  @item -Og
> >  @opindex Og
> > @@ -10227,6 +10228,12 @@ The maximum allowed @var{n} option value
> >
> >  Enabled at levels @option{-O2}, @option{-O3}.
> >
> > +@item -fallow-store-data-races
> > +@opindex fallow-stor

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener
On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
 wrote:
>
> On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > I think we just need to fix the bug in the current logic when checking
> > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > for that, I think we just need to add a test that enforces that the
> > > caller's ISA flags match exactly the callee's flags, for those flags
> > > that were explicitly set in the callee.  The patch below seems to fix
> > > the issue (regtesting now).  Does this look like what we want?
> >
> > I believe this is going to bite you exactly in the case you want the
> > opposite behavior.  If you have CUs compiled with defaults and
> > a specialized one with VSX that calls into generic compiled functions
> > you _do_ want to allow inlining into the VSX enabled routines.
>
> Yes, but *not* inlining is relatively harmless, while inlining can be
> fatal.  I don't see how we can handle both scenarios optimally.

How can it be fatal to inline a non-VSX function into a VSX one?

> > Just
> > think of LTO, C++ and comdats - you'll get a random comdat entity
> > at link time for inlining - either from the VSX CU or the non-VSX one.
>
> This would make LTO totally unusable, with or without this patch?  Something
> else must be going on?

It's the same without LTO - the linker will simply choose (randomly)
one of the comdats from one of the CUs providing it, not caring about
some built with and some without VSX.

Richard.

>
> Segher


[PATCH] Fix PR92048

2019-10-15 Thread Richard Biener


Committed.

Richard.

2019-10-15  Richard Biener  

PR testsuite/92048
* gcc.dg/vect/fast-math-vect-pr29925.c: Avoid unrolling of
inner loop.

Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c
===
--- gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c  (revision 276983)
+++ gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c  (working copy)
@@ -13,6 +13,7 @@
for (i=0;i

[PATCH] Fix PR92094

2019-10-15 Thread Richard Biener


The following fixes vectorization of nested cycles when the nested
cycle only constists of a PHI node.  As in the previous fix a
nested cycle only consists of the PHI, it doesn't necessarily have
another stmt only participating in that cycle (in this case it
participates in another nested cycle).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-15  Richard Biener  

PR tree-optimization/92094
* tree-vect-loop.c (vectorizable_reduction): For nested cycles
do not adjust the reduction definition def type.
* tree-vect-stmts.c (vect_transform_stmt): Verify the scalar stmt
defines the latch argument of the PHI.

* gfortran.dg/pr92094.f90: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276983)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5742,20 +5751,9 @@ vectorizable_reduction (stmt_vec_info stmt_info, s
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
 {
   if (is_a  (stmt_info->stmt))
-   {
- /* Analysis for double-reduction is done on the outer
-loop PHI, nested cycles have no further restrictions.  */
- STMT_VINFO_TYPE (stmt_info) = cycle_phi_info_type;
- /* For nested cycles we want to let regular vectorizable_*
-routines handle code-generation.  */
- if (STMT_VINFO_DEF_TYPE (reduc_info) != vect_double_reduction_def)
-   {
- stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
- STMT_VINFO_DEF_TYPE (stmt_info) = vect_internal_def;
- STMT_VINFO_DEF_TYPE (vect_stmt_to_vectorize (stmt_info))
-   = vect_internal_def;
-   }
-   }
+   /* Analysis for double-reduction is done on the outer
+  loop PHI, nested cycles have no further restrictions.  */
+   STMT_VINFO_TYPE (stmt_info) = cycle_phi_info_type;
   else
STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
   return true;
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 276983)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10906,13 +10906,16 @@ vect_transform_stmt (stmt_vec_info stmt_info, gimp
   && STMT_VINFO_REDUC_TYPE (reduc_info) != EXTRACT_LAST_REDUCTION)
 {
   gphi *phi;
+  edge e;
   if (!slp_node
  && (phi = dyn_cast 
  (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt))
  && dominated_by_p (CDI_DOMINATORS,
-gimple_bb (orig_stmt_info->stmt), gimple_bb (phi)))
+gimple_bb (orig_stmt_info->stmt), gimple_bb (phi))
+ && (e = loop_latch_edge (gimple_bb (phi)->loop_father))
+ && (PHI_ARG_DEF_FROM_EDGE (phi, e)
+ == gimple_get_lhs (orig_stmt_info->stmt)))
{
- edge e = loop_latch_edge (gimple_bb (phi)->loop_father);
  stmt_vec_info phi_info
= STMT_VINFO_VEC_STMT (STMT_VINFO_REDUC_DEF (orig_stmt_info));
  stmt_vec_info vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
@@ -10932,7 +10935,7 @@ vect_transform_stmt (stmt_vec_info stmt_info, gimp
{
  slp_tree phi_node = slp_node_instance->reduc_phis;
  gphi *phi = as_a  (SLP_TREE_SCALAR_STMTS (phi_node)[0]->stmt);
- edge e = loop_latch_edge (gimple_bb (phi)->loop_father);
+ e = loop_latch_edge (gimple_bb (phi)->loop_father);
  gcc_assert (SLP_TREE_VEC_STMTS (phi_node).length ()
  == SLP_TREE_VEC_STMTS (slp_node).length ());
  for (unsigned i = 0; i < SLP_TREE_VEC_STMTS (phi_node).length (); ++i)
Index: gcc/testsuite/gfortran.dg/pr92094.f90
===
--- gcc/testsuite/gfortran.dg/pr92094.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92094.f90   (working copy)
@@ -0,0 +1,28 @@
+! { dg-do compile }
+! { dg-options "-O3" }
+  subroutine hesfcn(n, x, h, ldh)
+  integer n,ldh
+  double precision x(n), h(ldh)
+
+  integer i,j,k,kj
+  double precision th,u1,u2,v2
+ 
+  kj = 0
+  do 770 j = 1, n
+ kj = kj - j
+ do 760 k = 1, j
+kj = kj + 1
+v2 = 2 * x(k) - 1
+u1 = 0
+u2 = 2
+do 750 i = 1, n
+   h(kj) = h(kj) + u2
+   th = 4 * v2 + u2 - u1
+   u1 = u2
+   u2 = th
+   th = v2 - 1
+  750   continue
+  760continue
+  770 continue
+
+  end


Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener
On Tue, Oct 15, 2019 at 2:18 AM Peter Bergner  wrote:
>
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute.
> >
> > I agree, and that is what we already do afaik.
>
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do.  It just has some bugs.
>
>
> > But in this case, the callee explicitly disables something (-mno-vsx),
> > while the caller has it enabled (all modern cpus have VSX).  If it ends
> > up being inlined, it will get VSX insns generated for it.  Which is what
> > Jiu Fu's patch aims to prevent.
> >
> > So you are saying the GCC policy is that you should use noinline on the
> > callee in such cases?
>
> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.

There's a function attribute for this.

> I think we just need to fix the bug in the current logic when checking
> whether the caller's ISA flags supports the callee's ISA flags. ...and
> for that, I think we just need to add a test that enforces that the
> caller's ISA flags match exactly the callee's flags, for those flags
> that were explicitly set in the callee.  The patch below seems to fix
> the issue (regtesting now).  Does this look like what we want?

I believe this is going to bite you exactly in the case you want the
opposite behavior.  If you have CUs compiled with defaults and
a specialized one with VSX that calls into generic compiled functions
you _do_ want to allow inlining into the VSX enabled routines.  Just
think of LTO, C++ and comdats - you'll get a random comdat entity
at link time for inlining - either from the VSX CU or the non-VSX one.

Richard.

> Peter
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
> options.
>
> gcc.testsuite/
> * gcc.target/powerpc/pr70010.c: New test.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c  (revision 276975)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c
>else
>  {
>struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
> +  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
>struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
> +  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
> +  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
>
> -  /* Callee's options should a subset of the caller's, i.e. a vsx 
> function
> -can inline an altivec function but a non-vsx function can't inline a
> -vsx function.  */
> -  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> - == callee_opts->x_rs6000_isa_flags)
> +  /* The callee's options must be a subset of the caller's options, i.e.
> +a vsx function may inline an altivec function, but a non-vsx function
> +must not inline a vsx function.  However, for those options that the
> +callee has explicitly set, then we must enforce that the callee's
> +and caller's options match exactly; see PR70010.  */
> +  if (((caller_isa & callee_isa) == callee_isa)
> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
> ret = true;
>  }
>
> Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr70010.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c  (working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */
> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */
> +
> +typedef int vec_t __attribute__((vector_size(16)));
> +
> +static vec_t
> +__attribute__((__target__("no-vsx&

Re: [PATCH] RFA (gimplify.h) Fix incorrect cp/ use of get_formal_tmp_var.

2019-10-15 Thread Richard Biener
On Mon, Oct 14, 2019 at 10:25 PM Jason Merrill  wrote:
>
> The comment for get_formal_tmp_var says that it shouldn't be used for
> expressions whose value might change between initialization and use, and in
> this case we're creating a temporary precisely because the value might
> change, so we should use get_initialized_tmp_var instead.
>
> I also noticed that many callers of get_initialized_tmp_var pass NULL for
> post_p, so it seems appropriate to make it a default argument.  OK for trunk?

OK.

> Tested x86_64-pc-linux-gnu.
>
> gcc/cp/
> * cp-gimplify.c (cp_gimplify_expr): Use get_initialized_tmp_var.
> gcc/
> * gimplify.h (get_initialized_tmp_var): Add default argument to
> post_p.
> ---
>  gcc/gimplify.h   | 2 +-
>  gcc/cp/cp-gimplify.c | 2 +-
>  gcc/gimplify.c   | 5 +++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/gimplify.h b/gcc/gimplify.h
> index 1070006374a..6c997a769cd 100644
> --- a/gcc/gimplify.h
> +++ b/gcc/gimplify.h
> @@ -57,7 +57,7 @@ extern gbind *gimple_current_bind_expr (void);
>  extern vec gimple_bind_expr_stack (void);
>  extern void gimplify_and_add (tree, gimple_seq *);
>  extern tree get_formal_tmp_var (tree, gimple_seq *);
> -extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq *,
> +extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq * = NULL,
>  bool = true);
>  extern void declare_vars (tree, gimple *, bool);
>  extern void gimple_add_tmp_var (tree);
> diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
> index 154fa70ec06..80754b930b9 100644
> --- a/gcc/cp/cp-gimplify.c
> +++ b/gcc/cp/cp-gimplify.c
> @@ -767,7 +767,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
> && (TREE_CODE (op1) == CALL_EXPR
> || (SCALAR_TYPE_P (TREE_TYPE (op1))
> && !TREE_CONSTANT (op1
> -TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (op1, pre_p);
> +TREE_OPERAND (*expr_p, 1) = get_initialized_tmp_var (op1, pre_p);
>}
>ret = GS_OK;
>break;
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 836706961f3..7f9100ba97d 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -661,8 +661,9 @@ get_formal_tmp_var (tree val, gimple_seq *pre_p)
> are as in gimplify_expr.  */
>
>  tree
> -get_initialized_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
> -bool allow_ssa)
> +get_initialized_tmp_var (tree val, gimple_seq *pre_p,
> +gimple_seq *post_p /* = NULL */,
> +bool allow_ssa /* = true */)
>  {
>return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa);
>  }
>
> base-commit: aa45db50a034b266c338b55dee1b412178ea84a7
> --
> 2.18.1
>


Re: [PATCH] teach gengtype about 'mutable'

2019-10-15 Thread Richard Biener
Yes, it is. :)

On Mon, Oct 14, 2019 at 10:09 PM Nathan Sidwell  wrote:
>
> On 10/14/19 3:46 PM, Jeff Law wrote:
> > On 10/14/19 6:09 AM, Nathan Sidwell wrote:
> >> On 10/14/19 7:16 AM, Richard Biener wrote:
> >>> On Sun, Oct 13, 2019 at 4:45 PM Nathan Sidwell  wrote:
> >>>>
> >>>> In constifying some more of line-map I discovered gengtype didn't
> >>>> know mutable.
> >>>> Added thusly.
> >>>
> >>> mutable is bad.  Why do you want to use it?
> >>
> >> the line map info has a caching field.
> > Isn't that in fact a classic use case for mutable?
>
> Indeed it is.  I'm curious as to the 'mutable is bad' origin.  Is it
> similar to 'goto is bad'?
>
> nathan
>
>
> --
> Nathan Sidwell


[PATCH] More PR92046 fixes, make --param allow-store-data-races a -f option

2019-10-15 Thread Richard Biener


This makes allow-store-data-races adjustable per function by making it
a regular option rather than a --param.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2019-10-15  Richard Biener  

PR middle-end/92046
* common.opt (fallow-store-data-races): New.
* params.def (PARAM_ALLOW_STORE_DATA_RACES): Remove.
* params.h (ALLOW_STORE_DATA_RACES): Likewise.
* doc/invoke.texi (fallow-store-data-races): Document.
(--param allow-store-data-races): Remove docs.
* opts.c (default_options_table): Enable -fallow-store-data-races
at -Ofast.
(default_options_optimization): Do not enable --param
allow-store-data-races at -Ofast.
* tree-if-conv.c (ifcvt_memrefs_wont_trap): Use flag_store_data_races
instead of PARAM_ALLOW_STORE_DATA_RACES.
* tree-ssa-loop-im.c (execute_sm): Likewise.

* c-c++-common/cxxbitfields-3.c: Adjust.
* c-c++-common/cxxbitfields-6.c: Likewise.
* c-c++-common/simulate-thread/bitfields-1.c: Likewise.
* c-c++-common/simulate-thread/bitfields-2.c: Likewise.
* c-c++-common/simulate-thread/bitfields-3.c: Likewise.
* c-c++-common/simulate-thread/bitfields-4.c: Likewise.
* g++.dg/simulate-thread/bitfields-2.C: Likewise.
* g++.dg/simulate-thread/bitfields.C: Likewise.
* gcc.dg/lto/pr52097_0.c: Likewise.
* gcc.dg/simulate-thread/speculative-store-2.c: Likewise.
* gcc.dg/simulate-thread/speculative-store-3.c: Likewise.
* gcc.dg/simulate-thread/speculative-store-4.c: Likewise.
* gcc.dg/simulate-thread/speculative-store.c: Likewise.
* gcc.dg/tree-ssa/20050314-1.c: Likewise.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 276983)
+++ gcc/common.opt  (working copy)
@@ -993,6 +993,10 @@ Align the start of loops.
 falign-loops=
 Common RejectNegative Joined Var(str_align_loops) Optimization
 
+fallow-store-data-races
+Common Report Var(flag_store_data_races) Optimization
+Allow the compiler to introduce new data races on stores.
+
 fargument-alias
 Common Ignore
 Does nothing. Preserved for backward compatibility.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 276983)
+++ gcc/doc/invoke.texi (working copy)
@@ -406,6 +406,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
+-fallow-store-data-races @gol
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fcaller-saves @gol
@@ -8463,9 +8464,9 @@ designed to reduce code size.
 Disregard strict standards compliance.  @option{-Ofast} enables all
 @option{-O3} optimizations.  It also enables optimizations that are not
 valid for all standard-compliant programs.
-It turns on @option{-ffast-math} and the Fortran-specific
-@option{-fstack-arrays}, unless @option{-fmax-stack-var-size} is
-specified, and @option{-fno-protect-parens}.
+It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
+and the Fortran-specific @option{-fstack-arrays}, unless
+@option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
 
 @item -Og
 @opindex Og
@@ -10227,6 +10228,12 @@ The maximum allowed @var{n} option value
 
 Enabled at levels @option{-O2}, @option{-O3}.
 
+@item -fallow-store-data-races
+@opindex fallow-store-data-races
+Allow the compiler to introduce new data races on stores.
+
+Enabled at level @option{-Ofast}.
+
 @item -funit-at-a-time
 @opindex funit-at-a-time
 This option is left for compatibility reasons. @option{-funit-at-a-time}
@@ -12060,10 +12067,6 @@ The maximum number of conditional store
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.
 
-@item allow-store-data-races
-Allow optimizers to introduce new data races on stores.
-Set to 1 to allow, otherwise to 0.
-
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 276983)
+++ gcc/opts.c  (working copy)
@@ -564,6 +564,7 @@ static const struct default_options defa
 
 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },
 
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
@@ -671,13 +672,6 @@ default_options_optimization (struct gcc
  opt2 ? 100 : default_param_value (PARAM_MAX_FIELDS_FOR_FIELD_SENSITIVE),
  opts->x_param_values, opts_set->x_param_

Re: Add expr_callee_abi

2019-10-14 Thread Richard Biener
On October 14, 2019 2:53:36 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Fri, Oct 11, 2019 at 4:39 PM Richard Sandiford
>>  wrote:
>>>
>>> This turned out to be useful for the SVE PCS support, and is a
>natural
>>> tree-level analogue of insn_callee_abi.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> 2019-10-11  Richard Sandiford  
>>>
>>> gcc/
>>> * function-abi.h (expr_callee_abi): Declare.
>>> * function-abi.cc (expr_callee_abi): New function.
>>>
>>> Index: gcc/function-abi.h
>>> ===
>>> --- gcc/function-abi.h  2019-09-30 17:39:33.514597856 +0100
>>> +++ gcc/function-abi.h  2019-10-11 15:38:54.141605718 +0100
>>> @@ -315,5 +315,6 @@ call_clobbered_in_region_p (unsigned int
>>>  extern const predefined_function_abi _abi (const_tree);
>>>  extern function_abi fndecl_abi (const_tree);
>>>  extern function_abi insn_callee_abi (const rtx_insn *);
>>> +extern function_abi expr_callee_abi (const_tree);
>>>
>>>  #endif
>>> Index: gcc/function-abi.cc
>>> ===
>>> --- gcc/function-abi.cc 2019-09-30 17:39:33.514597856 +0100
>>> +++ gcc/function-abi.cc 2019-10-11 15:38:54.141605718 +0100
>>> @@ -229,3 +229,32 @@ insn_callee_abi (const rtx_insn *insn)
>>>
>>>return default_function_abi;
>>>  }
>>> +
>>> +/* Return the ABI of the function called by CALL_EXPR EXP.  Return
>the
>>> +   default ABI for erroneous calls.  */
>>> +
>>> +function_abi
>>> +expr_callee_abi (const_tree exp)
>>> +{
>>> +  gcc_assert (TREE_CODE (exp) == CALL_EXPR);
>>> +
>>> +  if (tree fndecl = get_callee_fndecl (exp))
>>> +return fndecl_abi (fndecl);
>>
>> Please not.  The ABI in effect on the call is that of
>> the type of CALL_EXPR_FN, what GIMPLE optimizers
>> propagated as fndecl here doesn't matter.
>
>expr_callee_abi is returning the ABI of the callee function, ignoring
>any additional effects involved in actually calling it.  It's supposed
>to take advantage of local IPA information where possible, e.g. by
>ignoring call-clobbered registers that the target doesn't actually
>clobber.
>
>So if get_callee_fndecl returns either the correct FUNCTION_DECL or
>null,
>using it gives better information than just using the type.  Failing to
>propagate the decl (or not having a decl to propagate) just means that
>we don't take advantage of the IPA information.

So we don't ever want to use this for stdcall vs. Msabi Or so where the 
indirect call function type might be correctly attributed while a function decl 
is not? 

>>> +
>>> +  tree callee = CALL_EXPR_FN (exp);
>>> +  if (callee == error_mark_node)
>>> +return default_function_abi;
>>> +
>>> +  tree type = TREE_TYPE (callee);
>>> +  if (type == error_mark_node)
>>> +return default_function_abi;
>>> +
>>> +  if (POINTER_TYPE_P (type))
>>> +{
>>> +  type = TREE_TYPE (type);
>>> +  if (type == error_mark_node)
>>> +   return default_function_abi;
>>> +}
>>
>> so when it's not a POINTER_TYPE (it always shold be!)
>> then you're handing arbitrary types to fntype_abi.
>
>fntype_abi asserts that TYPE is an appropriate type.  There's no safe
>value we can return if it isn't.  (Well, apart from the error_mark_node
>cases above, where the assumption is that compilation is going to fail
>anyway.)
>
>I can change it to assert for POINTER_TYPE_P if that's guaranteed
>everywhere.

That's probably better. 

Thanks, 
Richard. 

>Thanks,
>Richard
>
>>
>>> +  return fntype_abi (type);
>>> +}



Re: Add a constant_range_value_p function (PR 92033)

2019-10-14 Thread Richard Biener
On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
>>  wrote:
>>>
>>> The range-tracking code has a pretty hard-coded assumption that
>>> is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
>>> ADDR_EXPR".  It seems better to add a predicate specifically for
>>> that rather than contiually fight cases in which it can't handle
>>> other invariants.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> ICK.  Nobody is going to remember this new restriction and
>> constant_range_value_p reads like constant_value_range_p ;)
>>
>> Btw, is_gimple_invariant_address shouldn't have been exported,
>> it's only use could have used is_gimple_min_invariant...
>
>What do you think we should do instead?

Just handle POLY_INT_CST in a few place to quickly enough drop to varying. 

Richard. 

>Richard
>
>>
>> Richard.
>>
>>> Richard
>>>
>>>
>>> 2019-10-11  Richard Sandiford  
>>>
>>> gcc/
>>> PR tree-optimization/92033
>>> * tree-vrp.h (constant_range_value_p): Declare.
>>> * tree-vrp.c (constant_range_value_p): New function.
>>> (value_range_base::symbolic_p,
>value_range_base::singleton_p)
>>> (get_single_symbol, compare_values_warnv, intersect_ranges)
>>> (value_range_base::normalize_symbolics): Use it instead of
>>> is_gimple_min_invariant.
>>> (simplify_stmt_for_jump_threading): Likewise.
>>> * vr-values.c (symbolic_range_based_on_p, valid_value_p):
>Likewise.
>>> (vr_values::op_with_constant_singleton_value_range):
>Likewise.
>>> (vr_values::extract_range_from_binary_expr): Likewise.
>>> (vr_values::extract_range_from_unary_expr): Likewise.
>>> (vr_values::extract_range_from_cond_expr): Likewise.
>>> (vr_values::extract_range_from_comparison): Likewise.
>>> (vr_values::extract_range_from_assignment): Likewise.
>>> (vr_values::adjust_range_with_scev, vrp_valueize): Likewise.
>>> (vr_values::vrp_visit_assignment_or_call): Likewise.
>>> (vr_values::vrp_evaluate_conditional): Likewise.
>>> (vr_values::simplify_bit_ops_using_ranges): Likewise.
>>> (test_for_singularity): Likewise.
>>> (vr_values::simplify_cond_using_ranges_1): Likewise.
>>>
>>> Index: gcc/tree-vrp.h
>>> ===
>>> --- gcc/tree-vrp.h  2019-10-08 09:23:31.282533990 +0100
>>> +++ gcc/tree-vrp.h  2019-10-11 15:41:20.380576059 +0100
>>> @@ -284,6 +284,7 @@ value_range_base::supports_type_p (tree
>>>return false;
>>>  }
>>>
>>> +extern bool constant_range_value_p (const_tree);
>>>  extern void register_edge_assert_for (tree, edge, enum tree_code,
>>>   tree, tree, vec
>&);
>>>  extern bool stmt_interesting_for_vrp (gimple *);
>>> Index: gcc/tree-vrp.c
>>> ===
>>> --- gcc/tree-vrp.c  2019-10-08 09:23:31.282533990 +0100
>>> +++ gcc/tree-vrp.c  2019-10-11 15:41:20.380576059 +0100
>>> @@ -78,6 +78,18 @@ ranges_from_anti_range (const value_rang
>>> for still active basic-blocks.  */
>>>  static sbitmap *live;
>>>
>>> +/* Return true if VALUE is considered constant for range tracking.
>>> +   This is stricter than is_gimple_min_invariant and should be
>>> +   used instead of it in range-related code.  */
>>> +
>>> +bool
>>> +constant_range_value_p (const_tree value)
>>> +{
>>> +  return (TREE_CODE (value) == INTEGER_CST
>>> + || (TREE_CODE (value) == ADDR_EXPR
>>> + && is_gimple_invariant_address (value)));
>>> +}
>>> +
>>>  void
>>>  value_range::set_equiv (bitmap equiv)
>>>  {
>>> @@ -273,8 +285,8 @@ value_range_base::symbolic_p () const
>>>  {
>>>return (!varying_p ()
>>>   && !undefined_p ()
>>> - && (!is_gimple_min_invariant (m_min)
>>> - || !is_gimple_min_invariant (m_max)));
>>> + && (!constant_range_value_p (m_min)
>>> + || !constant_range_valu

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-14 Thread Richard Biener
On October 14, 2019 5:31:58 PM GMT+02:00, Peter Bergner  
wrote:
>On 10/12/19 3:46 AM, Segher Boessenkool wrote:
>> Two spaces after a period.  How about something like
>> 
>>   /* Callee's options should be a subset of the caller's.  Also,
>a function
>>   without VSX enabled should not be inlined into one with VSX
>enabled,
>>   because it may be important it is disabled there; see PR70010.  */
>> 
>> It's not clear to me why this is important, and what makes -mvsx
>different
>> from all other similar options?
>
>I agree, there is nothing special about VSX here and the other similar
>options
>like Altivec, HTM, etc., etc. should all be handled similarly.
>
>I agree with your other comment that we should be looking at explicit
>option
>usage versus default options.  However, the way we now implement
>default CPU,
>the gcc driver always passes a -mcpu= option to cc1 no matter if the
>user
>used -mcpu= or not, so -mcpu= will always looks like an explicit
>option.
>So when -mcpu=power[789] is passed to cc1 (via explicit user usage or
>default
>cpu), does that look like -mvsx was also explicitly used?  I'm guessing
>not.
>
>So if we have a caller compiled with -mcpu=power8 (VSX and Altivec are
>implicitly
>enabled) and a callee compiled with -mcpu=power6 (VSX and Altivec is
>not enabled
>...implicitly), do we allow inlining?  I would say we shouldn't, but
>the VSX
>and Altivec flags in the callee are a subset of the caller's flags.  It
>must
>be that the ISA* flags in rs6000_isa_flags that save us from not
>inlining?
>
>Therefore, I'd say that the callee's flags should be a subset of the
>caller's
>flags as the current code does now, but we should be also checking that
>the
>callee doesn't have an explicitly used option flag(s) that conflicts
>with
>the callers flags (implicit or explicit).  That means the caller's
>flags
>must match exactly the callee's flags, for those flags that were
>explicitly
>set in the callee.

The general case should be that if the caller ISA supports the callee one then 
inlining is OK. If this is not wanted in some cases then there are options like 
using a noinline attribute. 

>
>Peter



Re: [PATCH] Fix PR92046

2019-10-14 Thread Richard Biener
On October 14, 2019 4:53:02 PM GMT+02:00, Christophe Lyon 
 wrote:
>On Fri, 11 Oct 2019 at 12:43, Richard Biener  wrote:
>
>> On Fri, 11 Oct 2019, Rainer Orth wrote:
>>
>> > Hi Christophe,
>> >
>> > > On Thu, 10 Oct 2019 at 16:01, Richard Biener 
>> wrote:
>> > >
>> > >>
>> > >> The following fixes a few param adjustments that are made based
>on
>> > >> per-function adjustable flags by moving the adjustments to their
>> > >> users.  Semantics change in some minor ways but that's allowed
>> > >> for --params.
>> > >>
>> > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> > >>
>> > >> Hi,
>> > >
>> > > This generates several regressions.
>> > > On aarch64:
>> > > FAIL:  gcc.target/aarch64/vect_fp16_1.c scan-assembler-times
>> > > fadd\tv[0-9]+.8h 2
>> > >
>> > > on arm-linux-gnueabihf:
>> > > FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects
>> > >  scan-tree-dump-times vect "vectorized 1 loops" 1
>> > > FAIL: gcc.dg/vect/vect-align-1.c scan-tree-dump-times vect
>"vectorized
>> 1
>> > > loops" 1
>> > > FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects
>> > >  scan-tree-dump-times vect "vectorized 1 loops" 1
>> > > FAIL: gcc.dg/vect/vect-align-2.c scan-tree-dump-times vect
>"vectorized
>> 1
>> > > loops" 1
>> > >
>> > > on armeb-linux-gnueabihf, many (316) like:
>> > > FAIL: gcc.dg/vect/O3-vect-pr34223.c scan-tree-dump-times vect
>> "vectorized 1
>> > > loops" 1
>> > > FAIL: gcc.dg/vect/fast-math-pr35982.c scan-tree-dump-times vect
>> "vectorized
>> > > 1 loops" 1
>> > >
>> > > still on armeb-linux-gnueabihf:
>> > > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++14  (test for
>warnings,
>> line )
>> > > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++17  (test for
>warnings,
>> line )
>> > > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++2a  (test for
>warnings,
>> line )
>> > > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++98  (test for
>warnings,
>> line )
>> > > g++.dg/vect/pr33426-ivdep-3.cc(test for warnings, line )
>> > > g++.dg/vect/pr33426-ivdep-4.cc(test for warnings, line )
>> > > g++.dg/vect/pr33426-ivdep.cc  -std=c++14  (test for warnings,
>line
>> )
>> > > g++.dg/vect/pr33426-ivdep.cc  -std=c++17  (test for warnings,
>line
>> )
>> > > g++.dg/vect/pr33426-ivdep.cc  -std=c++2a  (test for warnings,
>line
>> )
>> > > g++.dg/vect/pr33426-ivdep.cc  -std=c++98  (test for warnings,
>line
>> )
>> > >
>> > > gfortran.dg/vect/no-vfa-pr32377.f90   -O  
>scan-tree-dump-times
>> vect
>> > > "vectorized 2 loops" 1
>> > > gfortran.dg/vect/pr19049.f90   -O   scan-tree-dump-times vect
>> > > "vectorized 1 loops" 1
>> > > gfortran.dg/vect/pr32377.f90   -O   scan-tree-dump-times vect
>> > > "vectorized 2 loops" 1
>> > > gfortran.dg/vect/vect-2.f90   -O   scan-tree-dump-times vect
>> > > "vectorized 3 loops" 1
>> > > gfortran.dg/vect/vect-3.f90   -O   scan-tree-dump-times vect
>> "Alignment
>> > > of access forced using versioning" 3
>> > > gfortran.dg/vect/vect-4.f90   -O   scan-tree-dump-times vect
>> "accesses
>> > > have the same alignment." 1
>> > > gfortran.dg/vect/vect-4.f90   -O   scan-tree-dump-times vect
>> > > "vectorized 1 loops" 1
>> > > gfortran.dg/vect/vect-5.f90   -O   scan-tree-dump-times vect
>> "Alignment
>> > > of access forced using versioning." 2
>> > > gfortran.dg/vect/vect-5.f90   -O   scan-tree-dump-times vect
>> > > "vectorized 1 loops" 1
>> >
>> > that's PR tree-optimization/92066, also seen on sparc, powerpc64,
>and
>> > ia64.
>>
>> Hmm, OK.  There's one obvious bug fixed below, other than that I have
>> to investigate in more detail.
>>
>> Committed as obvious.
>>
>> Hi Richard,
>
>This patch caused another regression on armeb:
>FAIL: gcc.dg/vect/vect-multitypes-11.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 1 loops" 1
>FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect
>"vectorized 1 loops" 1
>FAIL: gcc.dg/vect/vect-multitypes-12.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorized 1 loops" 1
>FAIL: gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect
>"vectorized 1 loops" 1

Are the other ones fixed though? 

Richard. 

>Christophe
>
>
> Richard.
>>
>> 2019-10-11  Richard Biener  
>>
>> PR tree-optimization/92066
>> PR tree-optimization/92046
>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
>> Fix bogus cost model check.
>>
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   (revision 276858)
>> +++ gcc/tree-vect-data-refs.c   (working copy)
>> @@ -2179,7 +2179,7 @@
>>do_versioning
>>  = (optimize_loop_nest_for_speed_p (loop)
>> && !loop->inner /* FORNOW */
>> -   && flag_vect_cost_model > VECT_COST_MODEL_CHEAP);
>> +   && flag_vect_cost_model != VECT_COST_MODEL_CHEAP);
>>
>>if (do_versioning)
>>  {
>>



[PATCH] More PR92046 fixes

2019-10-14 Thread Richard Biener


Two other params do similar scaling so fix them similarly.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-10-14  Richard Biener  

PR middle-end/92046
* dse.c (scan_insn): Use param max_active_local_stores.
(dse_step1): Get PARAM_MAX_DSE_ACTIVE_LOCAL_STORES and adjust
based on optimization level.
* loop-invariant.c (move_loop_invariants): Adjust
LOOP_INVARIANT_MAX_BBS_IN_LOOP based on optimization level.
* opts.c (default_options_optimization): Do not adjust
PARAM_MAX_DSE_ACTIVE_LOCAL_STORES and
LOOP_INVARIANT_MAX_BBS_IN_LOOP here.

Index: gcc/dse.c
===
--- gcc/dse.c   (revision 276961)
+++ gcc/dse.c   (working copy)
@@ -2401,7 +2401,7 @@ copy_fixed_regs (const_bitmap in)
non-register target.  */
 
 static void
-scan_insn (bb_info_t bb_info, rtx_insn *insn)
+scan_insn (bb_info_t bb_info, rtx_insn *insn, int max_active_local_stores)
 {
   rtx body;
   insn_info_type *insn_info = insn_info_type_pool.allocate ();
@@ -2523,8 +2523,7 @@ scan_insn (bb_info_t bb_info, rtx_insn *
fprintf (dump_file, "handling memset as BLKmode store\n");
  if (mems_found == 1)
{
- if (active_local_stores_len++
- >= PARAM_VALUE (PARAM_MAX_DSE_ACTIVE_LOCAL_STORES))
+ if (active_local_stores_len++ >= max_active_local_stores)
{
  active_local_stores_len = 1;
  active_local_stores = NULL;
@@ -2584,8 +2583,7 @@ scan_insn (bb_info_t bb_info, rtx_insn *
  it as cannot delete.  This simplifies the processing later.  */
   if (mems_found == 1)
 {
-  if (active_local_stores_len++
- >= PARAM_VALUE (PARAM_MAX_DSE_ACTIVE_LOCAL_STORES))
+  if (active_local_stores_len++ >= max_active_local_stores)
{
  active_local_stores_len = 1;
  active_local_stores = NULL;
@@ -2657,6 +2655,12 @@ dse_step1 (void)
   bitmap_set_bit (all_blocks, ENTRY_BLOCK);
   bitmap_set_bit (all_blocks, EXIT_BLOCK);
 
+  /* For -O1 reduce the maximum number of active local stores for RTL DSE
+ since this can consume huge amounts of memory (PR89115).  */
+  int max_active_local_stores = PARAM_VALUE 
(PARAM_MAX_DSE_ACTIVE_LOCAL_STORES);
+  if (optimize < 2)
+max_active_local_stores /= 10;
+
   FOR_ALL_BB_FN (bb, cfun)
 {
   insn_info_t ptr;
@@ -2684,7 +2688,7 @@ dse_step1 (void)
  FOR_BB_INSNS (bb, insn)
{
  if (INSN_P (insn))
-   scan_insn (bb_info, insn);
+   scan_insn (bb_info, insn, max_active_local_stores);
  cselib_process_insn (insn);
  if (INSN_P (insn))
df_simulate_one_insn_forwards (bb, insn, regs_live);
Index: gcc/loop-invariant.c
===
--- gcc/loop-invariant.c(revision 276961)
+++ gcc/loop-invariant.c(working copy)
@@ -2276,9 +2276,13 @@ move_loop_invariants (void)
   FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
 {
   curr_loop = loop;
-  /* move_single_loop_invariants for very large loops
-is time consuming and might need a lot of memory.  */
-  if (loop->num_nodes <= (unsigned) LOOP_INVARIANT_MAX_BBS_IN_LOOP)
+  /* move_single_loop_invariants for very large loops is time consuming
+and might need a lot of memory.  For -O1 only do loop invariant
+motion for very small loops.  */
+  unsigned max_bbs = LOOP_INVARIANT_MAX_BBS_IN_LOOP;
+  if (optimize < 2)
+   max_bbs /= 10;
+  if (loop->num_nodes <= max_bbs)
move_single_loop_invariants (loop);
 }
 
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 276961)
+++ gcc/opts.c  (working copy)
@@ -671,21 +671,6 @@ default_options_optimization (struct gcc
  opt2 ? 100 : default_param_value (PARAM_MAX_FIELDS_FOR_FIELD_SENSITIVE),
  opts->x_param_values, opts_set->x_param_values);
 
-  /* For -O1 only do loop invariant motion for very small loops.  */
-  maybe_set_param_value
-(PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
- opt2 ? default_param_value (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP)
- : default_param_value (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP) / 10,
- opts->x_param_values, opts_set->x_param_values);
-
-  /* For -O1 reduce the maximum number of active local stores for RTL DSE
- since this can consume huge amounts of memory (PR89115).  */
-  maybe_set_param_value
-(PARAM_MAX_DSE_ACTIVE_LOCAL_STORES,
- opt2 ? default_param_value (PARAM_MAX_DSE_ACTIVE_LOCAL_STORES)
- : default_param_value (PARAM_MAX_DSE_ACTIVE_LOCAL_STORES) / 10,
- opts->x_param_values, opts_set->x_param_values);
-
   /* At -Ofast,

[PATCH] Fix PR92069

2019-10-14 Thread Richard Biener


This fixes the PR by not setting vect_nested_cycle on the latch def
for nested cycles.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-14  Richard Biener  

PR tree-optimization/92069
* tree-vect-loop.c (vect_analyze_scalar_cycles_1): For nested
cycles do not set vect_nested_cycle on the latch definition.

* gcc.dg/torture/pr92069.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276950)
+++ gcc/tree-vect-loop.c(working copy)
@@ -584,7 +584,6 @@ vect_analyze_scalar_cycles_1 (loop_vec_i
 "Detected vectorizable nested cycle.\n");
 
   STMT_VINFO_DEF_TYPE (stmt_vinfo) = vect_nested_cycle;
- STMT_VINFO_DEF_TYPE (reduc_stmt_info) = vect_nested_cycle;
 }
   else
 {
Index: gcc/testsuite/gcc.dg/torture/pr92069.c
===
--- gcc/testsuite/gcc.dg/torture/pr92069.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92069.c  (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+unsigned int a, c, d;
+double b;
+void e()
+{
+  for (; d; d++)
+{
+  double f;
+  a = 2;
+  for (; a; a++)
+   {
+ c = b;
+ b = f;
+ f = c;
+   }
+}
+}


Re: Fix unchecked use of tree_to_uhwi in tree-ssa-strlen.c

2019-10-14 Thread Richard Biener
On Fri, Oct 11, 2019 at 4:47 PM Richard Sandiford
 wrote:
>
> r273783 introduced an unchecked use of tree_to_uhwi.  This is
> tested by the SVE ACLE patches, but could potentially trigger
> in non-SVE cases too.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2019-10-11  Richard Sandiford  
>
> gcc/
> * tree-ssa-strlen.c (count_nonzero_bytes): Check tree_fits_uhwi_p
> before using tree_to_uhwi.
>
> Index: gcc/tree-ssa-strlen.c
> ===
> --- gcc/tree-ssa-strlen.c   2019-10-11 15:43:51.127514545 +0100
> +++ gcc/tree-ssa-strlen.c   2019-10-11 15:46:11.718524445 +0100
> @@ -4026,10 +4026,10 @@ count_nonzero_bytes (tree exp, unsigned
>
>/* The size of the MEM_REF access determines the number of bytes.  */
>tree type = TREE_TYPE (exp);
> -  if (tree typesize = TYPE_SIZE_UNIT (type))
> -   nbytes = tree_to_uhwi (typesize);
> -  else
> +  tree typesize = TYPE_SIZE_UNIT (type);
> +  if (!typesize || !tree_fits_uhwi_p (typesize))
> return false;
> +  nbytes = tree_to_uhwi (typesize);
>
>/* Handle MEM_REF = SSA_NAME types of assignments.  */
>return count_nonzero_bytes (arg, offset, nbytes, lenrange, nulterm,


Re: Add a constant_range_value_p function (PR 92033)

2019-10-14 Thread Richard Biener
On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
 wrote:
>
> The range-tracking code has a pretty hard-coded assumption that
> is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
> ADDR_EXPR".  It seems better to add a predicate specifically for
> that rather than contiually fight cases in which it can't handle
> other invariants.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

ICK.  Nobody is going to remember this new restriction and
constant_range_value_p reads like constant_value_range_p ;)

Btw, is_gimple_invariant_address shouldn't have been exported,
it's only use could have used is_gimple_min_invariant...

Richard.

> Richard
>
>
> 2019-10-11  Richard Sandiford  
>
> gcc/
> PR tree-optimization/92033
> * tree-vrp.h (constant_range_value_p): Declare.
> * tree-vrp.c (constant_range_value_p): New function.
> (value_range_base::symbolic_p, value_range_base::singleton_p)
> (get_single_symbol, compare_values_warnv, intersect_ranges)
> (value_range_base::normalize_symbolics): Use it instead of
> is_gimple_min_invariant.
> (simplify_stmt_for_jump_threading): Likewise.
> * vr-values.c (symbolic_range_based_on_p, valid_value_p): Likewise.
> (vr_values::op_with_constant_singleton_value_range): Likewise.
> (vr_values::extract_range_from_binary_expr): Likewise.
> (vr_values::extract_range_from_unary_expr): Likewise.
> (vr_values::extract_range_from_cond_expr): Likewise.
> (vr_values::extract_range_from_comparison): Likewise.
> (vr_values::extract_range_from_assignment): Likewise.
> (vr_values::adjust_range_with_scev, vrp_valueize): Likewise.
> (vr_values::vrp_visit_assignment_or_call): Likewise.
> (vr_values::vrp_evaluate_conditional): Likewise.
> (vr_values::simplify_bit_ops_using_ranges): Likewise.
> (test_for_singularity): Likewise.
> (vr_values::simplify_cond_using_ranges_1): Likewise.
>
> Index: gcc/tree-vrp.h
> ===
> --- gcc/tree-vrp.h  2019-10-08 09:23:31.282533990 +0100
> +++ gcc/tree-vrp.h  2019-10-11 15:41:20.380576059 +0100
> @@ -284,6 +284,7 @@ value_range_base::supports_type_p (tree
>return false;
>  }
>
> +extern bool constant_range_value_p (const_tree);
>  extern void register_edge_assert_for (tree, edge, enum tree_code,
>   tree, tree, vec &);
>  extern bool stmt_interesting_for_vrp (gimple *);
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  2019-10-08 09:23:31.282533990 +0100
> +++ gcc/tree-vrp.c  2019-10-11 15:41:20.380576059 +0100
> @@ -78,6 +78,18 @@ ranges_from_anti_range (const value_rang
> for still active basic-blocks.  */
>  static sbitmap *live;
>
> +/* Return true if VALUE is considered constant for range tracking.
> +   This is stricter than is_gimple_min_invariant and should be
> +   used instead of it in range-related code.  */
> +
> +bool
> +constant_range_value_p (const_tree value)
> +{
> +  return (TREE_CODE (value) == INTEGER_CST
> + || (TREE_CODE (value) == ADDR_EXPR
> + && is_gimple_invariant_address (value)));
> +}
> +
>  void
>  value_range::set_equiv (bitmap equiv)
>  {
> @@ -273,8 +285,8 @@ value_range_base::symbolic_p () const
>  {
>return (!varying_p ()
>   && !undefined_p ()
> - && (!is_gimple_min_invariant (m_min)
> - || !is_gimple_min_invariant (m_max)));
> + && (!constant_range_value_p (m_min)
> + || !constant_range_value_p (m_max)));
>  }
>
>  /* NOTE: This is not the inverse of symbolic_p because the range
> @@ -388,7 +400,7 @@ value_range_base::singleton_p (tree *res
>  }
>if (m_kind == VR_RANGE
>&& vrp_operand_equal_p (min (), max ())
> -  && is_gimple_min_invariant (min ()))
> +  && constant_range_value_p (min ()))
>  {
>if (result)
>  *result = min ();
> @@ -953,13 +965,13 @@ get_single_symbol (tree t, bool *neg, tr
>|| TREE_CODE (t) == POINTER_PLUS_EXPR
>|| TREE_CODE (t) == MINUS_EXPR)
>  {
> -  if (is_gimple_min_invariant (TREE_OPERAND (t, 0)))
> +  if (constant_range_value_p (TREE_OPERAND (t, 0)))
> {
>   neg_ = (TREE_CODE (t) == MINUS_EXPR);
>   inv_ = TREE_OPERAND (t, 0);
>   t = TREE_OPERAND (t, 1);
> }
> -  else if (is_gimple_min_invariant (TREE_OPERAND (t, 1)))
> +  else if (constant_range_value_p (TREE_OPERAND (t, 1)))
> {
>   neg_ = false;
>   inv_ = TREE_OPERAND (t, 1);
> @@ -1106,8 +1118,8 @@ compare_values_warnv (tree val1, tree va
>   TYPE_SIGN (TREE_TYPE (val1)));
>  }
>
> -  const bool cst1 = is_gimple_min_invariant (val1);
> -  const bool cst2 = is_gimple_min_invariant (val2);
> +  const bool cst1 = 

Re: Add expr_callee_abi

2019-10-14 Thread Richard Biener
On Fri, Oct 11, 2019 at 4:39 PM Richard Sandiford
 wrote:
>
> This turned out to be useful for the SVE PCS support, and is a natural
> tree-level analogue of insn_callee_abi.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2019-10-11  Richard Sandiford  
>
> gcc/
> * function-abi.h (expr_callee_abi): Declare.
> * function-abi.cc (expr_callee_abi): New function.
>
> Index: gcc/function-abi.h
> ===
> --- gcc/function-abi.h  2019-09-30 17:39:33.514597856 +0100
> +++ gcc/function-abi.h  2019-10-11 15:38:54.141605718 +0100
> @@ -315,5 +315,6 @@ call_clobbered_in_region_p (unsigned int
>  extern const predefined_function_abi _abi (const_tree);
>  extern function_abi fndecl_abi (const_tree);
>  extern function_abi insn_callee_abi (const rtx_insn *);
> +extern function_abi expr_callee_abi (const_tree);
>
>  #endif
> Index: gcc/function-abi.cc
> ===
> --- gcc/function-abi.cc 2019-09-30 17:39:33.514597856 +0100
> +++ gcc/function-abi.cc 2019-10-11 15:38:54.141605718 +0100
> @@ -229,3 +229,32 @@ insn_callee_abi (const rtx_insn *insn)
>
>return default_function_abi;
>  }
> +
> +/* Return the ABI of the function called by CALL_EXPR EXP.  Return the
> +   default ABI for erroneous calls.  */
> +
> +function_abi
> +expr_callee_abi (const_tree exp)
> +{
> +  gcc_assert (TREE_CODE (exp) == CALL_EXPR);
> +
> +  if (tree fndecl = get_callee_fndecl (exp))
> +return fndecl_abi (fndecl);

Please not.  The ABI in effect on the call is that of
the type of CALL_EXPR_FN, what GIMPLE optimizers
propagated as fndecl here doesn't matter.

> +
> +  tree callee = CALL_EXPR_FN (exp);
> +  if (callee == error_mark_node)
> +return default_function_abi;
> +
> +  tree type = TREE_TYPE (callee);
> +  if (type == error_mark_node)
> +return default_function_abi;
> +
> +  if (POINTER_TYPE_P (type))
> +{
> +  type = TREE_TYPE (type);
> +  if (type == error_mark_node)
> +   return default_function_abi;
> +}

so when it's not a POINTER_TYPE (it always shold be!)
then you're handing arbitrary types to fntype_abi.

> +  return fntype_abi (type);
> +}


Re: [PATCH] teach gengtype about 'mutable'

2019-10-14 Thread Richard Biener
On Sun, Oct 13, 2019 at 4:45 PM Nathan Sidwell  wrote:
>
> In constifying some more of line-map I discovered gengtype didn't know 
> mutable.
> Added thusly.

mutable is bad.  Why do you want to use it?

> nathan
>
> --
> Nathan Sidwell


[PATCH] Fix PR91929

2019-10-14 Thread Richard Biener


The following tries to improve debug info for PRE inserted expressions
by assigning them locations from any of the original expressions
involved.  For simple cases where there exists a 1:1 match this
is "correct" while for more complex cases (like code-hoisting)
this will pick the location of any of the hoisted equivalent
expressions.

An improvement to the current state where we pick up unrelated
locations from surrounding stmts, sometimes missing complete
inline chains on backtraces.

Bootstrapped on x86_64-unknown-linux-gnu.

Anybody think this is a bad idea?  The result should be similar
to inserting all participating exprs and then DCEing all but one
minus missing the debug stmts generated by that.

Thanks,
Richard.

2019-10-14  Richard Biener  

PR tree-optimization/91929
* tree-ssa-pre.c (pre_expr_d::loc): New member.
(get_or_alloc_expr_for_name): Initialize it.
(get_or_alloc_expr_for_constant): Likewise.
(phi_translate_1): Copy it.
(create_expression_by_pieces): Use the original location
of the expression for the inserted stmt.
(compute_avail): Record the location of the stmt for the
expressions created.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 276760)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -257,6 +257,7 @@ typedef struct pre_expr_d : nofree_ptr_h
 {
   enum pre_expr_kind kind;
   unsigned int id;
+  location_t loc;
   pre_expr_union u;
 
   /* hash_table support.  */
@@ -421,6 +422,7 @@ get_or_alloc_expr_for_name (tree name)
 
   result = pre_expr_pool.allocate ();
   result->kind = NAME;
+  result->loc = UNKNOWN_LOCATION;
   PRE_EXPR_NAME (result) = name;
   alloc_expression_id (result);
   return result;
@@ -1077,6 +1079,7 @@ get_or_alloc_expr_for_constant (tree con
 
   newexpr = pre_expr_pool.allocate ();
   newexpr->kind = CONSTANT;
+  newexpr->loc = UNKNOWN_LOCATION;
   PRE_EXPR_CONSTANT (newexpr) = constant;
   alloc_expression_id (newexpr);
   value_id = get_or_alloc_constant_value_id (constant);
@@ -1334,6 +1337,7 @@ phi_translate_1 (bitmap_set_t dest,
 {
   basic_block pred = e->src;
   basic_block phiblock = e->dest;
+  location_t expr_loc = expr->loc;
   switch (expr->kind)
 {
 case NARY:
@@ -1436,6 +1440,7 @@ phi_translate_1 (bitmap_set_t dest,
expr = pre_expr_pool.allocate ();
expr->kind = NARY;
expr->id = 0;
+   expr->loc = expr_loc;
if (nary && !nary->predicated_values)
  {
PRE_EXPR_NARY (expr) = nary;
@@ -1587,6 +1592,7 @@ phi_translate_1 (bitmap_set_t dest,
expr = pre_expr_pool.allocate ();
expr->kind = REFERENCE;
expr->id = 0;
+   expr->loc = expr_loc;
 
if (newref)
  new_val_id = newref->value_id;
@@ -2789,6 +2795,7 @@ create_expression_by_pieces (basic_block
  args.quick_push (arg);
}
  gcall *call = gimple_build_call_vec (fn, args);
+ gimple_set_location (call, expr->loc);
  gimple_call_set_fntype (call, currop->type);
  if (sc)
gimple_call_set_chain (call, sc);
@@ -2822,6 +2829,7 @@ create_expression_by_pieces (basic_block
return NULL_TREE;
  name = make_temp_ssa_name (exprtype, NULL, "pretmp");
  newstmt = gimple_build_assign (name, folded);
+ gimple_set_location (newstmt, expr->loc);
  gimple_seq_add_stmt_without_update (_stmts, newstmt);
  gimple_set_vuse (newstmt, BB_LIVE_VOP_ON_EXIT (block));
  folded = name;
@@ -2860,6 +2868,7 @@ create_expression_by_pieces (basic_block
folded = build_constructor (nary->type, elts);
name = make_temp_ssa_name (exprtype, NULL, "pretmp");
newstmt = gimple_build_assign (name, folded);
+   gimple_set_location (newstmt, expr->loc);
gimple_seq_add_stmt_without_update (_stmts, newstmt);
folded = name;
  }
@@ -2868,16 +2877,17 @@ create_expression_by_pieces (basic_block
switch (nary->length)
  {
  case 1:
-   folded = gimple_build (_stmts, nary->opcode, nary->type,
-  genop[0]);
+   folded = gimple_build (_stmts, expr->loc,
+  nary->opcode, nary->type, genop[0]);
break;
  case 2:
-   folded = gimple_build (_stmts, nary->opcode, nary->type,
-  genop[0], genop[1]);
+   folded = gimple_build (_stmts, expr->loc, nary->opcode,
+  nary->type, genop[0], genop[1]);
break;
  case 3:
-   folded = gimple

Re: [PATCH] Fix up *COND_EXPR *trap* handling (PR middle-end/92063)

2019-10-12 Thread Richard Biener
On October 12, 2019 10:44:06 AM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR and on IRC, tree_could_trap_p is described
>as taking GIMPLE expressions only, but in fact we rely on it not
>crashing
>when feeded GENERIC, just for GENERIC it will not handle expressions
>recursively and we have generic_expr_could_trap_p for that that calls
>tree_could_trap_p recursively.
>The addition of != COND_EXPR and != VEC_COND_EXPR assert to
>operation_could_trap_p broke this, because if GENERIC with COND_EXPR
>being first argument (condition) of another COND_EXPR is fed to
>tree_could_trap_p, it now ICEs.
>
>The following patch fixes it by:
>1) in tree_could_trap_p return false for {,VEC_}COND_EXPR rather than
>just recursing on the first argument.  For GIMPLE, we shouldn't be
>called
>with {,VEC_}COND_EXPR, because those are ternary rhs and thus 3
>arguments,
>not one.  For GENERIC, we can, but we also need to recurse and the
>recursion
>should discover that the comparison may trap.
>2) in simple_operand_p_2 which calls tree_could_trap_p on GENERIC, this
>is
>changed to generic_expr_could_trap_p so that it actually recurses and
>tests
>subexpressions too
>3) in operation_could_trap_helper_p signals that *COND_EXPR is not
>handled
>and it doesn't care about whether *COND_EXPR is floating point or not.
>This change is for sccvn, which calls operation_could_trap_helper_p
>and then, if not handled, calls tree_could_trap_p on the operands.
>Without the first hunk, *COND_EXPR is handled by the default handling,
>which says that fp_operation can trap (but *COND_EXPR with fp operands
>can't
>in itself) and makes it unhandled otherwise, at which point we call
>tree_could_trap_p on the condition, which is all that matters.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Thanks, 
Richard. 

>2019-10-12  Jakub Jelinek  
>
>   PR middle-end/92063
>   * tree-eh.c (operation_could_trap_helper_p) 
>   : Return false with *handled = false.
>   (tree_could_trap_p): For {,VEC_}COND_EXPR return false instead of
>   recursing on the first operand.
>   * fold-const.c (simple_operand_p_2): Use generic_expr_could_trap_p
>   instead of tree_could_trap_p.
>   * tree-ssa-sccvn.c (vn_nary_may_trap): Formatting fixes.
>
>   * gcc.c-torture/compile/pr92063.c: New test.
>
>--- gcc/tree-eh.c.jj   2019-10-07 17:30:31.028153702 +0200
>+++ gcc/tree-eh.c  2019-10-11 13:46:12.626811039 +0200
>@@ -2499,6 +2499,14 @@ operation_could_trap_helper_p (enum tree
>   /* Constructing an object cannot trap.  */
>   return false;
> 
>+case COND_EXPR:
>+case VEC_COND_EXPR:
>+  /* Whether *COND_EXPR can trap depends on whether the
>+   first argument can trap, so signal it as not handled.
>+   Whether lhs is floating or not doesn't matter.  */
>+  *handled = false;
>+  return false;
>+
> default:
>   /* Any floating arithmetic may trap.  */
>   if (fp_operation && flag_trapping_math)
>@@ -2614,9 +2622,12 @@ tree_could_trap_p (tree expr)
>   if (!expr)
> return false;
> 
>-  /* For COND_EXPR and VEC_COND_EXPR only the condition may trap.  */
>+  /* In COND_EXPR and VEC_COND_EXPR only the condition may trap, but
>+ they won't appear as operands in GIMPLE form, so this is just for
>the
>+ GENERIC uses where it needs to recurse on the operands and so
>+ *COND_EXPR itself doesn't trap.  */
>if (TREE_CODE (expr) == COND_EXPR || TREE_CODE (expr) == VEC_COND_EXPR)
>-expr = TREE_OPERAND (expr, 0);
>+return false;
> 
>   code = TREE_CODE (expr);
>   t = TREE_TYPE (expr);
>--- gcc/fold-const.c.jj2019-10-09 10:27:12.578402783 +0200
>+++ gcc/fold-const.c   2019-10-11 12:29:53.426603712 +0200
>@@ -4447,8 +4447,7 @@ simple_operand_p_2 (tree exp)
> {
>   enum tree_code code;
> 
>-  if (TREE_SIDE_EFFECTS (exp)
>-  || tree_could_trap_p (exp))
>+  if (TREE_SIDE_EFFECTS (exp) || generic_expr_could_trap_p (exp))
> return false;
> 
>   while (CONVERT_EXPR_P (exp))
>--- gcc/tree-ssa-sccvn.c.jj2019-09-24 14:39:07.479465423 +0200
>+++ gcc/tree-ssa-sccvn.c   2019-10-11 13:21:47.437326054 +0200
>@@ -5100,18 +5100,15 @@ vn_nary_may_trap (vn_nary_op_t nary)
> honor_nans = flag_trapping_math && !flag_finite_math_only;
> honor_snans = flag_signaling_nans != 0;
>   }
>-  else if (INTEGRAL_TYPE_P (type)
>- && TYPE_OVERFLOW_TRAPS (type))
>+  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type))
>   honor_trapv = true;
> }
>   if (nary->length >= 2)
> rhs2 = nary->op[1];
>   ret = operation_could_trap_helper_p (nary->opcode, fp_operation,
>- honor_trapv,
>- honor_nans, honor_snans, rhs2,
>- );
>-  if (handled
>-  && ret)
>+ honor_trapv, honor_nans, honor_snans,
>+ rhs2, );
>+  

[PATCH] Re-instantiate more redundant store removal in FRE

2019-10-11 Thread Richard Biener


I've spent some time robustifying the original idea (fixing some
latent wrong-code on the way...).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-10-11  Richard Biener  

PR tree-optimization/90883
PR tree-optimization/91091
* tree-ssa-sccvn.c (vn_reference_lookup_3): Use correct
alias-sets both for recording VN table entries and continuing
walking after translating through copies.  Handle same-sized
reads from SSA names by returning the plain SSA name.
(eliminate_dom_walker::eliminate_stmt): Properly handle
non-size precision stores in redundant store elimination.

* gcc.dg/torture/20191011-1.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-82.c: Likewise.
* gcc.dg/tree-ssa/ssa-fre-83.c: Likewise.
* gcc.dg/tree-ssa/redundant-assign-zero-1.c: Disable FRE.
* gcc.dg/tree-ssa/redundant-assign-zero-2.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 276858)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1877,8 +1877,10 @@
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "Successfully combined %u partial definitions\n", ndefs);
+  /* ???  If we track partial defs alias-set we could use that if it
+ is the same for all.  Use zero for now.  */
   return vn_reference_lookup_or_insert_for_pieces
-   (first_vuse, vr->set, vr->type, vr->operands, val);
+   (first_vuse, 0, vr->type, vr->operands, val);
 }
   else
 {
@@ -2333,7 +2335,7 @@
 
   /* If we are looking for redundant stores do not create new hashtable
  entries from aliasing defs with made up alias-sets.  */
-  if (*disambiguate_only > TR_TRANSLATE || !data->tbaa_p)
+  if (*disambiguate_only > TR_TRANSLATE)
 return (void *)-1;
 
   /* If we cannot constrain the size of the reference we cannot
@@ -2449,7 +2451,7 @@
return (void *)-1;
}
  return vn_reference_lookup_or_insert_for_pieces
-  (vuse, vr->set, vr->type, vr->operands, val);
+  (vuse, 0, vr->type, vr->operands, val);
}
   /* For now handle clearing memory with partial defs.  */
   else if (known_eq (ref->size, maxsize)
@@ -2499,7 +2501,7 @@
{
  tree val = build_zero_cst (vr->type);
  return vn_reference_lookup_or_insert_for_pieces
- (vuse, vr->set, vr->type, vr->operands, val);
+ (vuse, get_alias_set (lhs), vr->type, vr->operands, val);
}
  else if (known_eq (ref->size, maxsize)
   && maxsize.is_constant ()
@@ -2614,7 +2616,7 @@
 
  if (val)
return vn_reference_lookup_or_insert_for_pieces
-   (vuse, vr->set, vr->type, vr->operands, val);
+ (vuse, get_alias_set (lhs), vr->type, vr->operands, val);
}
}
  else if (ranges_known_overlap_p (offseti, maxsizei, offset2i, size2i))
@@ -2672,23 +2674,26 @@
 according to endianness.  */
  && (! INTEGRAL_TYPE_P (vr->type)
  || known_eq (ref->size, TYPE_PRECISION (vr->type)))
- && multiple_p (ref->size, BITS_PER_UNIT)
- && (! INTEGRAL_TYPE_P (TREE_TYPE (def_rhs))
- || type_has_mode_precision_p (TREE_TYPE (def_rhs
+ && multiple_p (ref->size, BITS_PER_UNIT))
{
- gimple_match_op op (gimple_match_cond::UNCOND,
- BIT_FIELD_REF, vr->type,
- vn_valueize (def_rhs),
- bitsize_int (ref->size),
- bitsize_int (offset - offset2));
- tree val = vn_nary_build_or_lookup ();
- if (val
- && (TREE_CODE (val) != SSA_NAME
- || ! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (val)))
+ if (known_eq (ref->size, size2))
+   return vn_reference_lookup_or_insert_for_pieces
+   (vuse, get_alias_set (lhs), vr->type, vr->operands,
+SSA_VAL (def_rhs));
+ else if (! INTEGRAL_TYPE_P (TREE_TYPE (def_rhs))
+  || type_has_mode_precision_p (TREE_TYPE (def_rhs)))
{
- vn_reference_t res = vn_reference_lookup_or_insert_for_pieces
- (vuse, vr->set, vr->type, vr->operands, val);
- return res;
+ gimple_match_op op (gimple_match_cond::UNCOND,
+ BIT_FIELD_REF, vr->type,
+ vn_valueize (def_rhs),
+ bitsize_int (ref->size)

Re: [PATCH] Cleanup parameter of vectorizable_live_operation

2019-10-11 Thread Richard Biener
On Fri, 11 Oct 2019, Bernd Edlinger wrote:

> Hi Richard,
> 
> I became aware of this while looking at the -Wshadow=compatible-local 
> warnings.
> The function vectorizable_live_operation uses a parameter called "vec_stmt"
> that is shadowed by something also called "vec_stmt".  But in this case,
> the vec_stmt actually only used as a boolean, i.e. pointer is NULL or not.
> 
> This changes the parameter vec_stmt to vec_stmt_p, and propagates that
> change to can_vectorize_live_stmts.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

OK.  Some more refactoring is in order here but it's an improvement.

Thanks,
Richard.


Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-11 Thread Richard Biener
ert
> stmts on loop preheader edge.
> (vect_do_peeling): Enable skip-vectors when doing loop versioning if
> we decided to vectorize epilogues.  Update epilogues NITERS and
> construct ADVANCE to update epilogues data references where needed.
> (vect_loop_versioning): Moved decision to check_profitability
> based on cost model.
> * tree-vect-stmts.c (ensure_base_align): Only update alignment
> if new alignment is lower.
> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos member.
> (vect_loop_versioning, vect_do_peeling, vect_get_loop_niters,
> vect_update_inits_of_drs, determine_peel_for_niter,
>     vect_analyze_loop): Add or update declarations.
> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already
> create loop_vec_info's for epilogues when available.  Otherwise analyse
> epilogue separately.
> 
> 
> 
> Cheers,
> Andre
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

2019-10-11 Thread Richard Biener
On Tue, 8 Oct 2019, Tamar Christina wrote:

> Hi Richi,
> 
> Thanks for the review, I've added some comments inline.
> 
> The 10/07/2019 12:15, Richard Biener wrote:
> > On Tue, 1 Oct 2019, Tamar Christina wrote:
> > 
> > > Hi All,
> > > 
> > > This adds a framework to allow pattern matchers to be written at based on 
> > > the
> > > SLP tree.  The difference between this one and the one in 
> > > tree-vect-patterns is
> > > that this matcher allows the matching of an arbitrary number of parallel
> > > statements and replacing of an arbitrary number of children or statements.
> > > 
> > > Any relationship created by the SLP pattern matcher will be undone if SLP 
> > > fails.
> > > 
> > > The pattern matcher can also cancel all permutes depending on what the 
> > > pattern
> > > requested it to do.  As soon as one pattern requests the permutes to be
> > > cancelled all permutes are cancelled.
> > > 
> > > Compared to the previous pattern matcher this one will work for an 
> > > arbitrary
> > > group size and will match at any arbitrary node in the SLP tree.  The only
> > > requirement is that the entire node is matched or rejected.
> > > 
> > > vect_build_slp_tree_1 is a bit more lenient in what it accepts as 
> > > "compatible
> > > operations" but the matcher cannot be because in cases where you match 
> > > the order
> > > of the operands may be changed.  So all operands must be changed or none.
> > > 
> > > Furthermore the matcher relies on canonization of the operations inside 
> > > the
> > > SLP tree and on the fact that floating math operations are not 
> > > commutative.
> > > This means that matching a pattern does not need to try all alternatives 
> > > or
> > > combinations and that arguments will always be in the same order if it's 
> > > the
> > > same operation.
> > > 
> > > The pattern matcher also ignored uninteresting nodes such as type casts, 
> > > loads
> > > and stores.  Doing so is essential to keep the runtime down.
> > > 
> > > Each matcher is allowed a post condition that can be run to perform any 
> > > changes
> > > to the SLP tree as needed before the patterns are created and may also 
> > > abort
> > > the creation of the patterns.
> > > 
> > > When a pattern is matched it is not immediately created but instead it is
> > > deferred until all statements in the node have been analyzed.  Only if 
> > > all post
> > > conditions are true, and all statements will be replaced will the 
> > > patterns be
> > > created in batch.  This allows us to not have to undo any work if the 
> > > pattern
> > > fails but also makes it so we traverse the tree only once.
> > > 
> > > When a new pattern is created it is a marked as a pattern to the 
> > > statement it is
> > > replacing and be marked as used in the current SLP scope.  If SLP fails 
> > > then
> > > relationship is undone and the relevancy restored.
> > > 
> > > Each pattern matcher can detect any number of pattern it wants.  The only
> > > constraint is that the optabs they produce must all have the same arity.
> > > 
> > > The pattern matcher supports instructions that have no scalar form as they
> > > are added as pattern statements to the stmt.  The BB is left untouched and
> > > so the scalar loop is untouched.
> > > 
> > > Bootstrapped on aarch64-none-linux-gnu and no issues.
> > > No regression testing done yet.
> > 
> > If you split out the introduction of SLP_TREE_REF_COUNT you can commit
> > that right now (sorry for being too lazy there...).
> > 
> 
> I'll split those off :)
> 
> > One overall comment - you do pattern matching after SLP tree
> > creation (good) but still do it before the whole SLP graph is
> > created (bad).  Would it be possible to instead do it as a separate
> > phase in vect_analyze_slp, looping over all instances (the instances
> > present entries into the single unified SLP graph now), avoiding
> > to visit "duplicates"?
> > 
> 
> It should be, the only issue I can see is that build SLP may fail because of
> an unsupported permute, or because it can use load lanes.  If I'm 
> understanding
> it correctly you wouldn't get SLP vectorization in those cases so then the 
> matching
> can't work? So it would limit it a it more.

T

Re: Avid ggc_alloc and push_cfun during LTO streaming

2019-10-11 Thread Richard Biener
On Fri, Oct 11, 2019 at 11:03 AM Jan Hubicka  wrote:
>
> Hi,
> this patch prevents tree creation druing WPA stream out (to avoid
> touching pages and triggering COW).  It fixes the following
>  - gimple streamer produces MEM_REF wrappings for global decls.
>This is to preserve the type of access and is not necessary for
>WPA->LTRANS streaming when decls ar eno longer going to be merged.
>  - we renumber stmt uids during streaming WPA summaries
>  - loop optimizer is initialized in output_function.
> After testing the patch I noticed that output_function does one extra
> renumbering of stmts. This seems quite broken and I will fix it
> incrementally.
>
> Bootstrapped/regtested x86_64-linux, comitted.

Huh.  Why do we stream function bodies at WPA time at all?
We should already have input sections we can copy/remap?

That is, why does gcc_assert (!flag_wpa) in output_function trip?

Richard.

>
> * gimple-streamer-out.c (output_gimple_stmt): Add explicit function
> parameter.
> * lto-streamer-out.c: Include tree-dfa.h.
> (output_cfg): Do not use cfun.
> (lto_prepare_function_for_streaming): New.
> (output_function): Do not push cfun; do not initialize loop optimizer.
> * lto-streamer.h (lto_prepare_function_for_streaming): Declare.
> * passes.c (ipa_write_summaries): Use it.
> (ipa_write_optimization_summaries): Do not modify bodies.
> * tree-dfa.c (renumber_gimple_stmt_uids): Add function parameter.
> * tree.dfa.h (renumber_gimple_stmt_uids): Update prototype.
> * tree-ssa-dse.c (pass_dse::execute): Update use of
> renumber_gimple_stmt_uids.
> * tree-ssa-math-opts.c (pass_optimize_widening_mul::execute): 
> Likewise.
>
> * lto.c (lto_wpa_write_files): Prepare all bodies for streaming.
> Index: gimple-streamer-out.c
> ===
> --- gimple-streamer-out.c   (revision 276850)
> +++ gimple-streamer-out.c   (working copy)
> @@ -57,7 +57,7 @@ output_phi (struct output_block *ob, gph
>  /* Emit statement STMT on the main stream of output block OB.  */
>
>  static void
> -output_gimple_stmt (struct output_block *ob, gimple *stmt)
> +output_gimple_stmt (struct output_block *ob, struct function *fn, gimple 
> *stmt)
>  {
>unsigned i;
>enum gimple_code code;
> @@ -80,7 +80,7 @@ output_gimple_stmt (struct output_block
>  as_a  (stmt)),
>1);
>bp_pack_value (, gimple_has_volatile_ops (stmt), 1);
> -  hist = gimple_histogram_value (cfun, stmt);
> +  hist = gimple_histogram_value (fn, stmt);
>bp_pack_value (, hist != NULL, 1);
>bp_pack_var_len_unsigned (, stmt->subcode);
>
> @@ -139,7 +139,7 @@ output_gimple_stmt (struct output_block
>  so that we do not have to deal with type mismatches on
>  merged symbols during IL read in.  The first operand
>  of GIMPLE_DEBUG must be a decl, not MEM_REF, though.  */
> - if (op && (i || !is_gimple_debug (stmt)))
> + if (!flag_wpa && op && (i || !is_gimple_debug (stmt)))
> {
>   basep = 
>   if (TREE_CODE (*basep) == ADDR_EXPR)
> @@ -147,7 +147,7 @@ output_gimple_stmt (struct output_block
>   while (handled_component_p (*basep))
> basep = _OPERAND (*basep, 0);
>   if (VAR_P (*basep)
> - && !auto_var_in_fn_p (*basep, current_function_decl)
> + && !auto_var_in_fn_p (*basep, fn->decl)
>   && !DECL_REGISTER (*basep))
> {
>   bool volatilep = TREE_THIS_VOLATILE (*basep);
> @@ -228,7 +228,7 @@ output_bb (struct output_block *ob, basi
>   print_gimple_stmt (streamer_dump_file, stmt, 0, TDF_SLIM);
> }
>
> - output_gimple_stmt (ob, stmt);
> + output_gimple_stmt (ob, fn, stmt);
>
>   /* Emit the EH region holding STMT.  */
>   region = lookup_stmt_eh_lp_fn (fn, stmt);
> Index: lto/lto.c
> ===
> --- lto/lto.c   (revision 276850)
> +++ lto/lto.c   (working copy)
> @@ -304,6 +304,13 @@ lto_wpa_write_files (void)
>
>timevar_push (TV_WHOPR_WPA_IO);
>
> +  cgraph_node *node;
> +  /* Do body modifications needed for streaming before we fork out
> + worker processes.  */
> +  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
> +if (gimple_has_body_p (node->decl))
> +  lto_prepare_function_for_streaming (node);
> +
>/* Generate a prefix for the LTRANS unit files.  */
>blen = strlen (ltrans_output_list);
>temp_filename = (char *) xmalloc (blen + sizeof ("2147483648.o"));
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c  (revision 276850)
> +++ lto-streamer-out.c  (working copy)
> @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.
>  

Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-11 Thread Richard Biener
On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
> Thanks for the review.
>
> On Wed, 2 Oct 2019 at 20:41, Richard Biener  
> wrote:
> >
> > On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Hi,
> > >
> > > As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
> > > passing assembler options specified with -Wa, to the link-time driver.
> > >
> > > The proposed solution only works for uniform -Wa options across all
> > > TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
> > > would require either adjusting partitioning according to flags or
> > > emitting multiple object files  from a single LTRANS CU. We could
> > > consider this as a follow up.
> > >
> > > Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for trunk?
> >
> > While it works for your simple cases it is unlikely to work in practice 
> > since
> > your implementation needs the assembler options be present at the link
> > command line.  I agree that this might be the way for people to go when
> > they face the issue but then it needs to be documented somewhere
> > in the manual.
> >
> > That is, with COLLECT_AS_OPTION (why singular?  I'd expected
> > COLLECT_AS_OPTIONS) available to cc1 we could stream this string
> > to lto_options and re-materialize it at link time (and diagnose mismatches
> > even if we like).
> OK. I will try to implement this. So the idea is if we provide
> -Wa,options as part of the lto compile, this should be available
> during link time. Like in:
>
> arm-linux-gnueabihf-gcc -march=armv7-a -mthumb -O2 -flto
> -Wa,-mimplicit-it=always,-mthumb -c test.c
> arm-linux-gnueabihf-gcc  -flto  test.o
>
> I am not sure where should we stream this. Currently, cl_optimization
> has all the optimization flag provided for compiler and it is
> autogenerated and all the flags are integer values. Do you have any
> preference or example where this should be done.

In lto_write_options, I'd simply append the contents of COLLECT_AS_OPTIONS
(with -Wa, prepended to each of them), then recover them in lto-wrapper
for each TU and pass them down to the LTRANS compiles (if they agree
for all TUs, otherwise I'd warn and drop them).

Richard.

> Thanks,
> Kugan
>
>
>
> >
> > Richard.
> >
> > > Thanks,
> > > Kugan
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-10-02  kugan.vivekanandarajah  
> > >
> > > PR lto/78353
> > > * gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in env.
> > > (driver::main): Call putenv_COLLECT_AS_OPTION.
> > > * lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2019-10-02  kugan.vivekanandarajah  
> > >
> > > PR lto/78353
> > > * gcc.target/arm/pr78353-1.c: New test.
> > > * gcc.target/arm/pr78353-2.c: New test.


Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-11 Thread Richard Biener
On Thu, Oct 10, 2019 at 8:06 PM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > ping
> >
> > Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
> > bitfields by their declared type, which results in better codegeneration on 
> > practically
> > any target.
>
> The name is confusing, but the documentation looks accurate to me:
>
> Define this macro as a C expression which is nonzero if accessing less
> than a word of memory (i.e.@: a @code{char} or a @code{short}) is no
> faster than accessing a word of memory, i.e., if such access
> require more than one instruction or if there is no difference in cost
> between byte and (aligned) word loads.
>
> When this macro is not defined, the compiler will access a field by
> finding the smallest containing object; when it is defined, a fullword
> load will be used if alignment permits.  Unless bytes accesses are
> faster than word accesses, using word accesses is preferable since it
> may eliminate subsequent memory access if subsequent accesses occur to
> other fields in the same word of the structure, but to different bytes.
>
> > I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
> > from GCC as it's confusing and useless.
>
> I disagree.  Some targets can optimise single-bit operations when the
> container is a byte, for example.

There's also less chance of store-to-load forwarding issues when _not_
forcing word accesses.  Because I think that we do not use the
same macro to change code generation for writes (where a larger access
requires a read-modify-write cycle).

But this also means that a decision with no information on context is
going to be flawed.  Still generally I'd do smaller reads if they are not
significantly more expensive on the target but larger writes (with the
same constraint) just for the STLF issue.

Unfortunately the macro docs don't say if it is applicable to reads or writes
or both...

> > OK for commit until we get rid of it?
> >
> > ChangeLog:
> > 2017-11-17  Wilco Dijkstra  
> >
> > gcc/
> > * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
> > --
> > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > index 
> > 056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
> >  100644
> > --- a/gcc/config/aarch64/aarch64.h
> > +++ b/gcc/config/aarch64/aarch64.h
> > @@ -769,14 +769,9 @@ typedef struct
> > if given data not on the nominal alignment.  */
> >  #define STRICT_ALIGNMENTTARGET_STRICT_ALIGN
> >
> > -/* Define this macro to be non-zero if accessing less than a word of
> > -   memory is no faster than accessing a word of memory, i.e., if such
> > -   accesses require more than one instruction or if there is no
> > -   difference in cost.
> > -   Although there's no difference in instruction count or cycles,
> > -   in AArch64 we don't want to expand to a sub-word to a 64-bit access
> > -   if we don't have to, for power-saving reasons.  */
> > -#define SLOW_BYTE_ACCESS   0
> > +/* Contrary to all documentation, this enables wide bitfield accesses,
> > +   which results in better code when accessing multiple bitfields.  */
> > +#define SLOW_BYTE_ACCESS   1
> >
> >  #define NO_FUNCTION_CSE 1
>
> I agree this makes sense from a performance point of view, and I think
> the existing comment is admitting that AArch64 has the properties that
> would normally cause us to set SLOW_BYTE_ACCESS to 1.  But the comment
> is claiming that there's a power-saving benefit to leaving it off.
>
> It seems like a weak argument though.  Bitfields are used when several
> values are packed into the same integer, so there's a high likelihood
> we'll need the whole integer anyway.  Avoiding the redundancies described
> in the documention should if anything help with power usage.
>
> Maybe the main concern was using a 64-bit access when a 32-bit one
> would do, since 32-bit bitfield containers are the most common.  But the:
>
>  && GET_MODE_ALIGNMENT (mode) <= align
>
> condition in get_best_mode should avoid that unless the 64-bit
> access is naturally aligned.  (See the big comment above for the
> pros and cons of this.)
>
> So I think we should change the macro value unless anyone can back up the
> power-saving claim.  Let's wait a week (more) to see if anyone objects.
>
> The comment change isn't OK though.  Please keep the first paragraph
> and just reword the second to say that's why we set the value to 1.
>
> Thanks,
> Richard


Re: Type representation in CTF and DWARF

2019-10-11 Thread Richard Biener
On Fri, Oct 11, 2019 at 1:06 AM Indu Bhagat  wrote:
>
>
>
> On 10/09/2019 12:49 AM, Jakub Jelinek wrote:
> > On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote:
> >> There's a mechanism to get type (and decl - I suppose CTF also
> >> contains debug info
> >> for function declarations not only its type?) info as part of early
> >> debug generation.
> >> The attached "hack" simply mangles dwarf2out to output this early info as 
> >> the
> >> only debug info (only verified on a small .c file).  We still have things 
> >> like
> >> file, line and column numbers for entities (not sure if CTF has those).
> >>
> >> It should be possible to "hide" the hack behind a -gdwarf-like-ctf or 
> >> similar.
> >> I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
> >> (and -g1 doesn't include types but just decls).
> > Yeah.  And if location info isn't in CTF, you can as well add an early
> > return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION
> > already.  Or if it is there, but just file/line and not column, you can use
> > -gno-column-info.  As has been mentioned earlier, you can use dwz utility
> > post-linking instead of -fdebug-types-section.
> >
> >   Jakub
>
> Thanks for your pointers.
>
> CTF does not encode location information. So, I used early exit in the
> add_src_coords_attributes to avoid generation of location info (file, line,
> column). To answer Richard's question, CTF does have type debug info
> for function declarations and the argument types. So I think with these
> changes, both CTF and DWARF generation will emit debug info for the same set 
> of
> types and decl.
>
> Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> dwz compiled from the master branch) on the generated binaries:
>
> (coreutils-0.22)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> ls   30616   |1136   |21098   | 26240 
>   | 0.62
> pwd  10734   |788|10433   | 13929 
>   | 0.83
> groups 10706 |811|10249   | 13378 
>   | 0.80
>
> (emacs-26.3)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> emacs-26.3.1 674657  |6402   |   273963   |   273910  
>   | 0.33
>
> I chose to account for 50% of .debug_str because at this point, it will be
> unfair to not account for them. Actually, one could even argue that upto 70%
> of the .debug_str are names of entities. CTF section sizes do include the CTF
> string tables.
>
> Across coreutils, I see a geomean of 0.73 (ratio of
> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> footprint than CTF (with 50% of .debug_str accounted for).

I'm not convinced this "improvement" in size is worth maintainig another
debug-info format much less since it lacks desirable features right now
and thus evaluation is tricky.

At least you can improve dwarf size considerably with a low amount of work.

I suspect another factor where dwarf is bigger compared to CTF is that dwarf
is recording typedef names as well as qualified type variants.  But maybe
CTF just has a more compact representation for the bits it actually implements.

Richard.

> Indu
>


Re: Remove splay tree form ipa-reference.c

2019-10-11 Thread Richard Biener
On Thu, Oct 10, 2019 at 9:25 PM Jan Hubicka  wrote:
>
> Hi,
> ipa-reference uses splay tree to map DECL_UIDs to trees.  On other
> places we use hash-maps which are more sutiable.

A simple hash-table/jhash-set does it as well if the UID is the same
as DECL_UID.
You can use the tree_decl_hash hash-traits for that.  decl lookup
can then be done like

tree_decl_minimal d;
d.uid = uid;
find_slot_with_hash (, uid)

or so.  There must be existing users that got converted from old-style
hashes to new templated ones but I can't find them right now :/

Richard.

> Bootstrapped/regtested x86_64-linux, comitted.
>
> Honza
> * ipa-reference.c: Do not include splay-tree.h
> (reference_vars_to_consider): Turn to hash map.
> (get_static_name, ipa_init, analyze_function, propagate,
> stream_out_bitmap, ipa_reference_write_optimization_summary,
> ipa_reference_write_optimization_summary): Update.
> Index: ipa-reference.c
> ===
> --- ipa-reference.c (revision 276849)
> +++ ipa-reference.c (working copy)
> @@ -46,7 +46,6 @@ along with GCC; see the file COPYING3.
>  #include "cgraph.h"
>  #include "data-streamer.h"
>  #include "calls.h"
> -#include "splay-tree.h"
>  #include "ipa-utils.h"
>  #include "ipa-reference.h"
>  #include "symbol-summary.h"
> @@ -92,9 +91,11 @@ struct ipa_reference_vars_info_d
>
>  typedef struct ipa_reference_vars_info_d *ipa_reference_vars_info_t;
>
> -/* This splay tree contains all of the static variables that are
> +/* This map contains all of the static variables that are
> being considered by the compilation level alias analysis.  */
> -static splay_tree reference_vars_to_consider;
> +typedef hash_map, tree>
> +reference_vars_to_consider_t;
> +static reference_vars_to_consider_t *reference_vars_to_consider;
>
>  /* Set of all interesting module statics.  A bit is set for every module
> static we are considering.  This is added to the local info when asm
> @@ -272,9 +273,7 @@ is_proper_for_analysis (tree t)
>  static const char *
>  get_static_name (int index)
>  {
> -  splay_tree_node stn =
> -splay_tree_lookup (reference_vars_to_consider, index);
> -  return fndecl_name ((tree)(stn->value));
> +  return fndecl_name (*reference_vars_to_consider->get (index));
>  }
>
>  /* Dump a set of static vars to FILE.  */
> @@ -416,7 +415,7 @@ ipa_init (void)
>ipa_init_p = true;
>
>if (dump_file)
> -reference_vars_to_consider = splay_tree_new (splay_tree_compare_ints, 0, 
> 0);
> +reference_vars_to_consider = new reference_vars_to_consider_t(251);
>
>bitmap_obstack_initialize (_info_obstack);
>bitmap_obstack_initialize (_summary_obstack);
> @@ -476,9 +475,8 @@ analyze_function (struct cgraph_node *fn
>   && bitmap_set_bit (all_module_statics, ipa_reference_var_uid (var)))
> {
>   if (dump_file)
> -   splay_tree_insert (reference_vars_to_consider,
> -  ipa_reference_var_uid (var),
> -  (splay_tree_value)var);
> +   reference_vars_to_consider->put (ipa_reference_var_uid (var),
> +   var);
> }
>switch (ref->use)
> {
> @@ -898,7 +896,7 @@ propagate (void)
>  }
>
>if (dump_file)
> -splay_tree_delete (reference_vars_to_consider);
> +delete reference_vars_to_consider;
>reference_vars_to_consider = NULL;
>return remove_p ? TODO_remove_functions : 0;
>  }
> @@ -968,8 +966,7 @@ stream_out_bitmap (struct lto_simple_out
>  return;
>EXECUTE_IF_AND_IN_BITMAP (bits, ltrans_statics, 0, index, bi)
>  {
> -  tree decl = (tree)splay_tree_lookup (reference_vars_to_consider,
> -  index)->value;
> +  tree decl = *reference_vars_to_consider->get (index);
>lto_output_var_decl_index (ob->decl_state, ob->main_stream, decl);
>  }
>  }
> @@ -987,7 +984,7 @@ ipa_reference_write_optimization_summary
>auto_bitmap ltrans_statics;
>int i;
>
> -  reference_vars_to_consider = splay_tree_new (splay_tree_compare_ints, 0, 
> 0);
> +  reference_vars_to_consider = new reference_vars_to_consider_t (251);
>
>/* See what variables we are interested in.  */
>for (i = 0; i < lto_symtab_encoder_size (encoder); i++)
> @@ -1001,9 +998,8 @@ ipa_reference_write_optimization_summary
> {
>   tree decl = vnode->decl;
>   bitmap_set_bit (ltrans_statics, ipa_reference_var_uid (decl));
> - splay_tree_insert (reference_vars_to_consider,
> -ipa_reference_var_uid (decl),
> -(splay_tree_value)decl);
> + reference_vars_to_consider->put
> +(ipa_reference_var_uid (decl), decl);
>   ltrans_statics_bitcount ++;
> }
>  }
> @@ -1045,7 +1041,7 @@ ipa_reference_write_optimization_summary
>   }
>}
>

Re: Correctly release ipa-reference summarries

2019-10-11 Thread Richard Biener
On Thu, Oct 10, 2019 at 9:21 PM Jan Hubicka  wrote:
>
> Hi,
> this patch fixes code removing summaries in ipa-reference.  As a memory
> leak it may make sense to backport this to release branches.

Please do so.

Richard.

> Honza
>
> * ipa-reference.c (propagate): Fix releasing of IPA summaries.
> Index: ipa-reference.c
> ===
> --- ipa-reference.c (revision 276707)
> +++ ipa-reference.c (working copy)
> @@ -891,15 +889,14 @@ propagate (void)
>
>bitmap_obstack_release (_info_obstack);
>
> -  if (ipa_ref_var_info_summaries == NULL)
> +  if (ipa_ref_var_info_summaries != NULL)
>  {
>delete ipa_ref_var_info_summaries;
>ipa_ref_var_info_summaries = NULL;
>  }
>
> -  ipa_ref_var_info_summaries = NULL;
>if (dump_file)
>  splay_tree_delete (reference_vars_to_consider);
>reference_vars_to_consider = NULL;
>return remove_p ? TODO_remove_functions : 0;
>  }


Re: [PATCH] Fix PR92046

2019-10-11 Thread Richard Biener
On Fri, 11 Oct 2019, Rainer Orth wrote:

> Hi Christophe,
> 
> > On Thu, 10 Oct 2019 at 16:01, Richard Biener  wrote:
> >
> >>
> >> The following fixes a few param adjustments that are made based on
> >> per-function adjustable flags by moving the adjustments to their
> >> users.  Semantics change in some minor ways but that's allowed
> >> for --params.
> >>
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>
> >> Hi,
> >
> > This generates several regressions.
> > On aarch64:
> > FAIL:  gcc.target/aarch64/vect_fp16_1.c scan-assembler-times
> > fadd\tv[0-9]+.8h 2
> >
> > on arm-linux-gnueabihf:
> > FAIL: gcc.dg/vect/vect-align-1.c -flto -ffat-lto-objects
> >  scan-tree-dump-times vect "vectorized 1 loops" 1
> > FAIL: gcc.dg/vect/vect-align-1.c scan-tree-dump-times vect "vectorized 1
> > loops" 1
> > FAIL: gcc.dg/vect/vect-align-2.c -flto -ffat-lto-objects
> >  scan-tree-dump-times vect "vectorized 1 loops" 1
> > FAIL: gcc.dg/vect/vect-align-2.c scan-tree-dump-times vect "vectorized 1
> > loops" 1
> >
> > on armeb-linux-gnueabihf, many (316) like:
> > FAIL: gcc.dg/vect/O3-vect-pr34223.c scan-tree-dump-times vect "vectorized 1
> > loops" 1
> > FAIL: gcc.dg/vect/fast-math-pr35982.c scan-tree-dump-times vect "vectorized
> > 1 loops" 1
> >
> > still on armeb-linux-gnueabihf:
> > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++14  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++17  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++2a  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep-2.cc  -std=c++98  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep-3.cc(test for warnings, line )
> > g++.dg/vect/pr33426-ivdep-4.cc(test for warnings, line )
> > g++.dg/vect/pr33426-ivdep.cc  -std=c++14  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep.cc  -std=c++17  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep.cc  -std=c++2a  (test for warnings, line )
> > g++.dg/vect/pr33426-ivdep.cc  -std=c++98  (test for warnings, line )
> >
> > gfortran.dg/vect/no-vfa-pr32377.f90   -O   scan-tree-dump-times vect
> > "vectorized 2 loops" 1
> > gfortran.dg/vect/pr19049.f90   -O   scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > gfortran.dg/vect/pr32377.f90   -O   scan-tree-dump-times vect
> > "vectorized 2 loops" 1
> > gfortran.dg/vect/vect-2.f90   -O   scan-tree-dump-times vect
> > "vectorized 3 loops" 1
> > gfortran.dg/vect/vect-3.f90   -O   scan-tree-dump-times vect "Alignment
> > of access forced using versioning" 3
> > gfortran.dg/vect/vect-4.f90   -O   scan-tree-dump-times vect "accesses
> > have the same alignment." 1
> > gfortran.dg/vect/vect-4.f90   -O   scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> > gfortran.dg/vect/vect-5.f90   -O   scan-tree-dump-times vect "Alignment
> > of access forced using versioning." 2
> > gfortran.dg/vect/vect-5.f90   -O   scan-tree-dump-times vect
> > "vectorized 1 loops" 1
> 
> that's PR tree-optimization/92066, also seen on sparc, powerpc64, and
> ia64.

Hmm, OK.  There's one obvious bug fixed below, other than that I have
to investigate in more detail.

Committed as obvious.

Richard.

2019-10-11  Richard Biener  

PR tree-optimization/92066
PR tree-optimization/92046
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Fix bogus cost model check.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 276858)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2179,7 +2179,7 @@
   do_versioning
 = (optimize_loop_nest_for_speed_p (loop)
&& !loop->inner /* FORNOW */
-   && flag_vect_cost_model > VECT_COST_MODEL_CHEAP);
+   && flag_vect_cost_model != VECT_COST_MODEL_CHEAP);
 
   if (do_versioning)
 {


Re: Implement ggc_trim

2019-10-11 Thread Richard Biener
On October 11, 2019 9:03:53 AM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patch adds ggc_trim that releases free pages used by GGC memory
>to system.  This is useful to reduce memory footprint of WPA streaming:
>WPA streaming ought to not use any more GGC memory (patches in testing
>for that) and trimming the memory makes it available to fork
>used
>by stream out machinery.
>
>I collected some stats for cc1 for both GGC and heap (using mallinfo).
>Memory footprints are as follows:
>
>After streaming in global stream: 123MB GGC;  25MB of heap.
>After streaming in callgraph: 228MB GGC;  45MB of heap.
>After streaming in summaries: 373MB GGC; 126MB of heap.
>After symbol merging   : 348MB GGC; 130MB of heap.
>After IPA-ICF  : 501MB GGC; 160MB of heap. (this is all ICF)
>After IPA-CP   : 528MB GGC; 163MB of heap.
>After IPA-SRA  : 532MB GGC; 163MB of heap.
>After Inline   : 644MB GGC; 173MB of heap
>   This is after collecting of 118MB of
>   garbage and returning 740k to system
>   by madvise_dontneed
>After ipa-reference: 644MB GGC; 370MB of heap
>   I checked this all goes into the
>   bitmaps; I have WIP patch for that
>After releasing summariess : 431MB GGC; 383MB of heap
>   Trim releases 43MB by unmap
>   and 321MB by madvise_dontneed
>
>At least i learnt new fact about ipa-reference consuming  200MB of
>memory which was not obvious from our detailed mem stats.
>
>I think the lowest hanging fruit after this patch is to add
>malloc_madvise which further reduces footpring and fix ipa-reference.
>Hopefully Martin will do a bit about ipa-icf.
>
>I will dig into what inliner does but it produces a lot of clones so I
>think it is mostly clone and summary duplication. Perhaps we can avoid
>copying some of summaries for inline clones.
>
>In TOP I see about 900MB instead of 1.4GB before WPA streaming starts
>with both ggc_trim and madvise.
>
>Note that I also tried to hack ggc_free to recognize free pages but at
>least in simple implementation it is a loss since it makes ggc_alloc
>more expensive (it needs to bring pages back and add into freelists)
>which hurts stream-in performance.
>
>I think sweeping once per WPA is no problem, it is definitly less than
>1% of WPA time.
>
>Bootstrapped/regtested x86_64-linux, OK?

Ok.

Richard. 

>   * ggc-page.c (release_pages): Output statistics when !quiet_flag.
>   (ggc_collect): Dump later to not interfere with release_page dump.
>   (ggc_trim): New function.
>   * ggc-none.c (ggc_trim): New.
>
>   * lto.c (lto_wpa_write_files): Call ggc_trim.
>Index: ggc-page.c
>===
>--- ggc-page.c (revision 276707)
>+++ ggc-page.c (working copy)
>@@ -529,7 +529,6 @@ static void clear_page_group_in_use (pag
> #endif
> static struct page_entry * alloc_page (unsigned);
> static void free_page (struct page_entry *);
>-static void release_pages (void);
> static void clear_marks (void);
> static void sweep_pages (void);
> static void ggc_recalculate_in_use_p (page_entry *);
>@@ -1016,6 +1015,8 @@ free_page (page_entry *entry)
> static void
> release_pages (void)
> {
>+  size_t n1 = 0;
>+  size_t n2 = 0;
> #ifdef USING_MADVISE
>   page_entry *p, *start_p;
>   char *start;
>@@ -1061,6 +1062,7 @@ release_pages (void)
>   else
> G.free_pages = p;
>   G.bytes_mapped -= mapped_len;
>+n1 += len;
> continue;
> }
>   prev = newprev;
>@@ -1092,6 +1094,7 @@ release_pages (void)
>/* Don't count those pages as mapped to not touch the garbage collector
>  unnecessarily. */
>   G.bytes_mapped -= len;
>+  n2 += len;
>   while (start_p != p)
> {
>   start_p->discarded = true;
>@@ -1124,6 +1127,7 @@ release_pages (void)
>   }
> 
>   munmap (start, len);
>+  n1 += len;
>   G.bytes_mapped -= len;
> }
> 
>@@ -1152,10 +1156,20 @@ release_pages (void)
>   *gp = g->next;
>   G.bytes_mapped -= g->alloc_size;
>   free (g->allocation);
>+  n1 += g->alloc_size;
>   }
> else
>   gp = >next;
> #endif
>+  if (!quiet_flag && (n1 || n2))
>+{
>+  fprintf (stderr, " {GC");
>+  if (n1)
>+  fprintf (stderr, " released %luk", (unsigned long)(n1 / 1024));
>+  if (n2)
>+  fprintf (stderr, " madv_dontneed %luk", (unsigned long)(n2 / 1024));
>+  fprintf (stderr, "}");
>+}
> }
> 
> /* This table provides a fast way to determine ceil(log_2(size)) for
>@@ -2178,19 +2192,22 @@ ggc_collect (void)
> return;
> 
>   timevar_push (TV_GC);
>-  if (!quiet_flag)
>-fprintf (stderr, " {GC %luk -> ", (unsigned long) G.allocated /
>1024);
>   if (GGC_DEBUG_LEVEL >= 2)
> fprintf (G.debug_file, 

[PATCH] Fix PR92046

2019-10-10 Thread Richard Biener


The following fixes a few param adjustments that are made based on
per-function adjustable flags by moving the adjustments to their
users.  Semantics change in some minor ways but that's allowed
for --params.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-10  Richard Biener  

PR middle-end/92046
* opts.c (finish_options): Do not influence global --params
from options that are adjustable per function.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Apply --param adjustment based on active cost-model.
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Disable
further store-sinking when vectorization or if-conversion
are not enabled.

Index: gcc/opts.c
===
--- gcc/opts.c  (revision 276795)+++ gcc/opts.c (working copy)
+++ gcc/opts.c  (working copy)
@@ -1123,24 +1123,6 @@ finish_options (struct gcc_options *opts
   && !opts_set->x_flag_reorder_functions)
 opts->x_flag_reorder_functions = 1;
 
-  /* Tune vectorization related parametees according to cost model.  */
-  if (opts->x_flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
-{
-  maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS,
-6, opts->x_param_values, opts_set->x_param_values);
-  maybe_set_param_value (PARAM_VECT_MAX_VERSION_FOR_ALIGNMENT_CHECKS,
-0, opts->x_param_values, opts_set->x_param_values);
-  maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
-0, opts->x_param_values, opts_set->x_param_values);
-}
-
-  /* Set PARAM_MAX_STORES_TO_SINK to 0 if either vectorization or if-conversion
- is disabled.  */
-  if ((!opts->x_flag_tree_loop_vectorize && !opts->x_flag_tree_slp_vectorize)
-   || !opts->x_flag_tree_loop_if_convert)
-maybe_set_param_value (PARAM_MAX_STORES_TO_SINK, 0,
-   opts->x_param_values, opts_set->x_param_values);
-
   /* The -gsplit-dwarf option requires -ggnu-pubnames.  */
   if (opts->x_dwarf_split_debug_info)
 opts->x_debug_generate_pub_sections = 2;
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 276795)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -2075,6 +2075,8 @@ vect_enhance_data_refs_alignment (loop_v
 {
   unsigned max_allowed_peel
 = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT);
+ if (flag_vect_cost_model == VECT_COST_MODEL_CHEAP)
+   max_allowed_peel = 0;
   if (max_allowed_peel != (unsigned)-1)
 {
   unsigned max_peel = npeel;
@@ -2168,15 +2170,16 @@ vect_enhance_data_refs_alignment (loop_v
   /* (2) Versioning to force alignment.  */
 
   /* Try versioning if:
- 1) optimize loop for speed
+ 1) optimize loop for speed and the cost-model is not cheap
  2) there is at least one unsupported misaligned data ref with an unknown
 misalignment, and
  3) all misaligned data refs with a known misalignment are supported, and
  4) the number of runtime alignment checks is within reason.  */
 
-  do_versioning =
-   optimize_loop_nest_for_speed_p (loop)
-   && (!loop->inner); /* FORNOW */
+  do_versioning
+= (optimize_loop_nest_for_speed_p (loop)
+   && !loop->inner /* FORNOW */
+   && flag_vect_cost_model > VECT_COST_MODEL_CHEAP);
 
   if (do_versioning)
 {
@@ -3641,13 +3644,15 @@ vect_prune_runtime_alias_test_list (loop
 dump_printf_loc (MSG_NOTE, vect_location,
 "improved number of alias checks from %d to %d\n",
 may_alias_ddrs.length (), count);
-  if ((int) count > PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
+  unsigned limit = PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS);
+  if (flag_simd_cost_model == VECT_COST_MODEL_CHEAP)
+limit = default_param_value
+ (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) * 6 / 10;
+  if (count > limit)
 return opt_result::failure_at
   (vect_location,
-   "number of versioning for alias "
-   "run-time tests exceeds %d "
-   "(--param vect-max-version-for-alias-checks)\n",
-   PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
+   "number of versioning for alias run-time tests exceeds %d "
+   "(--param vect-max-version-for-alias-checks)\n", limit);
 
   return opt_result::success ();
 }
Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   (revision 276795)
+++ gcc/tree-ssa-phiopt.c   (working copy)
@@ -2467,7 +2467,11 @@ cond_if_else_store_replacement (basic_bl
  

Re: [PR 70929] IPA call type compatibility fix/workaround

2019-10-10 Thread Richard Biener


Now also to the list...

On Thu, 10 Oct 2019, Richard Biener wrote:

> On Thu, 10 Oct 2019, Martin Jambor wrote:
> 
> > Hi,
> > 
> > On Wed, Oct 09 2019, Richard Biener wrote:
> > >>
> > 
> > ...
> > 
> > >> +  /* If we only have the fntype extracted from the call statement, 
> > >> check it
> > >> + against the type of declarations while being pessimistic about
> > >> + promotions.  */
> > >> +  tree p;
> > >> +
> > >> +  if (fndecl)
> > >> +p = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
> > >> +  else
> > >> +p = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
> > >
> > > This else case is bougs - you are then comparing the call arguments
> > > against the call arguments...  Oh, I see it was there before :/
> > 
> > Right, and one hand I noticed id did not make much sense, on the other
> > there were few cases where it was necessary to make the new predicate as
> > permissive as the old one (not that any of those that I saw looked
> > interesting).
> > 
> > >
> > > So it is that the FEs are expected to promote function arguments
> > > according to the originally called function and that "ABI" is
> > > recorded in gimple_call_fntype.  That means that we can either
> > > look at the actual arguments or at TYPE_ARG_TYPES of
> > > gimple_call_fntype.  But the fndecl ABI we want to verify
> > > against is either its DECL_ARGUMENTS or TYPE_ARG_TYPEs of its type.
> > >
> > > Verifying gimple_call_arg () against gimple_call_fntype ()
> > > is pointless.  What should have been used here is
> > >
> > >else
> > >  p = TYPE_ARG_TYPES (TREE_TYPE (gimple_call_fn (stmt)));
> > >
> > > so, gimple_call_fn is the function called (if no fndecl then
> > > this is a function pointer), thus look at the pointed-to type
> > > and then its arguments.
> > 
> > OK, this is a very nice idea, I have made the change in the patch.
> > 
> > >
> > > Maybe you can test/fix that as independent commit.
> > >
> > > Your second case
> > >
> > >> +  if (fndecl
> > >> +  && TYPE_ARG_TYPES (TREE_TYPE (fndecl))
> > >> +  && TYPE_ARG_TYPES (gimple_call_fntype (stmt)))
> > >
> > > then collapses with this and is also the better fallback IMHO
> > > (so enter this case by using TYPE_ARG_TYPES (TREE_TYPE (gimple_call_fn 
> > > (...))) instead of the fndecl).
> > >
> > 
> > The fndecl here is not the decl extracted from the gimple statement.  It
> > is received as a function parameter and two callers extract it from a
> > call graph edge callee and one - speculation resolution - even from the
> > ipa reference associated with the speculation.  So I don't think th
> > should be replaced.
> 
> Hmm, OK.  But then the code cares for fndecl == NULL which as far as
> I can see should not happen.  And in that case it does something
> completely different, so...
> 
> > So, is the following OK (bootstrapped and tested on x86_64-linux,  no
> > LTO bootstrap this time because of PR 92037)?
> > 
> > Martin
> > 
> > 
> > 2019-10-09  Martin Jambor  
> > 
> > PR lto/70929
> > * cgraph.c (gimple_check_call_args): Also compare types of argumen
> > types and call statement fntype types.
> > 
> > testsuite/
> > * g++.dg/lto/pr70929_[01].C: New test.
> > ---
> >  gcc/cgraph.c | 83 ++--
> >  gcc/testsuite/g++.dg/lto/pr70929_0.C | 18 ++
> >  gcc/testsuite/g++.dg/lto/pr70929_1.C | 10 
> >  3 files changed, 95 insertions(+), 16 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/lto/pr70929_0.C
> >  create mode 100644 gcc/testsuite/g++.dg/lto/pr70929_1.C
> > 
> > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > index 0c3c6e7cac4..4f7bfa28f37 100644
> > --- a/gcc/cgraph.c
> > +++ b/gcc/cgraph.c
> > @@ -3636,26 +3636,19 @@ cgraph_node::get_fun () const
> >  static bool
> >  gimple_check_call_args (gimple *stmt, tree fndecl, bool args_count_match)
> >  {
> > -  tree parms, p;
> > -  unsigned int i, nargs;
> > -
> >/* Calls to internal functions always match their signature.  */
> >if (gimple_call_internal_p (stmt))
> >  return true;
> >  
> > -  nargs = gimple_call_num_args (stmt);
> > +  unsigned int nargs = gi

[PATCH][LTO] Do not merge anonymous NAMESPACE_DECLs

2019-10-09 Thread Richard Biener


The following does $subject which likely fixes some debuginfo issues
with anonymous namespaces (but likely artificial and not too
noticable?!  we'd get bogus contexts for DIEs)

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'm doin a LTO bootstrap right now (but IIRC it failed a few days ago)

Richard.

2019-10-09  Richard Biener  

lto/
* lto-common.c (unify_scc): Do not merge anonymous NAMESPACE_DECLs.

diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
index 9a17933d094..e5c15f2b844 100644
--- a/gcc/lto/lto-common.c
+++ b/gcc/lto/lto-common.c
@@ -1646,11 +1646,13 @@ unify_scc (class data_in *data_in, unsigned from,
   tree t = streamer_tree_cache_get_tree (cache, from + i);
   scc->entries[i] = t;
   /* Do not merge SCCs with local entities inside them.  Also do
-not merge TRANSLATION_UNIT_DECLs and anonymous namespace types.  */
+not merge TRANSLATION_UNIT_DECLs and anonymous namespaces
+and types therein types.  */
   if (TREE_CODE (t) == TRANSLATION_UNIT_DECL
  || (VAR_OR_FUNCTION_DECL_P (t)
  && !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
  || TREE_CODE (t) == LABEL_DECL
+ || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
  || (TYPE_P (t)
  && type_with_linkage_p (TYPE_MAIN_VARIANT (t))
  && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t


[PATCH] Relax nested cycle vectorization further

2019-10-09 Thread Richard Biener


This simplifies and refactors vect_is_simple_reduction to make sure
to not reject nested cycle vectorization just beacuse there are calls
in the innermost loop.  This lets us vectorize the new testcase
using outer loop vectorization.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-10-09  Richard Biener  

* tree-vect-loop.c (vect_is_simple_reduction): Simplify and
allow stmts other than GIMPLE_ASSIGN in nested cycles.

* gcc.dg/vect/vect-outer-call-1.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-call-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-outer-call-1.c
new file mode 100644
index 000..f26d4220532
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-call-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-additional-options "-fno-math-errno" } */
+
+void
+foo (float * __restrict x, float *y, int n, int m)
+{
+  if (m > 0)
+for (int i = 0; i < n; ++i)
+  {
+   float tem = x[i], tem1;
+   for (int j = 0; j < m; ++j)
+ {
+   tem += y[j];
+   tem1 = tem;
+   tem = __builtin_sqrtf (tem);
+ }
+   x[i] = tem - tem1;
+  }
+}
+
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" { target { 
vect_call_sqrtf } } } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a9ea0caf218..14352102f54 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2756,10 +2756,8 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   enum tree_code orig_code, code;
   tree op1, op2, op3 = NULL_TREE, op4 = NULL_TREE;
   tree type;
-  tree name;
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-  bool phi_def;
 
   *double_reduc = false;
   STMT_VINFO_REDUC_TYPE (phi_info) = TREE_CODE_REDUCTION;
@@ -2791,44 +2789,24 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   phi_use_stmt = use_stmt;
 }
 
-  edge latch_e = loop_latch_edge (loop);
-  tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
-  if (TREE_CODE (loop_arg) != SSA_NAME)
+  tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+  if (TREE_CODE (latch_def) != SSA_NAME)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"reduction: not ssa_name: %T\n", loop_arg);
+"reduction: not ssa_name: %T\n", latch_def);
   return NULL;
 }
 
-  stmt_vec_info def_stmt_info = loop_info->lookup_def (loop_arg);
+  stmt_vec_info def_stmt_info = loop_info->lookup_def (latch_def);
   if (!def_stmt_info
   || !flow_bb_inside_loop_p (loop, gimple_bb (def_stmt_info->stmt)))
 return NULL;
 
-  if (gassign *def_stmt = dyn_cast  (def_stmt_info->stmt))
-{
-  name = gimple_assign_lhs (def_stmt);
-  phi_def = false;
-}
-  else if (gphi *def_stmt = dyn_cast  (def_stmt_info->stmt))
-{
-  name = PHI_RESULT (def_stmt);
-  phi_def = true;
-}
-  else
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"reduction: unhandled reduction operation: %G",
-def_stmt_info->stmt);
-  return NULL;
-}
-
   unsigned nlatch_def_loop_uses = 0;
   auto_vec lcphis;
   bool inner_loop_of_double_reduc = false;
-  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, latch_def)
 {
   gimple *use_stmt = USE_STMT (use_p);
   if (is_gimple_debug (use_stmt))
@@ -2846,11 +2824,21 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
}
 }
 
+  /* If we are vectorizing an inner reduction we are executing that
+ in the original order only in case we are not dealing with a
+ double reduction.  */
+  if (nested_in_vect_loop && !inner_loop_of_double_reduc)
+{
+  if (dump_enabled_p ())
+   report_vect_op (MSG_NOTE, def_stmt_info->stmt,
+   "detected nested cycle: ");
+  return def_stmt_info;
+}
+
   /* If this isn't a nested cycle or if the nested cycle reduction value
  is used ouside of the inner loop we cannot handle uses of the reduction
  value.  */
-  if ((!nested_in_vect_loop || inner_loop_of_double_reduc)
-  && (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1))
+  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2860,9 +2848,8 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
 
   /* If DEF_STMT is a phi node itself, we expect it to have a single argument
  defined in the inner loop.  */
-  if (phi_def)
+  if (gphi *def_stmt = dy

Re: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops

2019-10-09 Thread Richard Biener
On Tue, 8 Oct 2019, Andre Vieira (lists) wrote:

> Hi Richard,
> 
> As I mentioned in the IRC channel, I managed to get "most" of the regression
> testsuite working for x86_64 (avx512) and aarch64.
> 
> On x86_64 I get a failure that I can't explain, was hoping you might be able
> to have a look with me:
> "PASS->FAIL: gcc.target/i386/vect-perm-odd-1.c execution test"
> 
> vect-perm-odd-1.exe segfaults and when I gdb it seems to be the first
> iteration of the main loop.  The tree dumps look alright, but I do notice the
> stack usage seems to change between --param vect-epilogue-nomask={0,1}.

So the issue is that we have

=> 0x00400778 <+72>:vmovdqa64 %zmm1,-0x40(%rax)

but the memory accessed is not appropriately aligned.  The vectorizer
sets DECL_USER_ALIGN on the stack local but somehow later it downs
it to 256:

Old value = 640
New value = 576
ensure_base_align (dr_info=0x526f788) at 
/tmp/trunk/gcc/tree-vect-stmts.c:6294
6294  DECL_USER_ALIGN (base_decl) = 1;
(gdb) l
6289  if (decl_in_symtab_p (base_decl))
6290symtab_node::get (base_decl)->increase_alignment 
(align_base_to);
6291  else
6292{
6293  SET_DECL_ALIGN (base_decl, align_base_to);
6294  DECL_USER_ALIGN (base_decl) = 1;
6295}

this means vectorizing the epilogue modifies the DRs, in particular
the base alignment?

> Am I missing to update some field that may later lead to the amount of stack
> being used? I am confused, it could very well be that I am missing something
> obvious, I am not too familiar with x86's ISA. I will try to investigate
> further.
> 
> This patch needs further clean-up and more comments (or comment updates), but
> I thought I'd share current state to see if you can help me unblock.
> 
> Cheers,
> Andre
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PR 70929] IPA call type compatibility fix/workaround

2019-10-09 Thread Richard Biener
_idx++)
>   {
> +   if (!f
> +   || !s
> +   || TREE_VALUE (f) == error_mark_node
> +   || TREE_VALUE (s) == error_mark_node)
> + return false;
> +   if (TREE_CODE (TREE_VALUE (f)) == VOID_TYPE)
> + {
> +   if (TREE_CODE (TREE_VALUE (s)) != VOID_TYPE
> +   || arg_idx != nargs)
> + return false;
> +   else
> + break;
> + }
> +
> tree arg;
> +
> +   if (arg_idx >= nargs
> +   || (arg = gimple_call_arg (stmt, arg_idx)) == error_mark_node)
> + return false;
> +
> +   if (TREE_CODE (TREE_VALUE (s)) == VOID_TYPE
> +   || (!types_compatible_p (TREE_VALUE (f), TREE_VALUE (s))
> +   && !fold_convertible_p (TREE_VALUE (f), arg)))
> + return false;
> + }
> +
> +  if (args_count_match && arg_idx != nargs)
> + return false;
> +
> +  return true;
> +}
> +
> +  /* If we only have the fntype extracted from the call statement, check it
> + against the type of declarations while being pessimistic about
> + promotions.  */
> +  tree p;
> +
> +  if (fndecl)
> +p = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
> +  else
> +p = TYPE_ARG_TYPES (gimple_call_fntype (stmt));

This else case is bougs - you are then comparing the call arguments
against the call arguments...  Oh, I see it was there before :/

So it is that the FEs are expected to promote function arguments
according to the originally called function and that "ABI" is
recorded in gimple_call_fntype.  That means that we can either
look at the actual arguments or at TYPE_ARG_TYPES of
gimple_call_fntype.  But the fndecl ABI we want to verify
against is either its DECL_ARGUMENTS or TYPE_ARG_TYPEs of its type.

Verifying gimple_call_arg () against gimple_call_fntype ()
is pointless.  What should have been used here is

   else
 p = TYPE_ARG_TYPES (TREE_TYPE (gimple_call_fn (stmt)));

so, gimple_call_fn is the function called (if no fndecl then
this is a function pointer), thus look at the pointed-to type
and then its arguments.

Maybe you can test/fix that as independent commit.

Your second case

> +  if (fndecl
> +  && TYPE_ARG_TYPES (TREE_TYPE (fndecl))
> +  && TYPE_ARG_TYPES (gimple_call_fntype (stmt)))

then collapses with this and is also the better fallback IMHO
(so enter this case by using TYPE_ARG_TYPES (TREE_TYPE (gimple_call_fn 
(...))) instead of the fndecl).

Richard.

> +  if (p)
> +{
> +  for (unsigned i = 0; i < nargs; i++, p = TREE_CHAIN (p))
> + {
> /* If this is a varargs function defer inlining decision
>to callee.  */
> if (!p)
>   break;
> -   arg = gimple_call_arg (stmt, i);
> +   tree arg = gimple_call_arg (stmt, i);
> if (TREE_VALUE (p) == error_mark_node
> || arg == error_mark_node
> || TREE_CODE (TREE_VALUE (p)) == VOID_TYPE
> diff --git a/gcc/testsuite/g++.dg/lto/pr70929_0.C 
> b/gcc/testsuite/g++.dg/lto/pr70929_0.C
> new file mode 100644
> index 000..c96fb1c743a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr70929_0.C
> @@ -0,0 +1,18 @@
> +// { dg-lto-do run }
> +// { dg-lto-options { "-O3 -flto" } }
> +
> +struct s
> +{
> +  int a;
> +  s() {a=1;}
> +  ~s() {}
> +};
> +int t(struct s s);
> +int main()
> +{
> +  s s;
> +  int v=t(s);
> +  if (!__builtin_constant_p (v))
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/g++.dg/lto/pr70929_1.C 
> b/gcc/testsuite/g++.dg/lto/pr70929_1.C
> new file mode 100644
> index 000..b33aa8f35f0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/pr70929_1.C
> @@ -0,0 +1,10 @@
> +struct s
> +{
> +  int a;
> +  s() {a=1;}
> +  ~s() {}
> +};
> +int t(struct s s)
> +{
> +  return s.a;
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: Type representation in CTF and DWARF

2019-10-09 Thread Richard Biener
On Wed, Oct 9, 2019 at 7:26 AM Indu Bhagat  wrote:
>
>
>
> On 10/08/2019 08:37 AM, Pedro Alves wrote:
> > On 10/4/19 8:23 PM, Indu Bhagat wrote:
> >> Hello,
> >>
> >> At GNU Tools Cauldron this year, some folks were curious to know more on 
> >> how
> >> the "type representation" in CTF compares vis-a-vis DWARF.
> > I was one of those, and I brought this up to Jose, after your
> > presentation.  Glad to see the follow up!  Thanks much for this.
> >
> > In your Cauldron presentation we saw CTF compared to full blown DWARF
> > as justification for CTF,
>
> Hmm. And I thought I made the effort reqd to clarify my position that 
> comparing
> full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let
> alone to not use as a justification for CTF. My intention to show those 
> numbers was
> only to give some perspective to users curious to know the sizes of CTF debug
> info (as generated by dwarf2ctf) because these sections will ideally be not
> stripped out of shipped binaries.
>
> The justification for CTF is and will remain - a compact, faster debug format
> for type information and support some online debugging use-cases (like
> backtraces) in future.
>
> > but I was more interested in a comparison between
> > CTF and a DWARF subset containing exactly only what you have available in
> > CTF.  Because if DWARF with everything-you-don't-need stripped out
> > is in the same ballpark, then I am puzzled on why add/maintain a new
> > Debug format, with all the duplication of effort that entails going
> > forward.
>
> I shared some numbers on this in the previous emails in this thread. I thought
> comparing DWARF's de-duplication-amenable offering (using
> -fdebug-types-section) will be useful in this context.
>
> For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data.
> The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is
> being worked on currently. The end result of link-time CTF de-dup is expected
> to be at par with these .ctf section sizes.
>
> The .ctf section sizes below include the CTF string table (.debug_str is
> excluded from the calculations however):
>
> (coreutils-0.22)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | 
> .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3))
> ls  109806 |  18876|  22042 |  12413   |   
> 26240 | 0.18
> pwd 27902  |  7914 |  10851 |  5753|   
> 13929 | 0.33
> groups 26920   |  8173 |  10674 |  5070|   
> 13378 | 0.33
>
> (emacs-26.3)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | 
> .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3))
> emacs 3755083  |   202926  |  431926|   143462 |   
> 273910| 0.06
>
>
> It is not easy to get an estimate of 'DWARF with everything-you-don't-need
> stripped out'. At this time, I don't know of an easy way to make this 
> comparison
> more meaningful. Any suggestions ?

There's a mechanism to get type (and decl - I suppose CTF also
contains debug info
for function declarations not only its type?) info as part of early
debug generation.
The attached "hack" simply mangles dwarf2out to output this early info as the
only debug info (only verified on a small .c file).  We still have things like
file, line and column numbers for entities (not sure if CTF has those).

It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar.
I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
(and -g1 doesn't include types but just decls).

Richard.

> > Also, it's my understanding that the current CTF format doesn't yet
> > support C++, Vector registers, etc., maybe other things, so if DWARF
> > was sufficient for your needs, then in the long run it sounds like
> > a better option to me, as then you wouldn't have to extend CTF _and_
> > DWARF whenever some feature is needed.
>
> Yes, CTF does not support C++ at this time. To cover all of C (including
> GNU C extensions), we need to add representation for things like Vector type,
> non IEEE float etc. (somewhat infrequently occurring constructs)
>
> The issue is not that DWARF cannot represent the required type information.
> DWARF is voluminous and secondly, the current workflow to get to CTF from
> source programs without direct toolchain support is tiresome and lengthy.
>
> For current and future users of CTF, having the support for the format in the
> toolchain is the best way to promote adoption and enhance community 
> experience.
>
> > Maybe it would make sense to work on integrating CTF into the DWARF
> > standard itself, not sure?
> >
> > I was also curious on your plans for adding unwinding support to CTF,
> > while the kernel (the main CTF user, IIUC), already has plans to
> > use its own unwinding format (ORC)?
>
> Kernel's unwinding format (ORC) helps generate backtrace with 

Re: [PATCH] Come up with ipa passes introduction in gccint documentation

2019-10-09 Thread Richard Biener
On Tue, Oct 8, 2019 at 10:06 PM Sandra Loosemore
 wrote:
>
> On 10/8/19 2:52 AM, luoxhu wrote:
> > Hi,
> >
> > This is the formal documentation patch for IPA passes.  Thanks.
> >
> >
> > None of the IPA passes are documented in passes.texi.  This patch adds
> > a section IPA passes just before GIMPLE passes and RTL passes in
> > Chapter 9 "Passes and Files of the Compiler".  Also, a short description
> > for each IPA pass is provided.
> > gccint.pdf can be produced without errors.
> >
> > ChangeLog:
> >   PR middle-end/26241
> >   * doc/lto.texi (IPA): Reference to the IPA passes.
> >   * doc/passes.texi (Pass manager): Add node IPA passes and
> > description for each IPA pass.
>
> Thanks for submitting this documentation patch!  The content looks
> helpful to me, but I see that it has quite a few grammar bugs (I
> understand how hard English is even for native speakers), plus some
> issues like indexing, cross-referencing, use of jargon without defining
> it, etc.  I think it would be more efficient for me to take over
> polishing the text some more than to mark it up for you to fix, but I'd
> like to give others a few days to comment on technical content first.

I think the contents are OK for a first try, so please go ahead polishing
and commit.  We can then improve the content as followup (I'm making
a note for myself to not forget).

Thanks,
Richard.

> -Sandra


[PATCH] Remove redundant code from reduction vectorization

2019-10-09 Thread Richard Biener


Some cleanup.

Bootstrapped & tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-10-08  Richard Biener  

* tree-vectorizer.h (_stmt_vec_info::reduc_vectype_in): New.
(_stmt_vec_info::force_single_cycle): Likewise.
(STMT_VINFO_FORCE_SINGLE_CYCLE): New.
(STMT_VINFO_REDUC_VECTYPE_IN): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Set
STMT_VINFO_REDUC_VECTYPE_IN and STMT_VINFO_FORCE_SINGLE_CYCLE.
(vect_transform_reduction): Use them to remove redundant code.
(vect_transform_cycle_phi): Likewise.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index f63bb855618..a9ea0caf218 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5983,9 +5983,13 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
slp_tree slp_node,
}
}
 }
-
   if (!vectype_in)
 vectype_in = vectype_out;
+  STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
+  /* For the SSA cycle we store on each participating stmt the operand index
+ where the cycle continues.  Store the one relevant for the actual
+ operation in the reduction meta.  */
+  STMT_VINFO_REDUC_IDX (reduc_info) = reduc_index;
 
   /* When vectorizing a reduction chain w/o SLP the reduction PHI is not
  directy used in stmt.  */
@@ -6457,7 +6461,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, slp_tree 
slp_node,
   && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
  || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
   && vect_stmt_to_vectorize (use_stmt_info) == stmt_info)
-single_defuse_cycle = true;
+STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info) = single_defuse_cycle = true;
 
   if (single_defuse_cycle
   || code == DOT_PROD_EXPR
@@ -6584,17 +6588,11 @@ vect_transform_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
  stmt_vec_info *vec_stmt, slp_tree slp_node)
 {
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
-  tree vectype_in = NULL_TREE;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  enum tree_code code;
-  int op_type;
-  bool is_simple_use;
   int i;
   int ncopies;
-  bool single_defuse_cycle = false;
   int j;
-  tree ops[3];
   int vec_num;
 
   stmt_vec_info reduc_info = info_for_reduction (stmt_info);
@@ -6607,30 +6605,20 @@ vect_transform_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
 }
 
   gassign *stmt = as_a  (stmt_info->stmt);
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  int op_type = TREE_CODE_LENGTH (code);
 
   /* Flatten RHS.  */
-  switch (get_gimple_rhs_class (gimple_assign_rhs_code (stmt)))
+  tree ops[3];
+  switch (get_gimple_rhs_class (code))
 {
-case GIMPLE_BINARY_RHS:
-  code = gimple_assign_rhs_code (stmt);
-  op_type = TREE_CODE_LENGTH (code);
-  gcc_assert (op_type == binary_op);
-  ops[0] = gimple_assign_rhs1 (stmt);
-  ops[1] = gimple_assign_rhs2 (stmt);
-  break;
-
 case GIMPLE_TERNARY_RHS:
-  code = gimple_assign_rhs_code (stmt);
-  op_type = TREE_CODE_LENGTH (code);
-  gcc_assert (op_type == ternary_op);
+  ops[2] = gimple_assign_rhs3 (stmt);
+  /* Fall thru.  */
+case GIMPLE_BINARY_RHS:
   ops[0] = gimple_assign_rhs1 (stmt);
   ops[1] = gimple_assign_rhs2 (stmt);
-  ops[2] = gimple_assign_rhs3 (stmt);
   break;
-
-case GIMPLE_UNARY_RHS:
-  return false;
-
 default:
   gcc_unreachable ();
 }
@@ -6641,110 +6629,19 @@ vect_transform_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
  reduction variable.  */
   stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info));
   gphi *reduc_def_phi = as_a  (phi_info->stmt);
-  tree reduc_def = PHI_RESULT (reduc_def_phi);
-  int reduc_index = -1;
-  for (i = 0; i < op_type; i++)
-{
-  /* The condition of COND_EXPR is checked in vectorizable_condition().  */
-  if (i == 0 && code == COND_EXPR)
-continue;
-
-  stmt_vec_info def_stmt_info;
-  enum vect_def_type dt;
-  tree tem;
-  is_simple_use = vect_is_simple_use (ops[i], loop_vinfo, , ,
- _stmt_info);
-  gcc_assert (is_simple_use);
-  if (dt == vect_reduction_def
- && ops[i] == reduc_def)
-   {
- reduc_index = i;
- continue;
-   }
-  else if (tem)
-   {
- /* To properly compute ncopies we are interested in the widest
-input type in case we're looking at a widening accumulation.  */
- if (!vectype_in
- || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
- < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (tem)
-   vectype_in = tem;
-   }
-
-  if (dt == vect_nested_cycle
- && ops[i] == reduc_def)
-   {
- reduc_index = i

[PATCH] Refactor vectorizer reduction more

2019-10-08 Thread Richard Biener


This builds upon the previous refactorings and does the following

 1) move the reduction meta to the outermost PHI stmt_info (from the
inner loop computation stmt), the new info_for_reduction gets
you to that.
 2) Merge STMT_VINFO_VEC_REDUCTION_TYPE and STMT_VINFO_REDUC_TYPE
into the latter.
 3) Apart from single-def-use, lane-reducting ops and fold-left
reductions code generation is no longer done by
vect_transform_reduction but by individual vectorizable_*
routines.  In particular this gets rid of calling
vectorizable_condition and vectorizable_shift from
vectorizable_reduction and vect_transform_reduction.
 4) Remove easy to remove restrictions for pure nested cycles.
(there are still some left in vect_is_simple_reduction)

While I developed and tested this in baby-steps those are too ugly
in isolation and thus here's a combined patch for all of the above.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2019-10-08  Richard Biener  

* tree-vectorizer.h (_stmt_vec_info::v_reduc_type): Remove.
(_stmt_vec_info::is_reduc_info): Add.
(STMT_VINFO_VEC_REDUCTION_TYPE): Remove.
(vectorizable_condition): Remove.
(vectorizable_shift): Likewise.
(vectorizable_reduction): Adjust.
(info_for_reduction): New.
* tree-vect-loop.c (vect_force_simple_reduction): Fold into...
(vect_analyze_scalar_cycles_1): ... here.
(vect_analyze_loop_operations): Adjust.
(needs_fold_left_reduction_p): Simplify for single caller.
(vect_is_simple_reduction): Likewise.  Remove stmt restriction
for nested cycles not part of double reductions.
(vect_model_reduction_cost): Pass in the reduction type.
(info_for_reduction): New function.
(vect_create_epilog_for_reduction): Use it, access reduction
meta off the stmt info it returns.  Use STMT_VINFO_REDUC_TYPE
instead of STMT_VINFO_VEC_REDUCTION_TYPE.
(vectorize_fold_left_reduction): Remove pointless assert.
(vectorizable_reduction): Analyze the full reduction when
visiting the outermost PHI.  Simplify.  Use STMT_VINFO_REDUC_TYPE
instead of STMT_VINFO_VEC_REDUCTION_TYPE.  Direct reduction
stmt code-generation to vectorizable_* in most cases.  Verify
code-generation only for cases handled by
vect_transform_reductuon.
(vect_transform_reduction): Use info_for_reduction to get at
reduction meta.  Simplify.
(vect_transform_cycle_phi): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-patterns.c (vect_reassociating_reduction_p): Look
at the PHI node for STMT_VINFO_REDUC_TYPE.
* tree-vect-slp.c (vect_schedule_slp_instance): Remove no
longer necessary code.
* tree-vect-stmts.c (vectorizable_shift): Make static again.
(vectorizable_condition): Likewise.  Get at reduction related
info via info_for_reduction.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
STMT_VINFO_REDUC_TYPE instead of STMT_VINFO_VEC_REDUCTION_TYPE.

* gcc.dg/vect/pr65947-1.c: Adjust.
* gcc.dg/vect/pr65947-13.c: Likewise.
* gcc.dg/vect/pr65947-14.c: Likewise.
* gcc.dg/vect/pr65947-4.c: Likewise.
* gcc.dg/vect/pr80631-1.c: Likewise.
* gcc.dg/vect/pr80631-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-1.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
index 879819d576a..b81baed914c 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-1.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-1.c
@@ -42,4 +42,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" { target { ! vect_fold_extract_last } } } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-13.c 
b/gcc/testsuite/gcc.dg/vect/pr65947-13.c
index e1d3ff52f5c..4ad5262019a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr65947-13.c
+++ b/gcc/testsuite/gcc.dg/vect/pr65947-13.c
@@ -41,5 +41,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" { xfail vect_fold_extract_last } } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 2 "vect" { xfail 

Re: [PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

2019-10-07 Thread Richard Biener
 support check which could
possibly be done upfront in the pattern matcher itself to save
compile-time?  It also seems to be required that patterns match
a single IFN call?

Looking at the complex patterns I am worried and confused about
the transform phase - just quoting/commenting on random pieces:

+  FOR_EACH_VEC_ELT (scalar_stmts, i, scalar_stmt)
+{
+  if (defs.contains (scalar_stmt))
+   {

this is quadratic - vec::contains does a linear search.

 arg_map->put (scalar_stmt, vect_split_slp_tree (node, i));
+ found_p = true;

it seems that you are re-doing the match here, something that
should have been done in the first phase of pattern matching already.
May I suggest to restructure the pattern matchers in a way that you
have a

 class slp_pattern
 {
   virtual match() = 0;
   virtual transform() = 0;
 };

and derive from that so you can have a pattern specific state you
can transfer from match to transform?  Iteration over patterns
then either becomes ad-hoc or you find a way to iterate over
an "array of types" with our C++04 features creating a new
instance when you match to hold this state?

I also wonder what you are doing here - shouldn't it be "simply"
replacing a subgraph of the SLP tree with a single new SLP
node that now has those IFNs as "scalar" stmts (they are not
really scalars anymore because of that arity issue).  This also
means that code-generation might better not go the "traditional"
way but instead we use a new vect_transform_slp_pattern function
which does the natural thing and we'll just have the pattern
IFN recorded directly in the slp_tree structure?  (I realize
the complication of vect_get_slp_defs using the scalar stmts to
identify vectorized operands defs)

That said, I still think the same result can be achieved by
post-vectorizer pattern matching.  I also think that
doing pattern matching after the SLP tree build is backwards.
My vision is that we'd do more general graph matching on
the SSA graph forming the SLP tree rather than the current
ad-hoc matching starting from special "sinks".

Thanks,
Richard.


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 2019-10-01  Tamar Christina  
> 
>   * tree-vect-loop.c (vect_dissolve_slp_only_patterns): New.
>   (vect_dissolve_slp_only_groups): Use macro.
>   * tree-vect-patterns.c (vect_mark_pattern_stmts): Expose symbol.
>   * tree-vect-slp.c (vect_free_slp_tree): Add control of recursion and how
>   to free.
>   (ssa_name_def_to_slp_tree_map_t): New.
>   (vect_create_new_slp_node, vect_build_slp_tree): Use macro.
>   (vect_create_slp_patt_stmt): New.
>   (vect_match_slp_patterns_2): New.
>   (vect_match_slp_patterns): New.
>   (vect_analyze_slp_instance): Call vect_match_slp_patterns and undo
>   permutes.
>   (vect_detect_hybrid_slp_stmts): Dissolve relationships created for SLP.
>   * tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
>   (vect_mark_pattern_stmts): New.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [SVE] PR91532

2019-10-07 Thread Richard Biener
On Fri, 4 Oct 2019, Prathamesh Kulkarni wrote:

> On Fri, 4 Oct 2019 at 12:18, Richard Biener  wrote:
> >
> > On Thu, 3 Oct 2019, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 2 Oct 2019 at 12:28, Richard Biener  wrote:
> > > >
> > > > On Wed, 2 Oct 2019, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
> > > > > >
> > > > > > On 10/1/19 12:40 AM, Richard Biener wrote:
> > > > > > > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > >>>
> > > > > > >>>> On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > > > > > >>>>>
> > > > > > >>>>> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > > > >>>>>> Hi,
> > > > > > >>>>>> For PR91532, the dead store is trivially deleted if we place 
> > > > > > >>>>>> dse pass
> > > > > > >>>>>> between ifcvt and vect. Would it be OK to add another 
> > > > > > >>>>>> instance of dse there ?
> > > > > > >>>>>> Or should we add an ad-hoc "basic-block dse" sub-pass to 
> > > > > > >>>>>> ifcvt that
> > > > > > >>>>>> will clean up the dead store ?
> > > > > > >>>>> I'd hesitate to add another DSE pass.  If there's one nearby 
> > > > > > >>>>> could we
> > > > > > >>>>> move the existing pass?
> > > > > > >>>> Well I think the nearest one is just after pass_warn_restrict. 
> > > > > > >>>> Not
> > > > > > >>>> sure if it's a good
> > > > > > >>>> idea to move it up from there ?
> > > > > > >>>
> > > > > > >>> You'll need it inbetween ifcvt and vect so it would be disabled
> > > > > > >>> w/o vectorization, so no, that doesn't work.
> > > > > > >>>
> > > > > > >>> ifcvt already invokes SEME region value-numbering so if we had
> > > > > > >>> MESE region DSE it could use that.  Not sure if you feel like
> > > > > > >>> refactoring DSE to work on regions - it currently uses a DOM
> > > > > > >>> walk which isn't suited for that.
> > > > > > >>>
> > > > > > >>> if-conversion has a little "local" dead predicate compute 
> > > > > > >>> removal
> > > > > > >>> thingy (not that I like that), eventually it can be enhanced to
> > > > > > >>> do the DSE you want?  Eventually it should be moved after the 
> > > > > > >>> local
> > > > > > >>> CSE invocation though.
> > > > > > >> Hi,
> > > > > > >> Thanks for the suggestions.
> > > > > > >> For now, would it be OK to do "dse" on loop header in
> > > > > > >> tree_if_conversion, as in the attached patch ?
> > > > > > >> The patch does local dse in a new function ifcvt_local_dse 
> > > > > > >> instead of
> > > > > > >> ifcvt_local_dce, because it needed to be done after RPO VN which
> > > > > > >> eliminates:
> > > > > > >> Removing dead stmt _ifc__62 = *_55;
> > > > > > >> and makes the following store dead:
> > > > > > >> *_55 = _ifc__61;
> > > > > > >
> > > > > > > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > > > > > > try that as independent patch (pre-approved).
> > > > > > >
> > > > > > > I don't mind the extra walk though.
> > > > > > >
> > > > > > > What I see as possible issue is that dse_classify_store walks 
> > > > > > > virtual
>

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-07 Thread Richard Biener
On Tue, Oct 1, 2019 at 1:48 PM Dmitrij Pochepko
 wrote:
>
> Hi Richard,
>
> I updated patch according to all your comments.
> Also bootstrapped and tested again on x86_64-pc-linux-gnu and 
> aarch64-linux-gnu, which took some time.
>
> attached v3.

OK.

Thanks,
Richard.

> Thanks,
> Dmitrij
>
> On Thu, Sep 26, 2019 at 09:47:04AM +0200, Richard Biener wrote:
> > On Tue, Sep 24, 2019 at 5:29 PM Dmitrij Pochepko
> >  wrote:
> > >
> > > Hi,
> > >
> > > can anybody take a look at v2?
> >
> > +(if (tree_to_uhwi (@4) == 1
> > + && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
> >
> > those will still ICE for large __int128_t constants.  Since you do not match
> > any conversions you should probably restrict the precision of 'type' like
> > with
> >(if (TYPE_PRECISION (type) <= 64
> > && tree_to_uhwi (@4) ...
> >
> > likewise tree_to_uhwi will fail for negative constants thus if the
> > pattern assumes
> > unsigned you should verify that as well with && TYPE_UNSIGNED  (type).
> >
> > Your 'argtype' is simply 'type' so you can elide it.
> >
> > +   (switch
> > +   (if (types_match (argtype, long_long_unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNTLL:integer_type_node @0)))
> > +   (if (types_match (argtype, long_unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNTL:integer_type_node @0)))
> > +   (if (types_match (argtype, unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNT:integer_type_node @0)))
> >
> > Please test small types first so we can avoid popcountll when long == long 
> > long
> > or long == int.  I also wonder if we really want to use the builtins and
> > check optab availability or if we nowadays should use
> > direct_internal_fn_supported_p (IFN_POPCOUNT, integer_type_node, type,
> > OPTIMIZE_FOR_BOTH) and
> >
> > (convert (IFN_POPCOUNT:type @0))
> >
> > without the switch?
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Dmitrij
> > >
> > > On Mon, Sep 09, 2019 at 10:03:40PM +0300, Dmitrij Pochepko wrote:
> > > > Hi all.
> > > >
> > > > Please take a look at v2 (attached).
> > > > I changed patch according to review comments. The same testing was 
> > > > performed again.
> > > >
> > > > Thanks,
> > > > Dmitrij
> > > >
> > > > On Thu, Sep 05, 2019 at 06:34:49PM +0300, Dmitrij Pochepko wrote:
> > > > > This patch adds matching for Hamming weight (popcount) 
> > > > > implementation. The following sources:
> > > > >
> > > > > int
> > > > > foo64 (unsigned long long a)
> > > > > {
> > > > > unsigned long long b = a;
> > > > > b -= ((b>>1) & 0xULL);
> > > > > b = ((b>>2) & 0xULL) + (b & 
> > > > > 0xULL);
> > > > > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
> > > > > b *= 0x0101010101010101ULL;
> > > > > return (int)(b >> 56);
> > > > > }
> > > > >
> > > > > and
> > > > >
> > > > > int
> > > > > foo32 (unsigned int a)
> > > > > {
> > > > > unsigned long b = a;
> > > > > b -= ((b>>1) & 0xUL);
> > > > > b = ((b>>2) & 0xUL) + (b & 0xUL);
> > > > > b = ((b>>4) + b) & 0x0F0F0F0FUL;
> > > > > b *= 0x01010101UL;
> > > > > return (int)(b >> 24);
> > > > > }
> > > > >
> > > > > and equivalents are now recognized as popcount for platforms with hw 
> > > > > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and 
> > > > > aarch64-linux-gnu systems with no regressions.
> > > > >
> > > > > (I have no write access to repo)
> > > > >
> > > > > Thanks,
> > > > > Dmitrij
> > > > >
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR tree-optimization/90836
> > > > >
> > > > > * gcc/match.pd (popcount): New pattern.
> > > > >
> > > 

Re: [PATCH v4 1/7] Allow COND_EXPR and VEC_COND_EXPR condtions to trap

2019-10-07 Thread Richard Biener
On Tue, Oct 1, 2019 at 3:27 PM Ilya Leoshkevich  wrote:
>
> Right now gimplifier does not allow VEC_COND_EXPR's condition to trap
> and introduces a temporary if this could happen, for example, generating
>
>   _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>   _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>
> from GENERIC
>
>   VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>   { -1, -1, -1, -1 } ,
>   { 0, 0, 0, 0 } >
>
> This is not necessary and makes the resulting GIMPLE harder to analyze.
> Change the gimplifier so as to allow COND_EXPR and VEC_COND_EXPR
> conditions to trap.
>
> This patch takes special care to avoid introducing trapping comparisons
> in GIMPLE_COND.  They are not allowed, because they would require 3
> outgoing edges (then, else and EH), which is awkward to say the least.
> Therefore, computations of such conditions should live in their own basic
> blocks.

OK.  This can go in independently of the other changes in this series.

Thanks and sorry for the delay.
Richard.

> gcc/ChangeLog:
>
> 2019-09-03  Ilya Leoshkevich  
>
> PR target/77918
> * gimple-expr.c (gimple_cond_get_ops_from_tree): Assert that the
> caller passes a non-trapping condition.
> (is_gimple_condexpr): Allow trapping conditions.
> (is_gimple_condexpr_1): New helper function.
> (is_gimple_condexpr_for_cond): New function, acts like old
> is_gimple_condexpr.
> * gimple-expr.h (is_gimple_condexpr_for_cond): New function.
> * gimple.c (gimple_could_trap_p_1): Handle COND_EXPR and
> VEC_COND_EXPR. Fix an issue with statements like i = (fp < 1.).
> * gimplify.c (gimplify_cond_expr): Use
> is_gimple_condexpr_for_cond.
> (gimplify_expr): Allow is_gimple_condexpr_for_cond.
> * tree-eh.c (operation_could_trap_p): Assert on COND_EXPR and
> VEC_COND_EXPR.
> (tree_could_trap_p): Handle COND_EXPR and VEC_COND_EXPR.
> * tree-ssa-forwprop.c (forward_propagate_into_gimple_cond): Use
> is_gimple_condexpr_for_cond, remove pointless tmp check
> (forward_propagate_into_cond): Remove pointless tmp check.
> ---
>  gcc/gimple-expr.c   | 25 +
>  gcc/gimple-expr.h   |  1 +
>  gcc/gimple.c| 14 +-
>  gcc/gimplify.c  |  5 +++--
>  gcc/tree-eh.c   |  8 
>  gcc/tree-ssa-forwprop.c |  7 ---
>  6 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index 4082828e198..1738af186d7 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -574,6 +574,7 @@ gimple_cond_get_ops_from_tree (tree cond, enum tree_code 
> *code_p,
>   || TREE_CODE (cond) == TRUTH_NOT_EXPR
>   || is_gimple_min_invariant (cond)
>   || SSA_VAR_P (cond));
> +  gcc_checking_assert (!tree_could_throw_p (cond));
>
>extract_ops_from_tree (cond, code_p, lhs_p, rhs_p);
>
> @@ -605,17 +606,33 @@ is_gimple_lvalue (tree t)
>   || TREE_CODE (t) == BIT_FIELD_REF);
>  }
>
> -/*  Return true if T is a GIMPLE condition.  */
> +/* Helper for is_gimple_condexpr and is_gimple_condexpr_for_cond.  */
>
> -bool
> -is_gimple_condexpr (tree t)
> +static bool
> +is_gimple_condexpr_1 (tree t, bool allow_traps)
>  {
>return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> -   && !tree_could_throw_p (t)
> +   && (allow_traps || !tree_could_throw_p (t))
> && is_gimple_val (TREE_OPERAND (t, 0))
> && is_gimple_val (TREE_OPERAND (t, 1;
>  }
>
> +/* Return true if T is a GIMPLE condition.  */
> +
> +bool
> +is_gimple_condexpr (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, true);
> +}
> +
> +/* Like is_gimple_condexpr, but does not allow T to trap.  */
> +
> +bool
> +is_gimple_condexpr_for_cond (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, false);
> +}
> +
>  /* Return true if T is a gimple address.  */
>
>  bool
> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> index 1ad1432bd17..0925aeb0f57 100644
> --- a/gcc/gimple-expr.h
> +++ b/gcc/gimple-expr.h
> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum 
> tree_code *, tree *,
>tree *);
>  extern bool is_gimple_lvalue (tree);
>  extern bool is_gimple_condexpr (tree);
> +extern bool is_gimple_condexpr_for_cond (tree);
>  extern bool is_gimple_address (const_tree);
>  extern bool is_gimple_invariant_address (const_tree);
>  extern bool is_gimple_ip_invariant_address (const_tree);
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 8e828a5f169..a874c29454c 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2149,10 +2149,22 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, 
> bool include_stores)
>return false;
>
>  case GIMPLE_ASSIGN:
> -  t 

[PATCH] Fix PR91975, tame PRE some more

2019-10-07 Thread Richard Biener


The following tries to address the issue that PRE is quite happy
to introduce new IVs in loops just because it can compute some
constant value on the loop entry edge.  In principle there's
already code that should work against that but it simply searches
for a optimize_edge_for_speed () edge.  That still considers
the loop entry edge to be worth optimizing because it ends
up as maybe_hot_edge_p (e) for -O2 which compares the edge count
against the entry block count.  For PRE we want something more
local (comparing to the destination block count).

Now for the simple testcases this shouldn't make a difference
but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
isn't the same as profile_probabilities likely or very likely...

Still one key of the patch is that we compare the sum of the
edge counts where the value is available (and thus the redundancy
elimination happens) with the count we have to insert rather
than looking for a single optimize_edge_for_speed_p edge.

For that I've used

if (avail_count < block->count.apply_probability
(profile_probability::unlikely ()))

so we block insertion if the redundancies would overall be "unlikely".

I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
default value) that seems to match better for scaling a profile-count?

Honza?

Bootstrap & regtest running on x86-64-unknown-linux-gnu.

Does the above predicate look sane or am I on a wrong track with
using the destination block count here (I realize even the "locally cold"
entries into the block might be quite hot globally).

For a 1:1 translation of the existing code to sth using the
original predicate but summing over edges I could use
!maybe_hot_count_p (cfun, avail_count)?  But then we're back to
PRE doing the unwanted insertions.  Changing maybe_hot_count_p
to use HOT_BB_COUNT_FRACTION doesn't make any difference there
(obviously).

Thanks,
Richard.

2019-10-06  Richard Biener  

PR tree-optimization/91975
* tree-ssa-pre.c (do_pre_regular_insertion): Adjust
profitability check to use the sum of all edge counts the
value is available on and check against unlikely execution
of the block.

* gcc.dg/tree-ssa/ldist-39.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
new file mode 100644
index 000..a63548979ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+#define T int
+
+const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
+T b[sizeof a / sizeof *a];
+
+void f0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+d[i] = s[i];
+}
+
+void g0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+*d++ = *s++;
+}
+
+extern const T c[sizeof a / sizeof *a];
+
+void f1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+d[i] = s[i];
+}
+
+void g1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+*d++ = *s++;
+}
+
+/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..af49ba388c1 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
  pre_expr eprime = NULL;
  edge_iterator ei;
  pre_expr edoubleprime = NULL;
- bool do_insertion = false;
+ profile_count avail_count = profile_count::zero ();
 
  val = get_expr_value_id (expr);
  if (bitmap_set_contains_value (PHI_GEN (block), val))
@@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
{
  avail[pred->dest_idx] = edoubleprime;
  by_some = true;
- /* We want to perform insertions to remove a redundancy on
-a path in the CFG we want to optimize for speed.  */
- if (optimize_edge_for_speed_p (pred))
-   do_insertion = true;
+ avail_count += pred->count ();
  if (first_s == NULL)
first_s = edoubleprime;
  else if (!pre_expr_d::equal (first_s, edoubleprime))
@@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
 partially redundant.  */
  if (!cant_insert && !all_same && by_some)
{
- if (!do_insertion)
+ /* We want to perform insertions to remove a redundancy on
+a path in the CFG that is som

[PATCH] Improve unrolling heuristics, PR91975

2019-10-07 Thread Richard Biener


Currently there's a surprising difference in unrolling size estimation
depending on how exactly you formulate your IV expressions.  The following
patch makes it less dependent on this, behaving like the more optimistical
treatment ( + 1 being constant).  In the end it's still a heuristic
and in some sense the estimation of the original size now looks odd
(costing of a[i] vs. *(a + i * 4)).  I still think it's an improvement.

For testcase adjustments I generally tried to disable unrolling if
doing so would defeat the testcases purpose (validate correctness
of vectorization for example).  I've verified that the unrolling we
now do results in no worse code for the cases (even if I ended up
disabling that unrolling).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

2019-10-07  Richard Biener  

PR tree-optimization/91975
* tree-ssa-loop-ivcanon.c (constant_after_peeling): Consistently
handle invariants.

* g++.dg/tree-ssa/ivopts-3.C: Adjust.
* gcc.dg/vect/vect-profile-1.c: Disable cunrolli.
* gcc.dg/vect/vect-double-reduc-6.c: Disable unrolling of
the innermost loop.
* gcc.dg/vect/vect-93.c: Likewise.
* gcc.dg/vect/vect-105.c: Likewise.
* gcc.dg/vect/pr79920.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102.c: Likewise.
* gcc.dg/vect/no-vfa-vect-101.c: Likewise.
* gcc.dg/vect/pr83202-1.c: Operate on a larger array.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gcc.dg/tree-ssa/cunroll-2.c: Scan early unrolling dump instead
of late one.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 07ff1b770f8..6760a5b1851 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -70,6 +70,8 @@ int main ( int , char** ) {
 return 0;
 }
 
-// Verify that on x86_64 and i?86 we use a single IV for the innermost loop
+// Verify that on x86_64 and i?86 we unroll the innsermost loop and
+// use three IVs for the then innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 3 avg niters, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:63, 127 avg niters, 3 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump-not "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-3.c 
b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
index e314a01b1f1..33eb18826fd 100644
--- a/gcc/testsuite/gcc.c-torture/execute/loop-3.c
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
@@ -13,7 +13,7 @@ f (m)
   i = m;
   do
 {
-  g (i * INT_MAX / 2);
+  g ((int)((unsigned)i * INT_MAX) / 2);
 }
   while (--i > 0);
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
index b1d1c7d3d85..ae3fec99749 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
 int a[2];
 int test2 (void);
 void
@@ -14,4 +14,4 @@ test(int c)
 }
 }
 /* We are not able to get rid of the final conditional because the loop has 
two exits.  */
-/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" 
"cunroll"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" 
"cunrolli"} } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c 
b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
index 91eb28218bd..ce934279ddf 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
@@ -22,6 +22,7 @@ int main1 (int x, int y) {
   p = (struct extraction *) malloc (sizeof (struct extraction));
 
   /* Not vectorizable: different unknown offset.  */
+#pragma GCC unroll 0
   for (i = 0; i < N; i++)
 {
   *((int *)p + x + i) = a[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c 
b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
index 51f62788dbf..d9e0529e73f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
@@ -28,6 +28,7 @@ int main1 (int x, int y) {
 }
 
   /* Not vectorizable: distance 1.  */
+#pragma GCC unroll 0
   for (i = 0; i < N - 1; i++)
 {
*((int *)p + x + i + 1) = *((int *)p + x + i);
diff --git a/gcc/testsuite/gcc.dg/vect/pr79920.c 
b/gcc/testsuite/gcc.dg/vect/pr79920.c
index 276a2806f0c..38e0fef779a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr79920.c
+++ b/gcc/testsuite/gcc.dg/vect/pr79920.c
@@ -14,6 +14,7 @@ compute_integral (double w_1[18])

Re: Type representation in CTF and DWARF

2019-10-07 Thread Richard Biener
On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat  wrote:
>
> Hello,
>
> At GNU Tools Cauldron this year, some folks were curious to know more on how
> the "type representation" in CTF compares vis-a-vis DWARF.
>
> I use small testcase below to gather some numbers to help drive this 
> discussion.
>
> [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
> #define MAX_NUM_MSGS 5
>
> enum node_type
> {
>INIT_TYPE = 0,
>COMM_TYPE = 1,
>COMP_TYPE = 2,
>MSG_TYPE = 3,
>RELEASE_TYPE = 4,
>MAX_NODE_TYPE
> };
>
> typedef struct node_payload
> {
>unsigned short npay_offset;
>const char * npay_msg;
>unsigned int npay_nelems;
>struct node_payload * npay_next;
> } node_payload;
>
> typedef struct node_property
> {
>int timestamp;
>char category;
>long initvalue;
> } node_property_t;
>
> typedef struct node
> {
>enum node_type ntype;
>int nmask:5;
>union
>  {
>struct node_payload * npayload;
>void * nbase;
>  } nu;
>  unsigned int msgs[MAX_NUM_MSGS];
>  node_property_t node_prop;
> } Node;
>
> Node s;
>
> int main (void)
> {
>return 0;
> }
>
> Note that in this case, there is nothing that the de-duplicator has to do
> (neither for the TYPE comdat sections nor CTF types). I chose such an example
> because de-duplication of types is orthogonal to the concept of representation
> of types.
>
> So, for the small C testcase with a union, enum, array, struct, typedef etc, I
> see following sizes :
>
> Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
>  .debug_aranges 48 0
>  .debug_info   150 0
>  .debug_abbrev 314 0
>  .debug_line73 0
>  .debug_str455 0
>  .debug_ranges  32 0
>  .debug_types  578 0
>
> Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
>  .debug_aranges  48 0
>  .debug_info732 0
>  .debug_abbrev  309 0
>  .debug_line 73 0
>  .debug_str 455 0
>  .debug_rnglists 23 0
>
> Compile with -gt (size -A  excerpt):
>  .ctf  966 0
>  CTF strings sub-section size (ctf_strlen in disassmebly) = 374
>  == > CTF section just for representing types = 966 - 374 = 592 bytes
>  (The 592 bytes include the CTF header and other indexes etc.)
>
> So, following points are what I would highlight. Hopefully this helps you see
> that CTF has promise for the task of representing type debug info.
>
> 1. Type Information layout in sections:
> A .ctf section is self-sufficient to represent types in a program. All
> references within the CTF section are via either indexes or offsets into 
> the
> CTF section. No relocations are necessary in CTF at this time. In 
> contrast,
> DWARF type information is organized in multiple sections - .debug_info,
> .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
> DWARF4.
>
> 2. Type Information encoding / compactness matters:
> Because the type information is organized across sections in DWARF (and
> contains some debug information like location etc.) , it is not feasible
> to put a distinct number to the size in bytes for representing type
> information in DWARF. But the size info of sections shown above should
> be helpful to show that CTF does show promise in compactly representing
> types.
>
> Lets see some size data. CTF string table (= 374 bytes) is left out of the
> discussion at hand because it will not be fair to compare with .debug_str
> section which contains other information than just names of types.
>
> The 592 bytes of the .ctf section are needed to represent types in CTF
> format. Now, when using DWARF5, the type information needs 732 bytes in
> .debug_info and 309 bytes in .debug_abbrev.
>
> In DWARF (when using -fdebug-types-section), the base types are duplicated
> across type units. So for the above example, the DWARF DIE representing
> 'unsigned int' will appear in both the  DWARF trees for types - node and
> node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info  292
.debug_abbrev 189
.debug_str   299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.

> 3. Type Information retrieval and handling:
> CTF type information is organized as a linear array of CTF types. CTF 
> types
> have references to other CTF types. 

Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Richard Biener
On Thu, Oct 3, 2019 at 5:18 PM Bernd Edlinger  wrote:
>
> Hi,
>
> this fixes -Wshadow=local warnings in passes.c.
> The non-trivial part is due to the PUSH_INSERT_PASSES_WITHIN
> is used recursively, and shadows the local value p
> in each invocation.
>
> Fixed by using a helper class that restores the saved content
> of p at the end of the block.
>
> The shadowing variable in ipa_write_summaries can simply be
> removed sine the outer variable of the same name has the
> same type and is not live here, this is a trivial change.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

The class seems to be a poor-mans way to avoid the warning
while the current use is more clear.  You do

 {
   int i;
   {
 // int i; ... oops, warning
 int prev_i_1 = i;
 i = ...

 // fixup
 i = prev_i_1;
   }

Using of a C++ class doesn't make this less obviously worse.
If this ends up acked then please add this to ansidecl.h or
somewhere else global as template:

template 
struct push {
  push (T &);
  ~push ();
  T *m_loc;
  T m_val;
};

because it would be a general solution for _all_ shadow=local warnings?!

Richard.

>
> Thanks
> Bernd.
>


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-10-07 Thread Richard Biener
On Sun, Oct 6, 2019 at 4:38 PM Jan Hubicka  wrote:
>
> > On 9/19/19 2:33 AM, Martin Liška wrote:
> > > Hi.
> > >
> > > Function reordering has been around for quite some time and a naive
> > > implementation was also part of my diploma thesis some time ago.
> > > Currently, the GCC can reorder function based on first execution, which
> > > happens with PGO and LTO of course. Known limitation is that the order
> > > is preserved only partially as various symbols go into different LTRANS 
> > > partitions.
> > >
> > > There has been some research in the area and I would point out the 
> > > Facebook paper
> > > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new 
> > > implementation
> > > in the GCC that does the same (in a proper way). First part of the 
> > > enablement are patches
> > > to ld.bfd and ld.gold that come up with a new section .text.sorted, that 
> > > is always sorted.
> > > There's a snippet from the modified default linker script:
> > Funny, I was doing this via linker scripts circa ~95, in fact that's why
> > we have -ffunction-sections :-)   We started with scripts which
> > post-processed profiling data to create linker scripts for ELF systems.
> >  We had something for HPUX/SOM as well, but I can't remember what
> > mechanism we used, it may have been a gross level sorting using the SOM
> > section sort key mechanism - it only had 128 or 256 keys with a notable
> > amount of them reserved.
> >
> > We had also built a linker with a basic level of interposition circa
> > 1993 and explored various approaches to reordering executables.  I'd
> > just joined the group at the time and was responsible for wiring up
> > stuff on the OS side, but eventually got the "pleasure" of owning the
> > linker server.  A lot of the C3 algorithmic stuff looks similar to what
> > we did.
>
> For reference I am attaching early LTO version from circa 2010 :)
> >
> > Anyway...
> >
> > I don't see anything objectionable in here.  It's noted as an RFC.  Are
> > you interested in pushing this forward for gcc-10?
>
> I think it is plan to get some form of code layout pass into GCC10.  I
> will test Martin's version on Firefox and see if it have any effect
> here. It is generally quite hard to evaluate those changes (and it is
> reason why I did not moved forward with version form 2010 - more
> precisely it kept falling off the shelf for about a decade)
>
> If LLD has support for cgraph annotations and we support LLD, i think we
> should add support for that, too - it will be very useful to compare
> indvidiual implementations.
> I believe there is gold support (off tree?) for that too and something
> similar is also supported by other toolchains like AIX?
>
> One problem with the sections approach is that every section gets
> aligned to largest code alignment inside of the section. Since our
> alignments are (should be) often cache line based we get cca 30 bytes of
> waste for every function that is quite a lot.
>
> This is why we currently have way to order function when outputting them
> and use that with FDO (using Martin's first execution logic). This has
> drwarback of making the functions to flow in that order through late
> optimizations and RTL backend and thus we lose IPA-RA and some
> IP propagation (late pure/const/nothrow discovery).

But you can also fix that by the parallelization GSoC project approach,
decoupling things at RTL expansion IPA-wise and using output order
for the latter (or even do the fragments in GCC itself by refactoring the
output machinery to use function-local "strings", assembling the final
file later).  Refactoring output to handle multiple output "files" at the
same time might also help getting rid of that early-LTO-debug copying
stuff (now that it seems that linker-plugin extensions for partially claiming
files will never happen...)

> So this approach has a drawback, too. It is why i was trying to push
> myself to get gcc to use gas fragments :)
>
> Anyway, all these issues can be sovled incementally - lets see how
> Maritn's patch work on Firefox and if we can get it tested elsewhere and
> start from that.
>
> I will take a look into the details.
>
> Honza
> >
> > jeff
> >
>
> Index: tree-pass.h
> ===
> *** tree-pass.h (revision 164689)
> --- tree-pass.h (working copy)
> *** extern struct ipa_opt_pass_d pass_ipa_lt
> *** 467,472 
> --- 467,473 
>   extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
>   extern struct ipa_opt_pass_d pass_ipa_profile;
>   extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
> + extern struct ipa_opt_pass_d pass_ipa_func_reorder;
>
>   extern struct gimple_opt_pass pass_all_optimizations;
>   extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
> Index: cgraphunit.c
> ===
> *** cgraphunit.c(revision 164689)
> --- cgraphunit.c(working copy)
> 

Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-07 Thread Richard Biener
On Sun, Oct 6, 2019 at 1:24 PM Bernd Edlinger  wrote:
>
> On 10/5/19 8:24 PM, Segher Boessenkool wrote:
> >
> > I am maintainer of combine, I know all about its many problems (it has much
> > deeper problems than this unfortunately).  Thanks for your help though, this
> > is much appreciated, but I do think your current patch is not a step in the
> > right direction.
> >
>
> Hmm, thanks for your open words these are of course important.  I will not
> commit this under obvious rules since you objected to the patch in general.
>
> What I want to achieve is to make sure that new code is not introducing more
> variable shadowing.  New shadowing variables are introduced by new code, 
> unless
> we have a warning enabled.  And the warning need to be enabled together with
> -Werror otherwise it will be overlooked.
>
> For instance I believe MISRA has even stronger coding rules with respect
> to shadowing.
>
> What I tried to do was adding -Wshadow=local to the -Werror warnings set
> and do a more or less mechanical change over the whole code base.
> How that mechanical change is done - if at all -, needs to be agreed upon.
>
> Currently I have the impression that we do not agree if this warning is to be
> enabled at all.

I think if the current code-base was clean then enabling it would be a
no-brainer.
But I agree that mechanically "fixing" the current code-base, while ending up
with no new introductions of local shadowing, is worse if it makes the current
code-base worse.  Consider your

  for  (int i = )
{
...
   {
  int i = ...;
  ... (*)
   }
}

when editing (*) using 'i' will usually be picking up the correct one.  If you
change the inner do 'i1' then fat-fingering 'i' is easy and bound to happen
and will be silently accepted.  It's also not any more obvious _which_
of the i's is intended since 'i1' is not any more descriptive than 'i'.

If only it will confuse developers familiar with the code then such change is
making things worse.

But yes, this means that the quest to enable -Werror=shadow=local becomes
much harder.  Would it be possible to enable it selectively for "clean"
files in the Makefile rules?  White-listing with -Wno-error=... doesn't work
because older host compilers treat unknown options as error there.  More
configury around this might help (like simply not enabling it during stage1
and using a configure macro in the while-listing makefile rules).

This probably means fixing the header file issues first though.

Richard.

>
> Bernd.
>


Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-04 Thread Richard Biener
On October 4, 2019 5:38:09 PM GMT+02:00, Jeff Law  wrote:
>On 10/4/19 6:59 AM, Aldy Hernandez wrote:
>> When I did the value_range canonicalization work, I noticed that an
>> unsigned [1,MAX] and an ~[0,0] could be two different representations
>> for the same thing.  I didn't address the problem then because
>callers
>> to ranges_from_anti_range() would go into an infinite loop trying to
>> extract ~[0,0] into [1,MAX] and [].  We had a lot of callers to
>>
>ranges_from_anti_range, and it smelled like a rat's nest, so I bailed.
>> 
>> Now that we have one main caller (from the symbolic PLUS/MINUS
>> handling), it's a lot easier to contain.  Well, singleton_p also
>calls
>>
>it, but it's already handling nonzero specially, so it wouldn't be affected.
>> 
>> 
>> With some upcoming cleanups I'm about to post, the fact that [1,MAX]
>and
>> ~[0,0] are equal_p(), but not nonzero_p(), matters.  Plus, it's just
>> good form to have one representation, giving us the ability to pick
>at
>> nonzero_p ranges with ease.
>> 
>> The code in extract_range_from_plus_minus_expr() continues to be a
>mess
>> (as it has always been), but at least it's contained, and with this
>> patch, it's slightly smaller.
>> 
>> Note, I'm avoiding adding a comment header for functions with highly
>> descriptive obvious names.
>> 
>> OK?
>> 
>> Aldy
>> 
>> canonicalize-nonzero-ranges.patch
>> 
>> commit 1c333730deeb4ddadc46ad6d12d5344f92c0352c
>> Author: Aldy Hernandez 
>> Date:   Fri Oct 4 08:51:25 2019 +0200
>> 
>> Canonicalize UNSIGNED [1,MAX] into ~[0,0].
>> 
>> Adapt PLUS/MINUS symbolic handling, so it doesn't call
>> ranges_from_anti_range with a VR_ANTI_RANGE containing one
>sub-range.
>> 
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 6e4f145af46..3934b41fdf9 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,18 @@
>> +2019-10-04  Aldy Hernandez  
>> +
>> +* tree-vrp.c (value_range_base::singleton_p): Use num_pairs
>> +instead of calling vrp_val_is_*.
>> +(value_range_base::set): Canonicalize unsigned [1,MAX] into
>> +non-zero.
>> +(range_has_numeric_bounds_p): New.
>> +(range_int_cst_p): Use range_has_numeric_bounds_p.
>> +(ranges_from_anti_range): Assert that we won't recurse
>> +indefinitely.
>> +(extract_extremes_from_range): New.
>> +(extract_range_from_plus_minus_expr): Adapt so we don't call
>> +ranges_from_anti_range with an anti-range containing only one
>> +sub-range.
>So no problem with the implementation, but I do have a higher level
>question.
>
>One of the goals of the representation side of the Ranger project is to
>drop anti-ranges.  Canonicalizing [1, MAX] to ~[0,0] seems to be going
>in the opposite direction.   So do we really want to canonicalize to
>~[0,0]?

No, we don't. 

Richard. 

>jeff



Re: compatibility of structs/unions/enums in the middle end

2019-10-04 Thread Richard Biener
On Fri, Oct 4, 2019 at 1:55 PM Uecker, Martin
 wrote:
>
> Am Freitag, den 04.10.2019, 12:29 +0200 schrieb Richard Biener:
> > On Wed, Oct 2, 2019 at 8:24 PM Uecker, Martin
> >  wrote:
> > >
> > > Am Mittwoch, den 02.10.2019, 17:37 +0200 schrieb Richard Biener:
>
> > >
> > > ...
> > >
> > > > > > Oh, and LTO does _not_ merge types declared inside a function,
> > > > > > so
> > > > > >
> > > > > > void foo () { struct S { int i; }; }
> > > > > > void bar () { struct S { int i; }; }
> > > > > >
> > > > > > the two S are distinct and objects of that type do not conflict.
> > > > >
> > > > > This is surprising as these types are compatible across TUs. So
> > > > > if some pointer is passed between these functions this is
> > > > > supposed to work.
> > > >
> > > > So if they are compatible the frontend needs to mark them so in this 
> > > > case.
> > >
> > > It can't. The front end never sees the other TU.
> >
> > If the type "leaves" the CU via a call the called function has a prototype
> > through which it "sees" the CU.
>
> The prototype could be local to the function or it could be a void*
> (or other pointer type) argument.
>
>
>
> TU1---
> #include 
>
> extern void f(void *p);
>
> int main()
> {
> struct foo { int x; } b;
> b.x = 2;
> f();
> printf("%d\n", b.x);
> }
>
> TU2-
> extern void f(void *p)
> {
> struct foo { int x; } *q = p;
> q->x = 3;
> }

If the frontend puts those structures at local scope
then yes, the above presents a problem to LTO
at the moment.  So, trigger some inlining,
make main() read b.x after f() and assert that
it is 3.  I think that would fail at the moment.

Richard.

> Best,
> Martin


Re: [PATCH] Add -Wshadow=local

2019-10-04 Thread Richard Biener
On Fri, Oct 4, 2019 at 1:21 PM Bernd Edlinger  wrote:
>
> On 10/4/19 12:43 PM, Richard Biener wrote:
> > On Thu, Oct 3, 2019 at 5:17 PM Bernd Edlinger  
> > wrote:
> >>
> >> Hi,
> >>
> >> this adds -Wshadow=local to the GCC build rules.
> >>
> >> It is to be applied after all other patches in this series
> >> including the trivial ones are applied.
> >>
> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> >> Is it OK for trunk?
> >
> > The -Wshadow=local hunk is obviously OK but there's
> > a hunk in the patch adding -Wextra as well...?!
> >
>
> Yes, that replaces -W with -Wextra, I couldn't resist
> to doing that when I was already there, since
>
> gcc -v --help
> prints:
> -W   This switch is deprecated; use -Wextra instead
>
> So the change log says that as well:
>
> > * configure.ac (ACX_PROG_CXX_WARNING_OPTS): Use -Wextra instead of 
> > -W.
> > Add -Wshadow=local.
>
> Would you like me to split that patch for the -Wextra?

Oh, no need - just didin't look at the ChangeLog...

Richard.

>
> Bernd.


[PATCH] Fix gcc/testsuite/gcc.c-torture/execute/loop-3.c

2019-10-04 Thread Richard Biener


Which invokes undefined behavior.

Committed as obvious.

Richard.

2019-10-04  Richard Biener  

* gcc.c-torture/execute/loop-3.c: Fix undefined behavior.

diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-3.c 
b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
index e314a01b1f1..33eb18826fd 100644
--- a/gcc/testsuite/gcc.c-torture/execute/loop-3.c
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
@@ -13,7 +13,7 @@ f (m)
   i = m;
   do
 {
-  g (i * INT_MAX / 2);
+  g ((int)((unsigned)i * INT_MAX) / 2);
 }
   while (--i > 0);
 }


[PATCH] Fix PR91968

2019-10-04 Thread Richard Biener


Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-04  Richard Biener  

PR lto/91968
* tree.c (find_decls_types_r): Do not remove LABEL_DECLs from
BLOCK_VARS.

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 276396)
+++ gcc/tree.c  (working copy)
@@ -5936,8 +5936,9 @@ find_decls_types_r (tree *tp, int *ws, v
 {
   for (tree *tem = _VARS (t); *tem; )
{
- if (TREE_CODE (*tem) != VAR_DECL
- || !auto_var_in_fn_p (*tem, DECL_CONTEXT (*tem)))
+ if (TREE_CODE (*tem) != LABEL_DECL
+ && (TREE_CODE (*tem) != VAR_DECL
+ || !auto_var_in_fn_p (*tem, DECL_CONTEXT (*tem
{
  gcc_assert (TREE_CODE (*tem) != RESULT_DECL
  && TREE_CODE (*tem) != PARM_DECL);


Re: [EXT] Re: Modifying types during optimization

2019-10-04 Thread Richard Biener
On Wed, Oct 2, 2019 at 8:50 PM Gary Oblock  wrote:
>
> On 10/2/19 3:15 AM, Richard Biener wrote:
> > External Email
> >
> > --
> > On Wed, Oct 2, 2019 at 1:43 AM Gary Oblock  wrote:
> >> I'm working on structure reorganization optimizations and one of the
> >> things that needs to happen is that pointers to arrays of structures
> >> need to be modified into either being an integer of a structure depending
> >> on which optimization is required.
> >>
> >> I'm seeing something similar happening in omp-low.c where the code in
> >> install_var_field and fixup_child_record_type both seem to rebuild the
> >> entire type from scratch if a field is either added or modified. Wouldn't
> >> it be possible simply modify the field(s) in question and rerun 
> >> layout_type?
> >>
> >> I suspect the answer will be no but reasons as to why that wouldn't work
> >> will probably be equally valuable to me.
> > I think it's undesirable at least.  When last discussing "structure reorg"
> > I was always arguing that trying to change the "type" is the wrong angle
> > to look at (likewise computing something like "type escape").  It's
> > really individual objects you are transforming and that you need to track
> > so there may be very well instances of the original type T plus the
> > modified type T' in the program after the transform.
> >
> > Richard.
> >
> >> Thanks,
> >>
> >> Gary Oblock
> >>
> Richard,

Hope you don't mind I CC the list on my answer.

> You are right, in that T' is a whole new type that I'll be creating for
> the actual structure
> reorganization optimizations but if I have a pointer pointing to to a T'
> object I also have
> to change "that."

Well, actually you don't have to(*).  On GIMPLE what matters in the
end are the types of the actual memory accesses, the type of a record
field isn't important, esp. if it is "only" a pointer.

> Unfortunately, if this pointer  is an element of a
> structure type
> then I also have to modify that structure type. Note, there are
> situations where that pointer
> could possibly point to either T or T' in which case I have to
> disqualify the optimization
> on T. There are lot's details like that but this is probably not the
> best forum to transmit
> them.

I always thought people are too obsessed on "types" when doing
structure layout optimizations.  What they really are after is
optimizing memory layout and thus the transformation is on the
memory accesses.  IIRC the original implementation GCC had tried
to mostly get away with mangling the types and the actual memory
accesses changing "auto-magically".  But that severely limits
the application since once pointers are involved this "simple"
transform does not work.

So in my view memory layout optimization has to happen like

1) identify an interesting set of memory objects (a memory allocation
point or a variable declaration)
2) make sure you can see _all_ accesses to that object (because you'll
need to modify them)
3) cost/benefit/transform decision
4) rewrite the allocation/declaration and the accesses previously identified

see how types nowhere appear;  they might in deriving a transform in 3),
but that's for simplicity only.

> Back to my original question about modifying  the type, is rerunning
> layout_type reasonable?

If the layout changes then you'll have a problem unless you modify
all accesses anyways in which case you don't need that.

> I'm inspired by relayout_decl which clears a half a dozen or so fields
> and then calls layout_decl.
>
> I appreciate your answers and comments,

(*) There's the issue of debug information but IIRC that's unsolved for
this kind of transforms anyway.

> Gary


Re: Reimplementation of -Wclobbered on GIMPLE SSA

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 1:55 PM Vladislav Ivanishin  wrote:
>
> Hi!
>
> This series implements the -Wclobbered warning on GIMPLE in place of the
> old RTL implementation.
>
> This cover letter contains a high-level explanation of what is going on
> and why.  The new implementation itself is in the patch 3.  I will post
> the details in the accompanying blurb.
>
> The most obvious benefit of doing it on GIMPLE is making this warning
> independent of the specific register allocation decisions the RA makes.
> I.e. if there is a possibility of a different compiler allocating a
> variable to a register live through the setjmp call, but in the current
> compilation it got spilled, the existing implementation does not give a
> warning.
>
> So the new implementation strives to indicate as many potential errors
> in the code as it can, not limited to the problems pertinent to this
> compilation with this particular compiler.  This is in line with
> e.g. what Tom Lane (of PostgreSQL) expressed interest in seeing [1].
>
> Also, in order to suppress spurious warnings, it is sometimes useful to
> distinguish paths realizable after the second (and subsequent) returns
> from setjmp vs. realizable only after the first return (see PR 21161).
> The new implementation does that using a simple heuristic.
>
> All comments are appreciated.  Is this OK for trunk?

-ENOPATCH

> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1185673#c5
>
> Thanks,
> Vlad


Re: [PATCH] Fix -Wshadow=local warnings in cgraph.h

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 5:18 PM Bernd Edlinger  wrote:
>
> Hi,
>
> this fixes a -Wshadow=local warning in the FOR_EACH_ALIAS macro
> that happens when it is used in lto/lto-partition.c in a nested
> block.
>
> For now to keep the patch simple, using the fact that the ALIAS
> parameter is always a simple name, concatenate _iter_ to make
> the loop variable a reserved name.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

OK.

Richard.

>
> Thanks
> Bernd.
>


Re: [PATCH] Fix -Wshadow=local warnings in genmatch.c

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 5:18 PM Bernd Edlinger  wrote:
>
> Hi,
>
> this fixes -Wshadow=local warnings in genmatch.c itself and in generated
> files as well.
>
> The non-trivial part of this patch is renaming the generated local variables
> in the gimple-match.c, to use different names for variables in inner blocks,
> and use a depth index to access the right value.  Also this uses variables
> in the reserved name space, to avoid using the same names (e.g. res, op0, op1)
> that are used in existing .md files.
>
> So I rename:
>
> ops%d -> _o%d
> op%d -> _p%d
> o%u%u -> _q%u%u
> res -> _r%d (in gen_transform)
> res -> _r (in dt_simplify::gen_1)
> def -> _a%d (if gimple_assign)
> def -> _c%d (if gimple_call)
> def_stmt -> _d%d
>
> leaving res_ops, res_op, capture for later, since they are not likely to
> be used in .md files.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

Jumping on one set of hunks:

@@ -2270,42 +2270,42 @@ capture_info::walk_result (operand *o, bool condit
  walk_result (e->ops[i], cond_p, result);
}
 }
-  else if (if_expr *e = dyn_cast  (o))
+  else if (if_expr *e1 = dyn_cast  (o))
 {
...
+  else if (with_expr *e2 = dyn_cast  (o))
 {
-  bool res = (e->subexpr == result);
..

this seems like a bogus -Wshadow if it really warns?  The change above makes
the code ugly IMHO.  Indeed:

> g++-8 t.C -Wshadow=local
t.C: In function ‘void foo(int)’:
t.C:5:16: warning: declaration of ‘j’ shadows a previous local [-Wshadow=local]
   else if (int j = 1)
^
t.C:3:11: note: shadowed declaration is here
   if (int j = i)
   ^

for

void foo (int i)
{
  if (int j = i)
;
  else if (int j = 1)
;
}

and that's a bug.  Ugh.

void foo (int i)
{
  if (int j = i)
;
  else
j = 1;
}

really compiles :/  I wasn't aware of that semantic and it's totally
non-obvious to me... ICK.  Btw, rather than e1,e2,e3... I'd
then have used e (expr *), ie (if_expr *) and we (with_expr).
1 2 and 3 are so arbitrary.

The rest of the patch looks OK to me, the above just caught my eye
so OK for trunk with e, ie, we.

Thanks,
Richard.

>
>
> Thanks
> Bernd.
>


Re: [PATCH] Add -Wshadow=local

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 5:17 PM Bernd Edlinger  wrote:
>
> Hi,
>
> this adds -Wshadow=local to the GCC build rules.
>
> It is to be applied after all other patches in this series
> including the trivial ones are applied.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

The -Wshadow=local hunk is obviously OK but there's
a hunk in the patch adding -Wextra as well...?!

Richard.

>
>
> Thanks
> Bernd.
>


Re: [PATCH] contrib: Add KPASS support to dg-extract-results.{sh,py}

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 4:33 PM Andrew Burgess
 wrote:
>
> My motivation for the patch below comes from GDB.  The binutils-gdb
> project maintains a copy of the contrib directory that it keeps in
> sync with upstream GCC, and patches to contrib/ are ideally first
> applied to GCC then backported to binutils-gdb.
>
> This patch extends the dg-extract-results.* scripts to support KPASS
> results.  These results are a part of standard dejagnu.
>
> I haven't checked if this is going to impact GCC, but if it does, it
> should only be a good impact - adding previously missing results into
> the final .sum file.  I tested on GDB and we now pick up some
> previously missing results.
>
> Will this be OK to apply?

OK.

Richard.

> Thanks,
> Andrew
>
> ---
>
> Extend dg-extract-results.sh and dg-extract-results.py to support the
> KPASS test result status.  This is required by GDB which uses a copy
> of the dg-extract-results.{sh,py} scripts that it tries to keep in
> sync with GCC.
>
> ChangeLog:
>
> * contrib/dg-extract-results.sh: Add support for KPASS.
> * contrib/dg-extract-results.py: Likewise.
> ---
>  ChangeLog | 5 +
>  contrib/dg-extract-results.py | 2 +-
>  contrib/dg-extract-results.sh | 2 +-
>  3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
> index 4e113a8dd6b..7100794d42a 100644
> --- a/contrib/dg-extract-results.py
> +++ b/contrib/dg-extract-results.py
> @@ -117,7 +117,7 @@ class Prog:
>  self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>  self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>   r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
> - r'|KFAIL):\s*(.+)')
> + r'|KFAIL|KPASS):\s*(.+)')
>  self.completed_re = re.compile (r'.* completed at (.*)')
>  # Pieces of text to write at the head of the output.
>  # start_line is a pair in which the first element is a datetime
> diff --git a/contrib/dg-extract-results.sh b/contrib/dg-extract-results.sh
> index 97ac222b54a..f948088370e 100755
> --- a/contrib/dg-extract-results.sh
> +++ b/contrib/dg-extract-results.sh
> @@ -326,7 +326,7 @@ BEGIN {
>}
>  }
>  /^\t\t=== .* ===$/ { curvar = ""; next }
> -/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/
>  {
> +/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL|KPASS):/
>  {
>testname=\$2
># Ugly hack for gfortran.dg/dg.exp
>if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
> --
> 2.14.5
>


Re: [SVE] PR86753

2019-10-04 Thread Richard Biener
On Thu, Oct 3, 2019 at 1:42 AM Prathamesh Kulkarni
 wrote:
>
> On Wed, 25 Sep 2019 at 09:17, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 16 Sep 2019 at 08:54, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 9 Sep 2019 at 09:36, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Prathamesh Kulkarni  writes:
> > > > > > With patch, the only following FAIL remains for aarch64-sve.exp:
> > > > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > > > > > scan-assembler-times \\tmovprfx\\t 6
> > > > > > which now contains 14.
> > > > > > Should I adjust the test, assuming the change isn't a regression ?
> > > > >
> > > > > Well, it is kind-of a regression, but it really just means that the
> > > > > integer code is now consistent with the floating-point code in having
> > > > > an unnecessary MOVPRFX.  So I think adjusting the count is fine.
> > > > > Presumably any future fix for the existing redundant MOVPRFXs will
> > > > > apply to the new ones as well.
> > > > >
> > > > > The patch looks good to me, just some very minor nits:
> > > > >
> > > > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
> > > > > >
> > > > > >  /* Record that a fully-masked version of LOOP_VINFO would need 
> > > > > > MASKS to
> > > > > > contain a sequence of NVECTORS masks that each control a vector 
> > > > > > of type
> > > > > > -   VECTYPE.  */
> > > > > > +   VECTYPE. SCALAR_MASK if non-null, represents the mask used for 
> > > > > > corresponding
> > > > > > +   load/store stmt.  */
> > > > >
> > > > > Should be two spaces between sentences.  Maybe:
> > > > >
> > > > >VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would 
> > > > > AND
> > > > >these vector masks with the vector version of SCALAR_MASK.  */
> > > > >
> > > > > since the mask isn't necessarily for a load or store statement.
> > > > >
> > > > > > [...]
> > > > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, 
> > > > > > tree, stmt_vec_info,
> > > > > > says how the load or store is going to be implemented and 
> > > > > > GROUP_SIZE
> > > > > > is the number of load or store statements in the containing 
> > > > > > group.
> > > > > > If the access is a gather load or scatter store, GS_INFO 
> > > > > > describes
> > > > > > -   its arguments.
> > > > > > +   its arguments. SCALAR_MASK is the scalar mask used for 
> > > > > > corresponding
> > > > > > +   load or store stmt.
> > > > >
> > > > > Maybe:
> > > > >
> > > > >its arguments.  If the load or store is conditional, SCALAR_MASK 
> > > > > is the
> > > > >condition under which it occurs.
> > > > >
> > > > > since SCALAR_MASK can be null here too.
> > > > >
> > > > > > [...]
> > > > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info 
> > > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > > >/* Handle cond expr.  */
> > > > > >for (j = 0; j < ncopies; j++)
> > > > > >  {
> > > > > > +  tree loop_mask = NULL_TREE;
> > > > > > +  bool swap_cond_operands = false;
> > > > > > +
> > > > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > > > + {
> > > > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > > > + {
> > > > > > +   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > > > vectype, j);
> > > > > > + }
> > > > > > +   else
> > > > > > + {
> > > > > > +   cond.code = invert_tree_comparison (cond.code,
> > > > > > +   HONOR_NANS 
> > > > > > (TREE_TYPE (cond.op0)));
> > > > >
> > > > > Long line.  Maybe just split it out into a separate assignment:
> > > > >
> > > > >   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > > >   cond.code = invert_tree_comparison (cond.code, 
> > > > > honor_nans);
> > > > >
> > > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > > > + {
> > > > > > +   vec_loop_masks *masks = _VINFO_MASKS 
> > > > > > (loop_vinfo);
> > > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, 
> > > > > > ncopies, vectype, j);
> > > > >
> > > > > Long line here too.
> > > > >
> > > > > > [...]
> > > > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info 
> > > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > > >   }
> > > > > >   }
> > > > > >   }
> > > > > > +
> > > > > > +   if (loop_mask)
> > > > > > + {
> > > > > > +   if (COMPARISON_CLASS_P (vec_compare))
> > > > > > + {
> > > > > > +   tree tmp = make_ssa_name (vec_cmp_type);
> > > > > > +   gassign *g = 

Re: compatibility of structs/unions/enums in the middle end

2019-10-04 Thread Richard Biener
On Wed, Oct 2, 2019 at 8:24 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 02.10.2019, 17:37 +0200 schrieb Richard Biener:
> > On October 2, 2019 3:55:43 PM GMT+02:00, "Uecker, Martin" 
> > 
> > wrote:
> > > Am Mittwoch, den 02.10.2019, 15:12 +0200 schrieb Richard Biener:
> > > > On Wed, Oct 2, 2019 at 3:10 PM Richard Biener
> > > >  wrote:
> > > > >
> ...
>
> > > > Oh, and LTO does _not_ merge types declared inside a function,
> > > > so
> > > >
> > > > void foo () { struct S { int i; }; }
> > > > void bar () { struct S { int i; }; }
> > > >
> > > > the two S are distinct and objects of that type do not conflict.
> > >
> > > This is surprising as these types are compatible across TUs. So
> > > if some pointer is passed between these functions this is
> > > supposed to work.
> >
> > So if they are compatible the frontend needs to mark them so in this case.
>
> It can't. The front end never sees the other TU.

If the type "leaves" the CU via a call the called function has a prototype
through which it "sees" the CU.

Richard.

> Best,
> Martin


[PATCH] Fix PR91982

2019-10-04 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-04  Richard Biener  

PR tree-optimization/91982
* tree-vect-loop.c (vectorizable_live_operation): Also guard
against EXTRACT_LAST_REDUCTION.
* tree-vect-stmts.c (vect_transform_stmt): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276564)
+++ gcc/tree-vect-loop.c(working copy)
@@ -7901,7 +7901,10 @@ vectorizable_live_operation (stmt_vec_in
return true;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)
{
- if (STMT_VINFO_REDUC_TYPE (stmt_info) == FOLD_LEFT_REDUCTION)
+ if (STMT_VINFO_REDUC_TYPE (stmt_info) == FOLD_LEFT_REDUCTION
+ || (STMT_VINFO_REDUC_TYPE (stmt_info) == COND_REDUCTION
+ && (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
+ == EXTRACT_LAST_REDUCTION)))
return true;
  if (slp_node)
{
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 276564)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10897,6 +10897,9 @@ vect_transform_stmt (stmt_vec_info stmt_
   stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
   if (!slp_node && STMT_VINFO_REDUC_DEF (orig_stmt_info)
   && STMT_VINFO_REDUC_TYPE (orig_stmt_info) != FOLD_LEFT_REDUCTION
+  && (STMT_VINFO_REDUC_TYPE (orig_stmt_info) != COND_REDUCTION
+ || (STMT_VINFO_VEC_REDUCTION_TYPE (orig_stmt_info)
+ != EXTRACT_LAST_REDUCTION))
   && is_a  (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt))
 {
   gphi *phi = as_a  (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt);


Re: [SVE] PR91532

2019-10-04 Thread Richard Biener
On Thu, 3 Oct 2019, Prathamesh Kulkarni wrote:

> On Wed, 2 Oct 2019 at 12:28, Richard Biener  wrote:
> >
> > On Wed, 2 Oct 2019, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
> > > >
> > > > On 10/1/19 12:40 AM, Richard Biener wrote:
> > > > > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > > > >
> > > > >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  
> > > > >> wrote:
> > > > >>>
> > > > >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> > > > >>>
> > > > >>>> On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > > > >>>>>
> > > > >>>>> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > >>>>>> Hi,
> > > > >>>>>> For PR91532, the dead store is trivially deleted if we place dse 
> > > > >>>>>> pass
> > > > >>>>>> between ifcvt and vect. Would it be OK to add another instance 
> > > > >>>>>> of dse there ?
> > > > >>>>>> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt 
> > > > >>>>>> that
> > > > >>>>>> will clean up the dead store ?
> > > > >>>>> I'd hesitate to add another DSE pass.  If there's one nearby 
> > > > >>>>> could we
> > > > >>>>> move the existing pass?
> > > > >>>> Well I think the nearest one is just after pass_warn_restrict. Not
> > > > >>>> sure if it's a good
> > > > >>>> idea to move it up from there ?
> > > > >>>
> > > > >>> You'll need it inbetween ifcvt and vect so it would be disabled
> > > > >>> w/o vectorization, so no, that doesn't work.
> > > > >>>
> > > > >>> ifcvt already invokes SEME region value-numbering so if we had
> > > > >>> MESE region DSE it could use that.  Not sure if you feel like
> > > > >>> refactoring DSE to work on regions - it currently uses a DOM
> > > > >>> walk which isn't suited for that.
> > > > >>>
> > > > >>> if-conversion has a little "local" dead predicate compute removal
> > > > >>> thingy (not that I like that), eventually it can be enhanced to
> > > > >>> do the DSE you want?  Eventually it should be moved after the local
> > > > >>> CSE invocation though.
> > > > >> Hi,
> > > > >> Thanks for the suggestions.
> > > > >> For now, would it be OK to do "dse" on loop header in
> > > > >> tree_if_conversion, as in the attached patch ?
> > > > >> The patch does local dse in a new function ifcvt_local_dse instead of
> > > > >> ifcvt_local_dce, because it needed to be done after RPO VN which
> > > > >> eliminates:
> > > > >> Removing dead stmt _ifc__62 = *_55;
> > > > >> and makes the following store dead:
> > > > >> *_55 = _ifc__61;
> > > > >
> > > > > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > > > > try that as independent patch (pre-approved).
> > > > >
> > > > > I don't mind the extra walk though.
> > > > >
> > > > > What I see as possible issue is that dse_classify_store walks virtual
> > > > > uses and I'm not sure if the loop exit is a natural boundary for
> > > > > such walk (eventually the loop header virtual PHI is reached but
> > > > > there may also be a loop-closed PHI for the virtual operand,
> > > > > but not necessarily).  So the question is whether to add a
> > > > > "stop at" argument to dse_classify_store specifying the virtual
> > > > > use the walk should stop at?
> > > > I think we want to stop at the block boundary -- aren't the cases we
> > > > care about here local to a block?
> > > This version restricts walking in dse_classify_store to basic-block if
> > > bb_only is true,
> > > and removes dead stores in ifcvt_local_dce instead of separate walk.
> > > Does it look OK ?
> >
> > As relied to Jeff please make it trivially work for SESE region walks
>

Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On October 2, 2019 3:55:43 PM GMT+02:00, "Uecker, Martin" 
 wrote:
>Am Mittwoch, den 02.10.2019, 15:12 +0200 schrieb Richard Biener:
>> On Wed, Oct 2, 2019 at 3:10 PM Richard Biener
>>  wrote:
>> > 
>> > On Wed, Oct 2, 2019 at 2:35 PM Uecker, Martin
>> >  wrote:
>> > > 
>> > > Am Mittwoch, den 02.10.2019, 14:18 +0200 schrieb Richard Biener:
>> > > > On Wed, Oct 2, 2019 at 1:57 PM Uecker, Martin
>> > > >  wrote:
>> > > > > 
>> > > 
>> > > Thank you for your answers.
>> > > 
>> > > > > Finally, how does LTO does it? It somehow also needs to unify
>> > > > > different tagged types? Could we reuse this mechanism
>somehow?
>> > > > 
>> > > > LTO structurally merges types via TYPE_CANONICAL.  But rules
>> > > > for merging depend on language semantics, too much merging
>> > > > hinders optimization.
>> > > 
>> > > LTO would need to merge types with identical tag and structure
>> > > across TUs anyway as this is needed for C programs to work.
>> > > But this implies that it also must merge such types inside a TU
>> > > (because merging enforces an equivalence relationship).
>> > > So if I am not missing anything important, LTO would already
>> > > implement the exact semantics which I propose for C2X.
>> > 
>> > Sure LTO handles C2X fine.  The issue is that it creates way
>> > larger equivalence classes than necessary for C2X (it has to
>> > work across language boundaries where compatibility is much
>> > less specified).
>
>Ok, using this would also for our purposes would pessimize things
>too much.
>
>> Oh, and LTO does _not_ merge types declared inside a function,
>> so
>> 
>> void foo () { struct S { int i; }; }
>> void bar () { struct S { int i; }; }
>> 
>> the two S are distinct and objects of that type do not conflict.
>
>This is surprising as these types are compatible across TUs. So
>if some pointer is passed between these functions this is
>supposed to work.

So if they are compatible the frontend needs to mark them so in this case. 

Richard. 

>Best,
>Martin



Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 3:10 PM Richard Biener
 wrote:
>
> On Wed, Oct 2, 2019 at 2:35 PM Uecker, Martin
>  wrote:
> >
> > Am Mittwoch, den 02.10.2019, 14:18 +0200 schrieb Richard Biener:
> > > On Wed, Oct 2, 2019 at 1:57 PM Uecker, Martin
> > >  wrote:
> > > >
> >
> > Thank you for your answers.
> >
> > > > Finally, how does LTO does it? It somehow also needs to unify
> > > > different tagged types? Could we reuse this mechanism somehow?
> > >
> > > LTO structurally merges types via TYPE_CANONICAL.  But rules
> > > for merging depend on language semantics, too much merging
> > > hinders optimization.
> >
> > LTO would need to merge types with identical tag and structure
> > across TUs anyway as this is needed for C programs to work.
> > But this implies that it also must merge such types inside a TU
> > (because merging enforces an equivalence relationship).
> > So if I am not missing anything important, LTO would already
> > implement the exact semantics which I propose for C2X.
>
> Sure LTO handles C2X fine.  The issue is that it creates way
> larger equivalence classes than necessary for C2X (it has to
> work across language boundaries where compatibility is much
> less specified).

Oh, and LTO does _not_ merge types declared inside a function,
so

void foo () { struct S { int i; }; }
void bar () { struct S { int i; }; }

the two S are distinct and objects of that type do not conflict.

Richard.

> Richard.
>
> > Best,
> > Martin


Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 2:35 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 02.10.2019, 14:18 +0200 schrieb Richard Biener:
> > On Wed, Oct 2, 2019 at 1:57 PM Uecker, Martin
> >  wrote:
> > >
>
> Thank you for your answers.
>
> > > Finally, how does LTO does it? It somehow also needs to unify
> > > different tagged types? Could we reuse this mechanism somehow?
> >
> > LTO structurally merges types via TYPE_CANONICAL.  But rules
> > for merging depend on language semantics, too much merging
> > hinders optimization.
>
> LTO would need to merge types with identical tag and structure
> across TUs anyway as this is needed for C programs to work.
> But this implies that it also must merge such types inside a TU
> (because merging enforces an equivalence relationship).
> So if I am not missing anything important, LTO would already
> implement the exact semantics which I propose for C2X.

Sure LTO handles C2X fine.  The issue is that it creates way
larger equivalence classes than necessary for C2X (it has to
work across language boundaries where compatibility is much
less specified).

Richard.

> Best,
> Martin


Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 1:57 PM Uecker, Martin
 wrote:
>
> Am Mittwoch, den 02.10.2019, 12:47 +0200 schrieb Richard Biener:
> > On Wed, Oct 2, 2019 at 12:46 PM Richard Biener
> >  wrote:
> > >
> > > On Tue, Oct 1, 2019 at 7:49 PM Uecker, Martin
> > >  wrote:
>
> ...
> > > >
> > > > In particular, the idea is to make structs (+ unions, enums)
> > > > with the same tag and the same members compatible. The
> > > > current C standards says that such structs are compatible
> > > > between different TUs but not inside the same TU, which
> > > > is very strange and - as pointed out by Joseph
> > > > in DR314 - this leads to "interesting" scenarios
> > > > where types across different TU cannot be partitioned
> > > > into equivalence classes in a consistent way.
> ...
>
> > > > I would appreciate any information about how to
> > > > approach this.
> > >
> > > The frontend either needs to have the same internal
> > > type representation for both or provide the middle-end
> > > with unification of compatible types via the TYPE_CANONICAL
> > > mechanism (that's what the C++ FE does in similar circumstances).
> > >
> > > That is, the TBAA machinery relies on TYPE_CANONICAL (TYPE_MAIN_VARIANT 
> > > (st1))
> > > == TYPE_CANONICAL (TYPE_MAIN_VARIANT (st2))
> > > (or requivalent TYPE_MAIN_VARIANT if that's already the case).
>
> Yes, this is what I assumed from looking at the code. The problem
> is that the front end would need to go over all types and set
> TYPE_CANONICAL.

Yes.

> This seems easy to do on the fly whenever the front
> end needs to compare types anyway, but this would not be enough
> as also types which appear unrelated to the front end (e.g. two
> types declared in separate local scopes) could be compatible.
> To identify these types would require searching a data structure
> of all such types in the front end every time a new tagged type
> is created. This would not be too difficult to implement.
>
> On the other hand, the situation with this propsoal for such types
> is then very similar to any other complex type expressions which
> need to compared structurally in the middle end. So what I am
> wondering is whether it would be possible to do such comparisons
> in the middle end also for tagged types?

The middle-end ensures there's only one such type via hashing
types via type_hash_canon.

> Finally, how does LTO does it? It somehow also needs to unify
> different tagged types? Could we reuse this mechanism somehow?

LTO structurally merges types via TYPE_CANONICAL.  But rules
for merging depend on language semantics, too much merging
hinders optimization.

> > Btw, for you example, how do you expect debug information to look like?
> > Would there be two type definitions that are not related?
>
> I don't know yet. This is why I am trying to implement it, to
> figure out all these practical issues. How does it work now for
> tagged types in different TUs that are compatible?

You get two copies.

Richard.

> Best,
> Martin


[PATCH] Split out stmt transform from vectorizable_reduction

2019-10-02 Thread Richard Biener


I've done only "trivial" pruning of unnecessary code to reduce the
chance of functional changes.  Cleanup will happen as followup.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-02  Richard Biener  

* tree-vectorizer.h (vect_transform_reduction): Declare.
* tree-vect-stmts.c (vect_transform_stmt): Use it.
* tree-vect-loop.c (vectorizable_reduction): Split out reduction
stmt transform to ...
(vect_transform_reduction): ... this.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a3fd011e6c4..31e745780ba 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5784,7 +5784,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   int i;
   int ncopies;
   bool single_defuse_cycle = false;
-  int j;
   tree ops[3];
   enum vect_def_type dts[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
@@ -6576,43 +6575,224 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
   bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in);
 
-  if (!vec_stmt) /* transformation not required.  */
+  /* transformation not required.  */
+  gcc_assert (!vec_stmt);
+
+  vect_model_reduction_cost (stmt_info, reduc_fn, ncopies, cost_vec);
+  if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
 {
-  vect_model_reduction_cost (stmt_info, reduc_fn, ncopies, cost_vec);
-  if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
+  if (reduction_type != FOLD_LEFT_REDUCTION
+ && !mask_by_cond_expr
+ && (cond_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED)))
{
- if (reduction_type != FOLD_LEFT_REDUCTION
- && !mask_by_cond_expr
- && (cond_fn == IFN_LAST
- || !direct_internal_fn_supported_p (cond_fn, vectype_in,
- OPTIMIZE_FOR_SPEED)))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't use a fully-masked loop because no"
-" conditional operation is available.\n");
- LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
-   }
- else if (reduc_index == -1)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't use a fully-masked loop for chained"
-" reductions.\n");
- LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
-   }
- else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in);
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't use a fully-masked loop because no"
+" conditional operation is available.\n");
+ LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}
-  if (dump_enabled_p ()
- && reduction_type == FOLD_LEFT_REDUCTION)
-   dump_printf_loc (MSG_NOTE, vect_location,
-"using an in-order (fold-left) reduction.\n");
-  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
-  return true;
+  else if (reduc_index == -1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't use a fully-masked loop for chained"
+" reductions.\n");
+ LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+   }
+  else
+   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
+  vectype_in);
+}
+  if (dump_enabled_p ()
+  && reduction_type == FOLD_LEFT_REDUCTION)
+dump_printf_loc (MSG_NOTE, vect_location,
+"using an in-order (fold-left) reduction.\n");
+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+  return true;
+}
+
+/* Transform the definition stmt STMT_INFO of a reduction PHI backedge
+   value.  */
+
+bool
+vect_transform_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+ stmt_vec_info *vec_stmt, slp_tree slp_node)
+{
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  tree vectype_in = NULL_TREE;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  enum tree_code code;
+  int op_type;
+  bool is_sim

Re: [patch] range-ops contribution

2019-10-02 Thread Richard Biener
.  I won't
> > however, test all of Fedora :-P.
> Agreed, I don't think that's necessary.  FWIW, using a month-old branch
> for testing was amazingly helpful in other respects.  We found ~100
> packages that need updating for gcc-10 as well as a few bugs unrelated
> to Ranger.  I've actually got Sunday's snapshot spinning now and fully
> expect to be spinning Fedora builds with snapshots for the next several
> months.  So I don't expect a Fedora build just to test after ranger
> integration, but instead that it'll "just happen" on a subsequent snapshot.
>
> >
> > May I be so bold as to suggest that if there are minor suggestions that
> > arise from this review, that they be done as follow-ups?  I'd like to
> > get as much testing as possible in this stage1.
> There's a variety of small, obvious things that should be fixed.
> Comment typos and the like.  There's one question on inversion that may
> require some discussion.
>
> See inline comments...
>
>
> >
> > Thanks.
> > Aldy
> >
> >
> > range-ops.patch
> >
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index 65f9db966d0..9aa46c087b8 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,68 @@
> > +2019-09-25  Aldy Hernandez  
> > +
> > + * Makefile.in (OBJS): Add range.o and range-op.o.
> > + Remove wide-int-range.o.
> > + (GTFILES): Add range.h.
> > + * function-tests.c (test_ranges): New.
> > + (function_tests_c_tests): Call test_ranges.
> > + * ipa-cp.c (ipa_vr_operation_and_type_effects): Call
> > + range_fold_unary_expr instead of extract_range_from_unary_expr.
> > + * ipa-prop.c (ipa_compute_jump_functions_for_edge): Same.
> > + * range-op.cc: New file.
> > + * range-op.h: New file.
> > + * range.cc: New file.
> > + * range.h: New file.
> > + * selftest.h (range_tests): New prototype.
> > + * ssa.h: Include range.h.
> > + * tree-vrp.c (value_range_base::value_range_base): New
> > + constructors.
> > + (value_range_base::singleton_p): Do not call
> > + ranges_from_anti_range until sure we will need to.
> > + (value_range_base::type): Rename gcc_assert to
> > + gcc_checking_assert.
> > + (vrp_val_is_max): New argument.
> > + (vrp_val_is_min): Same.
> > + (wide_int_range_set_zero_nonzero_bits): Move from
> > + wide-int-range.cc.
> > + (extract_range_into_wide_ints): Remove.
> > + (extract_range_from_multiplicative_op): Remove.
> > + (extract_range_from_pointer_plus_expr): Abstract POINTER_PLUS code
> > + from extract_range_from_binary_expr.
> > + (extract_range_from_plus_minus_expr): Abstract PLUS/MINUS code
> > + from extract_range_from_binary_expr.
> > + (extract_range_from_binary_expr): Remove.
> > + (normalize_for_range_ops): New.
> > + (range_fold_binary_expr): New.
> > + (range_fold_unary_expr): New.
> > + (value_range_base::num_pairs): New.
> > + (value_range_base::lower_bound): New.
> > + (value_range_base::upper_bound): New.
> > + (value_range_base::upper_bound): New.
> > + (value_range_base::contains_p): New.
> > + (value_range_base::invert): New.
> > + (value_range_base::union_): New.
> > + (value_range_base::intersect): New.
> > + (range_compatible_p): New.
> > + (value_range_base::operator==): New.
> > + (determine_value_range_1): Call range_fold_*expr instead of
> > + extract_range_from_*expr.
> > + * tree-vrp.h (class value_range_base): Add new constructors.
> > + Add methods for union_, intersect, operator==, contains_p,
> > + num_pairs, lower_bound, upper_bound, invert.
> > + (vrp_val_is_min): Add handle_pointers argument.
> > + (vrp_val_is_max): Same.
> > + (extract_range_from_unary_expr): Remove.
> > + (extract_range_from_binary_expr): Remove.
> > + (range_fold_unary_expr): New.
> > + (range_fold_binary_expr): New.
> > + * vr-values.c (vr_values::extract_range_from_binary_expr): Call
> > + range_fold_binary_expr instead of extract_range_from_binary_expr.
> > + (vr_values::extract_range_basic): Same.
> > + (vr_values::extract_range_from_unary_expr): Call
> > + range_fold_unary_expr instead of extract_range_from_unary_expr.
> > + * wide-int-range.cc: Remove.
> > + * wide-int-range.h: Remove.
> > +
> >  2019-08-27  Richard Biener  
> >
> >   * config/i386/i386-features.h
> > diff --git a/gcc/Makefile

Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 12:46 PM Richard Biener
 wrote:
>
> On Tue, Oct 1, 2019 at 7:49 PM Uecker, Martin
>  wrote:
> >
> >
> >
> > Hi,
> >
> > I have a proposal for making changes to the rules for
> > compatibility of tagged types in C2X  (N2366). This was
> > received with interest by WG14, so there is a chance
> > that this could get accepted into C2X.
> >
> > In particular, the idea is to make structs (+ unions, enums)
> > with the same tag and the same members compatible. The
> > current C standards says that such structs are compatible
> > between different TUs but not inside the same TU, which
> > is very strange and - as pointed out by Joseph
> > in DR314 - this leads to "interesting" scenarios
> > where types across different TU cannot be partitioned
> > into equivalence classes in a consistent way.
> >
> > The new rules would fix these inconsistencies and also
> > make some useful programming patterns possible: E.g. one
> > could declare structs/union/enums types in a macro so
> > that another invocation produces a compatible type.
> > For example:
> >
> > #define MAYBE(T) struct foo_##T { _Bool flag; T value };
> >
> > MAYBE(int) x = { true, 0 };
> > MAYBE(int) y = x;
> >
> >
> > I am working on a patch for GCC which adds this as an
> > optional feature. So far, I have a working patch to the
> > C front end which changes the concept of type compatibility
> > to match the proposed model. It uses the existing code
> > for type compatibility, so is relatively simple.
> >
> > The question is now how this should interact with the
> > middle end. So far, I have to insert some VIEW_CONVERT_EXPR
> > to avoid "useless type conversion" errors during gimple
> > verification.
> >
> > I am also wonder how to make TBAA do the right thing
> > for the new rules. Currently, GCC assumes 's1p' and 's2p'
> > cannot alias in the following example and outputs '2'
> > in 'f', but this would not be true anymore according
> > to the proposal.
> >
> >
> > #include 
> >
> > typedef struct { int i; } st1;
> > typedef struct { int i; } st2;
> >
> > void f(void* s1v, void* s2v)
> > {
> >   st1 *s1p = s1v;
> >   st2 *s2p = s2v;
> >   s1p->i = 2;
> >   s2p->i = 3;
> >   printf("f: s1p->i = %i\n", s1p->i);
> > }
> >
> > int main()
> > {
> >   st1 s = { .i = 1 };
> >   f(, );
> >   printf("s.i = %i\n", s.i);
> > }
> >
> > BTW: According to current rules when 'f' is
> > moved into a different TU, there is no UB.
> > As both 'st1'
> > and 'st2' in one TU are compatible
> > to both 'st1' and 'st2' in the other TU there
> > is no UB. Still, GCC
> > incorrectly assumes that
> > 's1p' and 's1p' do not alias.
> >
> >
> > I would appreciate any information about how to
> > approach this.
>
> The frontend either needs to have the same internal
> type representation for both or provide the middle-end
> with unification of compatible types via the TYPE_CANONICAL
> mechanism (that's what the C++ FE does in similar circumstances).
>
> That is, the TBAA machinery relies on TYPE_CANONICAL (TYPE_MAIN_VARIANT (st1))
> == TYPE_CANONICAL (TYPE_MAIN_VARIANT (st2))
> (or requivalent TYPE_MAIN_VARIANT if that's already the case).

Btw, for you example, how do you expect debug information to look like?
Would there be two type definitions that are not related?

Richard.

> Richard.
>
> >
> > Best,
> > Martin
> >


Re: compatibility of structs/unions/enums in the middle end

2019-10-02 Thread Richard Biener
On Tue, Oct 1, 2019 at 7:49 PM Uecker, Martin
 wrote:
>
>
>
> Hi,
>
> I have a proposal for making changes to the rules for
> compatibility of tagged types in C2X  (N2366). This was
> received with interest by WG14, so there is a chance
> that this could get accepted into C2X.
>
> In particular, the idea is to make structs (+ unions, enums)
> with the same tag and the same members compatible. The
> current C standards says that such structs are compatible
> between different TUs but not inside the same TU, which
> is very strange and - as pointed out by Joseph
> in DR314 - this leads to "interesting" scenarios
> where types across different TU cannot be partitioned
> into equivalence classes in a consistent way.
>
> The new rules would fix these inconsistencies and also
> make some useful programming patterns possible: E.g. one
> could declare structs/union/enums types in a macro so
> that another invocation produces a compatible type.
> For example:
>
> #define MAYBE(T) struct foo_##T { _Bool flag; T value };
>
> MAYBE(int) x = { true, 0 };
> MAYBE(int) y = x;
>
>
> I am working on a patch for GCC which adds this as an
> optional feature. So far, I have a working patch to the
> C front end which changes the concept of type compatibility
> to match the proposed model. It uses the existing code
> for type compatibility, so is relatively simple.
>
> The question is now how this should interact with the
> middle end. So far, I have to insert some VIEW_CONVERT_EXPR
> to avoid "useless type conversion" errors during gimple
> verification.
>
> I am also wonder how to make TBAA do the right thing
> for the new rules. Currently, GCC assumes 's1p' and 's2p'
> cannot alias in the following example and outputs '2'
> in 'f', but this would not be true anymore according
> to the proposal.
>
>
> #include 
>
> typedef struct { int i; } st1;
> typedef struct { int i; } st2;
>
> void f(void* s1v, void* s2v)
> {
>   st1 *s1p = s1v;
>   st2 *s2p = s2v;
>   s1p->i = 2;
>   s2p->i = 3;
>   printf("f: s1p->i = %i\n", s1p->i);
> }
>
> int main()
> {
>   st1 s = { .i = 1 };
>   f(, );
>   printf("s.i = %i\n", s.i);
> }
>
> BTW: According to current rules when 'f' is
> moved into a different TU, there is no UB.
> As both 'st1'
> and 'st2' in one TU are compatible
> to both 'st1' and 'st2' in the other TU there
> is no UB. Still, GCC
> incorrectly assumes that
> 's1p' and 's1p' do not alias.
>
>
> I would appreciate any information about how to
> approach this.

The frontend either needs to have the same internal
type representation for both or provide the middle-end
with unification of compatible types via the TYPE_CANONICAL
mechanism (that's what the C++ FE does in similar circumstances).

That is, the TBAA machinery relies on TYPE_CANONICAL (TYPE_MAIN_VARIANT (st1))
== TYPE_CANONICAL (TYPE_MAIN_VARIANT (st2))
(or requivalent TYPE_MAIN_VARIANT if that's already the case).

Richard.

>
> Best,
> Martin
>


Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
 wrote:
>
> Hi,
>
> As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
> passing assembler options specified with -Wa, to the link-time driver.
>
> The proposed solution only works for uniform -Wa options across all
> TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
> would require either adjusting partitioning according to flags or
> emitting multiple object files  from a single LTRANS CU. We could
> consider this as a follow up.
>
> Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for trunk?

While it works for your simple cases it is unlikely to work in practice since
your implementation needs the assembler options be present at the link
command line.  I agree that this might be the way for people to go when
they face the issue but then it needs to be documented somewhere
in the manual.

That is, with COLLECT_AS_OPTION (why singular?  I'd expected
COLLECT_AS_OPTIONS) available to cc1 we could stream this string
to lto_options and re-materialize it at link time (and diagnose mismatches
even if we like).

Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2019-10-02  kugan.vivekanandarajah  
>
> PR lto/78353
> * gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in env.
> (driver::main): Call putenv_COLLECT_AS_OPTION.
> * lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-02  kugan.vivekanandarajah  
>
> PR lto/78353
> * gcc.target/arm/pr78353-1.c: New test.
> * gcc.target/arm/pr78353-2.c: New test.


[PATCH] Fix build with GCC 4.3

2019-10-02 Thread Richard Biener


Which complains

libcpp/internal.h:129: error: comma at end of enumerator list

Committed as obvious.

Richard.

Index: libcpp/internal.h
===
--- libcpp/internal.h   (revision 276439)
+++ libcpp/internal.h   (working copy)
@@ -126,7 +126,7 @@ enum include_type
IT_MAIN, /* main  */
 
IT_DIRECTIVE_HWM = IT_IMPORT + 1,  /* Directives below this.  */
-   IT_HEADER_HWM = IT_DEFAULT + 1,/* Header files below this.  */
+   IT_HEADER_HWM = IT_DEFAULT + 1 /* Header files below this.  */
   };
 
 union utoken


Re: Modifying types during optimization

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 1:43 AM Gary Oblock  wrote:
>
> I'm working on structure reorganization optimizations and one of the
> things that needs to happen is that pointers to arrays of structures
> need to be modified into either being an integer of a structure depending
> on which optimization is required.
>
> I'm seeing something similar happening in omp-low.c where the code in
> install_var_field and fixup_child_record_type both seem to rebuild the
> entire type from scratch if a field is either added or modified. Wouldn't
> it be possible simply modify the field(s) in question and rerun layout_type?
>
> I suspect the answer will be no but reasons as to why that wouldn't work
> will probably be equally valuable to me.

I think it's undesirable at least.  When last discussing "structure reorg"
I was always arguing that trying to change the "type" is the wrong angle
to look at (likewise computing something like "type escape").  It's
really individual objects you are transforming and that you need to track
so there may be very well instances of the original type T plus the
modified type T' in the program after the transform.

Richard.

>
> Thanks,
>
> Gary Oblock
>


Re: [PATCH] Allow vectorization of __builtin_bswap16 (PR tree-optimization/91940)

2019-10-02 Thread Richard Biener
On Wed, 2 Oct 2019, Jakub Jelinek wrote:

> On Wed, Nov 09, 2016 at 09:14:55AM +0100, Richard Biener wrote:
> > The following implements vectorization of bswap via VEC_PERM_EXPR
> > on the corresponding QImode vector.
> > 
> > ARM already has backend handling via the builtin_vectorized_call
> > hook and thus there were already testcases available.  It doesn't
> > end up working for vect-bswap16.c because we have a promoted
> > argument to __builtin_bswap16 which confuses vectorization.
> 
> Indeed.  The following patch handles that in tree-vect-patterns.c.
> If it sees a __builtin_bswap16 with the promoted argument, it checks if we'd
> vectorize the builtin if it didn't have a promoted argument and if yes,
> it just changes it in a pattern_stmt to use an unpromoted argument or casts
> it first to the right type.  This works e.g. for the SSE4 case.
> Otherwise, it handles __builtin_bswap16 like a x r<< 8, if that is
> vectorizable, emits a pattern_stmt with x r<< 8, if it isn't, falls back to
> (x << 8) | (x >> 8) if that can be vectorized.  The last case matters for 
> SSE2.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-10-02  Jakub Jelinek  
> 
>   PR tree-optimization/91940
>   * tree-vect-patterns.c: Include tree-vector-builder.h and
>   vec-perm-indices.h.
>   (vect_recog_rotate_pattern): Also handle __builtin_bswap16, either by
>   unpromoting the argument back to uint16_t, or by converting into a
>   rotate, or into shifts plus ior.
> 
>   * gcc.dg/vect/vect-bswap16.c: Add -msse4 on x86, run on all targets,
>   expect vectorized 1 loops message on both vect_bswap and sse4_runtime
>   targets.
>   * gcc.dg/vect/vect-bswap16a.c: New test.
> 
> --- gcc/tree-vect-patterns.c.jj   2019-09-20 12:25:48.186387075 +0200
> +++ gcc/tree-vect-patterns.c  2019-10-01 11:29:18.229215895 +0200
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.
>  #include "cgraph.h"
>  #include "omp-simd-clone.h"
>  #include "predict.h"
> +#include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>  
>  /* Return true if we have a useful VR_RANGE range for VAR, storing it
> in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
> @@ -2168,24 +2170,107 @@ vect_recog_rotate_pattern (stmt_vec_info
>enum vect_def_type dt;
>optab optab1, optab2;
>edge ext_def = NULL;
> +  bool bswap16_p = false;
>  
> -  if (!is_gimple_assign (last_stmt))
> -return NULL;
> -
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> -  switch (rhs_code)
> +  if (is_gimple_assign (last_stmt))
>  {
> -case LROTATE_EXPR:
> -case RROTATE_EXPR:
> -  break;
> -default:
> -  return NULL;
> +  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  switch (rhs_code)
> + {
> + case LROTATE_EXPR:
> + case RROTATE_EXPR:
> +   break;
> + default:
> +   return NULL;
> + }
> +
> +  lhs = gimple_assign_lhs (last_stmt);
> +  oprnd0 = gimple_assign_rhs1 (last_stmt);
> +  type = TREE_TYPE (oprnd0);
> +  oprnd1 = gimple_assign_rhs2 (last_stmt);
> +}
> +  else if (gimple_call_builtin_p (last_stmt, BUILT_IN_BSWAP16))
> +{
> +  /* __builtin_bswap16 (x) is another form of x r>> 8.
> +  The vectorizer has bswap support, but only if the argument isn't
> +  promoted.  */
> +  lhs = gimple_call_lhs (last_stmt);
> +  oprnd0 = gimple_call_arg (last_stmt, 0);
> +  type = TREE_TYPE (oprnd0);
> +  if (TYPE_PRECISION (TREE_TYPE (lhs)) != 16
> +   || TYPE_PRECISION (type) <= 16
> +   || TREE_CODE (oprnd0) != SSA_NAME
> +   || BITS_PER_UNIT != 8
> +   || !TYPE_UNSIGNED (TREE_TYPE (lhs)))
> + return NULL;
> +
> +  stmt_vec_info def_stmt_info;
> +  if (!vect_is_simple_use (oprnd0, vinfo, , _stmt_info, 
> _stmt))
> + return NULL;
> +
> +  if (dt != vect_internal_def)
> + return NULL;
> +
> +  if (gimple_assign_cast_p (def_stmt))
> + {
> +   def = gimple_assign_rhs1 (def_stmt);
> +   if (INTEGRAL_TYPE_P (TREE_TYPE (def))
> +   && TYPE_PRECISION (TREE_TYPE (def)) == 16)
> + oprnd0 = def;
> + }
> +
> +  type = TREE_TYPE (lhs);
> +  vectype = get_vectype_for_scalar_type (type);
> +  if (vectype == NULL_TREE)
> + return NULL;
> +
> +  if (tree char_vectype = get_same_sized_vectype (char_type_node, 
> vectype))
> + {
> +   /* The encoding uses one stepped pat

[PATCH] Fix PR91606

2019-10-02 Thread Richard Biener


A middle-end change exposed that the C++ FE handling of pointer-to-member
aggregates in cxx_get_alias_set isn't effective.  The following patch
makes it so by design by marking the structure members as not being
aliased (there can be no pointers like __pfn or __delta) thus
making them behave like fat pointers.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk and affected branches?

Thanks,
Richard.

2019-10-02  Richard Biener  

PR c++/91606
* decl.c (build_ptrmemfunc_type): Mark pointer-to-member
fat pointer structure members as DECL_NONADDRESSABLE_P.

* g++.dg/torture/pr91606.C: New testcase.

Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c   (revision 276396)
+++ gcc/cp/decl.c   (working copy)
@@ -9554,10 +9554,12 @@ build_ptrmemfunc_type (tree type)
   TYPE_PTRMEMFUNC_FLAG (t) = 1;
 
   field = build_decl (input_location, FIELD_DECL, pfn_identifier, type);
+  DECL_NONADDRESSABLE_P (field) = 1;
   fields = field;
 
   field = build_decl (input_location, FIELD_DECL, delta_identifier, 
  delta_type_node);
+  DECL_NONADDRESSABLE_P (field) = 1;
   DECL_CHAIN (field) = fields;
   fields = field;
 
Index: gcc/testsuite/g++.dg/torture/pr91606.C
===
--- gcc/testsuite/g++.dg/torture/pr91606.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr91606.C  (working copy)
@@ -0,0 +1,109 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fstrict-aliasing" } */
+
+#include 
+#include 
+#include 
+
+template 
+struct variant
+{
+  constexpr variant(T1 arg)
+  : f1(arg),
+  index(0)
+  {}
+
+  constexpr variant(T2 arg)
+  : f2(arg),
+  index(1)
+  {}
+
+  union
+{
+  T1 f1;
+  T2 f2;
+};
+  std::size_t index = 0;
+};
+
+template 
+constexpr const T1* get_if(const variant* v)
+{
+  if (v->index != 0)
+{
+  return nullptr;
+}
+  return >f1;
+}
+
+template 
+constexpr const T2* get_if(const variant* v)
+{
+  if (v->index != 1)
+{
+  return nullptr;
+}
+  return >f2;
+}
+
+template 
+struct my_array
+{
+  constexpr const T* begin() const
+{
+  return data;
+}
+
+  constexpr const T* end() const
+{
+  return data + N;
+}
+
+  T data[N];
+};
+
+template 
+constexpr auto get_array_of_variants(Ts ...ptrs)
+{
+  return std::array...>, sizeof...(Ts)>{ ptrs... };
+}
+
+template 
+constexpr auto get_member_functions();
+
+template 
+constexpr int getFuncId(Member (Class::*memFuncPtr))
+{
+  int idx = 0u;
+  for (auto  : get_member_functions())
+{
+  if (auto *specificFunc = get_if())
+   {
+ if (*specificFunc == memFuncPtr)
+   {
+ return idx;
+   }
+   }
+  ++idx;
+}
+  std::abort();
+}
+
+struct MyStruct
+{
+  void fun1(int /*a*/) {}
+
+  int fun2(char /*b*/, short /*c*/, bool /*d*/) { return 0; }
+
+};
+
+template <>
+constexpr auto get_member_functions()
+{
+  return get_array_of_variants(::fun1, ::fun2);
+}
+
+int main()
+{
+  return getFuncId(::fun1);
+}


[PATCH] Split out PHI transform from vectorizable_reduction

2019-10-02 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-02  Richard Biener  

* tree-vectorizer.h (stmt_vec_info_type::cycle_phi_info_type):
New.
(vect_transform_cycle_phi): Declare.
* tree-vect-stmts.c (vect_transform_stmt): Call
vect_transform_cycle_phi.
* tree-vect-loop.c (vectorizable_reduction): Split out
PHI transformation stage to ...
(vect_transform_cycle_phi): ... here.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 350cee58246..a3fd011e6c4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5783,7 +5783,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   bool is_simple_use;
   int i;
   int ncopies;
-  stmt_vec_info prev_phi_info;
   bool single_defuse_cycle = false;
   int j;
   tree ops[3];
@@ -5811,207 +5810,15 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
 gcc_assert (slp_node
&& REDUC_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info);
 
-  if (gphi *phi = dyn_cast  (stmt_info->stmt))
+  if (is_a  (stmt_info->stmt))
 {
-  tree phi_result = gimple_phi_result (phi);
   /* Analysis is fully done on the reduction stmt invocation.  */
-  if (! vec_stmt)
-   {
- if (slp_node)
-   slp_node_instance->reduc_phis = slp_node;
-
- STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
- return true;
-   }
-
-  if (STMT_VINFO_REDUC_TYPE (stmt_info) == FOLD_LEFT_REDUCTION)
-   /* Leave the scalar phi in place.  Note that checking
-  STMT_VINFO_VEC_REDUCTION_TYPE (as below) only works
-  for reductions involving a single statement.  */
-   return true;
-
-  stmt_vec_info reduc_stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
-  reduc_stmt_info = vect_stmt_to_vectorize (reduc_stmt_info);
-
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (reduc_stmt_info)
- == EXTRACT_LAST_REDUCTION)
-   /* Leave the scalar phi in place.  */
-   return true;
-
-  if (gassign *reduc_stmt = dyn_cast  (reduc_stmt_info->stmt))
-   for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
- {
-   tree op = gimple_op (reduc_stmt, k);
-   if (op == phi_result)
- continue;
-   if (k == 1 && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
- continue;
-   bool is_simple_use = vect_is_simple_use (op, loop_vinfo, );
-   gcc_assert (is_simple_use);
-   if (dt == vect_constant_def || dt == vect_external_def)
- continue;
-   if (!vectype_in
-   || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
-   < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (op)
- vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op));
-   break;
- }
-  /* For a nested cycle we might end up with an operation like
- phi_result * phi_result.  */
-  if (!vectype_in)
-   vectype_in = STMT_VINFO_VECTYPE (stmt_info);
-  gcc_assert (vectype_in);
+  gcc_assert (! vec_stmt);
 
   if (slp_node)
-   {
- /* The size vect_schedule_slp_instance computes is off for us.  */
- vec_num = vect_get_num_vectors
- (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-  * SLP_TREE_SCALAR_STMTS (slp_node).length (), vectype_in);
- ncopies = 1;
-   }
-  else
-   {
- vec_num = 1;
- ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
-   }
-
-  /* Check whether we can use a single PHI node and accumulate
- vectors to one before the backedge.  */
-  stmt_vec_info use_stmt_info;
-  if (ncopies > 1
- && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live
- && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))
- && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
- || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
- && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info)
-   {
- single_defuse_cycle = true;
- ncopies = 1;
-   }
-
-  /* Create the destination vector  */
-  tree vec_dest = vect_create_destination_var (phi_result, vectype_out);
-
-  /* Get the loop-entry arguments.  */
-  tree vec_initial_def;
-  auto_vec vec_initial_defs;
-  if (slp_node)
-   {
- vec_initial_defs.reserve (vec_num);
- gcc_assert (slp_node == slp_node_instance->reduc_phis);
- stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
- tree neutral_op
- = neutral_op_for_slp_reduction (slp_node,
- STMT_VINFO_REDUC_CODE
- (first ? first : reduc_stmt_info),
-  

Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-02 Thread Richard Biener
On Tue, Oct 1, 2019 at 8:50 PM Jason Merrill  wrote:
>
> On 10/1/19 3:34 AM, Richard Biener wrote:
> > On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:
> >>
> >> My comments accidentally got lost.
> >>
> >> Several places in the front-end (and elsewhere) use the same lazy
> >> allocation pattern for hash_maps, and this patch replaces them with
> >> hash_map_safe_* functions like vec_safe_*.  They don't provide a way
> >> to specify an initial size, but I don't think that's a significant
> >> issue.
> >>
> >> Tested x86_64-pc-linux-gnu.  OK for trunk?
> >
> > You are using create_ggc but the new functions do not indicate that ggc
> > allocation is done.
> > It's then also incomplete with not having a non-ggc variant
> > of them?  Maybe I'm missing something.
>
> Ah, I had been thinking that this lazy pattern would only be used with
> ggc, but I see that I was wrong.  How's this?
>
> Incidentally, now I see another C++11 feature I'd like to be able to
> use: default template arguments for function templates.

I presume

template
inline bool
hash_map_safe_put (hash_map *, const K& k, const V& v, size_t
size = default_hash_map_size)
{
  return hash_map_maybe_create (h, size)->put (k, v);
}

was deemed to ugly?  IMHO instantiating the templates for different sizes
is unwanted compile-time overhead (plus not being able to use
a default value makes non-default values creep into the code-base?).

I'd have OKed a variant like above, so - would that work for you
(change hash_map_maybe_create as well then, of course).

Thanks,
Richard.


Re: [SVE] PR91532

2019-10-02 Thread Richard Biener
On Wed, 2 Oct 2019, Prathamesh Kulkarni wrote:

> On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
> >
> > On 10/1/19 12:40 AM, Richard Biener wrote:
> > > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > >
> > >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
> > >>>
> > >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> > >>>
> > >>>> On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > >>>>>
> > >>>>> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > >>>>>> Hi,
> > >>>>>> For PR91532, the dead store is trivially deleted if we place dse pass
> > >>>>>> between ifcvt and vect. Would it be OK to add another instance of 
> > >>>>>> dse there ?
> > >>>>>> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > >>>>>> will clean up the dead store ?
> > >>>>> I'd hesitate to add another DSE pass.  If there's one nearby could we
> > >>>>> move the existing pass?
> > >>>> Well I think the nearest one is just after pass_warn_restrict. Not
> > >>>> sure if it's a good
> > >>>> idea to move it up from there ?
> > >>>
> > >>> You'll need it inbetween ifcvt and vect so it would be disabled
> > >>> w/o vectorization, so no, that doesn't work.
> > >>>
> > >>> ifcvt already invokes SEME region value-numbering so if we had
> > >>> MESE region DSE it could use that.  Not sure if you feel like
> > >>> refactoring DSE to work on regions - it currently uses a DOM
> > >>> walk which isn't suited for that.
> > >>>
> > >>> if-conversion has a little "local" dead predicate compute removal
> > >>> thingy (not that I like that), eventually it can be enhanced to
> > >>> do the DSE you want?  Eventually it should be moved after the local
> > >>> CSE invocation though.
> > >> Hi,
> > >> Thanks for the suggestions.
> > >> For now, would it be OK to do "dse" on loop header in
> > >> tree_if_conversion, as in the attached patch ?
> > >> The patch does local dse in a new function ifcvt_local_dse instead of
> > >> ifcvt_local_dce, because it needed to be done after RPO VN which
> > >> eliminates:
> > >> Removing dead stmt _ifc__62 = *_55;
> > >> and makes the following store dead:
> > >> *_55 = _ifc__61;
> > >
> > > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > > try that as independent patch (pre-approved).
> > >
> > > I don't mind the extra walk though.
> > >
> > > What I see as possible issue is that dse_classify_store walks virtual
> > > uses and I'm not sure if the loop exit is a natural boundary for
> > > such walk (eventually the loop header virtual PHI is reached but
> > > there may also be a loop-closed PHI for the virtual operand,
> > > but not necessarily).  So the question is whether to add a
> > > "stop at" argument to dse_classify_store specifying the virtual
> > > use the walk should stop at?
> > I think we want to stop at the block boundary -- aren't the cases we
> > care about here local to a block?
> This version restricts walking in dse_classify_store to basic-block if
> bb_only is true,
> and removes dead stores in ifcvt_local_dce instead of separate walk.
> Does it look OK ?

As relied to Jeff please make it trivially work for SESE region walks
by specifying the exit virtual operand to stop on.

Richard.


Re: [SVE] PR91532

2019-10-02 Thread Richard Biener
On Tue, 1 Oct 2019, Jeff Law wrote:

> On 10/1/19 12:40 AM, Richard Biener wrote:
> > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > 
> >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
> >>>
> >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> >>>
> >>>> On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> >>>>>
> >>>>> On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> >>>>>> Hi,
> >>>>>> For PR91532, the dead store is trivially deleted if we place dse pass
> >>>>>> between ifcvt and vect. Would it be OK to add another instance of dse 
> >>>>>> there ?
> >>>>>> Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> >>>>>> will clean up the dead store ?
> >>>>> I'd hesitate to add another DSE pass.  If there's one nearby could we
> >>>>> move the existing pass?
> >>>> Well I think the nearest one is just after pass_warn_restrict. Not
> >>>> sure if it's a good
> >>>> idea to move it up from there ?
> >>>
> >>> You'll need it inbetween ifcvt and vect so it would be disabled
> >>> w/o vectorization, so no, that doesn't work.
> >>>
> >>> ifcvt already invokes SEME region value-numbering so if we had
> >>> MESE region DSE it could use that.  Not sure if you feel like
> >>> refactoring DSE to work on regions - it currently uses a DOM
> >>> walk which isn't suited for that.
> >>>
> >>> if-conversion has a little "local" dead predicate compute removal
> >>> thingy (not that I like that), eventually it can be enhanced to
> >>> do the DSE you want?  Eventually it should be moved after the local
> >>> CSE invocation though.
> >> Hi,
> >> Thanks for the suggestions.
> >> For now, would it be OK to do "dse" on loop header in
> >> tree_if_conversion, as in the attached patch ?
> >> The patch does local dse in a new function ifcvt_local_dse instead of
> >> ifcvt_local_dce, because it needed to be done after RPO VN which
> >> eliminates:
> >> Removing dead stmt _ifc__62 = *_55;
> >> and makes the following store dead:
> >> *_55 = _ifc__61;
> > 
> > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > try that as independent patch (pre-approved).
> > 
> > I don't mind the extra walk though.
> > 
> > What I see as possible issue is that dse_classify_store walks virtual
> > uses and I'm not sure if the loop exit is a natural boundary for
> > such walk (eventually the loop header virtual PHI is reached but
> > there may also be a loop-closed PHI for the virtual operand,
> > but not necessarily).  So the question is whether to add a
> > "stop at" argument to dse_classify_store specifying the virtual
> > use the walk should stop at?
> I think we want to stop at the block boundary -- aren't the cases we
> care about here local to a block?

Sure, but I see no reason to not make it trivially work for SESE
regions as well by instead specifying the "exit" virtual-operand.

Richard.


Re: [RFC][SLP] SLP vectorization: vectorize vector constructors

2019-10-01 Thread Richard Biener
On Tue, 1 Oct 2019, Joel Hutton wrote:

> On 01/10/2019 12:07, Joel wrote:
> >
> > SLP vectorization: vectorize vector constructors
> >
> >
> > Currently SLP vectorization can build SLP trees staring from 
> > reductions or from group stores. This patch adds a third starting 
> > point: vector constructors.
> >
> >
> > For the following test case (compiled with -O3 -fno-vect-cost-model):
> >
> >
> > char g_d[1024], g_s1[1024], g_s2[1024]; void test_loop(void) { char /d 
> > = g_d, /s1 = g_s1, *s2 = g_s2;
> >
> >
> > for ( int y = 0; y < 128; y++ )
> > {
> >for ( int x = 0; x < 16; x++ )
> >  d[x] = s1[x] + s2[x];
> >d += 16;
> > }
> >
> > }
> >
> >
> > before patch: test_loop: .LFB0: .cfi_startproc adrp x0, g_s1 adrp x2, 
> > g_s2 add x3, x0, :lo12:g_s1 add x4, x2, :lo12:g_s2 ldrb w7, [x2, 
> > #:lo12:g_s2] ldrb w1, [x0, #:lo12:g_s1] adrp x0, g_d ldrb w6, [x4, 1] 
> > add x0, x0, :lo12:g_d ldrb w5, [x3, 1] add w1, w1, w7 fmov s0, w1 ldrb 
> > w7, [x4, 2] add w5, w5, w6 ldrb w1, [x3, 2] ldrb w6, [x4, 3] add x2, 
> > x0, 2048 ins v0.b[1], w5 add w1, w1, w7 ldrb w7, [x3, 3] ldrb w5, [x4, 
> > 4] add w7, w7, w6 ldrb w6, [x3, 4] ins v0.b[2], w1 ldrb w8, [x4, 5] 
> > add w6, w6, w5 ldrb w5, [x3, 5] ldrb w9, [x4, 6] add w5, w5, w8 ldrb 
> > w1, [x3, 6] ins v0.b[3], w7 ldrb w8, [x4, 7] add w1, w1, w9 ldrb w11, 
> > [x3, 7] ldrb w7, [x4, 8] add w11, w11, w8 ldrb w10, [x3, 8] ins 
> > v0.b[4], w6 ldrb w8, [x4, 9] add w10, w10, w7 ldrb w9, [x3, 9] ldrb 
> > w7, [x4, 10] add w9, w9, w8 ldrb w8, [x3, 10] ins v0.b[5], w5 ldrb w6, 
> > [x4, 11] add w8, w8, w7 ldrb w7, [x3, 11] ldrb w5, [x4, 12] add w7, 
> > w7, w6 ldrb w6, [x3, 12] ins v0.b[6], w1 ldrb w12, [x4, 13] add w6, 
> > w6, w5 ldrb w5, [x3, 13] ldrb w1, [x3, 14] add w5, w5, w12 ldrb w13, 
> > [x4, 14] ins v0.b[7], w11 ldrb w12, [x4, 15] add w4, w1, w13 ldrb w1, 
> > [x3, 15] add w1, w1, w12 ins v0.b[8], w10 ins v0.b[9], w9 ins 
> > v0.b[10], w8 ins v0.b[11], w7 ins v0.b[12], w6 ins v0.b[13], w5 ins 
> > v0.b[14], w4 ins v0.b[15], w1 .p2align 3,,7 .L2: str q0, [x0], 16 cmp 
> > x2, x0 bne .L2 ret .cfi_endproc .LFE0:
> >
> >
> > After patch:
> >
> >
> > test_loop: .LFB0: .cfi_startproc adrp x3, g_s1 adrp x2, g_s2 add x3, 
> > x3, :lo12:g_s1 add x2, x2, :lo12:g_s2 adrp x0, g_d add x0, x0, 
> > :lo12:g_d add x1, x0, 2048 ldr q1, [x2] ldr q0, [x3] add v0.16b, 
> > v0.16b, v1.16b .p2align 3,,7 .L2: str q0, [x0], 16 cmp x0, x1 bne .L2 
> > ret .cfi_endproc .LFE0:
> >
> >
> >
> >
> > bootstrapped and tested on aarch64-none-linux-gnu
> >
> Patch attached:

I think a better place for the loop searching for CONSTRUCTORs is
vect_slp_analyze_bb_1 where I'd put it before the check you remove,
and I'd simply append found CONSTRUCTORs to the grouped_stores
array.  The fixup you do in vectorizable_operation doesn't
belong there either, I'd add a new field to the SLP instance
structure refering to the CONSTRUCTOR stmt and do the fixup
in vect_schedule_slp_instance instead where you can simply
replace the CONSTRUCTOR with the vectorized SSA name then.

+   /* Check that the constructor elements are unique.  */
+   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+ {
+   tree prev_val;
+   int j;
+   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j, 
prev_val)
+   {
+ if (val == prev_val && i!=j)

why's that necessary? (it looks incomplete, also doesn't catch
[duplicate] constants)

You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
(we can have omitted trailing zeros).

What happens if you have a vector constructor that is twice
as large as the machine supports?  The vectorizer will happily
produce a two vector SSA name vectorized result but your
CONSTRUCTOR replacement doesn't work here.  I think this should
be made work correctly (not give up on that case).

Thanks,
Richard.


[PATCH] vectorizable_reduction TLC

2019-10-01 Thread Richard Biener


Happened to test this separately.

Committed.

Richard.

2019-10-01  Richard Biener  

* tree-vect-loop.c (vectorizable_reduction): Move variables
to where they are used.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276401)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5767,7 +5767,6 @@ vectorizable_reduction (stmt_vec_info st
slp_instance slp_node_instance,
stmt_vector_for_cost *cost_vec)
 {
-  tree vec_dest;
   tree scalar_dest;
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   tree vectype_in = NULL_TREE;
@@ -5778,29 +5777,21 @@ vectorizable_reduction (stmt_vec_info st
   machine_mode vec_mode;
   int op_type;
   optab optab;
-  tree new_temp = NULL_TREE;
   enum vect_def_type dt, cond_reduc_dt = vect_unknown_def_type;
   stmt_vec_info cond_stmt_vinfo = NULL;
   tree scalar_type;
   bool is_simple_use;
   int i;
   int ncopies;
-  stmt_vec_info prev_stmt_info, prev_phi_info;
+  stmt_vec_info prev_phi_info;
   bool single_defuse_cycle = false;
-  stmt_vec_info new_stmt_info = NULL;
   int j;
   tree ops[3];
   enum vect_def_type dts[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
   bool double_reduc = false;
-  basic_block def_bb;
-  class loop * def_stmt_loop;
-  tree def_arg;
-  auto_vec vec_oprnds0;
-  auto_vec vec_oprnds1;
-  auto_vec vec_oprnds2;
   int vec_num;
-  tree def0, tem;
+  tree tem;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
 
@@ -5900,7 +5891,7 @@ vectorizable_reduction (stmt_vec_info st
}
 
   /* Create the destination vector  */
-  vec_dest = vect_create_destination_var (phi_result, vectype_out);
+  tree vec_dest = vect_create_destination_var (phi_result, vectype_out);
 
   /* Get the loop-entry arguments.  */
   tree vec_initial_def;
@@ -6348,15 +6339,16 @@ vectorizable_reduction (stmt_vec_info st
 
   if (nested_cycle)
 {
-  def_bb = gimple_bb (reduc_def_phi);
-  def_stmt_loop = def_bb->loop_father;
-  def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
-   loop_preheader_edge (def_stmt_loop));
+  basic_block def_bb = gimple_bb (reduc_def_phi);
+  class loop *def_stmt_loop = def_bb->loop_father;
+  tree def_arg = PHI_ARG_DEF_FROM_EDGE (reduc_def_phi,
+   loop_preheader_edge 
(def_stmt_loop));
   stmt_vec_info def_arg_stmt_info = loop_vinfo->lookup_def (def_arg);
   if (def_arg_stmt_info
  && (STMT_VINFO_DEF_TYPE (def_arg_stmt_info)
  == vect_double_reduction_def))
 double_reduc = true;
+  gcc_assert (!double_reduc || STMT_VINFO_RELEVANT (stmt_info) == 
vect_used_in_outer_by_reduction);
 }
 
   vect_reduction_type reduction_type
@@ -6670,6 +6662,8 @@ vectorizable_reduction (stmt_vec_info st
   if (code == DOT_PROD_EXPR
   && !types_compatible_p (TREE_TYPE (ops[0]), TREE_TYPE (ops[1])))
 {
+  gcc_unreachable ();
+  /* No testcase for this.  PR49478.  */
   if (TREE_CODE (ops[0]) == INTEGER_CST)
 ops[0] = fold_convert (TREE_TYPE (ops[1]), ops[0]);
   else if (TREE_CODE (ops[1]) == INTEGER_CST)
@@ -6812,7 +6806,15 @@ vectorizable_reduction (stmt_vec_info st
   return true;
 }
 
+
   /* Transform.  */
+  stmt_vec_info new_stmt_info = NULL;
+  stmt_vec_info prev_stmt_info;
+  tree new_temp = NULL_TREE;
+  auto_vec vec_oprnds0;
+  auto_vec vec_oprnds1;
+  auto_vec vec_oprnds2;
+  tree def0;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
@@ -6836,7 +6838,7 @@ vectorizable_reduction (stmt_vec_info st
 }
 
   /* Create the destination vector  */
-  vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
+  tree vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   prev_stmt_info = NULL;
   prev_phi_info = NULL;


Re: [patch] Extend GIMPLE store merging to throwing stores

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 1:05 PM Eric Botcazou  wrote:
>
> [Thanks for the quick review and sorry for the longish delay]
>
> > +/* Return the index number of the landing pad for STMT, if any.  */
> > +
> > +static int
> > +lp_nr_for_store (gimple *stmt)
> > +{
> > +  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
> > +return 0;
> > +
> > +  if (!stmt_could_throw_p (cfun, stmt))
> > +return 0;
> > +
> > +  return lookup_stmt_eh_lp (stmt);
> > +}
> >
> > Did you add the wrapper as compile-time optimization?  That is,
> > I don't see why simply calling lookup_stmt_eh_lp wouldn't work?
>
> Yes, I added it for C & C++, which both trivially fail the first test.  More
> generally, every additional processing is (directly or indirectly) guarded by
> the conjunction cfun->can_throw_non_call_exceptions && cfun->eh throughout.
>
> > +  /* If the function can throw and catch non-call exceptions, we'll be
> > trying + to merge stores across different basic blocks so we need to
> > first unsplit + the EH edges in order to streamline the CFG of the
> > function.  */ +  if (cfun->can_throw_non_call_exceptions && cfun->eh)
> > +{
> > +  free_dominance_info (CDI_DOMINATORS);
> > +  maybe_remove_unreachable_handlers ();
> > +  changed = unsplit_all_eh ();
> > +  if (changed)
> > +   delete_unreachable_blocks ();
> > +}
> >
> > uh, can unsplitting really result in unreachable blocks or does it
> > merely forget to delete forwarders it made unreachable?
>
> The latter.
>
> > Removing unreachable handlers is also to make things match better?
>
> Nope, only because calculate_dominance_info aborts otherwise below.
>
> > Just wondering how much of this work we could delay to the first
> > store-merging opportunity with EH we find (but I don't care too much
> > about -fnon-call-exceptions).
>
> This block of code is a manual, stripped down ehcleanup pass.
>
> > To isolate the details above maybe move this piece into a helper
> > in tree-eh.c so you also can avoid exporting unsplit_all_eh?
>
> The main point is the unsplitting though so this would trade an explicit call
> for a less implicit one.  But what I could do is to rename unsplit_all_eh into
> unsplit_all_eh_1 and hide the technicalities in a new unsplit_all_eh.

that works for me - the patch is OK with that change.

Thanks,
Richard.

> --
> Eric Botcazou


Re: Store float for pow result test

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 10:56 AM Alexandre Oliva  wrote:
>
> Optimizing gcc.dg/torture/pr41094.c, the compiler computes the
> constant value and short-circuits the whole thing.  At -O0, however,
> on 32-bit x86, the call to pow() remains, and the program compares the
> returned value in a stack register, with excess precision, with the
> exact return value expected from pow().  If libm's pow() returns a
> slightly off result, the compare fails.  If the value in the register
> is stored in a separate variable, so it gets rounded to double
> precision, and then compared, the compare passes.
>
> It's not clear that the test was meant to detect libm's reliance on
> rounding off the excess precision, but I guess it wasn't, so I propose
> this slight change that enables it to pass regardless of the slight
> inaccuracy of the C library in use.
>
> Regstrapped on x86_64-linux-gnu, and tested on the affected target.
> Ok to install?

OK.

Richard.

>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/torture/pr41094.c: Introduce intermediate variable.
> ---
>  gcc/testsuite/gcc.dg/torture/pr41094.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr41094.c 
> b/gcc/testsuite/gcc.dg/torture/pr41094.c
> index 2a4e9616cbfad..9219a1741a37f 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr41094.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr41094.c
> @@ -13,7 +13,8 @@ double foo(void)
>
>  int main()
>  {
> -  if (foo() != 2.0)
> +  double r = foo ();
> +  if (r != 2.0)
>  abort ();
>return 0;
>  }
>
> --
> Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
> Be the change, be Free!FSF VP & FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [PATCH] DWARF array bounds missing from C++ array definitions

2019-10-01 Thread Richard Biener
On Tue, Oct 1, 2019 at 10:51 AM Alexandre Oliva  wrote:
>
> On Sep 26, 2019, Richard Biener  wrote:
>
> > On Thu, Sep 26, 2019 at 4:05 AM Alexandre Oliva  wrote:
>
> > Heh, I don't have one - which usually makes me simply inline the
> > beast into the single caller :P
>
> > Maybe simply have_new_type_for_decl_with_old_die_p?
> > Or new_type_for_die_p?
>
> How about override_type_for_decl_p?

Also good.

OK for trunk (& branches I guess, it's a regression).

Thanks,
Richard.

>
> for  gcc/ChangeLog
>
> * dwarf2out.c (override_type_for_decl_p): New.
> (gen_variable_die): Use it.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/debug/dwarf2/array-0.c: New.
> * gcc.dg/debug/dwarf2/array-1.c: New.
> * gcc.dg/debug/dwarf2/array-2.c: New.
> * gcc.dg/debug/dwarf2/array-3.c: New.
> * g++.dg/debug/dwarf2/array-0.C: New.
> * g++.dg/debug/dwarf2/array-1.C: New.
> * g++.dg/debug/dwarf2/array-2.C: New.  Based on libstdc++-v3's
> src/c++98/pool_allocator.cc:__pool_alloc_base::_S_heap_size.
> * g++.dg/debug/dwarf2/array-3.C: New.  Based on
> gcc's config/i386/i386-features.c:xlogue_layout::s_instances.
> * g++.dg/debug/dwarf2/array-4.C: New.
> ---
>  gcc/dwarf2out.c |   32 
> ++-
>  gcc/testsuite/g++.dg/debug/dwarf2/array-0.C |   13 +++
>  gcc/testsuite/g++.dg/debug/dwarf2/array-1.C |   13 +++
>  gcc/testsuite/g++.dg/debug/dwarf2/array-2.C |   15 +
>  gcc/testsuite/g++.dg/debug/dwarf2/array-3.C |   20 +
>  gcc/testsuite/g++.dg/debug/dwarf2/array-4.C |   16 ++
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c |   10 
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c |   10 
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c |8 +++
>  gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c |8 +++
>  10 files changed, 144 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-1.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-2.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-3.C
>  create mode 100644 gcc/testsuite/g++.dg/debug/dwarf2/array-4.C
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-0.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/array-3.c
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index cec25fa5fa2b8..a29a200f19814 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -23706,6 +23706,34 @@ local_function_static (tree decl)
>  && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL;
>  }
>
> +/* Return true iff DECL overrides (presumably completes) the type of
> +   OLD_DIE within CONTEXT_DIE.  */
> +
> +static bool
> +override_type_for_decl_p (tree decl, dw_die_ref old_die,
> + dw_die_ref context_die)
> +{
> +  tree type = TREE_TYPE (decl);
> +  int cv_quals;
> +
> +  if (decl_by_reference_p (decl))
> +{
> +  type = TREE_TYPE (type);
> +  cv_quals = TYPE_UNQUALIFIED;
> +}
> +  else
> +cv_quals = decl_quals (decl);
> +
> +  dw_die_ref type_die = modified_type_die (type,
> +  cv_quals | TYPE_QUALS (type),
> +  false,
> +  context_die);
> +
> +  dw_die_ref old_type_die = get_AT_ref (old_die, DW_AT_type);
> +
> +  return type_die != old_type_die;
> +}
> +
>  /* Generate a DIE to represent a declared data object.
> Either DECL or ORIGIN must be non-null.  */
>
> @@ -23958,7 +23986,9 @@ gen_variable_die (tree decl, tree origin, dw_die_ref 
> context_die)
>   && !DECL_ABSTRACT_P (decl_or_origin)
>   && variably_modified_type_p (TREE_TYPE (decl_or_origin),
>decl_function_context
> -   (decl_or_origin
> +  (decl_or_origin)))
> +  || (old_die && specialization_p
> + && override_type_for_decl_p (decl_or_origin, old_die, context_die)))
>  {
>tree type = TREE_TYPE (decl_or_origin);
>
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C 
> b/gcc/testsuite/g++.dg/debug/dwarf2/array-0.C
> new file mode 100644
> index 0..a3458bd0d32a4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg

Re: [C][C++] Avoid exposing internal details in aka types

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 3:21 PM Richard Sandiford
 wrote:
>
> The current aka diagnostics can sometimes leak internal details that
> seem more likely to be distracting than useful.  E.g. on aarch64:
>
>   void f (va_list *va) { *va = 1; }
>
> gives:
>
>   incompatible types when assigning to type ‘va_list’ {aka ‘__va_list’} from 
> type ‘int’
>
> where __va_list isn't something the user is expected to know about.
> A similar thing happens for C++ on the arm_neon.h-based:
>
>   float x;
>   int8x8_t y = x;
>
> which gives:
>
>   cannot convert ‘float’ to ‘int8x8_t’ {aka ‘__Int8x8_t’} in initialization
>
> This is accurate -- and __Int8x8_t is defined by the AArch64 PCS --
> but it's not going to be meaningful to most users.
>
> This patch stops the aka code looking through typedefs if all of
> the following are true:
>
> (1) the typedef is built into the compiler or comes from a system header
>
> (2) the target of the typedef is anonymous or has a name in the
> implementation namespace
>
> (3) the target type is a tag type or vector type, which have in common that:
> (a) we print their type names if they have one
> (b) what we print for anonymous types isn't all that useful
> ("struct " etc. for tag types, pseudo-C "__vector(N) T"
> for vector types)
>
> The C side does this by recursively looking for the aka type, like the
> C++ side already does.  This in turn makes "aka" work for distinct type
> copies like __Int8x8_t on aarch64, fixing the ??? in aarch64/diag_aka_1.c.
>
> On the C++ side, strip_typedefs had:
>
>   /* Explicitly get the underlying type, as TYPE_MAIN_VARIANT doesn't
>  strip typedefs with attributes.  */
>   result = TYPE_MAIN_VARIANT (DECL_ORIGINAL_TYPE (TYPE_NAME (t)));
>   result = strip_typedefs (result);
>
> Applying TYPE_MAIN_VARIANT predates the strip_typedefs call, with the
> comment originally contrasting with plain:
>
>   result = TYPE_MAIN_VARIANT (t);
>
> But the recursive call to strip_typedefs will apply TYPE_MAIN_VARIANT,
> so it doesn't seem necessary to do it here too.  I think there was also
> a missing "remove_attributes" argument, since wrapping something in a
> typedef shouldn't change which attributes get removed.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

In other context (debug-info) we also look at DECL_ARTIFICIAL,
not sure if that is set on compiler-generated TYPE_DECLs.

> Richard
>
>
> 2019-09-30  Richard Sandiford  
>
> gcc/c-family/
> * c-common.h (user_facing_original_type_p): Declare.
> * c-common.c (user_facing_original_type_p): New function.
>
> gcc/c/
> * c-objc-common.c (useful_aka_type_p): Replace with...
> (get_aka_type): ...this new function.  Given the original type,
> decide which aka type to print (if any).  Only look through typedefs
> if user_facing_original_type_p.
> (print_type): Update accordingly.
>
> gcc/cp/
> * cp-tree.h (STF_USER_VISIBLE): New constant.
> (strip_typedefs, strip_typedefs_expr): Take a flags argument.
> * tree.c (strip_typedefs, strip_typedefs_expr): Likewise,
> updating mutual calls accordingly.  When STF_USER_VISIBLE is true,
> only look through typedefs if user_facing_original_type_p.
> * error.c (dump_template_bindings, type_to_string): Pass
> STF_USER_VISIBLE to strip_typedefs.
> (dump_type): Likewise, unless pp_c_flag_gnu_v3 is set.
>
> gcc/testsuite/
> * g++.dg/diagnostic/aka5.h: New test.
> * g++.dg/diagnostic/aka5a.C: Likewise.
> * g++.dg/diagnostic/aka5b.C: Likewise.
> * g++.target/aarch64/diag_aka_1.C: Likewise.
> * gcc.dg/diag-aka-5.h: Likewise.
> * gcc.dg/diag-aka-5a.c: Likewise.
> * gcc.dg/diag-aka-5b.c: Likewise.
> * gcc.target/aarch64/diag_aka_1.c (f): Expect an aka to be printed
> for myvec.
>
> Index: gcc/c-family/c-common.h
> ===
> --- gcc/c-family/c-common.h 2019-09-30 13:54:16.0 +0100
> +++ gcc/c-family/c-common.h 2019-09-30 14:16:45.002103890 +0100
> @@ -1063,6 +1063,7 @@ extern tree builtin_type_for_size (int,
>  extern void c_common_mark_addressable_vec (tree);
>
>  extern void set_underlying_type (tree);
> +extern bool user_facing_original_type_p (const_tree);
>  extern void record_types_used_by_current_var_decl (tree);
>  extern vec *make_tree_vector (void);
>  extern void release_tree_vector (vec *);
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c 2019-09-30 13:54:16.0 +0100
> +++ gcc/c-family/c-common.c 2019-09-30 14:16:45.002103890 +0100
> @@ -7713,6 +7713,55 @@ set_underlying_type (tree x)
>  }
>  }
>
> +/* Return true if it is worth exposing the DECL_ORIGINAL_TYPE of TYPE to
> +   the user in diagnostics, false if it would 

Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:
>
> My comments accidentally got lost.
>
> Several places in the front-end (and elsewhere) use the same lazy
> allocation pattern for hash_maps, and this patch replaces them with
> hash_map_safe_* functions like vec_safe_*.  They don't provide a way
> to specify an initial size, but I don't think that's a significant
> issue.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?

You are using create_ggc but the new functions do not indicate that ggc
allocation is done.  It's then also incomplete with not having a non-ggc variant
of them?  Maybe I'm missing something.

Thanks,
Richard.

> On Mon, Sep 30, 2019 at 2:30 PM Jason Merrill  wrote:
> >
> > gcc/
> > * hash-map.h (default_size): Put in member variable.
> > (create_ggc): Use it as default argument.
> > (hash_map_maybe_create, hash_map_safe_get)
> > (hash_map_safe_get_or_insert, hash_map_safe_put): New fns.
> > gcc/cp/
> > * constexpr.c (maybe_initialize_fundef_copies_table): Remove.
> > (get_fundef_copy): Use hash_map_safe_get_or_insert.
> > * cp-objcp-common.c (cp_get_debug_type): Use hash_map_safe_*.
> > * decl.c (store_decomp_type): Remove.
> > (cp_finish_decomp): Use hash_map_safe_put.
> > * init.c (get_nsdmi): Use hash_map_safe_*.
> > * pt.c (store_defaulted_ttp, lookup_defaulted_ttp): Remove.
> > (add_defaults_to_ttp): Use hash_map_safe_*.
> > ---
> >  gcc/hash-map.h   | 38 --
> >  gcc/cp/constexpr.c   | 14 ++
> >  gcc/cp/cp-objcp-common.c |  6 ++
> >  gcc/cp/decl.c|  9 +
> >  gcc/cp/init.c|  9 ++---
> >  gcc/cp/pt.c  | 21 +++--
> >  gcc/hash-table.c |  2 +-
> >  7 files changed, 47 insertions(+), 52 deletions(-)
> >
> > diff --git a/gcc/hash-map.h b/gcc/hash-map.h
> > index ba20fe79f23..e638f761465 100644
> > --- a/gcc/hash-map.h
> > +++ b/gcc/hash-map.h
> > @@ -128,8 +128,9 @@ class GTY((user)) hash_map
> > }
> >};
> >
> > +  static const size_t default_size = 13;
> >  public:
> > -  explicit hash_map (size_t n = 13, bool ggc = false,
> > +  explicit hash_map (size_t n = default_size, bool ggc = false,
> >  bool sanitize_eq_and_hash = true,
> >  bool gather_mem_stats = GATHER_STATISTICS
> >  CXX_MEM_STAT_INFO)
> > @@ -146,7 +147,7 @@ public:
> >HASH_MAP_ORIGIN PASS_MEM_STAT) {}
> >
> >/* Create a hash_map in ggc memory.  */
> > -  static hash_map *create_ggc (size_t size,
> > +  static hash_map *create_ggc (size_t size = default_size,
> >bool gather_mem_stats = GATHER_STATISTICS
> >CXX_MEM_STAT_INFO)
> >  {
> > @@ -326,4 +327,37 @@ gt_pch_nx (hash_map *h, gt_pointer_operator 
> > op, void *cookie)
> >op (>m_table.m_entries, cookie);
> >  }
> >
> > +template
> > +inline hash_map *
> > +hash_map_maybe_create (hash_map *)
> > +{
> > +  if (!h)
> > +h = h->create_ggc ();
> > +  return h;
> > +}
> > +
> > +/* Like h->get, but handles null h.  */
> > +template
> > +inline V*
> > +hash_map_safe_get (hash_map *h, const K& k)
> > +{
> > +  return h ? h->get (k) : NULL;
> > +}
> > +
> > +/* Like h->get, but handles null h.  */
> > +template
> > +inline V&
> > +hash_map_safe_get_or_insert (hash_map *, const K& k, bool *e = 
> > NULL)
> > +{
> > +  return hash_map_maybe_create (h)->get_or_insert (k, e);
> > +}
> > +
> > +/* Like h->put, but handles null h.  */
> > +template
> > +inline bool
> > +hash_map_safe_put (hash_map *, const K& k, const V& v)
> > +{
> > +  return hash_map_maybe_create (h)->put (k, v);
> > +}
> > +
> >  #endif
> > diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> > index cb5484f4b72..904b70a9c99 100644
> > --- a/gcc/cp/constexpr.c
> > +++ b/gcc/cp/constexpr.c
> > @@ -1098,15 +1098,6 @@ maybe_initialize_constexpr_call_table (void)
> >
> >  static GTY(()) hash_map *fundef_copies_table;
> >
> > -/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
> > -
> > -static void
> > -maybe_initialize_fundef_copies_table ()
> > -{
> > -  if (fundef_copies_table == NULL)
> > -fundef_copies_table = hash_map::create_ggc (101);
> > -}
> > -
> >  /* Reuse a copy or create a new unshared copy of the function FUN.
> > Return this copy.  We use a TREE_LIST whose PURPOSE is body, VALUE
> > is parms, TYPE is result.  */
> > @@ -1114,11 +1105,10 @@ maybe_initialize_fundef_copies_table ()
> >  static tree
> >  get_fundef_copy (constexpr_fundef *fundef)
> >  {
> > -  maybe_initialize_fundef_copies_table ();
> > -
> >tree copy;
> >bool existed;
> > -  tree *slot = _copies_table->get_or_insert (fundef->decl, 
> > );
> > +  tree *slot = _map_safe_get_or_insert (fundef_copies_table,
> > +fundef->decl, );
> >
> >if (!existed)
> >  {
> > 

Re: [PATCH] ifcvt: improve cost estimation (PR 87047)

2019-10-01 Thread Richard Biener
On Mon, Sep 30, 2019 at 7:51 PM Alexander Monakov  wrote:
>
> On Mon, 30 Sep 2019, Alexander Monakov wrote:
>
> > +static unsigned
> > +average_cost (unsigned then_cost, unsigned else_cost, edge e)
> > +{
> > +  return else_cost + e->probability.apply ((int) then_cost - else_cost);
>
> Ugh, I made a wrong last-minute edit here, we want signed cost difference so
> the argument to probability.apply should be
>
>   (int) (then_cost - else_cost)
>
> or
>
>   (int) then_cost - (int) else_cost.
>
> The patch I bootstrapped and passed Martin for testing correctly had
>
>   (gcov_type) then_cost - else_cost
>
> (gcov_type is int64).

OK for trunk with that fixed.  Not OK for backports.

Thanks,
Richard.

> Alexander


Re: [SVE] PR91532

2019-10-01 Thread Richard Biener
On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:

> On Wed, 25 Sep 2019 at 23:44, Richard Biener  wrote:
> >
> > On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> >
> > > On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > > >
> > > > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > > Hi,
> > > > > For PR91532, the dead store is trivially deleted if we place dse pass
> > > > > between ifcvt and vect. Would it be OK to add another instance of dse 
> > > > > there ?
> > > > > Or should we add an ad-hoc "basic-block dse" sub-pass to ifcvt that
> > > > > will clean up the dead store ?
> > > > I'd hesitate to add another DSE pass.  If there's one nearby could we
> > > > move the existing pass?
> > > Well I think the nearest one is just after pass_warn_restrict. Not
> > > sure if it's a good
> > > idea to move it up from there ?
> >
> > You'll need it inbetween ifcvt and vect so it would be disabled
> > w/o vectorization, so no, that doesn't work.
> >
> > ifcvt already invokes SEME region value-numbering so if we had
> > MESE region DSE it could use that.  Not sure if you feel like
> > refactoring DSE to work on regions - it currently uses a DOM
> > walk which isn't suited for that.
> >
> > if-conversion has a little "local" dead predicate compute removal
> > thingy (not that I like that), eventually it can be enhanced to
> > do the DSE you want?  Eventually it should be moved after the local
> > CSE invocation though.
> Hi,
> Thanks for the suggestions.
> For now, would it be OK to do "dse" on loop header in
> tree_if_conversion, as in the attached patch ?
> The patch does local dse in a new function ifcvt_local_dse instead of
> ifcvt_local_dce, because it needed to be done after RPO VN which
> eliminates:
> Removing dead stmt _ifc__62 = *_55;
> and makes the following store dead:
> *_55 = _ifc__61;

I suggested trying to move ifcvt_local_dce after RPO VN, you could
try that as independent patch (pre-approved).

I don't mind the extra walk though.

What I see as possible issue is that dse_classify_store walks virtual
uses and I'm not sure if the loop exit is a natural boundary for
such walk (eventually the loop header virtual PHI is reached but
there may also be a loop-closed PHI for the virtual operand,
but not necessarily).  So the question is whether to add a
"stop at" argument to dse_classify_store specifying the virtual
use the walk should stop at?

Thanks,
Richard.


Re: PPC64 libmvec implementation of sincos

2019-09-30 Thread Richard Biener
On September 30, 2019 3:52:52 PM GMT+02:00, Szabolcs Nagy 
 wrote:
>On 27/09/2019 20:23, GT wrote:
>> I am attempting to create a vector version of sincos for PPC64.
>> The relevant discussion thread is on the GLIBC libc-alpha mailing
>list.
>> Navigate it beginning at
>https://sourceware.org/ml/libc-alpha/2019-09/msg00334.html
>> 
>> The intention is to reuse as much as possible from the existing GCC
>implementation of other libmvec functions.
>> My questions are: Which function(s) in GCC;
>> 
>> 1. Gather scalar function input arguments, from multiple loop
>iterations, into a single vector input argument for the vector function
>version?
>> 2. Distribute scalar function outputs, to appropriate loop iteration
>result, from the single vector function output result?
>> 
>> I am referring especially to vectorization of sin and cos.
>
>i wonder if gcc can auto-vectorize scalar sincos
>calls, the vectorizer seems to want the calls to
>have no side-effect, but attribute pure or const
>is not appropriate for sincos (which has no return
>value but takes writable pointer args)

We have __builtin_cexpi for that but not sure if any of the mechanisms can 
provide a mapping to a vectorized variant. 

>"#pragma omp simd" on a loop seems to work but i
>could not get unannotated sincos loops to vectorize.
>
>it seems it would be nice if we could add pure/const
>somehow (maybe to the simd variant only? afaik openmp
>requires no sideeffects for simd variants, but that's
>probably only for explicitly marked loops?)



  1   2   3   4   5   6   7   8   9   10   >