Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek
On Tue, Nov 18, 2014 at 04:34:12PM -0800, Mike Stump wrote:
 On Nov 18, 2014, at 3:42 PM, Jakub Jelinek ja...@redhat.com wrote:
  No, I'm not touching tmp array at all in that case, only look at vp
  individual bytes.  Either they are all 0, or all 0xff, or I return NULL.
 
 Oh, sorry, I misread where the break; in your code was going.  I might have 
 been misled by:
 
  -   gcc_assert (GET_MODE_PRECISION (outer_submode)
  -   = MAX_BITSIZE_MODE_ANY_INT);
 
 in your patch.  You don’t need that anymore, do you?  If not, can you remove 
 this part.

I thought the assert is unnecessary given the condition just a few lines
above it.  But can keep it, perhaps gcc_checking_assert would be enough, and
hopefully compiler optimizes it away completely.

 
 The rest looks like normal rtl/vector code, I don’t see anything wrong with 
 it.

Jakub


Re: [PATCH, testsuite] Add bind_pic_locally to certain tests

2014-11-19 Thread Andreas Schwab
* gcc.dg/pure-2.c: Update line numbers.

diff --git a/gcc/testsuite/gcc.dg/pure-2.c b/gcc/testsuite/gcc.dg/pure-2.c
index 638bd7c..fe6e2bc 100644
--- a/gcc/testsuite/gcc.dg/pure-2.c
+++ b/gcc/testsuite/gcc.dg/pure-2.c
@@ -8,14 +8,14 @@ extern int v;
 /* Trivial.  */
 int
 foo1(int a)  /* { dg-bogus normally detect pure candidate } */
-{ /* { dg-warning pure detect pure candidate { target *-*-* } 9 } */
+{ /* { dg-warning pure detect pure candidate { target *-*-* } 10 } */
   return v;
 }
 
 /* Loops known to be normally and extern const calls should be safe.  */
 int __attribute__ ((noinline))
 foo2(int n)  /* { dg-bogus normally detect pure candidate } */
-{ /* { dg-warning pure detect pure candidate { target *-*-* } 16 } */
+{ /* { dg-warning pure detect pure candidate { target *-*-* } 17 } */
   int ret = 0;
   int i;
   for (i=0; in; i++)
@@ -53,6 +53,6 @@ foo4(int n)  /* { dg-warning pure\[^\n\]* normally detect 
pure candidate } *
 
 int
 foo5(int n)  /* { dg-bogus normally detect pure candidate } */
-{  /* { dg-warning pure detect pure candidate { target *-*-* } 54 } */
+{  /* { dg-warning pure detect pure candidate { target *-*-* } 55 } */
   return foo2(n);
 }
-- 
2.1.3


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Tom de Vries wrote:

 Richard,
 
 this (trunk) patch fixes PR62167.
 
 The patch fixes a problem that triggers with the test-case on the 4.8 branch,
 when tail-merge makes a dead type-unsafe load alive.
 
 I'm not able to reproduce this bug on 4.9 and trunk with the same test-case.
 On those branches, the tail-merge already does not happen.
 
 The reason for the difference is as follows: With 4.8 the two phi arguments of
 the phi in the tail block are value-numbered identically:
 ...
 SCC consists of: p_14
 Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first;
 Setting value number of p_14 to p_14 (changed)
 
 SCC consists of: p_15
 Value numbering p_15 stmt = p_15 = _13-next;
 Setting value number of p_15 to p_14 (changed)
 ...
 
 With 4.9 (and trunk), that's not the case:
 ...
 SCC consists of: p_14
 Value numbering p_14 stmt = p_14 = MEM[(struct head *)heads][k.1_9].first;
 Setting value number of p_14 to p_14 (changed)
 
 SCC consists of: p_15
 Value numbering p_15 stmt = p_15 = _13-next;
 Setting value number of p_15 to p_15 (changed)
 ...
 
 I'm not sure the bug triggers on trunk and 4.9, but I see no reason why it
 could not trigger, so I'd prefer to apply the patch to 4.9 and trunk as well.
 
 The patch introduces an xfail for pr51879-12.c. I can follow up with a patch
 to improve upon that, but I think that's better limited to trunk only.

Yeah.

 Bootstrapped and reg-tested on x86_64/trunk.
 
 OK for trunk/stage3, 4.8, 4.9?

Ok.  Note that this goes well with disabling the (re-)use of 
value-numbering in tail-merging (which causes me quite some headaches...).
On trunk it shouldn't be necessary as much as it was earlier as we now
fully propagate constants and copies during FRE/PRE elimination.

Thanks,
Richard.


Re: [PATCH] Fix -fsanitize=bool -fnon-call-exceptions (PR sanitizer/63913)

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Jakub Jelinek wrote:

 Hi!
 
 This patch fixes instrumentation of bool/enum loads if they could
 throw.  We want to keep the EH on the load, and push
 further statements on the fallthru edge.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2014-11-18  Jakub Jelinek  ja...@redhat.com
 
   PR sanitizer/63913
   * ubsan.c: Include tree-eh.h.
   (instrument_bool_enum_load): Handle loads that can throw.
 
   * g++.dg/ubsan/pr63913.C: New test.
 
 --- gcc/ubsan.c.jj2014-11-14 15:11:49.0 +0100
 +++ gcc/ubsan.c   2014-11-18 09:29:20.409209556 +0100
 @@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.
  #include dfp.h
  #include builtins.h
  #include tree-object-size.h
 +#include tree-eh.h
  
  /* Map from a tree to a VAR_DECL tree.  */
  
 @@ -1135,7 +1136,9 @@ instrument_bool_enum_load (gimple_stmt_i
|| TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
  return;
  
 +  bool can_throw = stmt_could_throw_p (stmt);
location_t loc = gimple_location (stmt);
 +  tree lhs = gimple_assign_lhs (stmt);
tree ptype = build_pointer_type (TREE_TYPE (rhs));
tree atype = reference_alias_ptr_type (rhs);
gimple g = gimple_build_assign (make_ssa_name (ptype, NULL),
 @@ -1145,9 +1148,24 @@ instrument_bool_enum_load (gimple_stmt_i
tree mem = build2 (MEM_REF, utype, gimple_assign_lhs (g),
build_int_cst (atype, 0));
tree urhs = make_ssa_name (utype, NULL);
 -  g = gimple_build_assign (urhs, mem);
 -  gimple_set_location (g, loc);
 -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
 +  if (can_throw)
 +{
 +  gimple_assign_set_lhs (stmt, urhs);
 +  g = gimple_build_assign_with_ops (NOP_EXPR, lhs, urhs, NULL_TREE);
 +  gimple_set_location (g, loc);
 +  edge e = find_fallthru_edge (gimple_bb (stmt)-succs);
 +  gsi_insert_on_edge_immediate (e, g);
 +  gimple_assign_set_rhs_with_ops (gsi, MEM_REF, mem, NULL_TREE);

Please use gimple_assign_set_rhs_from_tree for single-rhs trees.

Ok with that change.

Thanks,
RIchard.

 +  update_stmt (stmt);
 +  *gsi = gsi_for_stmt (g);
 +  g = stmt;
 +}
 +  else
 +{
 +  g = gimple_build_assign (urhs, mem);
 +  gimple_set_location (g, loc);
 +  gsi_insert_before (gsi, g, GSI_SAME_STMT);
 +}
minv = fold_convert (utype, minv);
maxv = fold_convert (utype, maxv);
if (!integer_zerop (minv))
 @@ -1169,8 +1187,11 @@ instrument_bool_enum_load (gimple_stmt_i
gimple_set_location (g, loc);
gsi_insert_after (gsi, g, GSI_NEW_STMT);
  
 -  gimple_assign_set_rhs_with_ops (gsi2, NOP_EXPR, urhs, NULL_TREE);
 -  update_stmt (stmt);
 +  if (!can_throw)
 +{
 +  gimple_assign_set_rhs_with_ops (gsi2, NOP_EXPR, urhs, NULL_TREE);
 +  update_stmt (stmt);
 +}
  
gsi2 = gsi_after_labels (then_bb);
if (flag_sanitize_undefined_trap_on_error)
 --- gcc/testsuite/g++.dg/ubsan/pr63913.C.jj   2014-11-18 09:34:10.603981487 
 +0100
 +++ gcc/testsuite/g++.dg/ubsan/pr63913.C  2014-11-18 09:33:47.0 
 +0100
 @@ -0,0 +1,12 @@
 +// PR sanitizer/63913
 +// { dg-do compile }
 +// { dg-options -fsanitize=bool -fnon-call-exceptions }
 +
 +struct B { B (); ~B (); };
 +
 +double
 +foo (bool *x)
 +{
 +  B b;
 +  return *x;
 +}
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Jakub Jelinek wrote:

 Hi!
 
 OImode/XImode on i?86/x86_64 are not = MAX_BITSIZE_MODE_ANY_INT, because
 they are never used for integer arithmetics (and there is no way
 to represent all their values in RTL if not using CONST_WIDE_INT).
 As the following testcase shows, simplify_immed_subreg can be called
 with such modes though, e.g. trying to forward propagate a CONST_VECTOR
 (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
 in the IL directly) into a SUBREG_REG.

So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
end doing with that OI mode subreg?  (why can't we simply use the
V4DF one?)

 The following patch instead of ICE handles the most common cases (all 0
 and all 1 CONST_VECTORs) and returns NULL otherwise.
 
 Before wide-int got merged, the testcase worked, though the code didn't
 bother checking anything, just created 0 or constm1_rtx for the two cases
 that could happen and if something else appeared, could just return what
 matched low TImode (or DImode for -m32).
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Looks ok to me.  Not sure if the zero/all-ones case really happens that
much to be worth special-casing - I think you could use
fixed_wide_intproper-size and simply see if the result is representable
in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
the patch may be smaller for that.

Thanks,
Richard.

 2014-11-18  Jakub Jelinek  ja...@redhat.com
 
   PR target/63910
   * simplify-rtx.c (simplify_immed_subreg): For integer modes
   wider than MAX_BITSIZE_MODE_ANY_INT, handle all zeros and all ones
   and for other values return NULL_RTX.
 
   * gcc.target/i386/pr63910.c: New test.
 
 --- gcc/simplify-rtx.c.jj 2014-11-11 00:06:19.0 +0100
 +++ gcc/simplify-rtx.c2014-11-18 10:17:01.198668555 +0100
 @@ -5505,6 +5505,21 @@ simplify_immed_subreg (machine_mode oute
   HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
 HOST_BITS_PER_WIDE_INT];
   wide_int r;
  
 + if (GET_MODE_PRECISION (outer_submode)  MAX_BITSIZE_MODE_ANY_INT)
 +   {
 + /* Handle just all zeros and all ones CONST_VECTORs in
 +this case.  */
 + if ((vp[0]  value_mask) == 0)
 +   elems[elem] = const0_rtx;
 + else if ((vp[0]  value_mask) == value_mask)
 +   elems[elem] = constm1_rtx;
 + else
 +   return NULL_RTX;
 + for (i = value_bit; i  elem_bitsize; i += value_bit)
 +   if ((vp[i / value_bit]  value_mask) != (vp[0]  value_mask))
 + return NULL_RTX;
 + break;
 +   }
   for (u = 0; u  units; u++)
 {
   unsigned HOST_WIDE_INT buf = 0;
 @@ -5516,8 +5531,6 @@ simplify_immed_subreg (machine_mode oute
   tmp[u] = buf;
   base += HOST_BITS_PER_WIDE_INT;
 }
 - gcc_assert (GET_MODE_PRECISION (outer_submode)
 - = MAX_BITSIZE_MODE_ANY_INT);
   r = wide_int::from_array (tmp, units,
 GET_MODE_PRECISION (outer_submode));
   elems[elem] = immed_wide_int_const (r, outer_submode);
 --- gcc/testsuite/gcc.target/i386/pr63910.c.jj2014-11-18 
 10:12:24.282659318 +0100
 +++ gcc/testsuite/gcc.target/i386/pr63910.c   2014-11-18 10:12:00.0 
 +0100
 @@ -0,0 +1,12 @@
 +/* PR target/63910 */
 +/* { dg-do compile } */
 +/* { dg-options -O -mstringop-strategy=vector_loop -mavx512f } */
 +
 +extern void bar (float *c);
 +
 +void
 +foo (void)
 +{
 +  float c[1024] = { };
 +  bar (c);
 +}
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Jakub Jelinek wrote:

 Hi!
 
 Apparently, expand_expr with EXPR_WRITE can return
 a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For

Huh, that looks bogus to me.  But of course I know nothing
(read: not enough) to really tell.  Eric?

Richard.

 UBSAN_CHECK_{ADD,SUB,MUL} expansion, I've been doing just
 emit_move_insn into it, which apparently is wrong in that case,
 store_expr instead uses convert_move for it.  The
 {ADD,SUB,MUL}_OVERFLOW (i.e. __builtin_*_overflow) expansion
 shouldn't need it, as the result is complex and complex integers
 aren't promoted that way.  As store_expr* uses a tree expression
 to store, while I have rtx, I just wrote a short helper function
 for this.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2014-11-18  Jakub Jelinek  ja...@redhat.com
 
   PR sanitizer/63520
   * internal-fn.c (expand_ubsan_result_store): New function.
   (expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
   Use it instead of just emit_move_insn.
 
   * c-c++-common/ubsan/pr63520.c: New test.
 
 --- gcc/internal-fn.c.jj  2014-11-12 13:28:47.0 +0100
 +++ gcc/internal-fn.c 2014-11-18 15:35:46.395916823 +0100
 @@ -395,6 +395,21 @@ expand_arith_overflow_result_store (tree
write_complex_part (target, lres, false);
  }
  
 +/* Helper for expand_*_overflow.  Store RES into TARGET.  */
 +
 +static void
 +expand_ubsan_result_store (rtx target, rtx res)
 +{
 +  if (GET_CODE (target) == SUBREG  SUBREG_PROMOTED_VAR_P (target))
 +/* If this is a scalar in a register that is stored in a wider mode   
 +   than the declared mode, compute the result into its declared mode
 +   and then convert to the wider mode.  Our value is the computed
 +   expression.  */
 +convert_move (SUBREG_REG (target), res, SUBREG_PROMOTED_SIGN (target));
 +  else
 +emit_move_insn (target, res);
 +}
 +
  /* Add sub/add overflow checking to the statement STMT.
 CODE says whether the operation is +, or -.  */
  
 @@ -809,7 +824,7 @@ expand_addsub_overflow (location_t loc,
if (lhs)
  {
if (is_ubsan)
 - emit_move_insn (target, res);
 + expand_ubsan_result_store (target, res);
else
   {
 if (do_xor)
 @@ -904,7 +919,7 @@ expand_neg_overflow (location_t loc, tre
if (lhs)
  {
if (is_ubsan)
 - emit_move_insn (target, res);
 + expand_ubsan_result_store (target, res);
else
   expand_arith_overflow_result_store (lhs, target, mode, res);
  }
 @@ -1590,7 +1605,7 @@ expand_mul_overflow (location_t loc, tre
if (lhs)
  {
if (is_ubsan)
 - emit_move_insn (target, res);
 + expand_ubsan_result_store (target, res);
else
   expand_arith_overflow_result_store (lhs, target, mode, res);
  }
 --- gcc/testsuite/c-c++-common/ubsan/pr63520.c.jj 2014-11-18 
 15:40:07.271273710 +0100
 +++ gcc/testsuite/c-c++-common/ubsan/pr63520.c2014-11-18 
 15:40:40.971673904 +0100
 @@ -0,0 +1,16 @@
 +/* PR sanitizer/63520 */
 +/* { dg-do compile } */
 +/* { dg-options -fsanitize=undefined } */
 +
 +int a;
 +
 +void
 +foo (void)
 +{
 +  while (1)
 +{
 +  if (a == 1)
 + break;
 +  a -= 1;
 +}
 +}
 
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


[PATCH] Fix PR63844

2014-11-19 Thread Richard Biener

The following improves code-generation for PR63844 by using
a restrict qualified reference type for the OMP receiver decl.
This improves alias analysis and points-to analysis enough to
usually allow invariant and store motion where that was possible
in the non-split-out variant.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

Sadly I don't have a good testcase (for example one that wasn't
vectorized before but is now).  We can add one later - ISTR there
were a few other bugs about not vectorizing with -fopenmp, I'll
try to find them again and see whether they are fixed.

Thanks,
Richard. 

2014-11-19  Richard Biener  rguent...@suse.de

PR tree-optimization/63844
* omp-low.c (fixup_child_record_type): Use a restrict qualified
referece type for the receiver parameter.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c   (revision 217692)
+++ gcc/omp-low.c   (working copy)
@@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
   layout_type (type);
 }
 
-  TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type);
+  TREE_TYPE (ctx-receiver_decl)
+= build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing


Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote:
 On Tue, 18 Nov 2014, Jakub Jelinek wrote:
  Apparently, expand_expr with EXPR_WRITE can return
  a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For
 
 Huh, that looks bogus to me.  But of course I know nothing
 (read: not enough) to really tell.  Eric?

I've tried to look where it comes from, and it dates back to r2xxx
or so, so forever.
And store_expr has a large:
  else if (GET_CODE (target) == SUBREG  SUBREG_PROMOTED_VAR_P (target))
/* If this is a scalar in a register that is stored in a wider mode
   than the declared mode, compute the result into its declared mode
   and then convert to the wider mode.  Our value is the computed
   expression.  */
handling block.  For targets with only word sized operations something
like that actually makes a lot of sense, I have just not been aware of that.

Jakub


Re: [PATCH] Fix PR63844

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 10:07:07AM +0100, Richard Biener wrote:
 
 The following improves code-generation for PR63844 by using
 a restrict qualified reference type for the OMP receiver decl.
 This improves alias analysis and points-to analysis enough to
 usually allow invariant and store motion where that was possible
 in the non-split-out variant.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
 
 Sadly I don't have a good testcase (for example one that wasn't
 vectorized before but is now).  We can add one later - ISTR there
 were a few other bugs about not vectorizing with -fopenmp, I'll
 try to find them again and see whether they are fixed.

Most of those PRs should have openmp keyword.

 2014-11-19  Richard Biener  rguent...@suse.de
 
   PR tree-optimization/63844
   * omp-low.c (fixup_child_record_type): Use a restrict qualified
   referece type for the receiver parameter.

Ok, thanks.

 --- gcc/omp-low.c (revision 217692)
 +++ gcc/omp-low.c (working copy)
 @@ -1517,7 +1517,8 @@ fixup_child_record_type (omp_context *ct
layout_type (type);
  }
  
 -  TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type);
 +  TREE_TYPE (ctx-receiver_decl)
 += build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
  }
  
  /* Instantiate decls as necessary in CTX to satisfy the data sharing

Jakub


Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-11-19 Thread Ramana Radhakrishnan



On 14/11/14 15:12, Maxim Kuvyrkov wrote:

On Nov 14, 2014, at 8:38 AM, Jeff Law l...@redhat.com wrote:


On 10/20/14 22:06, Maxim Kuvyrkov wrote:

Hi,
Ramana, this change requires benchmarking, which I can't easily do
at

the moment. I would appreciate any benchmarking results that you can
share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
needs to be tuned/confirmed for Cortex-A15.
What were the results of that benchmarking?  IIRC I tabled reviewing this work 
waiting for those results (and I probably should have let you know that.  
Sorry,  my bad there).


I don't have the benchmarking results yet, and I was hoping for ARM to help 
with getting the numbers.  The arm maintainers still need to OK the 
arm-specific portion of the patch, which, I imagine, will happen only of 
benchmark scores improve.



I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the 
trees you gave me links to against the equivalent version on trunk.


The results with SPEC2k on A15 were in the noise with the default value 
for PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm 
still waiting on results for values 0, 1 and 3 and hopefully something 
will come back soon for SPEC2k.





@@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
   return issue_rate  1 ? issue_rate : 0;
 }

+/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
+static int
+arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
+{
+  switch (arm_tune)
+{
+case cortexa15:
+  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
+
+default:
+  return 0;
+}
+}
+



It would be better to have this as a flag in the tuning tables rather 
than hardcoding for a core here. The backend has been moving in that 
direction for all core centric information and it is preferable that be 
continued.


So this logic here should just be

if (current_tune-multipass_lookahead)
  return autopref_multipass_lookahead_guard (insn, ready_index);
else
  return 0;




regards
Ramana






...

Can this be built on top of Bin's work for insn fusion?  There's a lot of 
commonality in the structure of the insns you care about.  He's already got a 
nice little priority function that I think you could utilize to to ensure the 
insns with smaller offsets fire first.


I would argue that macro-fusion should have been implemented the way 
autopref_model is -- via 
targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook.  To implement the 
autopref model I cleaned up and generalized existing infrastructure (max_issue 
and dfa_lookahead_guard hook) instead of adding yet another decision-making 
primitive to the scheduler.




My biggest concern would be sched2 coming along and undoing that work since 
you're not going to fuse those into move-multiple types of instructions.


The autoprefetcher will be active only during sched2.  It is disabled during 
sched1 by the fact that max_issue is not used when scheduling for register 
pressure.

Thanks,

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-19 Thread Yangfei (Felix)
  Sorry for missing the point.  It seems to me that 't2' here will conflict 
  with
 condition of the pattern *movhi_insn_arch4:
 TARGET_ARM
   arm_arch4
   (register_operand (operands[0], HImode)
  || register_operand (operands[1], HImode))
 
  #define TARGET_ARM  (! TARGET_THUMB)
  /* 32-bit Thumb-2 code.  */
  #define TARGET_THUMB2   (TARGET_THUMB 
 arm_arch_thumb2)
 
 
 Bah, Indeed ! - I misremembered the t2 there, my mistake.
 
 Yes you are right there, but what I'd like you to do is to use that mechanism
 rather than putting all this logic in the predicate.
 
 So, I'd prefer you to add a v6t2 to the values for the arch attribute, 
 don't forget
 to update the comments above.
 
 and in arch_enabled you need to enforce this with
 
   (and (eq_attr arch v6t2)
(match_test TARGET_32BIT  arm_arch6  arm_arch_thumb2))
(const_string yes)
 
 And in the pattern use v6t2 ...
 
 arm_arch_thumb2 implies that this is at the architecture level of v6t2.
 Therefore TARGET_ARM  arm_arch_thumb2 implies ARM state.


Hi Ramana, 
Thank you for your suggestions.  I rebased the patch on the latest trunk 
and updated it accordingly. 
As this patch will not work for architectures older than armv6t2,  I also 
prefer Thomas's patch to fix for them. 
I am currently performing test for this patch.  Assuming no issues pops up, 
OK for the trunk?  
And is it necessary to backport this patch to the 4.8  4.9 branches? 


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217717)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2014-11-19  Felix Yang  felix.y...@huawei.com
+   Shanyao Chen  chenshan...@huawei.com
+
+   PR target/59593
+   * config/arm/arm.md (define_attr arch): Add v6t2.
+   (define_attr arch_enabled): Add test for the above.
+   (*movhi_insn_arch4): Add new alternative.
+
 2014-11-18  Felix Yang  felix.y...@huawei.com
 
* config/aarch64/aarch64.c (doloop_end): New pattern.
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 217717)
+++ gcc/config/arm/arm.md   (working copy)
@@ -125,9 +125,10 @@
 ; This can be a for ARM, t for either of the Thumbs, 32 for
 ; TARGET_32BIT, t1 or t2 to specify a specific Thumb mode.  v6
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
-; arm_arch6.  This attribute is used to compute attribute enabled,
-; use type any to enable an alternative in all cases.
-(define_attr arch 
any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3
+; arm_arch6.  v6t2 for Thumb-2 with arm_arch6.  This attribute is
+; used to compute attribute enabled, use type any to enable an
+; alternative in all cases.
+(define_attr arch 
any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3
   (const_string any))
 
 (define_attr arch_enabled no,yes
@@ -162,6 +163,10 @@
  (match_test TARGET_32BIT  !arm_arch6))
 (const_string yes)
 
+(and (eq_attr arch v6t2)
+ (match_test TARGET_32BIT  arm_arch6  arm_arch_thumb2))
+(const_string yes)
+
 (and (eq_attr arch avoid_neon_for_64bits)
  (match_test TARGET_NEON)
  (not (match_test TARGET_PREFER_NEON_64BITS)))
@@ -6288,8 +6293,8 @@
 
 ;; Pattern to recognize insn generated default case above
 (define_insn *movhi_insn_arch4
-  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
-   (match_operand:HI 1 general_operand  rIk,K,r,mi))]
+  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
+   (match_operand:HI 1 general_operand  rIk,K,n,r,mi))]
   TARGET_ARM
 arm_arch4
 (register_operand (operands[0], HImode)
@@ -6297,16 +6302,19 @@
   @
mov%?\\t%0, %1\\t%@ movhi
mvn%?\\t%0, #%B1\\t%@ movhi
+   movw%?\\t%0, %1\\t%@ movhi
str%(h%)\\t%1, %0\\t%@ movhi
ldr%(h%)\\t%0, %1\\t%@ movhi
   [(set_attr predicable yes)
-   (set_attr pool_range *,*,*,256)
-   (set_attr neg_pool_range *,*,*,244)
+   (set_attr pool_range *,*,*,*,256)
+   (set_attr neg_pool_range *,*,*,*,244)
+   (set_attr arch *,*,v6t2,*,*)
(set_attr_alternative type
  [(if_then_else (match_operand 1 const_int_operand 
)
 (const_string mov_imm )
 (const_string mov_reg))
   (const_string mvn_imm)
+  (const_string mov_imm)
   (const_string store1)
   (const_string load1)])]
 )


arm-patch-v3.diff
Description: arm-patch-v3.diff


Re: [PATCH] Fix simd clone vectorization with EH (PR tree-optimization/63915)

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Jakub Jelinek wrote:

 Hi!
 
 Simd clone vectorization uses vect_finish_stmt_generation which
 handles adding the new calls properly to EH tables, but afterwards
 we replace the original call with a normal assignment, and if the original
 call can throw, we need to remove it from the EH tables.
 
 Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
 ok for trunk?

Ok.

Thanks,
Richard.

 2014-11-18  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/63915
   * tree-vect-stmts.c (vectorizable_simd_clone_call): Pass
   true instead of false as last argument to gsi_replace.
 
   * c-c++-common/gomp/pr60823-4.c: New test.
 
 --- gcc/tree-vect-stmts.c.jj  2014-11-14 00:10:39.0 +0100
 +++ gcc/tree-vect-stmts.c 2014-11-18 17:02:15.635257023 +0100
 @@ -3195,7 +3195,7 @@ vectorizable_simd_clone_call (gimple stm
set_vinfo_for_stmt (new_stmt, stmt_info);
set_vinfo_for_stmt (stmt, NULL);
STMT_VINFO_STMT (stmt_info) = new_stmt;
 -  gsi_replace (gsi, new_stmt, false);
 +  gsi_replace (gsi, new_stmt, true);
unlink_stmt_vdef (stmt);
  
return true;
 --- gcc/testsuite/c-c++-common/gomp/pr60823-4.c.jj2014-11-18 
 17:06:38.859319961 +0100
 +++ gcc/testsuite/c-c++-common/gomp/pr60823-4.c   2014-11-18 
 17:06:34.602383632 +0100
 @@ -0,0 +1,7 @@
 +/* PR tree-optimization/63915 */
 +/* { dg-do run } */
 +/* { dg-require-effective-target vect_simd_clones } */
 +/* { dg-options -O2 -fopenmp-simd } */
 +/* { dg-additional-options -fpic { target fpic } } */
 +
 +#include pr60823-2.c
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH, PR62167] Fix tail-merge pass for dead type-unsafe code

2014-11-19 Thread Richard Biener
On Tue, 18 Nov 2014, Jeff Law wrote:

 On 11/18/14 09:57, Tom de Vries wrote:
  Richard,
  
  this (trunk) patch fixes PR62167.
  
  The patch fixes a problem that triggers with the test-case on the 4.8
  branch, when tail-merge makes a dead type-unsafe load alive.
  
  I'm not able to reproduce this bug on 4.9 and trunk with the same
  test-case. On those branches, the tail-merge already does not happen.
  
  The reason for the difference is as follows: With 4.8 the two phi
  arguments of the phi in the tail block are value-numbered identically:
  ...
  SCC consists of: p_14
  Value numbering p_14 stmt = p_14 = MEM[(struct head *)_13].first;
  Setting value number of p_14 to p_14 (changed)
  
  SCC consists of: p_15
  Value numbering p_15 stmt = p_15 = _13-next;
  Setting value number of p_15 to p_14 (changed)
  ...
  
  With 4.9 (and trunk), that's not the case:
  ...
  SCC consists of: p_14
  Value numbering p_14 stmt = p_14 = MEM[(struct head *)heads][k.1_9].first;
  Setting value number of p_14 to p_14 (changed)
  
  SCC consists of: p_15
  Value numbering p_15 stmt = p_15 = _13-next;
  Setting value number of p_15 to p_15 (changed)
  ...
  
  I'm not sure the bug triggers on trunk and 4.9, but I see no reason why
  it could not trigger, so I'd prefer to apply the patch to 4.9 and trunk
  as well.
  
  The patch introduces an xfail for pr51879-12.c. I can follow up with a
  patch to improve upon that, but I think that's better limited to trunk
  only.
  
  Bootstrapped and reg-tested on x86_64/trunk.
 So instead of simply disabling for anything with virtual operands, shouldn't
 instead we be comparing the virtual operands and alias information?  Seems to
 me that if we did that properly, this would just work.  Right?

But that would simply use operand_equal_p () 

Richard.


Re: [PATCH 7/8] Model cache auto-prefetcher in scheduler

2014-11-19 Thread Maxim Kuvyrkov
On Nov 19, 2014, at 12:27 PM, Ramana Radhakrishnan 
ramana.radhakrish...@arm.com wrote:

 
 
 On 14/11/14 15:12, Maxim Kuvyrkov wrote:
 On Nov 14, 2014, at 8:38 AM, Jeff Law l...@redhat.com wrote:
 
 On 10/20/14 22:06, Maxim Kuvyrkov wrote:
 Hi,
 Ramana, this change requires benchmarking, which I can't easily do
 at
 the moment. I would appreciate any benchmarking results that you can
 share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
 needs to be tuned/confirmed for Cortex-A15.
 What were the results of that benchmarking?  IIRC I tabled reviewing this 
 work waiting for those results (and I probably should have let you know 
 that.  Sorry,  my bad there).
 
 I don't have the benchmarking results yet, and I was hoping for ARM to help 
 with getting the numbers.  The arm maintainers still need to OK the 
 arm-specific portion of the patch, which, I imagine, will happen only of 
 benchmark scores improve.
 
 
 I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees 
 you gave me links to against the equivalent version on trunk.
 
 The results with SPEC2k on A15 were in the noise with the default value for 
 PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still waiting 
 on results for values 0, 1 and 3 and hopefully something will come back soon 
 for SPEC2k.

I think there is a good chance that this optimization will not improve SPEC 
benchmarks, my main concern is to make sure SPECs don't regress.  If SPECs are 
neutral and the patch gives good improvement (2.5 times faster) on another 
benchmark (STREAM), is this a good enough argument for commit?

 
 
 
 @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
   return issue_rate  1 ? issue_rate : 0;
 }
 
 +/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
 +static int
 +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
 +{
 +  switch (arm_tune)
 +{
 +case cortexa15:
 +  return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
 +
 +default:
 +  return 0;
 +}
 +}
 +
 
 
 It would be better to have this as a flag in the tuning tables rather than 
 hardcoding for a core here. The backend has been moving in that direction for 
 all core centric information and it is preferable that be continued.
 
 So this logic here should just be
 
 if (current_tune-multipass_lookahead)
  return autopref_multipass_lookahead_guard (insn, ready_index);
 else
  return 0;
 

OK, I'll convert this to use tuning tables.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


Re: [PATCH] Fix ubsan -fsanitize=signed-integer-overflow expansion (PR sanitizer/63520)

2014-11-19 Thread Richard Biener
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

 On Wed, Nov 19, 2014 at 10:02:18AM +0100, Richard Biener wrote:
  On Tue, 18 Nov 2014, Jakub Jelinek wrote:
   Apparently, expand_expr with EXPR_WRITE can return
   a SUBREG with SUBREG_PROMOTED_VAR_P set on it.  For
  
  Huh, that looks bogus to me.  But of course I know nothing
  (read: not enough) to really tell.  Eric?
 
 I've tried to look where it comes from, and it dates back to r2xxx
 or so, so forever.
 And store_expr has a large:
   else if (GET_CODE (target) == SUBREG  SUBREG_PROMOTED_VAR_P (target))
 /* If this is a scalar in a register that is stored in a wider mode
than the declared mode, compute the result into its declared mode
and then convert to the wider mode.  Our value is the computed
expression.  */
 handling block.  For targets with only word sized operations something
 like that actually makes a lot of sense, I have just not been aware of that.

Ok.  Then the patch is ok.

Thanks,
Richard.


Re: [PATCH, PR63742][ARM] Fix arm *movhi_insn_arch4 pattern for big-endian

2014-11-19 Thread Ramana Radhakrishnan



On 19/11/14 09:29, Yangfei (Felix) wrote:

Sorry for missing the point.  It seems to me that 't2' here will conflict with

condition of the pattern *movhi_insn_arch4:

TARGET_ARM
  arm_arch4
  (register_operand (operands[0], HImode)
 || register_operand (operands[1], HImode))

#define TARGET_ARM  (! TARGET_THUMB)
/* 32-bit Thumb-2 code.  */
#define TARGET_THUMB2   (TARGET_THUMB 

arm_arch_thumb2)




Bah, Indeed ! - I misremembered the t2 there, my mistake.

Yes you are right there, but what I'd like you to do is to use that mechanism
rather than putting all this logic in the predicate.

So, I'd prefer you to add a v6t2 to the values for the arch attribute, don't 
forget
to update the comments above.

and in arch_enabled you need to enforce this with

   (and (eq_attr arch v6t2)
(match_test TARGET_32BIT  arm_arch6  arm_arch_thumb2))
 (const_string yes)

And in the pattern use v6t2 ...

arm_arch_thumb2 implies that this is at the architecture level of v6t2.
Therefore TARGET_ARM  arm_arch_thumb2 implies ARM state.



Hi Ramana,
 Thank you for your suggestions.  I rebased the patch on the latest trunk 
and updated it accordingly.
 As this patch will not work for architectures older than armv6t2,  I also 
prefer Thomas's patch to fix for them.
 I am currently performing test for this patch.  Assuming no issues pops 
up, OK for the trunk?
 And is it necessary to backport this patch to the 4.8  4.9 branches?




This is OK for trunk if no regressions - please let it bake for a few 
days on trunk to let auto-testers catch up. This is OK for 4.8 / 4.9 
after that and please test your backport.


regards
Ramana





Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 217717)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,11 @@
+2014-11-19  Felix Yang  felix.y...@huawei.com
+   Shanyao Chen  chenshan...@huawei.com
+
+   PR target/59593
+   * config/arm/arm.md (define_attr arch): Add v6t2.
+   (define_attr arch_enabled): Add test for the above.
+   (*movhi_insn_arch4): Add new alternative.
+
  2014-11-18  Felix Yang  felix.y...@huawei.com

* config/aarch64/aarch64.c (doloop_end): New pattern.
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 217717)
+++ gcc/config/arm/arm.md   (working copy)
@@ -125,9 +125,10 @@
  ; This can be a for ARM, t for either of the Thumbs, 32 for
  ; TARGET_32BIT, t1 or t2 to specify a specific Thumb mode.  v6
  ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
-; arm_arch6.  This attribute is used to compute attribute enabled,
-; use type any to enable an alternative in all cases.
-(define_attr arch 
any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3
+; arm_arch6.  v6t2 for Thumb-2 with arm_arch6.  This attribute is
+; used to compute attribute enabled, use type any to enable an
+; alternative in all cases.
+(define_attr arch 
any,a,t,32,t1,t2,v6,nov6,v6t2,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3
(const_string any))

  (define_attr arch_enabled no,yes
@@ -162,6 +163,10 @@
  (match_test TARGET_32BIT  !arm_arch6))
 (const_string yes)

+(and (eq_attr arch v6t2)
+ (match_test TARGET_32BIT  arm_arch6  arm_arch_thumb2))
+(const_string yes)
+
 (and (eq_attr arch avoid_neon_for_64bits)
  (match_test TARGET_NEON)
  (not (match_test TARGET_PREFER_NEON_64BITS)))
@@ -6288,8 +6293,8 @@

  ;; Pattern to recognize insn generated default case above
  (define_insn *movhi_insn_arch4
-  [(set (match_operand:HI 0 nonimmediate_operand =r,r,m,r)
-   (match_operand:HI 1 general_operand  rIk,K,r,mi))]
+  [(set (match_operand:HI 0 nonimmediate_operand =r,r,r,m,r)
+   (match_operand:HI 1 general_operand  rIk,K,n,r,mi))]
TARGET_ARM
  arm_arch4
  (register_operand (operands[0], HImode)
@@ -6297,16 +6302,19 @@
@
 mov%?\\t%0, %1\\t%@ movhi
 mvn%?\\t%0, #%B1\\t%@ movhi
+   movw%?\\t%0, %1\\t%@ movhi
 str%(h%)\\t%1, %0\\t%@ movhi
 ldr%(h%)\\t%0, %1\\t%@ movhi
[(set_attr predicable yes)
-   (set_attr pool_range *,*,*,256)
-   (set_attr neg_pool_range *,*,*,244)
+   (set_attr pool_range *,*,*,*,256)
+   (set_attr neg_pool_range *,*,*,*,244)
+   (set_attr arch *,*,v6t2,*,*)
 (set_attr_alternative type
   [(if_then_else (match_operand 1 const_int_operand 
)
  (const_string mov_imm )
  (const_string mov_reg))
(const_string mvn_imm)
+  (const_string mov_imm)
(const_string store1)
(const_string load1)])]
  )



Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-19 Thread Richard Biener
On Tue, Nov 18, 2014 at 10:04 PM, David Malcolm dmalc...@redhat.com wrote:
 On Tue, 2014-11-18 at 10:53 +0100, Richard Biener wrote:
 On Tue, Nov 18, 2014 at 2:59 AM, David Malcolm dmalc...@redhat.com wrote:
  On Mon, 2014-11-17 at 11:06 +0100, Richard Biener wrote:
  On Sat, Nov 15, 2014 at 12:00 PM, David Malcolm dmalc...@redhat.com 
  wrote:
   On Thu, 2014-11-13 at 11:45 +0100, Richard Biener wrote:
   On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm dmalc...@redhat.com 
   wrote:
On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:
On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com 
wrote:
 On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
 On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
  On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
   To be constructive here - the above case is from within a
   GIMPLE_ASSIGN case label
   and thus I'd have expected
  
   case GIMPLE_ASSIGN:
 {
   gassign *a1 = as_a gassign * (s1);
   gassign *a2 = as_a gassign * (s2);
 lhs1 = gimple_assign_lhs (a1);
 lhs2 = gimple_assign_lhs (a2);
 if (TREE_CODE (lhs1) != SSA_NAME
  TREE_CODE (lhs2) != SSA_NAME)
   return (operand_equal_p (lhs1, lhs2, 0)
gimple_operand_equal_value_p 
   (gimple_assign_rhs1 (a1),

   gimple_assign_rhs1 (a2)));
 else if (TREE_CODE (lhs1) == SSA_NAME
   TREE_CODE (lhs2) == SSA_NAME)
   return vn_valueize (lhs1) == vn_valueize (lhs2);
 return false;
 }
  
   instead.  That's the kind of changes I have expected and 
   have approved of.
 
  But even that looks like just adding extra work for all 
  developers, with no
  gain.  You only have to add extra code and extra temporaries, 
  in switches
  typically also have to add {} because of the temporaries and 
  thus extra
  indentation level, and it doesn't simplify anything in the 
  code.

 The branch attempts to use the C++ typesystem to capture 
 information
 about the kinds of gimple statement we expect, both:
   (A) so that the compiler can detect type errors, and
   (B) as a comprehension aid to the human reader of the code

 The ideal here is when function params and struct field can be
 strengthened from gimple to a subclass ptr.  This captures the
 knowledge that every use of a function or within a struct has a 
 given
 gimple code.

 I just don't like all the as_a/is_a stuff enforced everywhere,
 it means more typing, more temporaries, more indentation.
 So, as I view it, instead of the checks being done cheaply (yes, 
 I think
 the gimple checking as we have right now is very cheap) under the
 hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those 
 changes
 put the burden on the developers, who has to check that manually 
 through
 the as_a/is_a stuff everywhere, more typing and uglier syntax.
 I just don't see that as a step forward, instead a huge step 
 backwards.
 But perhaps I'm alone with this.
 Can you e.g. compare the size of - lines in your patchset 
 combined, and
 size of + lines in your patchset?  As in, if your changes lead to 
 less
 typing or more.
   
I see two ways out here.  One is to add overloads to all the 
functions
taking the special types like
   
tree
gimple_assign_rhs1 (gimple *);
   
or simply add
   
gassign *operator ()(gimple *g) { return as_a gassign * (g); }
   
into a gimple-compat.h header which you include in places that
are not converted nicely.
   
Thanks for the suggestions.
   
Am I missing something, or is the gimple-compat.h idea above not 
valid C
++?
   
Note that gimple is still a typedef to
  gimple_statement_base *
(as noted before, the gimple - gimple * change would break everyone
else's patches, so we talked about that as a followup patch for early
stage3).
   
Given that, if I try to create an operator () outside of a class, I
get this error:
   
‘gassign* operator()(gimple)’ must be a nonstatic member function
   
which is emitted from cp/decl.c's grok_op_properties:
  /* An operator function must either be a non-static member 
function
 or have at least one parameter of a class, a reference to a 
class,
 an enumeration, or a reference to an enumeration.  13.4.0.6 
*/
   
I tried making it a member function of gimple_statement_base, but 
that
doesn't work either: we want a conversion
  from a gimple_statement_base * to a gassign *, not
  from a gimple_statement_base   to a gassign *.
   
Is there some syntactic 

Re: [PATCH] Cancel bswap opt when intermediate stmts are reused

2014-11-19 Thread Richard Biener
On Tue, Nov 18, 2014 at 5:33 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Monday, November 17, 2014 12:47 PM

 Hmm.  I am a little bit concerned about the malloc traffic generated here.
 So why not use a vecusedtree, get rid of the -next pointer and
 use a hash_map gimple, unsigned to associate the stmt with
 an index into the vector?

 Sure, it even makes things easier. However I don't understand why a vector is
 better than malloc. Is it a matter of fragmentation?

That and with a vector you get log(N) allocations instead of N.


 At this point I'd rather leave DCE to DCE ...

 I thought since all information is there why not do it. It makes it easier to
 read the dump of the pass.

Fair enough ...


 Ick - do we really have to use gotos here?  Can you refactor this
 a bit to avoid it?

 Yes I can. I needed the same kind of thing for fixing PR63761 (r217409) and
 found a way to avoid it.

Thanks.


 The whole scheme wouldn't exactly fly with the idea of eventually
 using a lattice to propagate the symbolic numbers, but well...


 I think the overall idea is sound and if you have time to adjust according
 to my comments that would be nice.

 To be honest I think it should wait for next stage1. After several ping I took
 another look at the testcase with the idea of measuring the size reduction
 the patch could give and was surprised that in all cases I could construct the
 size was actually bigger. Performance might be improved nonetheless but
 I think this needs more careful consideration.

 And as you said the approach would need to be changed if bswap was
 rewritten to do a forward analysis. At last, nobody reported a code size or
 performance regression so far due to the changes so that might be a non
 issue. If such a report happens, then it will be a good time to revisit that
 decision.

 Do you agree?

Yes.  If you can't come up with a testcase that it improves but it only
pessimizes cases then IMHO this would be a very premature optimization.
I suspect that we'd need to have a more complex cost analysis factoring
in that we _do_ remove a lot of code even if parts need to be kept alive.
So maybe only discard the transform if _all_ stmts need to be kept
live (as when we even remove one of the partial loads then this should
alread offset the single load we add ... for the bswap this may be
slightly different).



 Sorry for the very late review.

 That's alright, I know how it is. Thank you for keeping track of it. I 
 actually
 feel sorry I didn't warn about my findings. I thought the patch fell through
 the cracks and didn't want to spam gcc-patches uselessly.

Sure.

Thanks,
Richard.

 Best regards,

 Thomas





[PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}

2014-11-19 Thread Uros Bizjak
Hello!

2014-11-19  Uros Bizjak  ubiz...@gmail.com

* testsuite/libgomp.c/examples-4/e.53.5.c: Require
vect_simd_clones effective target.
* testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto.

Tested on x86_64-linux-gnu {,-m32} CentOS 5.11.

OK for mainline?

Uros.
Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90
===
--- testsuite/libgomp.fortran/examples-4/e.53.5.f90 (revision 217718)
+++ testsuite/libgomp.fortran/examples-4/e.53.5.f90 (working copy)
@@ -1,4 +1,4 @@
-! { dg-do run }
+! { dg-do run { target vect_simd_clones } }
 ! { dg-options -O2 }
 ! { dg-additional-options -msse2 { target sse2_runtime } }
 ! { dg-additional-options -mavx { target avx_runtime } }
Index: testsuite/libgomp.c/examples-4/e.53.5.c
===
--- testsuite/libgomp.c/examples-4/e.53.5.c (revision 217718)
+++ testsuite/libgomp.c/examples-4/e.53.5.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target vect_simd_clones } } */
 /* { dg-options -O2 } */
 /* { dg-additional-options -msse2 { target sse2_runtime } } */
 /* { dg-additional-options -mavx { target avx_runtime } } */


Re: [PATCH, libgomp]: Require vect_simd_clones effective target for examples-4/e.53.5.{c,f90}

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 11:26:25AM +0100, Uros Bizjak wrote:
 2014-11-19  Uros Bizjak  ubiz...@gmail.com
 
 * testsuite/libgomp.c/examples-4/e.53.5.c: Require
 vect_simd_clones effective target.
 * testsuite/libgomp.fortran/examples-4/e.53.5.f90: Ditto.
 
 Tested on x86_64-linux-gnu {,-m32} CentOS 5.11.
 
 OK for mainline?

Yes, thanks.

 Index: testsuite/libgomp.fortran/examples-4/e.53.5.f90
 ===
 --- testsuite/libgomp.fortran/examples-4/e.53.5.f90   (revision 217718)
 +++ testsuite/libgomp.fortran/examples-4/e.53.5.f90   (working copy)
 @@ -1,4 +1,4 @@
 -! { dg-do run }
 +! { dg-do run { target vect_simd_clones } }
  ! { dg-options -O2 }
  ! { dg-additional-options -msse2 { target sse2_runtime } }
  ! { dg-additional-options -mavx { target avx_runtime } }
 Index: testsuite/libgomp.c/examples-4/e.53.5.c
 ===
 --- testsuite/libgomp.c/examples-4/e.53.5.c   (revision 217718)
 +++ testsuite/libgomp.c/examples-4/e.53.5.c   (working copy)
 @@ -1,4 +1,4 @@
 -/* { dg-do run } */
 +/* { dg-do run { target vect_simd_clones } } */
  /* { dg-options -O2 } */
  /* { dg-additional-options -msse2 { target sse2_runtime } } */
  /* { dg-additional-options -mavx { target avx_runtime } } */


Jakub


[PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options

2014-11-19 Thread David Malcolm
Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by
plugging these leaks allocated at opts.c lines 286 and 289:

==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of 917
==57820==at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) 
(opts.c:286)
==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)
==57820==
==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of 917
==57820==at 0x4A081D4: calloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6780: xcalloc (xmalloc.c:162)
==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) 
(opts.c:289)
==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* opts.c (finalize_options_struct): New.
* opts.h (finalize_options_struct): New.
* toplev.c (toplev::finalize): Call finalize_options_struct
on global_options and global_options_set.
---
 gcc/opts.c   | 8 
 gcc/opts.h   | 1 +
 gcc/toplev.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/gcc/opts.c b/gcc/opts.c
index d22882b..dabd3c6 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct 
gcc_options *opts_set)
   targetm_common.option_init_struct (opts);
 }
 
+/* Release any allocations owned by OPTS.  */
+
+void
+finalize_options_struct (struct gcc_options *opts)
+{
+  XDELETEVEC (opts-x_param_values);
+}
+
 /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
-Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT
to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language
diff --git a/gcc/opts.h b/gcc/opts.h
index f694082..c3ec942 100644
--- a/gcc/opts.h
+++ b/gcc/opts.h
@@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int 
argc,
 extern void init_options_once (void);
 extern void init_options_struct (struct gcc_options *opts,
 struct gcc_options *opts_set);
+extern void finalize_options_struct (struct gcc_options *opts);
 extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
  const char **argv, 
  struct 
cl_decoded_option **decoded_options,
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 2e48047..4b4e568 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2169,4 +2169,7 @@ toplev::finalize (void)
   ipa_cp_c_finalize ();
   ipa_reference_c_finalize ();
   params_c_finalize ();
+
+  finalize_options_struct (global_options);
+  finalize_options_struct (global_options_set);
 }
-- 
1.8.5.3



[PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack

2014-11-19 Thread David Malcolm
Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by
plugging this leak allocated at reginfo.c:1327:
  gcc_obstack_init (valid_mode_changes_obstack);

==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of 917
==57820==at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
==57820==by 0x30958842DB: _obstack_begin (obstack.c:184)
==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327)
==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181)
==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211)
==57820==by 0x50D1326: ira_build() (ira-build.c:3459)
==57820==by 0x50C909C: (anonymous namespace)::pass_ira::execute(function*) 
(ira.c:5227)
==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269)
==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321)
==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322)
==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) 
(passes.c:2332)

gcc/ChangeLog:
PR jit/63854
* reginfo.c (finish_subregs_of_mode): Replace obstack_finish with
obstack_free when cleaning up valid_mode_changes_obstack.
---
 gcc/reginfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index efe23cd..c2daf22 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1343,7 +1343,7 @@ void
 finish_subregs_of_mode (void)
 {
   XDELETEVEC (valid_mode_changes);
-  obstack_finish (valid_mode_changes_obstack);
+  obstack_free (valid_mode_changes_obstack, NULL);
 }
 
 /* Free all data attached to the structure.  This isn't a destructor because
-- 
1.8.5.3



[PATCH 06/21] PR jit/63854: Fix leak of opts_obstack

2014-11-19 Thread David Malcolm
This was leaking about 4KB per iteration:

16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C17: xmalloc (xmalloc.c:147)
   by 0x30958842DB: _obstack_begin (obstack.c:184)
   by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279)
   by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081)
   by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* toplev.c (toplev::finalize): Free opts_obstack.
---
 gcc/toplev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 291b84d..0d617c2 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2178,4 +2178,6 @@ toplev::finalize (void)
   /* Clean up the context (and pass_manager etc). */
   delete g;
   g = NULL;
+
+  obstack_free (opts_obstack, NULL);
 }
-- 
1.8.5.3



[PATCH 03/21] PR jit/63854: Fix memory leaks within context/pass_manager/dump_manager

2014-11-19 Thread David Malcolm
We weren't cleaning up the context, pass_manager or dump_manager.

An earlier version of the context and pass_manager classes had them
allocated in the GC-heap, but they were moved to the system heap
before that patch was committed; they were never cleaned up, so e.g.
all passes leak on each in-process iteration.

Cleaning all of this up fixes the largest of the leaks in the PR,
about 60KB per iteration:

==57820== 80,560 (88 direct, 80,472 indirect) bytes in 1 blocks are definitely 
lost in loss record 913 of 917
==57820==at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x536A462: make_pass_dce(gcc::context*) (tree-ssa-dce.c:1703)
==57820==by 0x518B8DA: gcc::pass_manager::pass_manager(gcc::context*) 
(pass-instances.def:173)
==57820==by 0x4E8D6D9: gcc::context::context() (context.c:37)
==57820==by 0x4E2ED19: toplev::main(int, char**) (toplev.c:1211)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)
==57820==
==57820== 161,488 (352 direct, 161,136 indirect) bytes in 4 blocks are 
definitely lost in loss record 917 of 917
==57820==at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x5553102: make_pass_insert_vzeroupper(gcc::context*) 
(i386.c:2581)
==57820==by 0x55571DF: ix86_option_override() (i386.c:4325)
==57820==by 0x4E2EEF4: toplev::main(int, char**) (toplev.c:1295)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

All of this additional cleanup happens within toplev::finalize, and
hence we don't call it from cc1, cc1plus, etc, only from libgccjit.so.

Note that some calls to register_pass_info were using a static:
  static struct register_pass_info
on the stack e.g.:

 opt_pass *pass_handle_trap_shadows = make_pass_handle_trap_shadows (g);
 static struct register_pass_info handle_trap_shadows_info
   = { pass_handle_trap_shadows, eh_ranges,
   1, PASS_POS_INSERT_AFTER
 };

This led to crashes on the 2nd iteration: the on-stack struct was only
being built on the first iteration, using the result of the first call
to the make_pass_ function.  Subsequent calls would thus be
registering a freed pass, giving a use-after-free error (which was
previously hidden because we weren't freeing them).

The fix is to remove the static from such structs.

This also fixes a leak of strings within dump_file_info for passes.

gcc/ChangeLog:
PR jit/63854
* config/alpha/alpha.c (alpha_option_override): Remove static from
handle_trap_shadows_info and align_insns_info.
* config/i386/i386.c (ix86_option_override): Likewise for
insert_vzeroupper_info.
* config/rl78/rl78.c (rl78_asm_file_start): Likewise for
rl78_devirt_info and rl78_move_elim_info.
* config/rs6000/rs6000.c (rs6000_option_override): Likewise for
analyze_swaps_info.
* context.c (gcc::context::~context): New.
* context.h (gcc::context::~context): New.
* dumpfile.c (dump_files): Add false initializers for new field
owns_strings.
(gcc::dump_manager::~dump_manager): New.
(gcc::dump_manager::dump_register): Add param take_ownership.
* dumpfile.h (struct dump_file_info): Add field owns_strings.
(gcc::dump_manager::~dump_manager): New.
(gcc::dump_manager::dump_register): Add param take_ownership.
* pass_manager.h (gcc::pass_manager::operator delete): New.
(gcc::pass_manager::~pass_manager): New.
* passes.c (pass_manager::register_one_dump_file): Pass true to
new owns_strings argument to dump_register.
(pass_manager::operator delete): New.
(delete_pass_tree): New function.
(pass_manager::~pass_manager): New.
* statistics.c (statistics_early_init): Pass false to
new owns_strings argument to dump_register.
* toplev.c (toplev::finalize): Clean up the context and thus the
things it owns.
---
 gcc/config/alpha/alpha.c   |  4 ++--
 gcc/config/i386/i386.c |  2 +-
 gcc/config/rl78/rl78.c |  4 ++--
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/context.c  |  6 ++
 gcc/context.h  |  1 +
 gcc/dumpfile.c | 47 ++
 gcc/dumpfile.h | 11 ++-
 gcc/pass_manager.h |  2 ++
 gcc/passes.c   | 39 +-
 gcc/statistics.c   |  3 ++-
 gcc/toplev.c   |  4 
 12 files 

[PATCH 00/21] PR 63854: Fix various memory leaks

2014-11-19 Thread David Malcolm
Running the jit testsuite under valgrind showed various memory leaks.

Some are per-invocation of the compiler, and hence only affect
libgccjit.so (although presumably it's useful to cc1 etc to get
valgrind clean).

Others appear to be per-function and hence affect the non-JIT use
cases.

The following patch kit fixes most of the leaks.

It also adds a contrib/valgrind.supp suppressions file, for sparseset
(though I see now that gcc/configure.ac can detect if
valgrind/memcheck.h is available and use it if
  --enable-valgrind-annotations
Should I instead require that option when doing such testing?).

Successfully bootstrappedregrtested the cumulative effect of the
patches on x86_64-unknown-linux-gnu (Fedora 20).

Are the non-JIT parts OK for trunk?
(Presumably I can give myself approval for the JIT parts)

David Malcolm (21):
  PR jit/63854: Fix memory leak within gcc_options
  PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack
  PR jit/63854: Fix memory leaks within
context/pass_manager/dump_manager
  PR jit/63854: Fix memory leak within bb-reorder.c
  PR jit/63854: Fix memory leak of save_decoded_options
  PR jit/63854: Fix leak of opts_obstack
  PR jit/63854: Fix leak of optimization_summary_obstack
  PR jit/63854: Add ira_costs_c_finalize
  PR jit/63854: Don't leak producer_string in dwarf2out.c
  PR jit/63854: Fix leak of worklist within jit-recording.c
  PR jit/63854: Fix leak of avail within tree-ssa-pre.c
  PR jit/63854: Add a valgrind suppresion file
  PR jit/63854: Add support for running make check-jit under valgrind
  PR jit/63854: Fix leak of paths within jump threading
  PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling
lra_inheritance
  PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS
  PR jit/63854: Fix leaking vec in jit
  PR jit/63854: Add long-term allocator to gcc::context
  PR jit/63854: Fix leak of ipa hooks
  PR jit/63854: Fix leak in ipa-icf.c
  PR jit/63854: Fix leaks in test-fuzzer.c

 contrib/valgrind.supp  | 11 ++
 gcc/bb-reorder.c   |  2 +-
 gcc/config/alpha/alpha.c   |  4 +-
 gcc/config/i386/i386.c |  2 +-
 gcc/config/rl78/rl78.c |  4 +-
 gcc/config/rs6000/rs6000.c |  2 +-
 gcc/context.c  | 24 
 gcc/context.h  | 47 +++
 gcc/dumpfile.c | 47 +--
 gcc/dumpfile.h | 11 +-
 gcc/dwarf2out.c|  2 +
 gcc/ipa-icf.c  |  1 +
 gcc/ipa-prop.c |  3 +-
 gcc/ipa-reference.c| 17 +++-
 gcc/ira-costs.c|  6 +++
 gcc/ira.h  |  3 ++
 gcc/jit/jit-playback.c | 73 +--
 gcc/jit/jit-playback.h | 27 -
 gcc/jit/jit-recording.c| 19 +
 gcc/jit/jit-recording.h| 26 ++---
 gcc/lra.c  |  1 +
 gcc/opts.c |  8 
 gcc/opts.h |  1 +
 gcc/pass_manager.h |  3 ++
 gcc/passes.c   | 39 ++-
 gcc/reginfo.c  |  2 +-
 gcc/statistics.c   |  3 +-
 gcc/testsuite/jit.dg/jit.exp   | 79 +-
 gcc/testsuite/jit.dg/test-fuzzer.c |  6 +++
 gcc/toplev.c   | 20 --
 gcc/tree-ssa-pre.c |  2 +-
 gcc/tree-ssa-threadedge.c  |  1 +
 gcc/tree-ssa-threadupdate.c|  1 +
 gcc/tree.c | 17 +---
 34 files changed, 434 insertions(+), 80 deletions(-)
 create mode 100644 contrib/valgrind.supp

-- 
1.8.5.3



[PATCH 10/21] PR jit/63854: Fix leak of worklist within jit-recording.c

2014-11-19 Thread David Malcolm
Fix this leak:

160 bytes in 5 blocks are definitely lost in loss record 154 of 228
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D4F: xrealloc (xmalloc.c:177)
   by 0x4DE1710: void 
va_heap::reservegcc::jit::recording::block*(vecgcc::jit::recording::block*, 
va_heap, vl_embed*, unsigned int, bool) (vec.h:310)
   by 0x4DDFAB5: vecgcc::jit::recording::block*, va_heap, 
vl_ptr::reserve(unsigned int, bool) (vec.h:1428)
   by 0x4DDFBFC: vecgcc::jit::recording::block*, va_heap, 
vl_ptr::reserve_exact(unsigned int) (vec.h:1448)
   by 0x4DDE588: vecgcc::jit::recording::block*, va_heap, 
vl_ptr::create(unsigned int) (vec.h:1463)
   by 0x4DD9B9F: gcc::jit::recording::function::validate() 
(jit-recording.c:2191)
   by 0x4DD7AD3: gcc::jit::recording::context::validate() (jit-recording.c:1005)
   by 0x4DD7660: gcc::jit::recording::context::compile() (jit-recording.c:848)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/jit/ChangeLog:
PR jit/63854
* jit-recording.c (recording::function::validate): Convert
worklist from vec to autovec to fix a leak.
---
 gcc/jit/jit-recording.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
index 8daa8f2..8cce277 100644
--- a/gcc/jit/jit-recording.c
+++ b/gcc/jit/jit-recording.c
@@ -2187,8 +2187,7 @@ recording::function::validate ()
 {
   /* Iteratively walk the graph of blocks, marking their m_is_reachable
 flag, starting at the initial block.  */
-  vecblock * worklist;
-  worklist.create (m_blocks.length ());
+  auto_vecblock * worklist (m_blocks.length ());
   worklist.safe_push (m_blocks[0]);
   while (worklist.length ()  0)
{
-- 
1.8.5.3



[PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize

2014-11-19 Thread David Malcolm
this_target_ira_int-free_ira_costs () is called by ira_init_costs,
but this isn't called after the compile, causing noise when running
under valgrind.

This patch adds a ira_costs_c_finalize function to clean this up,
and calls it from toplev::finalize (and hence this isn't called by
cc1/cc1plus/etc, just by libgccjit.so).

Doing so eliminates about 27KB of leak from the valgrind report.

gcc/ChangeLog:
PR jit/63854
* ira-costs.c (ira_costs_c_finalize): New function.
* ira.h (ira_costs_c_finalize): New prototype.
* toplev.c (toplev::finalize): Call ira_costs_c_finalize.
---
 gcc/ira-costs.c | 6 ++
 gcc/ira.h   | 3 +++
 gcc/toplev.c| 1 +
 3 files changed, 10 insertions(+)

diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 122815b..2dabead 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost)
   else
 regno_equiv_gains[regno] += cost;
 }
+
+void
+ira_costs_c_finalize (void)
+{
+  this_target_ira_int-free_ira_costs ();
+}
diff --git a/gcc/ira.h b/gcc/ira.h
index b294ea1..d62656c 100644
--- a/gcc/ira.h
+++ b/gcc/ira.h
@@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx);
 
 extern void ira_adjust_equiv_reg_cost (unsigned, int);
 
+/* ira-costs.c */
+extern void ira_costs_c_finalize (void);
+
 #endif /* GCC_IRA_H */
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 0d617c2..0181a3f 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2168,6 +2168,7 @@ toplev::finalize (void)
   gcse_c_finalize ();
   ipa_cp_c_finalize ();
   ipa_reference_c_finalize ();
+  ira_costs_c_finalize ();
   params_c_finalize ();
 
   finalize_options_struct (global_options);
-- 
1.8.5.3



[PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack

2014-11-19 Thread David Malcolm
This was leaking ~4KB per iteration:

16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C17: xmalloc (xmalloc.c:147)
   by 0x30958842DB: _obstack_begin (obstack.c:184)
   by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338)
   by 0x5CA3C1D: ipa_init() (ipa-reference.c:431)
   by 0x5CA3FB1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4650: propagate() (ipa-reference.c:684)
   by 0x5CA5BEE: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E495C1: ipa_passes() (cgraphunit.c:2088)
   by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172)

gcc/ChangeLog:
PR jit/63854
* ipa-reference.c (ipa_reference_c_finalize): Release
optimization_summary_obstack.
---
 gcc/ipa-reference.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index b421f63..1ce06d1 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt)
 void
 ipa_reference_c_finalize (void)
 {
-  ipa_init_p = false;
+  if (ipa_init_p)
+{
+  bitmap_obstack_release (optimization_summary_obstack);
+  ipa_init_p = false;
+}
 }
-- 
1.8.5.3



[PATCH 11/21] PR jit/63854: Fix leak of avail within tree-ssa-pre.c

2014-11-19 Thread David Malcolm
Fix this leak:

120 bytes in 5 blocks are definitely lost in loss record 141 of 227
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D8F: xrealloc (xmalloc.c:177)
   by 0x550DEBA: void va_heap::reservepre_expr_d*(vecpre_expr_d*, va_heap, 
vl_embed*, unsigned int, bool) (vec.h:310)
   by 0x550D509: vecpre_expr_d*, va_heap, vl_ptr::reserve(unsigned int, bool) 
(vec.h:1428)
   by 0x550DA3E: vecpre_expr_d*, va_heap, vl_ptr::reserve_exact(unsigned int) 
(vec.h:1448)
   by 0x550D167: vecpre_expr_d*, va_heap, vl_ptr::safe_grow(unsigned int) 
(vec.h:1576)
   by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) 
(tree-ssa-pre.c:3209)
   by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526)
   by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
   by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
   by 0x55084C0: insert() (tree-ssa-pre.c:3559)
   by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) 
(tree-ssa-pre.c:4805)

gcc/ChangeLog:
PR jit/63854
* tree-ssa-pre.c (do_regular_insertion): Convert avail from
vec to auto_vec to fix a leak.
---
 gcc/tree-ssa-pre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index ea99198..027dc1c 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block dom)
   bool new_stuff = false;
   vecpre_expr exprs;
   pre_expr expr;
-  vecpre_expr avail = vNULL;
+  auto_vecpre_expr avail;
   int i;
 
   exprs = sorted_array_from_bitmap_set (ANTIC_IN (block));
-- 
1.8.5.3



[PATCH 12/21] PR jit/63854: Add a valgrind suppresion file

2014-11-19 Thread David Malcolm
Valgrind complains about uninitialized data within sparseset_bit_p.
Provide a suppression file to silence these warnings.

Valgrind requires suppression files for C++ code to use the mangled
names, so we do that here.

contrib/ChangeLog:
PR jit/63854
* valgrind.supp: New.
---
 contrib/valgrind.supp | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 contrib/valgrind.supp

diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp
new file mode 100644
index 000..ec9ff02
--- /dev/null
+++ b/contrib/valgrind.supp
@@ -0,0 +1,11 @@
+{
+   suppress-uninit-cond-with-sparseset_bit_p
+   Memcheck:Cond
+   fun:_ZL15sparseset_bit_pP13sparseset_defm
+}
+
+{
+   suppress-uninit-use-with-sparseset_bit_p
+   Memcheck:Value8
+   fun:_ZL15sparseset_bit_pP13sparseset_defm
+}
-- 
1.8.5.3



[PATCH 13/21] PR jit/63854: Add support for running make check-jit under valgrind

2014-11-19 Thread David Malcolm
This commit updates jit.exp so that if RUN_UNDER_VALGRIND is present
in the environment, all of the built client code using libgccjit.so is
run under valgrind, with --leak-check=full.

Hence:
  RUN_UNDER_VALGRIND= make check-jit
will run all jit testcases under valgrind (taking 27 mins on my
machine).

Results are written to testsuite/jit/test-FOO.exe.valgrind.txt

jit.exp automatically parses these result file, looking for lines of
the form
  definitely lost: 11,316 bytes in 235 blocks
  indirectly lost: 352 bytes in 4 blocks
in the valgrind log's summary footer, adding PASSes if they are zero
bytes, and, for now generating XFAILs for non-zero bytes.

Sadly this diverges jit.exp's fixed_host_execute further from DejaGnu's
host_execute, but I don't see a clean way to fix that.

This currently adds 63 PASSes and 49 XFAILs to jit.sum, giving:
  # of expected passes   2481
  # of expected failures 49

gcc/testsuite/ChangeLog:
PR jit/63854
* jit.dg/jit.exp (report_leak): New.
(parse_valgrind_logfile): New.
(fixed_host_execute): Detect if RUN_UNDER_VALGRIND is present
in the environment, and if so, run the executable under
valgrind, capturing valgrind's output to a logfile.  Parse the
log file, generating PASSes and XFAILs for the summary of leaks.
---
 gcc/testsuite/jit.dg/jit.exp | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/jit.dg/jit.exp b/gcc/testsuite/jit.dg/jit.exp
index 531e929..26ab127 100644
--- a/gcc/testsuite/jit.dg/jit.exp
+++ b/gcc/testsuite/jit.dg/jit.exp
@@ -23,6 +23,48 @@ load_lib target-libpath.exp
 load_lib gcc.exp
 load_lib dejagnu.exp
 
+# Look for lines of the form:
+#   definitely lost: 11,316 bytes in 235 blocks
+#   indirectly lost: 352 bytes in 4 blocks
+# Ideally these would report zero bytes lost (which is a PASS);
+# for now, report non-zero leaks as XFAILs.
+proc report_leak {kind name logfile line} {
+set match [regexp $kind lost: .* $line result]
+if $match {
+   verbose Saw \$result\ within \$line\ 3
+   # Extract bytes and blocks.
+   # These can contain commas as well as numerals,
+   # but we only care about whether we have zero.
+   regexp $kind lost: (.+) bytes in (.+) blocks \
+   $result - bytes blocks
+   verbose bytes: '$bytes' 2
+   verbose blocks: '$blocks' 2
+   if { $bytes == 0 } {
+   pass $name: $logfile: $result
+   } else {
+   xfail $name: $logfile: $result
+   }
+}
+}
+
+proc parse_valgrind_logfile {name logfile} {
+verbose parse_valgrind_logfile: $logfile 2
+if [catch {set f [open $logfile]}] {
+   fail $name: unable to read $logfile
+   return
+}
+
+while { [gets $f line] = 0 } {
+   # Strip off the PID prefix e.g. ==7675==
+   set line [regsub ==\[0-9\]*==  $line ]
+   verbose $line 2
+
+   report_leak definitely $name $logfile $line
+   report_leak indirectly $name $logfile $line
+}
+close $f
+}
+
 # This is host_execute from dejagnu.exp commit
 #   126a089777158a7891ff975473939f08c0e31a1c
 # with the following patch applied, and renaming to fixed_host_execute.
@@ -49,8 +91,11 @@ load_lib dejagnu.exp
 #  if there was a problem.
 #
 proc fixed_host_execute {args} {
+global env
 global text
 global spawn_id
+global srcdir
+verbose srcdir: $srcdir 2
 
 set timeoutmsg Timed out: Never got started, 
 set timeout 100
@@ -75,7 +120,28 @@ proc fixed_host_execute {args} {
 # spawn the executable and look for the DejaGnu output messages from the
 # test case.
 # spawn -noecho -open [open |./${executable} r]
-spawn -noecho ./${executable} ${params}
+
+set run_under_valgrind [info exists env(RUN_UNDER_VALGRIND)]
+
+if $run_under_valgrind {
+   set valgrind_logfile ${executable}.valgrind.txt
+   set valgrind_params {valgrind}
+   lappend valgrind_params --leak-check=full
+   # srcdir is within gcc/testsuite; locate contrib relative to it:
+   lappend valgrind_params 
--suppressions=${srcdir}/../../contrib/valgrind.supp
+   lappend valgrind_params --log-file=${valgrind_logfile}
+} else {
+   set valgrind_params {}
+}
+verbose valgrind_params: $valgrind_params 2
+
+set args ${valgrind_params}
+lappend args ./${executable}
+set args [concat $args ${params}]
+verbose args: $args 2
+
+eval spawn -noecho $args
+
 expect_after full_buffer { error got full_buffer }
 
 set prefix \[^\r\n\]*
@@ -147,6 +213,17 @@ proc fixed_host_execute {args} {
 # force a close of the executable to be safe.
 catch close
 
+# valgrind might not have finished writing the log out
+# before we parse it; need to wait for spawnee to
+# finish.
+catch wait reason
+verbose reason: $reason 2
+
+if $run_under_valgrind {
+   upvar 2 name name
+   parse_valgrind_logfile $name 

[PATCH 15/21] PR jit/63854: lra.c: Fix leak of point_freq_vec's buffer when calling lra_inheritance

2014-11-19 Thread David Malcolm
Valgrind showed a leak of 1640 bytes per iteration of one of the jit
testcases (I think this is per-function in a compile):

8,200 bytes in 5 blocks are definitely lost in loss record 214 of 223
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75D1F: xrealloc (xmalloc.c:177)
   by 0x4E0C94E: void va_heap::reserveint(vecint, va_heap, vl_embed*, 
unsigned int, bool) (vec.h:310)
   by 0x4E0C7A3: vecint, va_heap, vl_ptr::reserve(unsigned int, bool) 
(vec.h:1428)
   by 0x4E0C55A: vecint, va_heap, vl_ptr::reserve_exact(unsigned int) 
(vec.h:1448)
   by 0x4E0C2C6: vecint, va_heap, vl_ptr::create(unsigned int) (vec.h:1463)
   by 0x519B316: lra_create_live_ranges(bool) (lra-lives.c:1248)
   by 0x5179BD7: lra(_IO_FILE*) (lra.c:2297)
   by 0x5126C4E: do_reload() (ira.c:5380)
   by 0x5127098: (anonymous namespace)::pass_reload::execute(function*) 
(ira.c:5550)
   by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)

which is this line:
  point_freq_vec.create (get_max_uid () * 2);

point_freq_vec's buffer is released in lra_clear_live_ranges.

Am unfamiliar with LRA, but my reading of the lra code is that the
live_p boolean tracks whether or not we need to call
lra_clear_live_ranges.

Debugging calls to the above line and calls to lra_clear_live_ranges
showed a mismatch; the call to lra_create_live_ranges before
lra_inheritance wasn't setting live_p.

Setting it after that callsite fixes the leak (though perhaps the code
could be cleaned up by having live_p be set/cleared by the
allocation/clear functions themselves, rather than by their callers?
Vladimir?)

gcc/ChangeLog:
PR jit/63854
* lra.c (lra): After creating live ranges in preparation for call
to lra_inheritance, set live_p to true.
---
 gcc/lra.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lra.c b/gcc/lra.c
index 9309d5e..ec122c7 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -2297,6 +2297,7 @@ lra (FILE *f)
 actual_call_used_reg_set,  which is needed during
 lra_inheritance.  */
  lra_create_live_ranges (true, true);
+ live_p = true;
}
  lra_inheritance ();
}
-- 
1.8.5.3



[PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c

2014-11-19 Thread David Malcolm
Valgrind showed this leaking 200 bytes per iteration in one of
my testcases:

1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241
   at 0x4A083AA: realloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75C5C: xrealloc (xmalloc.c:179)
   by 0x4E10734: void va_heap::reservebasic_block_def*(vecbasic_block_def*, 
va_heap, vl_embed*, unsigned int, bool) (vec.h:310
)
   by 0x4E105BB: vecbasic_block_def*, va_heap, vl_ptr::reserve(unsigned int, 
bool) (vec.h:1428)
   by 0x4E15B37: vecbasic_block_def*, va_heap, 
vl_ptr::safe_push(basic_block_def* const) (vec.h:1537)
   by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() 
(bb-reorder.c:1614)
   by 0x5B63E90: (anonymous 
namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711)
   by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358)
   by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359)
   by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369)
   by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773)

Fix is trivial.

gcc/ChangeLog:
PR jit/63854
* bb-reorder.c
(find_rarely_executed_basic_blocks_and_crossing_edges): Convert
local bbs_in_hot_partition from vec to auto_vec.
---
 gcc/bb-reorder.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 1f7c3ee..4ca3bb2 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges 
(void)
   edge e;
   edge_iterator ei;
   unsigned int cold_bb_count = 0;
-  vecbasic_block bbs_in_hot_partition = vNULL;
+  auto_vecbasic_block bbs_in_hot_partition;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB_FN (bb, cfun)
-- 
1.8.5.3



[PATCH 17/21] PR jit/63854: Fix leaking vec in jit

2014-11-19 Thread David Malcolm
This fixes various leaks of vec buffers seen via valgrind within jit
(both recording and playback).

Various vec within jit::recording are converted to auto_vec.

Various playback::wrapper subclasses containing vec gain a finalizer
so they can release the vec when they are collected.

gcc/jit/ChangeLog:
PR jit/63854
* jit-playback.c (gcc::jit::playback::compound_type::set_fields):
Convert param from const vecplayback::field *  to
const auto_vecplayback::field * *.
(gcc::jit::playback::context::new_function_type): Convert param
param_types from vectype * * to const auto_vectype * *.
(gcc::jit::playback::context::new_function): Convert param
params from vecparam * * to const auto_vecparam * *.
(gcc::jit::playback::context::build_call): Convert param args
from vecrvalue * to const auto_vecrvalue * *.
(gcc::jit::playback::context::new_call): Likewise.
(gcc::jit::playback::context::new_call_through_ptr): Likewise.
(wrapper_finalizer): New function.
(gcc::jit::playback::wrapper::operator new): Call the finalizer
variant of ggc_internal_cleared_alloc, supplying
wrapper_finalizer.
(gcc::jit::playback::function::finalizer): New.
(gcc::jit::playback::block::finalizer): New.
(gcc::jit::playback::source_file::finalizer): New.
(gcc::jit::playback::source_line::finalizer): New.

* jit-playback.h
(gcc::jit::playback::context::new_function_type): Convert param
param_types from vectype * * to const auto_vectype * *.
(gcc::jit::playback::context::new_function): Convert param
params from vecparam * * to const auto_vecparam * *.
(gcc::jit::playback::context::new_call): Convert param
args from vecrvalue * to const auto_vecrvalue * *.
(gcc::jit::playback::context::new_call_through_ptr): Likewise.
(gcc::jit::playback::context::build_call): Likewise.
(gcc::jit::playback::context): Convert fields m_functions,
m_source_files, m_cached_locations from vec to auto_vec.
(gcc::jit::playback::wrapper::finalizer): New virtual function.
(gcc::jit::playback::compound_type::set_fields): Convert param fro
const vecplayback::field *  to
const auto_vecplayback::field * *.
(gcc::jit::playback::function::finalizer): New.
(gcc::jit::playback::block::finalizer): New.
(gcc::jit::playback::source_file::finalizer): New.
(gcc::jit::playback::source_line::finalizer): New.

* jit-recording.c
(gcc::jit::recording::function_type::replay_into): Convert local
from a vec into an auto_vec.
(gcc::jit::recording::fields::replay_into): Likewise.
(gcc::jit::recording::function::replay_into): Likewise.
(gcc::jit::recording::call::replay_into): Likewise.
(gcc::jit::recording::call_through_ptr::replay_into): Likewise.

* jit-recording.h (gcc::jit::recording::context): Convert fields
m_mementos, m_compound_types, m_functions from vec to
auto_vec .
(gcc::jit::recording::function_type::get_param_types): Convert
return type from vectype * to const vectype * .
(gcc::jit::recording::function_type): Convert field
m_param_types from a vec to an auto_vec.
(gcc::jit::recording::fields): Likewise for field m_fields.
(gcc::jit::recording::function::get_params): Convert return type
from vec param * to const vecparam * .
(gcc::jit::recording::function): Convert fields m_params,
m_locals, m_blocks from vec to auto_vec.
(gcc::jit::recording::block): Likewise for field m_statements.
vec to auto_vec.
(gcc::jit::recording::call): Likewise for field m_args.
(gcc::jit::recording::call_through_ptr): Likewise.
---
 gcc/jit/jit-playback.c  | 73 +
 gcc/jit/jit-playback.h  | 27 --
 gcc/jit/jit-recording.c | 16 +--
 gcc/jit/jit-recording.h | 26 +-
 4 files changed, 100 insertions(+), 42 deletions(-)

diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
index 285a3ef..8fdfa29 100644
--- a/gcc/jit/jit-playback.c
+++ b/gcc/jit/jit-playback.c
@@ -285,15 +285,15 @@ new_compound_type (location *loc,
 }
 
 void
-playback::compound_type::set_fields (const vecplayback::field * fields)
+playback::compound_type::set_fields (const auto_vecplayback::field * *fields)
 {
   /* Compare with c/c-decl.c: finish_struct. */
   tree t = as_tree ();
 
   tree fieldlist = NULL;
-  for (unsigned i = 0; i  fields.length (); i++)
+  for (unsigned i = 0; i  fields-length (); i++)
 {
-  field *f = fields[i];
+  field *f = (*fields)[i];
   DECL_CONTEXT (f-as_tree ()) = t;
   fieldlist = chainon (f-as_tree (), fieldlist);
 }
@@ -309,7 +309,7 @@ playback::compound_type::set_fields (const 
vecplayback::field * fields)
 

[PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS

2014-11-19 Thread David Malcolm
Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably
pass_omp_simd_clone):

704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss 
record 198 of 241
   at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x555A55D: make_pass_ipa_pta(gcc::context*) (tree-ssa-structalias.c:7425)
   by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) 
(pass-instances.def:141)
   by 0x4E4F5D1: gcc::context::context() (context.c:37)
   by 0x532C07B: general_init(char const*) (toplev.c:1212)
   by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)
   by 0x401EA8: main (harness.h:232)

Investigation showed that ~pass_manager wasn't deleting these
passes since for some reason GCC_PASS_LISTS didn't contain
all_late_ipa_passes and so wasn't calling delete_pass_tree on it.

Add it to GCC_PASS_LISTS, fixing the leak.

gcc/ChangeLog:
PR jit/63854
* pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes.
---
 gcc/pass_manager.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
index 82857a9..e6658ad 100644
--- a/gcc/pass_manager.h
+++ b/gcc/pass_manager.h
@@ -29,6 +29,7 @@ struct register_pass_info;
   DEF_PASS_LIST (all_lowering_passes) \
   DEF_PASS_LIST (all_small_ipa_passes) \
   DEF_PASS_LIST (all_regular_ipa_passes) \
+  DEF_PASS_LIST (all_late_ipa_passes) \
   DEF_PASS_LIST (all_passes)
 
 #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST,
-- 
1.8.5.3



[PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c

2014-11-19 Thread David Malcolm
ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here:

80 bytes in 5 blocks are definitely lost in loss record 94 of 199
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D764B7: xmalloc (xmalloc.c:147)
   by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, 
ipa_icf::sem_item_type) (ipa-icf.c:1503)
   by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() 
(ipa-icf.c:1723)
   by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616)
   by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373)
   by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421)
   by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

by not freeing the allocation here:

  congruence_class_group *item = XNEW (congruence_class_group);

in sem_item_optimizer::get_group_by_hash.

Fix it.

gcc/ChangeLog:
PR jit/63854
* ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each
congruence_class_group *.
---
 gcc/ipa-icf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 2d5fcf5..b0ba82e 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer ()
delete (*it)-classes[i];
 
   (*it)-classes.release ();
+  free (*it);
 }
 
   m_items.release ();
-- 
1.8.5.3



[PATCH 19/21] PR jit/63854: Fix leak of ipa hooks

2014-11-19 Thread David Malcolm
This fixes three leaks in IPA seen in jit testcases with valgrind:

This one:
96 bytes in 4 blocks are definitely lost in loss record 102 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35C23: symbol_table::add_cgraph_insertion_hook(void (*)(cgraph_node*, 
void*), void*) (cgraph.c:383)
   by 0x51070C6: ipa_register_cgraph_hooks() (ipa-prop.c:3864)
   by 0x5C753D8: ipcp_generate_summary() (ipa-cp.c:3786)
   by 0x5223540: execute_ipa_summary_passes(ipa_opt_pass_d*) (passes.c:2102)
   by 0x4E49A60: ipa_passes() (cgraphunit.c:2074)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)
   by 0x4DC999C: jit_langhook_write_globals() (dummy-frontend.c:216)
   by 0x532B3A6: compile_file() (toplev.c:583)
   by 0x532E15F: do_compile() (toplev.c:2020)

appears to be caused by
  ipa-prop.c (ipa_register_cgraph_hooks)
unconditionally inserting ipa_add_new_function as
function_insertion_hook_holder, rather than if the latter is
non-NULL, like the other hooks.

These two in ipa-reference.c:
96 bytes in 4 blocks are definitely lost in loss record 103 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35AA9: symbol_table::add_cgraph_removal_hook(void (*)(cgraph_node*, 
void*), void*) (cgraph.c:329)
   by 0x5CA446E: ipa_init() (ipa-reference.c:435)
   by 0x5CA47D1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4E70: propagate() (ipa-reference.c:684)
   by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

96 bytes in 4 blocks are definitely lost in loss record 104 of 205
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76447: xmalloc (xmalloc.c:147)
   by 0x4E35E29: symbol_table::add_cgraph_duplication_hook(void 
(*)(cgraph_node*, cgraph_node*, void*), void*) (cgraph.c:453)
   by 0x5CA4493: ipa_init() (ipa-reference.c:437)
   by 0x5CA47D1: generate_summary() (ipa-reference.c:551)
   by 0x5CA4E70: propagate() (ipa-reference.c:684)
   by 0x5CA640E: (anonymous namespace)::pass_ipa_reference::execute(function*) 
(ipa-reference.c:1178)
   by 0x5223CC1: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5224B9B: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
   by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
   by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
   by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

appear to be due to th hooks never being removed.

My patch hacks in a removal of them into ipa_reference_c_finalize, but
I suspect there's a cleaner place to put this.

gcc/ChangeLog:
PR jit/63854
* ipa-prop.c (ipa_register_cgraph_hooks): Guard insertion of
ipa_add_new_function on function_insertion_hook_holder being
non-NULL.
* ipa-reference.c (ipa_reference_c_finalize): Remove
node_removal_hook_holder and node_duplication_hook_holder if
they've been added to symtab.
* toplev.c (toplev::finalize): Call ipa_reference_c_finalize
before cgraph_c_finalize so that the former can access symtab.
---
 gcc/ipa-prop.c  |  3 ++-
 gcc/ipa-reference.c | 11 +++
 gcc/toplev.c|  4 +++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index f87243c..a8238ac 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3552,7 +3552,8 @@ ipa_register_cgraph_hooks (void)
   if (!node_duplication_hook_holder)
 node_duplication_hook_holder =
   symtab-add_cgraph_duplication_hook (ipa_node_duplication_hook, NULL);
-  function_insertion_hook_holder =
+  if (!function_insertion_hook_holder)
+function_insertion_hook_holder =
   symtab-add_cgraph_insertion_hook (ipa_add_new_function, NULL);
 }
 
diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index 1ce06d1..b046f9e 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -1198,4 +1198,15 @@ ipa_reference_c_finalize (void)
   bitmap_obstack_release (optimization_summary_obstack);
   ipa_init_p = false;
 }
+
+  if (node_removal_hook_holder)
+{
+  symtab-remove_cgraph_removal_hook (node_removal_hook_holder);
+  node_removal_hook_holder = NULL;
+}
+  if (node_duplication_hook_holder)
+{
+  symtab-remove_cgraph_duplication_hook (node_duplication_hook_holder);
+  node_duplication_hook_holder = NULL;
+}
 }
diff --git a/gcc/toplev.c b/gcc/toplev.c
index c0c418c..96e7af9 100644
--- a/gcc/toplev.c

[PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c

2014-11-19 Thread David Malcolm
Fix this small per-iteration leak with debuginfo:

424 bytes in 4 blocks are definitely lost in loss record 185 of 230
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75CA7: xmalloc (xmalloc.c:147)
   by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489)
   by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257)
   by 0x532AD3B: compile_file() (toplev.c:623)
   by 0x532D9E1: do_compile() (toplev.c:2020)
   by 0x532DC54: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* dwarf2out.c (dwarf2out_c_finalize): Free producer_string.
---
 gcc/dwarf2out.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index b16883f..9069f9a 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void)
   frame_pointer_fb_offset = 0;
   frame_pointer_fb_offset_valid = false;
   base_types.release ();
+  XDELETEVEC (producer_string);
+  producer_string = NULL;
 }
 
 #include gt-dwarf2out.h
-- 
1.8.5.3



[PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options

2014-11-19 Thread David Malcolm
This commit fixes this leak from opts-common.c, about 4KB per iteration.

==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are 
definitely lost in loss record 907 of 917
==57820==at 0x4A083AA: realloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==57820==by 0x59A67CC: xrealloc (xmalloc.c:179)
==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char 
const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885)
==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089)
==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
(jit-playback.c:1615)
==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
(jit-recording.c:861)
==57820==by 0x401CA4: test_jit (harness.h:190)
==57820==by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* toplev.c (toplev::finalize): Clean up save_decoded_options.
---
 gcc/toplev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 876279f..291b84d 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -2173,6 +2173,8 @@ toplev::finalize (void)
   finalize_options_struct (global_options);
   finalize_options_struct (global_options_set);
 
+  XDELETEVEC (save_decoded_options);
+
   /* Clean up the context (and pass_manager etc). */
   delete g;
   g = NULL;
-- 
1.8.5.3



[PATCH 18/21] PR jit/63854: Add long-term allocator to gcc::context

2014-11-19 Thread David Malcolm
Some places in the startup code use char * values that can sometimes be
string literals, and can sometimes be dynamically built using xstrdup or
concat.

This isn't a problem for cc1 etc since this is only called once, but
for libgccjit they are small per-invocation leaks.

There's no clean way to release them: retrofitting logic to decide if
we're dealing with a string literal vs a dynamically-allocated buffer
(and if something else is pointing to said buffer) is messy and
error-prone; they are also unconnected to the GC.

Hence the cleanest way to fix the leak is to add a new long-term
allocator, to gcc::context.  This gives back buffers that will persist
until the gcc::context is cleaned away.

Using this fixes these leaks seen via valgrind:

28 bytes in 4 blocks are definitely lost in loss record 13 of 209
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76417: xmalloc (xmalloc.c:147)
   by 0x5D76509: xstrdup (xstrdup.c:34)
   by 0x532CC38: process_options() (toplev.c:1316)
   by 0x532DFF4: do_compile() (toplev.c:1976)
   by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646)
   by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

108 bytes in 12 blocks are definitely lost in loss record 135 of 241
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75E07: xmalloc (xmalloc.c:147)
   by 0x5D7098B: concat (concat.c:147)
   by 0x563A342: build_common_builtin_nodes() (tree.c:10145)
   by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128)
   by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790)
   by 0x532D9C8: do_compile() (toplev.c:2007)
   by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)

108 bytes in 12 blocks are definitely lost in loss record 136 of 241
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D75E07: xmalloc (xmalloc.c:147)
   by 0x5D7098B: concat (concat.c:147)
   by 0x563A3B6: build_common_builtin_nodes() (tree.c:10151)
   by 0x4DC91FC: jit_langhook_init() (dummy-frontend.c:128)
   by 0x532D3F4: lang_dependent_init(char const*) (toplev.c:1790)
   by 0x532D9C8: do_compile() (toplev.c:2007)
   by 0x532DCAC: toplev::main(int, char**) (toplev.c:2118)
   by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
   by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401DC4: test_jit (harness.h:190)

136 bytes in 4 blocks are definitely lost in loss record 129 of 209
   at 0x4A0645D: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x5D76417: xmalloc (xmalloc.c:147)
   by 0x532BE28: init_asm_output(char const*) (toplev.c:936)
   by 0x532DB34: lang_dependent_init(char const*) (toplev.c:1795)
   by 0x532E0CC: do_compile() (toplev.c:2006)
   by 0x532E3B0: toplev::main(int, char**) (toplev.c:2117)
   by 0x4DE7983: gcc::jit::playback::context::compile() (jit-playback.c:1646)
   by 0x4DD791A: gcc::jit::recording::context::compile() (jit-recording.c:861)
   by 0x4DD5E12: gcc_jit_context_compile (libgccjit.c:2014)
   by 0x401CA4: test_jit (harness.h:190)
   by 0x401D88: main (harness.h:232)

gcc/ChangeLog:
PR jit/63854
* context.c (gcc::context::context): Initialize m_longterm_obstack.
(gcc::context::~context): Free m_longterm_obstack.
(gcc::context::longterm_xalloc): New method.
(gcc::context::longterm_xstrdup): New method.
* context.h: Include obstack.h
(gcc::context::longterm_xalloc): New method.
(gcc::context::longterm_xallocvecT): New method
(gcc::context::longterm_xstrdup): New method.
(gcc::context::m_longterm_obstack): New field.
* toplev.c (init_asm_output): Use g-longterm_xallocvec char
rather than XNEWVEC when allocating dumpname to avoid a leak.
(process_options): Likewise, use g-longterm_xstrdup rather than
xstrdup when allocating name (and hence aux_base_name).
* tree.c: Include context.h.
(build_common_builtin_nodes): When writing dynamically-allocated
entries into built_in_names, use g-longterm_xstrdup to take a
copy of the buffer acquired by concat to avoid a leak.
---
 gcc/context.c | 18 ++
 gcc/context.h | 46 ++
 gcc/toplev.c  |  4 ++--
 gcc/tree.c| 17 +++--
 4 files changed, 77 insertions(+), 8 

[PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading

2014-11-19 Thread David Malcolm
Paths are allocated as:
   vecjump_thread_edge * *path = new vecjump_thread_edge * ();
i.e. the vec itself is on the heap, so a mere:
  path-release ();
is merely releasing the buffer the vec holds, not the vec itself.

This patch updates the two sites that release paths to also delete them,
fixing leaks like this seen by valgrind:
160 bytes in 20 blocks are definitely lost in loss record 165 of 241
   at 0x4A06965: operator new(unsigned long) (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
   by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, 
vectree_node*, va_heap, vl_ptr*, tree_node
* (*)(gimple_statement_base*, gimple_statement_base*)) 
(tree-ssa-threadedge.c:1124)
   by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) 
(tree-ssa-dom.c:1184)
   by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) 
(tree-ssa-dom.c:2015)
   by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218)
   by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) 
(tree-ssa-dom.c:919)
   by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
   by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)
   by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359)
   by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369)
   by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773)
   by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909)

[Space-wise, a vec T, A, vl_ptr is just a pointer, hence the 8 bytes
per block seen in the valgrind report]

gcc/ChangeLog:
PR jit/63854
* tree-ssa-threadedge.c (thread_across_edge): Don't just release
path, delete it.
* tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise.
---
 gcc/tree-ssa-threadedge.c   | 1 +
 gcc/tree-ssa-threadupdate.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index d5b9696..9ee1cae 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond,
 through the vector entries.  */
   gcc_assert (path-length () == 0);
   path-release ();
+  delete path;
 
   /* A negative status indicates the target block was deemed too big to
 duplicate.  Just quit now rather than trying to use the block as
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 151ed83..5cb9f86 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2481,6 +2481,7 @@ delete_jump_thread_path (vecjump_thread_edge * *path)
   for (unsigned int i = 0; i  path-length (); i++)
 delete (*path)[i];
   path-release();
+  delete path;
 }
 
 /* Register a jump threading opportunity.  We queue up all the jump
-- 
1.8.5.3



[PATCH 21/21] PR jit/63854: Fix leaks in test-fuzzer.c

2014-11-19 Thread David Malcolm
gcc/testsuite/ChangeLog:
PR jit/63854
* jit.dg/test-fuzzer.c (fuzzer_init): Free malloced buffers.
(make_random_function): Free ff-locals.
---
 gcc/testsuite/jit.dg/test-fuzzer.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/testsuite/jit.dg/test-fuzzer.c 
b/gcc/testsuite/jit.dg/test-fuzzer.c
index f363f8f..b501792 100644
--- a/gcc/testsuite/jit.dg/test-fuzzer.c
+++ b/gcc/testsuite/jit.dg/test-fuzzer.c
@@ -105,6 +105,11 @@ fuzzer_init (fuzzer *f, gcc_jit_context *ctxt, unsigned 
int seed)
 
   for (i = 0; i  num_funcs; i++)
 f-funcs[f-num_funcs++] = make_random_function (f);
+
+  /* Now clean out f.  */
+  free (f-types);
+  free (f-funcs);
+  free (f-globals);
 }
 
 /* Get random int in inclusive range [min, max].  */
@@ -309,6 +314,7 @@ make_random_function (fuzzer *f)
 
   gcc_jit_function *result = ff-fn;
 
+  free (ff-locals);
   free (ff-params);
   free (ff);
 
-- 
1.8.5.3



Re: [PING][PATCH][AARCH64]Fix PR63424 by adding sumaxminv2di3 pattern

2014-11-19 Thread Renlin Li

On 10/11/14 16:55, Renlin Li wrote:

On 06/11/14 15:00, Renlin Li wrote:

Hi all,

Dose anybody have time to review this?

Kind regards,
Renlin Li

On 31/10/14 14:51, Renlin Li wrote:

Hi all,

This is a patch which will fix PR63424.

It implements signed/unsigned max/min pattern for V2DI mode in terms 
of vcondv2div2di pattern.


In this particular case, VEC_COND_EXPR (V2DImode) is generated as 
aarch64 target supports it (vcondmodemode for VALL). The 
VEC_COND_EXPR will further folded into MIN_EXPR/MAX_EXPR in dom pass 
unconditionally. Later in expand pass, the compiler tries to expand 
min_expr using standard RTL operation. It fails, because aarch64 
target don't have minv2di3 pattern implemented. It then tries to 
generate conditional move and comparebranch sequence, all fails. At 
last it falls into libfunc call, no luck either. An ICE to complain 
about this.


aarch64-none-elf toolchain has been tested on the model, no 
regressions.


Is it Okay for trunk?

gcc/ChangeLog:

2014-10-31  Renlin Li  renlin...@arm.com
PR target/63424
* config/aarch64/aarch64-simd.md (sumaxminv2di3): New.

gcc/testsuite/ChangeLog:

2014-10-31  Renlin Li  renlin...@arm.com
PR target/63424
* gcc.target/aarch64/pr63424.c: New.



Hi,

Dose anybody have time to review this?

Thank you so much!

Regards,
Renlin Li


Ping again.

Regards,
Renlin Li



[PATCH] Fix for fold_negate_expr (PR sanitizer/63879)

2014-11-19 Thread Marek Polacek
The problem here is that negate_expr_p returns true (as it should)
for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it,
and that is a no-no; if negate_expr_p is true, fold_negate_expr must
not return NULL_TREE.  I added the following hunk for bootstrap and
regtest for better coverage:

@@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t)
   break;
 }
 
+#ifdef ENABLE_CHECKING
+  if (negate_expr_p (t))
+internal_error (fold_negate_expr should never return NULL_TREE 
+   if negate_expr_p is true);
+#endif
+
   return NULL_TREE;
 }

Fixed by adjusting fold_negate_expr so that it folds non-trapping
wrapping constants.  The SANITIZE_SI_OVERFLOW check is important
to not regress -Woverflow warnings (ug).

negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr.
 
Bootstrapped/regtested on power8-linux, ok for trunk?

2014-11-19  Marek Polacek  pola...@redhat.com

PR sanitizer/63879
* fold-const.c (negate_expr_p) case NEGATE_EXPR: Return
!TYPE_OVERFLOW_SANITIZED.
(fold_negate_expr) case INTEGER_CST: Fold when overflow
does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW
is 0.

* c-c++-common/ubsan/pr63879-1.c: New test.
* c-c++-common/ubsan/pr63879-2.c: New test.

diff --git gcc/fold-const.c gcc/fold-const.c
index f6fb8af..719e06e 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -408,9 +408,11 @@ negate_expr_p (tree t)
   TYPE_OVERFLOW_WRAPS (type));
 
 case FIXED_CST:
-case NEGATE_EXPR:
   return true;
 
+case NEGATE_EXPR:
+  return !TYPE_OVERFLOW_SANITIZED (type);
+
 case REAL_CST:
   /* We want to canonicalize to positive real constants.  Pretend
  that only negative ones can be easily negated.  */
@@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t)
   tem = fold_negate_const (t, type);
   if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
  || (!TYPE_OVERFLOW_TRAPS (type)
-  (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0))
+  TYPE_OVERFLOW_WRAPS (type))
+ || (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
return tem;
   break;
 
diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c 
gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
index e69de29..2035849 100644
--- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
+++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
@@ -0,0 +1,23 @@
+/* PR sanitizer/63879 */
+/* { dg-do compile } */
+/* { dg-options -fsanitize=undefined } */
+
+struct A
+{
+  int inode;
+} * a;
+int b, c;
+void
+fn1 ()
+{
+  int d = 0;
+  while (b)
+{
+  if (a-inode)
+d++;
+  a = 0;
+}
+  c = d - 1;
+  for (; c = 0; c--)
+;
+}
diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c 
gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
index e69de29..34eb8e7 100644
--- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
+++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
@@ -0,0 +1,13 @@
+/* PR sanitizer/63879 */
+/* { dg-do compile } */
+/* { dg-options -fsanitize=undefined } */
+
+int a;
+void
+fn1 ()
+{
+  int b = 2;
+  for (; a;)
+while (b = 0)
+  b--;
+}

Marek


Re: [PATCH 01/21] PR jit/63854: Fix memory leak within gcc_options

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Valgrind shows this fixes ~1 KB of leak per iteration (on x86_64) by
 plugging these leaks allocated at opts.c lines 286 and 289:

 ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 875 of 
 917
 ==57820==at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
 ==57820==by 0x595542A: init_options_struct(gcc_options*, gcc_options*) 
 (opts.c:286)
 ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
 ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
 (jit-playback.c:1615)
 ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
 (jit-recording.c:861)
 ==57820==by 0x401CA4: test_jit (harness.h:190)
 ==57820==by 0x401D88: main (harness.h:232)
 ==57820==
 ==57820== 2,752 bytes in 4 blocks are definitely lost in loss record 876 of 
 917
 ==57820==at 0x4A081D4: calloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==57820==by 0x59A6780: xcalloc (xmalloc.c:162)
 ==57820==by 0x595543E: init_options_struct(gcc_options*, gcc_options*) 
 (opts.c:289)
 ==57820==by 0x4E2ED61: toplev::main(int, char**) (toplev.c:2081)
 ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
 (jit-playback.c:1615)
 ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
 (jit-recording.c:861)
 ==57820==by 0x401CA4: test_jit (harness.h:190)
 ==57820==by 0x401D88: main (harness.h:232)

Ok.

Thanks,
Richard.

 gcc/ChangeLog:
 PR jit/63854
 * opts.c (finalize_options_struct): New.
 * opts.h (finalize_options_struct): New.
 * toplev.c (toplev::finalize): Call finalize_options_struct
 on global_options and global_options_set.
 ---
  gcc/opts.c   | 8 
  gcc/opts.h   | 1 +
  gcc/toplev.c | 3 +++
  3 files changed, 12 insertions(+)

 diff --git a/gcc/opts.c b/gcc/opts.c
 index d22882b..dabd3c6 100644
 --- a/gcc/opts.c
 +++ b/gcc/opts.c
 @@ -307,6 +307,14 @@ init_options_struct (struct gcc_options *opts, struct 
 gcc_options *opts_set)
targetm_common.option_init_struct (opts);
  }

 +/* Release any allocations owned by OPTS.  */
 +
 +void
 +finalize_options_struct (struct gcc_options *opts)
 +{
 +  XDELETEVEC (opts-x_param_values);
 +}
 +
  /* If indicated by the optimization level LEVEL (-Os if SIZE is set,
 -Ofast if FAST is set, -Og if DEBUG is set), apply the option DEFAULT_OPT
 to OPTS and OPTS_SET, diagnostic context DC, location LOC, with language
 diff --git a/gcc/opts.h b/gcc/opts.h
 index f694082..c3ec942 100644
 --- a/gcc/opts.h
 +++ b/gcc/opts.h
 @@ -325,6 +325,7 @@ extern void decode_cmdline_options_to_array (unsigned int 
 argc,
  extern void init_options_once (void);
  extern void init_options_struct (struct gcc_options *opts,
  struct gcc_options *opts_set);
 +extern void finalize_options_struct (struct gcc_options *opts);
  extern void decode_cmdline_options_to_array_default_mask (unsigned int argc,
   const char **argv,
   struct 
 cl_decoded_option **decoded_options,
 diff --git a/gcc/toplev.c b/gcc/toplev.c
 index 2e48047..4b4e568 100644
 --- a/gcc/toplev.c
 +++ b/gcc/toplev.c
 @@ -2169,4 +2169,7 @@ toplev::finalize (void)
ipa_cp_c_finalize ();
ipa_reference_c_finalize ();
params_c_finalize ();
 +
 +  finalize_options_struct (global_options);
 +  finalize_options_struct (global_options_set);
  }
 --
 1.8.5.3



Re: [PATCH 02/21] PR jit/63854: Fix memory leak of reginfo.c: valid_mode_changes_obstack

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Valgrind shows this fixes ~4 KB of leak per iteration (on x86_64) by
 plugging this leak allocated at reginfo.c:1327:
   gcc_obstack_init (valid_mode_changes_obstack);

Ok.

Thanks,
Richard.

 ==57820== 16,256 bytes in 4 blocks are definitely lost in loss record 906 of 
 917
 ==57820==at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==57820==by 0x59A6747: xmalloc (xmalloc.c:147)
 ==57820==by 0x30958842DB: _obstack_begin (obstack.c:184)
 ==57820==by 0x51C1EC1: init_subregs_of_mode() (reginfo.c:1327)
 ==57820==by 0x50D2A38: init_costs() (ira-costs.c:2181)
 ==57820==by 0x50D74A8: ira_costs() (ira-costs.c:2211)
 ==57820==by 0x50D1326: ira_build() (ira-build.c:3459)
 ==57820==by 0x50C909C: (anonymous 
 namespace)::pass_ira::execute(function*) (ira.c:5227)
 ==57820==by 0x51884F7: execute_one_pass(opt_pass*) (passes.c:2269)
 ==57820==by 0x5188B75: execute_pass_list_1(opt_pass*) (passes.c:2321)
 ==57820==by 0x5188B87: execute_pass_list_1(opt_pass*) (passes.c:2322)
 ==57820==by 0x5188BC8: execute_pass_list(function*, opt_pass*) 
 (passes.c:2332)

 gcc/ChangeLog:
 PR jit/63854
 * reginfo.c (finish_subregs_of_mode): Replace obstack_finish with
 obstack_free when cleaning up valid_mode_changes_obstack.
 ---
  gcc/reginfo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/reginfo.c b/gcc/reginfo.c
 index efe23cd..c2daf22 100644
 --- a/gcc/reginfo.c
 +++ b/gcc/reginfo.c
 @@ -1343,7 +1343,7 @@ void
  finish_subregs_of_mode (void)
  {
XDELETEVEC (valid_mode_changes);
 -  obstack_finish (valid_mode_changes_obstack);
 +  obstack_free (valid_mode_changes_obstack, NULL);
  }

  /* Free all data attached to the structure.  This isn't a destructor because
 --
 1.8.5.3



Re: [PATCH][AARCH64]Fix PR63424 by adding sumaxminv2di3 pattern

2014-11-19 Thread Richard Henderson
On 10/31/2014 03:51 PM, Renlin Li wrote:
 +(define_expand sumaxminv2di3
 +  [(parallel [
 +(set (match_operand:V2DI 0 register_operand )
 +  (MAXMIN:V2DI (match_operand:V2DI 1 register_operand )
 +   (match_operand:V2DI 2 register_operand )))
 +(clobber (reg:CC CC_REGNUM))])]
 +  TARGET_SIMD

There's no clobber of CC_REGNUM, so you can take that out.

Otherwise it looks good.


r~


Re: [PATCH 06/21] PR jit/63854: Fix leak of opts_obstack

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 This was leaking about 4KB per iteration:

Ok.

Thanks,
Richard.

 16,256 bytes in 4 blocks are definitely lost in loss record 233 of 239
at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D75C17: xmalloc (xmalloc.c:147)
by 0x30958842DB: _obstack_begin (obstack.c:184)
by 0x5D1D317: init_options_struct(gcc_options*, gcc_options*) (opts.c:279)
by 0x532DB0C: toplev::main(int, char**) (toplev.c:2081)
by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
by 0x401CA4: test_jit (harness.h:190)
by 0x401D88: main (harness.h:232)

 gcc/ChangeLog:
 PR jit/63854
 * toplev.c (toplev::finalize): Free opts_obstack.
 ---
  gcc/toplev.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/gcc/toplev.c b/gcc/toplev.c
 index 291b84d..0d617c2 100644
 --- a/gcc/toplev.c
 +++ b/gcc/toplev.c
 @@ -2178,4 +2178,6 @@ toplev::finalize (void)
/* Clean up the context (and pass_manager etc). */
delete g;
g = NULL;
 +
 +  obstack_free (opts_obstack, NULL);
  }
 --
 1.8.5.3



Re: [PATCH 07/21] PR jit/63854: Fix leak of optimization_summary_obstack

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 This was leaking ~4KB per iteration:

Ok.

Thanks,
Richard.

 16,256 bytes in 4 blocks are definitely lost in loss record 234 of 239
at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D75C17: xmalloc (xmalloc.c:147)
by 0x30958842DB: _obstack_begin (obstack.c:184)
by 0x4DFECDE: bitmap_obstack_initialize(bitmap_obstack*) (bitmap.c:338)
by 0x5CA3C1D: ipa_init() (ipa-reference.c:431)
by 0x5CA3FB1: generate_summary() (ipa-reference.c:551)
by 0x5CA4650: propagate() (ipa-reference.c:684)
by 0x5CA5BEE: (anonymous 
 namespace)::pass_ipa_reference::execute(function*) (ipa-reference.c:1178)
by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
by 0x5224427: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
by 0x4E495C1: ipa_passes() (cgraphunit.c:2088)
by 0x4E498E0: symbol_table::compile() (cgraphunit.c:2172)

 gcc/ChangeLog:
 PR jit/63854
 * ipa-reference.c (ipa_reference_c_finalize): Release
 optimization_summary_obstack.
 ---
  gcc/ipa-reference.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

 diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
 index b421f63..1ce06d1 100644
 --- a/gcc/ipa-reference.c
 +++ b/gcc/ipa-reference.c
 @@ -1193,5 +1193,9 @@ make_pass_ipa_reference (gcc::context *ctxt)
  void
  ipa_reference_c_finalize (void)
  {
 -  ipa_init_p = false;
 +  if (ipa_init_p)
 +{
 +  bitmap_obstack_release (optimization_summary_obstack);
 +  ipa_init_p = false;
 +}
  }
 --
 1.8.5.3



Re: [PATCH 08/21] PR jit/63854: Add ira_costs_c_finalize

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 this_target_ira_int-free_ira_costs () is called by ira_init_costs,
 but this isn't called after the compile, causing noise when running
 under valgrind.

 This patch adds a ira_costs_c_finalize function to clean this up,
 and calls it from toplev::finalize (and hence this isn't called by
 cc1/cc1plus/etc, just by libgccjit.so).

 Doing so eliminates about 27KB of leak from the valgrind report.

Ok.

Thanks,
Richard.

 gcc/ChangeLog:
 PR jit/63854
 * ira-costs.c (ira_costs_c_finalize): New function.
 * ira.h (ira_costs_c_finalize): New prototype.
 * toplev.c (toplev::finalize): Call ira_costs_c_finalize.
 ---
  gcc/ira-costs.c | 6 ++
  gcc/ira.h   | 3 +++
  gcc/toplev.c| 1 +
  3 files changed, 10 insertions(+)

 diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
 index 122815b..2dabead 100644
 --- a/gcc/ira-costs.c
 +++ b/gcc/ira-costs.c
 @@ -2356,3 +2356,9 @@ ira_adjust_equiv_reg_cost (unsigned regno, int cost)
else
  regno_equiv_gains[regno] += cost;
  }
 +
 +void
 +ira_costs_c_finalize (void)
 +{
 +  this_target_ira_int-free_ira_costs ();
 +}
 diff --git a/gcc/ira.h b/gcc/ira.h
 index b294ea1..d62656c 100644
 --- a/gcc/ira.h
 +++ b/gcc/ira.h
 @@ -199,4 +199,7 @@ extern bool ira_bad_reload_regno (int, rtx, rtx);

  extern void ira_adjust_equiv_reg_cost (unsigned, int);

 +/* ira-costs.c */
 +extern void ira_costs_c_finalize (void);
 +
  #endif /* GCC_IRA_H */
 diff --git a/gcc/toplev.c b/gcc/toplev.c
 index 0d617c2..0181a3f 100644
 --- a/gcc/toplev.c
 +++ b/gcc/toplev.c
 @@ -2168,6 +2168,7 @@ toplev::finalize (void)
gcse_c_finalize ();
ipa_cp_c_finalize ();
ipa_reference_c_finalize ();
 +  ira_costs_c_finalize ();
params_c_finalize ();

finalize_options_struct (global_options);
 --
 1.8.5.3



Re: [PATCH 11/21] PR jit/63854: Fix leak of avail within tree-ssa-pre.c

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Fix this leak:

Ok.

Thanks,
Richard.

 120 bytes in 5 blocks are definitely lost in loss record 141 of 227
at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D75D8F: xrealloc (xmalloc.c:177)
by 0x550DEBA: void va_heap::reservepre_expr_d*(vecpre_expr_d*, va_heap, 
 vl_embed*, unsigned int, bool) (vec.h:310)
by 0x550D509: vecpre_expr_d*, va_heap, vl_ptr::reserve(unsigned int, 
 bool) (vec.h:1428)
by 0x550DA3E: vecpre_expr_d*, va_heap, vl_ptr::reserve_exact(unsigned 
 int) (vec.h:1448)
by 0x550D167: vecpre_expr_d*, va_heap, vl_ptr::safe_grow(unsigned int) 
 (vec.h:1576)
by 0x55076FE: do_regular_insertion(basic_block_def*, basic_block_def*) 
 (tree-ssa-pre.c:3209)
by 0x5508379: insert_aux(basic_block_def*) (tree-ssa-pre.c:3526)
by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
by 0x55083DC: insert_aux(basic_block_def*) (tree-ssa-pre.c:3536)
by 0x55084C0: insert() (tree-ssa-pre.c:3559)
by 0x550C5BC: (anonymous namespace)::pass_pre::execute(function*) 
 (tree-ssa-pre.c:4805)

 gcc/ChangeLog:
 PR jit/63854
 * tree-ssa-pre.c (do_regular_insertion): Convert avail from
 vec to auto_vec to fix a leak.
 ---
  gcc/tree-ssa-pre.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
 index ea99198..027dc1c 100644
 --- a/gcc/tree-ssa-pre.c
 +++ b/gcc/tree-ssa-pre.c
 @@ -3202,7 +3202,7 @@ do_regular_insertion (basic_block block, basic_block 
 dom)
bool new_stuff = false;
vecpre_expr exprs;
pre_expr expr;
 -  vecpre_expr avail = vNULL;
 +  auto_vecpre_expr avail;
int i;

exprs = sorted_array_from_bitmap_set (ANTIC_IN (block));
 --
 1.8.5.3



Re: [PATCH 12/21] PR jit/63854: Add a valgrind suppresion file

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Valgrind complains about uninitialized data within sparseset_bit_p.
 Provide a suppression file to silence these warnings.

 Valgrind requires suppression files for C++ code to use the mangled
 names, so we do that here.

There is --enable-valgrind-annotations to get the same effect by GCC
telling valgrind about this (and more).

Richard.

 contrib/ChangeLog
 PR jit/63854
 * valgrind.supp: New.
 ---
  contrib/valgrind.supp | 11 +++
  1 file changed, 11 insertions(+)
  create mode 100644 contrib/valgrind.supp

 diff --git a/contrib/valgrind.supp b/contrib/valgrind.supp
 new file mode 100644
 index 000..ec9ff02
 --- /dev/null
 +++ b/contrib/valgrind.supp
 @@ -0,0 +1,11 @@
 +{
 +   suppress-uninit-cond-with-sparseset_bit_p
 +   Memcheck:Cond
 +   fun:_ZL15sparseset_bit_pP13sparseset_defm
 +}
 +
 +{
 +   suppress-uninit-use-with-sparseset_bit_p
 +   Memcheck:Value8
 +   fun:_ZL15sparseset_bit_pP13sparseset_defm
 +}
 --
 1.8.5.3



Re: [PATCH 16/21] PR jit/63854: Add all_late_ipa_passes to GCC_PASS_LISTS

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Valgrind showed a per-iteration leak of pass_ipa_pta (and presumably
 pass_omp_simd_clone):

Ok.

Thanks,
Richard.

 704 (352 direct, 352 indirect) bytes in 4 blocks are definitely lost in loss 
 record 198 of 241
at 0x4A06965: operator new(unsigned long) (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x555A55D: make_pass_ipa_pta(gcc::context*) 
 (tree-ssa-structalias.c:7425)
by 0x5219FDA: gcc::pass_manager::pass_manager(gcc::context*) 
 (pass-instances.def:141)
by 0x4E4F5D1: gcc::context::context() (context.c:37)
by 0x532C07B: general_init(char const*) (toplev.c:1212)
by 0x532DB5E: toplev::main(int, char**) (toplev.c:2074)
by 0x4DE76AF: gcc::jit::playback::context::compile() (jit-playback.c:1615)
by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
by 0x401DC4: test_jit (harness.h:190)
by 0x401EA8: main (harness.h:232)

 Investigation showed that ~pass_manager wasn't deleting these
 passes since for some reason GCC_PASS_LISTS didn't contain
 all_late_ipa_passes and so wasn't calling delete_pass_tree on it.

 Add it to GCC_PASS_LISTS, fixing the leak.

 gcc/ChangeLog:
 PR jit/63854
 * pass_manager.h (GCC_PASS_LISTS): Add all_late_ipa_passes.
 ---
  gcc/pass_manager.h | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/gcc/pass_manager.h b/gcc/pass_manager.h
 index 82857a9..e6658ad 100644
 --- a/gcc/pass_manager.h
 +++ b/gcc/pass_manager.h
 @@ -29,6 +29,7 @@ struct register_pass_info;
DEF_PASS_LIST (all_lowering_passes) \
DEF_PASS_LIST (all_small_ipa_passes) \
DEF_PASS_LIST (all_regular_ipa_passes) \
 +  DEF_PASS_LIST (all_late_ipa_passes) \
DEF_PASS_LIST (all_passes)

  #define DEF_PASS_LIST(LIST) PASS_LIST_NO_##LIST,
 --
 1.8.5.3



Re: [PATCH 04/21] PR jit/63854: Fix memory leak within bb-reorder.c

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Valgrind showed this leaking 200 bytes per iteration in one of
 my testcases:

Ok.

Thanks,
Richard.

 1,000 bytes in 5 blocks are definitely lost in loss record 200 of 241
at 0x4A083AA: realloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D75C5C: xrealloc (xmalloc.c:179)
by 0x4E10734: void 
 va_heap::reservebasic_block_def*(vecbasic_block_def*, va_heap, 
 vl_embed*, unsigned int, bool) (vec.h:310
 )
by 0x4E105BB: vecbasic_block_def*, va_heap, vl_ptr::reserve(unsigned 
 int, bool) (vec.h:1428)
by 0x4E15B37: vecbasic_block_def*, va_heap, 
 vl_ptr::safe_push(basic_block_def* const) (vec.h:1537)
by 0x5B61F44: find_rarely_executed_basic_blocks_and_crossing_edges() 
 (bb-reorder.c:1614)
by 0x5B63E90: (anonymous 
 namespace)::pass_partition_blocks::execute(function*) (bb-reorder.c:2711)
by 0x522354D: execute_one_pass(opt_pass*) (passes.c:2306)
by 0x52237C4: execute_pass_list_1(opt_pass*) (passes.c:2358)
by 0x52237F5: execute_pass_list_1(opt_pass*) (passes.c:2359)
by 0x5223832: execute_pass_list(function*, opt_pass*) (passes.c:2369)
by 0x4E4884F: cgraph_node::expand() (cgraphunit.c:1773)

 Fix is trivial.

 gcc/ChangeLog:
 PR jit/63854
 * bb-reorder.c
 (find_rarely_executed_basic_blocks_and_crossing_edges): Convert
 local bbs_in_hot_partition from vec to auto_vec.
 ---
  gcc/bb-reorder.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
 index 1f7c3ee..4ca3bb2 100644
 --- a/gcc/bb-reorder.c
 +++ b/gcc/bb-reorder.c
 @@ -1582,7 +1582,7 @@ find_rarely_executed_basic_blocks_and_crossing_edges 
 (void)
edge e;
edge_iterator ei;
unsigned int cold_bb_count = 0;
 -  vecbasic_block bbs_in_hot_partition = vNULL;
 +  auto_vecbasic_block bbs_in_hot_partition;

/* Mark which partition (hot/cold) each basic block belongs in.  */
FOR_EACH_BB_FN (bb, cfun)
 --
 1.8.5.3



Re: [PATCH 05/21] PR jit/63854: Fix memory leak of save_decoded_options

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 This commit fixes this leak from opts-common.c, about 4KB per iteration.

Ok.

Thanks,
Richard.

 ==57820== 18,816 (2,560 direct, 16,256 indirect) bytes in 4 blocks are 
 definitely lost in loss record 907 of 917
 ==57820==at 0x4A083AA: realloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
 ==57820==by 0x59A67CC: xrealloc (xmalloc.c:179)
 ==57820==by 0x59587C9: decode_cmdline_options_to_array(unsigned int, char 
 const**, unsigned int, cl_decoded_option**, unsigned int*) (opts-common.c:885)
 ==57820==by 0x4E2ED90: toplev::main(int, char**) (toplev.c:2089)
 ==57820==by 0x4E43186: gcc::jit::playback::context::compile() 
 (jit-playback.c:1615)
 ==57820==by 0x4E4018D: gcc::jit::recording::context::compile() 
 (jit-recording.c:861)
 ==57820==by 0x401CA4: test_jit (harness.h:190)
 ==57820==by 0x401D88: main (harness.h:232)

 gcc/ChangeLog:
 PR jit/63854
 * toplev.c (toplev::finalize): Clean up save_decoded_options.
 ---
  gcc/toplev.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/gcc/toplev.c b/gcc/toplev.c
 index 876279f..291b84d 100644
 --- a/gcc/toplev.c
 +++ b/gcc/toplev.c
 @@ -2173,6 +2173,8 @@ toplev::finalize (void)
finalize_options_struct (global_options);
finalize_options_struct (global_options_set);

 +  XDELETEVEC (save_decoded_options);
 +
/* Clean up the context (and pass_manager etc). */
delete g;
g = NULL;
 --
 1.8.5.3



Re: [PATCH 20/21] PR jit/63854: Fix leak in ipa-icf.c

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 ipa-icf.c was leaking 16 bytes per iteration in the jit testcases here:

Ok.

Tahnks,
Richard.

 80 bytes in 5 blocks are definitely lost in loss record 94 of 199
at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D764B7: xmalloc (xmalloc.c:147)
by 0x5C962FF: ipa_icf::sem_item_optimizer::get_group_by_hash(unsigned int, 
 ipa_icf::sem_item_type) (ipa-icf.c:1503)
by 0x5C96C8C: ipa_icf::sem_item_optimizer::build_hash_based_classes() 
 (ipa-icf.c:1723)
by 0x5C96765: ipa_icf::sem_item_optimizer::execute() (ipa-icf.c:1616)
by 0x5C98E81: ipa_icf::ipa_icf_driver() (ipa-icf.c:2373)
by 0x5C993C8: ipa_icf::pass_ipa_icf::execute(function*) (ipa-icf.c:2421)
by 0x5223CCD: execute_one_pass(opt_pass*) (passes.c:2306)
by 0x5224BA7: execute_ipa_pass_list(opt_pass*) (passes.c:2700)
by 0x4E49B0D: ipa_passes() (cgraphunit.c:2088)
by 0x4E49E2C: symbol_table::compile() (cgraphunit.c:2172)
by 0x4E4A1C2: symbol_table::finalize_compilation_unit() (cgraphunit.c:2325)

 by not freeing the allocation here:

   congruence_class_group *item = XNEW (congruence_class_group);

 in sem_item_optimizer::get_group_by_hash.

 Fix it.

 gcc/ChangeLog:
 PR jit/63854
 * ipa-icf.c (sem_item_optimizer::~sem_item_optimizer): Free each
 congruence_class_group *.
 ---
  gcc/ipa-icf.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
 index 2d5fcf5..b0ba82e 100644
 --- a/gcc/ipa-icf.c
 +++ b/gcc/ipa-icf.c
 @@ -1322,6 +1322,7 @@ sem_item_optimizer::~sem_item_optimizer ()
 delete (*it)-classes[i];

(*it)-classes.release ();
 +  free (*it);
  }

m_items.release ();
 --
 1.8.5.3



Re: [PATCH 09/21] PR jit/63854: Don't leak producer_string in dwarf2out.c

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Fix this small per-iteration leak with debuginfo:

Ok.

Thanks,
Richard.

 424 bytes in 4 blocks are definitely lost in loss record 185 of 230
at 0x4A0645D: malloc (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D75CA7: xmalloc (xmalloc.c:147)
by 0x4ECE9E4: gen_producer_string() (dwarf2out.c:19489)
by 0x4EDB2C8: dwarf2out_finish(char const*) (dwarf2out.c:24257)
by 0x532AD3B: compile_file() (toplev.c:623)
by 0x532D9E1: do_compile() (toplev.c:2020)
by 0x532DC54: toplev::main(int, char**) (toplev.c:2117)
by 0x4DE766F: gcc::jit::playback::context::compile() (jit-playback.c:1615)
by 0x4DD76DA: gcc::jit::recording::context::compile() (jit-recording.c:861)
by 0x4DD5BD2: gcc_jit_context_compile (libgccjit.c:2014)
by 0x401CA4: test_jit (harness.h:190)
by 0x401D88: main (harness.h:232)

 gcc/ChangeLog:
 PR jit/63854
 * dwarf2out.c (dwarf2out_c_finalize): Free producer_string.
 ---
  gcc/dwarf2out.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
 index b16883f..9069f9a 100644
 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -24741,6 +24741,8 @@ dwarf2out_c_finalize (void)
frame_pointer_fb_offset = 0;
frame_pointer_fb_offset_valid = false;
base_types.release ();
 +  XDELETEVEC (producer_string);
 +  producer_string = NULL;
  }

  #include gt-dwarf2out.h
 --
 1.8.5.3



Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
  OImode/XImode on i?86/x86_64 are not = MAX_BITSIZE_MODE_ANY_INT, because
  they are never used for integer arithmetics (and there is no way
  to represent all their values in RTL if not using CONST_WIDE_INT).
  As the following testcase shows, simplify_immed_subreg can be called
  with such modes though, e.g. trying to forward propagate a CONST_VECTOR
  (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can appear
  in the IL directly) into a SUBREG_REG.
 
 So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
 end doing with that OI mode subreg?  (why can't we simply use the
 V4DF one?)

propagate_rtx_1 is called on
(subreg:OI (reg:V8DI 89) 0)
with old_rtx
(reg:V8DI 89)
and new_rtx
(const_vector:V8DI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])

Seems we throw away the result in that case though, because
gen_lowpart_common doesn't like to return low parts of VOIDmode constants
larger if the mode is larger than double int:
1353  innermode = GET_MODE (x);
1354  if (CONST_INT_P (x)
1355   msize * BITS_PER_UNIT = HOST_BITS_PER_WIDE_INT)
1356innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
1357  else if (innermode == VOIDmode)
1358innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
we hit the last if and as mode is wider than innermode, we return NULL later
on.

  The following patch instead of ICE handles the most common cases (all 0
  and all 1 CONST_VECTORs) and returns NULL otherwise.
  
  Before wide-int got merged, the testcase worked, though the code didn't
  bother checking anything, just created 0 or constm1_rtx for the two cases
  that could happen and if something else appeared, could just return what
  matched low TImode (or DImode for -m32).
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 Looks ok to me.  Not sure if the zero/all-ones case really happens that
 much to be worth special-casing - I think you could use
 fixed_wide_intproper-size and simply see if the result is representable
 in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
 the patch may be smaller for that.

So perhaps something like this?  Don't know how much more inefficient it is
compared to what it used to do before.

Or just keep the existing code and just remove the assert and instead return
NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
least during propagation that will make zero change.
Though, in that case I have still doubts about the current code handling right
modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
== 0, we still silently throw away the upper bits, don't we?

BTW, to Mike, the assert has been misplaced, there was first buffer overflow
and only after that the assert.

2014-11-19  Jakub Jelinek  ja...@redhat.com

PR target/63910
* simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
than MAX_BITSIZE_MODE_ANY_INT.

* gcc.target/i386/pr63910.c: New test.

--- gcc/simplify-rtx.c.jj   2014-11-19 09:17:15.491327992 +0100
+++ gcc/simplify-rtx.c  2014-11-19 12:28:16.223808178 +0100
@@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute
int units
  = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
  / HOST_BITS_PER_WIDE_INT;
-   HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
HOST_BITS_PER_WIDE_INT];
-   wide_int r;
+   const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
++ HOST_BITS_PER_WIDE_INT - 1)
+   / HOST_BITS_PER_WIDE_INT;
+   HOST_WIDE_INT tmp[tmpsize];
+   typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
+   imm_int r;
 
for (u = 0; u  units; u++)
  {
@@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute
tmp[u] = buf;
base += HOST_BITS_PER_WIDE_INT;
  }
-   gcc_assert (GET_MODE_PRECISION (outer_submode)
-   = MAX_BITSIZE_MODE_ANY_INT);
-   r = wide_int::from_array (tmp, units,
- GET_MODE_PRECISION (outer_submode));
+   r = imm_int::from_array (tmp, units,
+GET_MODE_PRECISION (outer_submode));
+#if TARGET_SUPPORTS_WIDE_INT == 0
+   if (wi::min_precision (r, SIGNED)  HOST_BITS_PER_DOUBLE_INT)
+ return NULL_RTX;
+#endif
elems[elem] = immed_wide_int_const (r, outer_submode);
  }
  break;
--- gcc/testsuite/gcc.target/i386/pr63910.c.jj  2014-11-19 

Re: [PATCH 14/21] PR jit/63854: Fix leak of paths within jump threading

2014-11-19 Thread Richard Biener
On Wed, Nov 19, 2014 at 11:46 AM, David Malcolm dmalc...@redhat.com wrote:
 Paths are allocated as:
vecjump_thread_edge * *path = new vecjump_thread_edge * ();
 i.e. the vec itself is on the heap, so a mere:
   path-release ();
 is merely releasing the buffer the vec holds, not the vec itself.

Ok.

Thanks,
RIchard.

 This patch updates the two sites that release paths to also delete them,
 fixing leaks like this seen by valgrind:
 160 bytes in 20 blocks are definitely lost in loss record 165 of 241
at 0x4A06965: operator new(unsigned long) (in 
 /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x556E5AF: thread_across_edge(gimple_statement_base*, edge_def*, bool, 
 vectree_node*, va_heap, vl_ptr*, tree_node
 * (*)(gimple_statement_base*, gimple_statement_base*)) 
 (tree-ssa-threadedge.c:1124)
by 0x547B73B: dom_opt_dom_walker::thread_across_edge(edge_def*) 
 (tree-ssa-dom.c:1184)
by 0x5480183: dom_opt_dom_walker::after_dom_children(basic_block_def*) 
 (tree-ssa-dom.c:2015)
by 0x5C1C5F7: dom_walker::walk(basic_block_def*) (domwalk.c:218)
by 0x547AE83: (anonymous namespace)::pass_dominator::execute(function*) 
 (tree-ssa-dom.c:919)
by 0x52235BD: execute_one_pass(opt_pass*) (passes.c:2306)
by 0x5223834: execute_pass_list_1(opt_pass*) (passes.c:2358)
by 0x5223865: execute_pass_list_1(opt_pass*) (passes.c:2359)
by 0x52238A2: execute_pass_list(function*, opt_pass*) (passes.c:2369)
by 0x4E4888F: cgraph_node::expand() (cgraphunit.c:1773)
by 0x4E48F29: expand_all_functions() (cgraphunit.c:1909)

 [Space-wise, a vec T, A, vl_ptr is just a pointer, hence the 8 bytes
 per block seen in the valgrind report]

 gcc/ChangeLog:
 PR jit/63854
 * tree-ssa-threadedge.c (thread_across_edge): Don't just release
 path, delete it.
 * tree-ssa-threadupdate.c (delete_jump_thread_path): Likewise.
 ---
  gcc/tree-ssa-threadedge.c   | 1 +
  gcc/tree-ssa-threadupdate.c | 1 +
  2 files changed, 2 insertions(+)

 diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
 index d5b9696..9ee1cae 100644
 --- a/gcc/tree-ssa-threadedge.c
 +++ b/gcc/tree-ssa-threadedge.c
 @@ -1149,6 +1149,7 @@ thread_across_edge (gimple dummy_cond,
  through the vector entries.  */
gcc_assert (path-length () == 0);
path-release ();
 +  delete path;

/* A negative status indicates the target block was deemed too big to
  duplicate.  Just quit now rather than trying to use the block as
 diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
 index 151ed83..5cb9f86 100644
 --- a/gcc/tree-ssa-threadupdate.c
 +++ b/gcc/tree-ssa-threadupdate.c
 @@ -2481,6 +2481,7 @@ delete_jump_thread_path (vecjump_thread_edge * *path)
for (unsigned int i = 0; i  path-length (); i++)
  delete (*path)[i];
path-release();
 +  delete path;
  }

  /* Register a jump threading opportunity.  We queue up all the jump
 --
 1.8.5.3



Re: [PATCH] Fix for fold_negate_expr (PR sanitizer/63879)

2014-11-19 Thread Richard Biener
On Wed, 19 Nov 2014, Marek Polacek wrote:

 The problem here is that negate_expr_p returns true (as it should)
 for UINT_MAX(OVF), but fold_negate_expr didn't actually fold it,
 and that is a no-no; if negate_expr_p is true, fold_negate_expr must
 not return NULL_TREE.  I added the following hunk for bootstrap and
 regtest for better coverage:
 
 @@ -752,6 +755,12 @@ fold_negate_expr (location_t loc, tree t)
break;
  }
  
 +#ifdef ENABLE_CHECKING
 +  if (negate_expr_p (t))
 +internal_error (fold_negate_expr should never return NULL_TREE 
 + if negate_expr_p is true);
 +#endif
 +
return NULL_TREE;
  }
 
 Fixed by adjusting fold_negate_expr so that it folds non-trapping
 wrapping constants.  The SANITIZE_SI_OVERFLOW check is important
 to not regress -Woverflow warnings (ug).
 
 negate_expr_p hunk is to match the NEGATE_EXPR case in fold_negate_expr.
  
 Bootstrapped/regtested on power8-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2014-11-19  Marek Polacek  pola...@redhat.com
 
   PR sanitizer/63879
   * fold-const.c (negate_expr_p) case NEGATE_EXPR: Return
   !TYPE_OVERFLOW_SANITIZED.
   (fold_negate_expr) case INTEGER_CST: Fold when overflow
   does not trap and when overflow wraps, or when SANITIZE_SI_OVERFLOW
   is 0.
 
   * c-c++-common/ubsan/pr63879-1.c: New test.
   * c-c++-common/ubsan/pr63879-2.c: New test.
 
 diff --git gcc/fold-const.c gcc/fold-const.c
 index f6fb8af..719e06e 100644
 --- gcc/fold-const.c
 +++ gcc/fold-const.c
 @@ -408,9 +408,11 @@ negate_expr_p (tree t)
  TYPE_OVERFLOW_WRAPS (type));
  
  case FIXED_CST:
 -case NEGATE_EXPR:
return true;
  
 +case NEGATE_EXPR:
 +  return !TYPE_OVERFLOW_SANITIZED (type);
 +
  case REAL_CST:
/* We want to canonicalize to positive real constants.  Pretend
   that only negative ones can be easily negated.  */
 @@ -555,7 +557,8 @@ fold_negate_expr (location_t loc, tree t)
tem = fold_negate_const (t, type);
if (TREE_OVERFLOW (tem) == TREE_OVERFLOW (t)
 || (!TYPE_OVERFLOW_TRAPS (type)
 -(flag_sanitize  SANITIZE_SI_OVERFLOW) == 0))
 +TYPE_OVERFLOW_WRAPS (type))
 +   || (flag_sanitize  SANITIZE_SI_OVERFLOW) == 0)
   return tem;
break;
  
 diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-1.c 
 gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
 index e69de29..2035849 100644
 --- gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
 +++ gcc/testsuite/c-c++-common/ubsan/pr63879-1.c
 @@ -0,0 +1,23 @@
 +/* PR sanitizer/63879 */
 +/* { dg-do compile } */
 +/* { dg-options -fsanitize=undefined } */
 +
 +struct A
 +{
 +  int inode;
 +} * a;
 +int b, c;
 +void
 +fn1 ()
 +{
 +  int d = 0;
 +  while (b)
 +{
 +  if (a-inode)
 +d++;
 +  a = 0;
 +}
 +  c = d - 1;
 +  for (; c = 0; c--)
 +;
 +}
 diff --git gcc/testsuite/c-c++-common/ubsan/pr63879-2.c 
 gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
 index e69de29..34eb8e7 100644
 --- gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
 +++ gcc/testsuite/c-c++-common/ubsan/pr63879-2.c
 @@ -0,0 +1,13 @@
 +/* PR sanitizer/63879 */
 +/* { dg-do compile } */
 +/* { dg-options -fsanitize=undefined } */
 +
 +int a;
 +void
 +fn1 ()
 +{
 +  int b = 2;
 +  for (; a;)
 +while (b = 0)
 +  b--;
 +}
 
   Marek
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany


Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

 On Wed, Nov 19, 2014 at 09:59:45AM +0100, Richard Biener wrote:
   OImode/XImode on i?86/x86_64 are not = MAX_BITSIZE_MODE_ANY_INT, because
   they are never used for integer arithmetics (and there is no way
   to represent all their values in RTL if not using CONST_WIDE_INT).
   As the following testcase shows, simplify_immed_subreg can be called
   with such modes though, e.g. trying to forward propagate a CONST_VECTOR
   (i?86/x86_64 handles all zeros and all ones as CONST_VECTORs that can 
   appear
   in the IL directly) into a SUBREG_REG.
  
  So we have (subreg:OI (reg:V4DF ...) ...) for example?  What do we
  end doing with that OI mode subreg?  (why can't we simply use the
  V4DF one?)
 
 propagate_rtx_1 is called on
 (subreg:OI (reg:V8DI 89) 0)
 with old_rtx
 (reg:V8DI 89)
 and new_rtx
 (const_vector:V8DI [
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 (const_int 0 [0])
 ])
 
 Seems we throw away the result in that case though, because
 gen_lowpart_common doesn't like to return low parts of VOIDmode constants
 larger if the mode is larger than double int:
 1353innermode = GET_MODE (x);
 1354if (CONST_INT_P (x)
 1355 msize * BITS_PER_UNIT = HOST_BITS_PER_WIDE_INT)
 1356  innermode = mode_for_size (HOST_BITS_PER_WIDE_INT, MODE_INT, 0);
 1357else if (innermode == VOIDmode)
 1358  innermode = mode_for_size (HOST_BITS_PER_DOUBLE_INT, MODE_INT, 0);
 we hit the last if and as mode is wider than innermode, we return NULL later
 on.
 
   The following patch instead of ICE handles the most common cases (all 0
   and all 1 CONST_VECTORs) and returns NULL otherwise.
   
   Before wide-int got merged, the testcase worked, though the code didn't
   bother checking anything, just created 0 or constm1_rtx for the two cases
   that could happen and if something else appeared, could just return what
   matched low TImode (or DImode for -m32).
   
   Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
  
  Looks ok to me.  Not sure if the zero/all-ones case really happens that
  much to be worth special-casing - I think you could use
  fixed_wide_intproper-size and simply see if the result is representable
  in a CONST_INT/CONST_DOUBLE?  Can you try if that works?  It looks like
  the patch may be smaller for that.
 
 So perhaps something like this?  Don't know how much more inefficient it is
 compared to what it used to do before.

Yes, that looks good.

 Or just keep the existing code and just remove the assert and instead return
 NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
 least during propagation that will make zero change.
 Though, in that case I have still doubts about the current code handling right
 modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
 MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
 == 0, we still silently throw away the upper bits, don't we?

Well - not with your added check, no?

I'd say the patch is ok.

Thanks,
Richard.

 BTW, to Mike, the assert has been misplaced, there was first buffer overflow
 and only after that the assert.
 
 2014-11-19  Jakub Jelinek  ja...@redhat.com
 
   PR target/63910
   * simplify-rtx.c (simplify_immed_subreg): Handle integer modes wider
   than MAX_BITSIZE_MODE_ANY_INT.
 
   * gcc.target/i386/pr63910.c: New test.
 
 --- gcc/simplify-rtx.c.jj 2014-11-19 09:17:15.491327992 +0100
 +++ gcc/simplify-rtx.c2014-11-19 12:28:16.223808178 +0100
 @@ -5501,8 +5501,12 @@ simplify_immed_subreg (machine_mode oute
   int units
 = (GET_MODE_BITSIZE (outer_submode) + HOST_BITS_PER_WIDE_INT - 1)
 / HOST_BITS_PER_WIDE_INT;
 - HOST_WIDE_INT tmp[MAX_BITSIZE_MODE_ANY_INT / 
 HOST_BITS_PER_WIDE_INT];
 - wide_int r;
 + const int tmpsize = (MAX_BITSIZE_MODE_ANY_MODE 
 +  + HOST_BITS_PER_WIDE_INT - 1)
 + / HOST_BITS_PER_WIDE_INT;
 + HOST_WIDE_INT tmp[tmpsize];
 + typedef FIXED_WIDE_INT (tmpsize * HOST_BITS_PER_WIDE_INT) imm_int;
 + imm_int r;
  
   for (u = 0; u  units; u++)
 {
 @@ -5515,10 +5519,12 @@ simplify_immed_subreg (machine_mode oute
   tmp[u] = buf;
   base += HOST_BITS_PER_WIDE_INT;
 }
 - gcc_assert (GET_MODE_PRECISION (outer_submode)
 - = MAX_BITSIZE_MODE_ANY_INT);
 - r = wide_int::from_array (tmp, units,
 -   GET_MODE_PRECISION (outer_submode));
 + r = imm_int::from_array (tmp, units,
 +  GET_MODE_PRECISION (outer_submode));
 +#if TARGET_SUPPORTS_WIDE_INT == 0
 + if (wi::min_precision (r, SIGNED)  HOST_BITS_PER_DOUBLE_INT)
 +   

Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
  So perhaps something like this?  Don't know how much more inefficient it is
  compared to what it used to do before.
 
 Yes, that looks good.
 
  Or just keep the existing code and just remove the assert and instead return
  NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
  least during propagation that will make zero change.
  Though, in that case I have still doubts about the current code handling 
  right
  modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
  MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If TARGET_SUPPORTS_WIDE_INT
  == 0, we still silently throw away the upper bits, don't we?
 
 Well - not with your added check, no?

For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
precision?  Mike?

Jakub


[C++ Patch] PR 55425

2014-11-19 Thread Paolo Carlini

Hi,

today I wanted to simply close this Bug as fixed, but then I noticed 
that Richard Smith in the audit trail argued, correctly in my opinion, 
that we should accept things like:


constexpr const char* x() { return __func__; }

in C++11 mode too, because the as if local variable specification of 
__func__ in the Standard doesn't matter for the purpose of constexpr, 
because otherwise in C++11 no constexpr function would be valid: the 
Standard does *not* say that __func__ is only predefined if it is used.


That said, I think a quick way to accept __func__, __FUNCTION__, and 
__PRETTY_FUNCTION__ is checking whether DECL_ARTIFICIAL (decl) is true 
on the decl. Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2014-11-19  Paolo Carlini  paolo.carl...@oracle.com

PR c++/55425
* constexpr.c (constexpr_fn_retval): Accept __func__, __FUNCTION__,
and __PRETTY_FUNCTION__.

/testsuite
2014-11-19  Paolo Carlini  paolo.carl...@oracle.com

PR c++/55425
* g++.dg/cpp0x/constexpr-__func__.C
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 217755)
+++ cp/constexpr.c  (working copy)
@@ -616,9 +616,14 @@ constexpr_fn_retval (tree body)
   return break_out_target_exprs (TREE_OPERAND (body, 0));
 
 case DECL_EXPR:
-  if (TREE_CODE (DECL_EXPR_DECL (body)) == USING_DECL)
-   return NULL_TREE;
-  return error_mark_node;
+  {
+   tree decl = DECL_EXPR_DECL (body);
+   if (TREE_CODE (decl) == USING_DECL
+   /* Accept __func__, __FUNCTION__, and __PRETTY_FUNCTION__.  */
+   || DECL_ARTIFICIAL (decl))
+ return NULL_TREE;
+   return error_mark_node;
+  }
 
 case CLEANUP_POINT_EXPR:
   return constexpr_fn_retval (TREE_OPERAND (body, 0));
Index: testsuite/g++.dg/cpp0x/constexpr-__func__.C
===
--- testsuite/g++.dg/cpp0x/constexpr-__func__.C (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-__func__.C (working copy)
@@ -0,0 +1,6 @@
+// PR c++/55425
+// { dg-do compile { target c++11 } }
+
+constexpr const char* x() { return __func__; }
+constexpr const char* y() { return __FUNCTION__; }
+constexpr const char* z() { return __PRETTY_FUNCTION__; }


Re: [PATCH] Fix ICEs in simplify_immed_subreg on OImode/XImode subregs (PR target/63910)

2014-11-19 Thread Richard Biener
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

 On Wed, Nov 19, 2014 at 12:59:06PM +0100, Richard Biener wrote:
   So perhaps something like this?  Don't know how much more inefficient it 
   is
   compared to what it used to do before.
  
  Yes, that looks good.
  
   Or just keep the existing code and just remove the assert and instead 
   return
   NULL whenever outer_submode is wider than MAX_BITSIZE_MODE_ANY_INT?  At
   least during propagation that will make zero change.
   Though, in that case I have still doubts about the current code handling 
   right
   modes wider than HOST_BITS_PER_DOUBLE_INT but smaller than
   MAX_BITSIZE_MODE_ANY_INT (none on i?86/x86_64).  If 
   TARGET_SUPPORTS_WIDE_INT
   == 0, we still silently throw away the upper bits, don't we?
  
  Well - not with your added check, no?
 
 For TARGET_SUPPORTS_WIDE_INT == 0 should be hopefully ok.  Not sure
 about TARGET_SUPPORTS_WIDE_INT != 0, can it express any generic_wide_int, or
 is it still bound to wide_int (i.e. MAX_BITSIZE_MODE_ANY_INT rounded up)
 precision?  Mike?

It can represent any - well, the RTX at least.  Code then using
simple wide_int may wreck then though.

Richard.


Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-11-19 Thread Martin Liška

On 11/18/2014 11:25 PM, Martin Jambor wrote:

On Tue, Nov 18, 2014 at 07:59:26PM +0100, Jan Hubicka wrote:

Hi,

On Tue, Nov 18, 2014 at 04:39:00PM +0100, Jan Hubicka wrote:

On Fri, Nov 14, 2014 at 08:59:10PM +0100, Jan Hubicka wrote:



b) with GTY, we cannot call destructor


Everything in symbol table is expecitely memory managed (i.e. enver left
to be freed by garbage collector). It resists in GTY only to allow linking
garbage collected object from them and to get PCH working.



Well, if I understand the intent correctly, summaries are for stuff
that is not in the symbol table.  For example jump functions are a

Correct.

vector of structures possibly containing trees, so everything has to
be in garbage collected memory.

When an edge is removed, it is necessary to be notified about it
immediately, for example to decrement rdesc_refcount (you might argue
that that should be done in a separate hook and not from within a
summary class but then you start to rely on hook invocation ordering
so I think it is better to eventually use the summaries for it too).


I do not see why ctors/dtors can not do the reference counting. In fact
this is how refcounting is done usually anyway?



Well, when there is no garbage collection involved then yes, that is
how you normally do it but in the GC case, there is the question of
what is the appropriate time to call destructor on garbage collected
data (like jump functions)?


I still fail to see problem here.  Summaries are explicitly managed- they are
constructed at summary construction time or when new callgarph node is
introduced/duplicated.  They are destroyed when callgarph node is destroyed or
whole summary is ddestroyed.  It is job of the summary datastructure to call
proper ctors/dtors, not job of garbage collector that provides the underlying
memory management.


I do not think that all summaries (in the meaning of a description of
one particular symbol table node or call graph edge) are explicitely
managed.  For example ipa_edge_args or ipa_agg_replacement_value
(which my alignment patch changes to ipcp_transformation_summary) are
allocated in GC memory because they contain trees.



If you have datastructure that points to something that is not
explicitly managed (i.e. tree expression), you just can not have
non-trivial constructor on that datastructure, because that is freed
transparently by gty that don't do destruction...


I admit to not being particularly bright today but that seems to be
exactly my point.


Well, in your case you have datastructure jump_function that contain a pointer
to tree (EXPR).  What I am trying to explain is that I see no reson why
jump_function needs to be POD.


I never said that the summary object needs to be a POD, I only said I
liked the possibility of storing very simple objects (without wrapping
them in classes with constructors and destructors).  That is of course
nothing more than my personal preference.


The tree pointed to by EXPR pointer can not
have a dtor by itself because GGC will not call it upon freeing.

It is true that jump_function lives in GGC memory (to make pointer to expr
work) but it never gets removed by ggc_collect because it is always pointed to
by the summary datastructure.  There are two ways to free the jump_function
datastructure.
   1) removing the symbol node it is attached to.
  Here the symtab code will call removal hook that was registered by 
container
  template. The container will call destructor of jump_function and the 
ggc_free
  its memory
   2) removing the summary.  In this case I would again expect the container
  template to walk all summaries and free them.

So even if your structure lives in GGC memory it is not really garbage
collected and thus the lack of machinery to call dtors at a time ggc decides to
free something is not a problem?

In fact looking at struct default_hashmap_traits, I see:

   /* Called to dispose of the key and value before marking the entry as
  deleted.  */

   templatetypename T static void remove (T v) { v.~T (); }


Now I see, I should have read your previous email more carefully, by
explicitely managed you mean that destructors will be called
explicitely by the summary infrastructure.  I was wondering how you
wanted to rip the summaries out of GGC memory.

Well, I suppose that would work, and since explicit calls to
destructors are basically the counterpart of placement new that we
already plan to use, it might be actually be the proper C++ thing to
do.

(I am not sure I like it though, for all other purposes the summary
objects will look like managed by the garbage collector and only we
who read this thread will know that the lifetime of the object would
be decoupled from the allocation-span of its memory).

Thanks for the clarification,

Martin


Hello.

I tried to come up with ctor/dtor solution for types passes to symbol_summary
template class.

Example:
struct inline_summary
{
  inline_summary (cgraph_node *node);
  

[PATCH] MIPS16/GCC: Optimise `__call_stub_fp_' call/return stubs

2014-11-19 Thread Maciej W. Rozycki
Hi,

 It has come to my attention that we create suboptimal code for the 
`__call_stub_fp_' variant of the MIPS16 call stubs.  These stubs are 
used for outgoing calls made from MIPS16 code to standard MIPS code that 
return floating-point results and may also pass floating-point 
arguments.

 This can be illustrated with this small program:

$ cat foo.c
double bar (double d);

int main (int argc, char **argv)
{
  return bar (argc);
}
$ mips-linux-gnu-gcc -O2 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   44856000mtc1a1,$f12
   4:   44846800mtc1a0,$f13
   8:   03e09021moves2,ra
   c:   0c00jal 0 __call_stub_fp_bar
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   0248jr  s2
  1c:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 main:
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 main
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 main
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 main
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see in `__call_stub_fp_bar' above the jump delay slot 
remains empty because the move to $s2 instruction cannot be scheduled 
there due to a data dependency on $ra.

 However there is no need to save $ra last and since the MIPS IV ISA 
there are no coprocessor move delay slots so the last MTC1 instruction 
could be scheduled there instead. 

 So here is a change to avoid this empty delay slot.  With this change 
in place we get this code instead:

$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   0c00jal 0 __call_stub_fp_bar
8: R_MIPS_26bar
   c:   44846800mtc1a0,$f13
  10:   4403mfc1v1,$f0
  14:   0248jr  s2
  18:   44020800mfc1v0,$f1

Disassembly of section .text.startup:

 main:
   0:   f100 64c4   save32,ra,s2
   4:   1800    jal 0 main
4: R_MIPS16_26  __mips16_floatsidf
   8:   6500nop
   a:   67a3movea1,v1
   c:   1800    jal 0 main
c: R_MIPS16_26  bar
  10:   6782movea0,v0
  12:   67a3movea1,v1
  14:   1800    jal 0 main
14: R_MIPS16_26 __mips16_fix_truncdfsi
  18:   6782movea0,v0
  1a:   f100 6444   restore 32,ra,s2
  1e:   e8a0jrc ra

-- as you can see the last MTC1 instruction has now been placed in the 
delay slot.

 For ISAs that do have a coprocessor move delay slot there is no gain, 
but no loss either, both delay slots remain empty due to data 
dependencies (coprocessor move delay slots):

$ mips-linux-gnu-gcc -O2 -mips1 -mips16 -c foo.c
$ mips-linux-gnu-objdump -dr foo.o

foo.o: file format elf32-tradbigmips

Disassembly of section .mips16.call.fp.bar:

 __call_stub_fp_bar:
   0:   03e09021moves2,ra
   4:   44856000mtc1a1,$f12
   8:   44846800mtc1a0,$f13
   c:   0c00jal 0 __call_stub_fp_bar
c: R_MIPS_26bar
  10:   nop
  14:   4403mfc1v1,$f0
  18:   44020800mfc1v0,$f1
  1c:   0248jr  s2
  20:   nop

Disassembly of section .text.startup:

 main:
   0:   63fcaddiu   sp,-32
   2:   6772movev1,s2
   4:   6207sw  ra,28(sp)
   6:   1800    jal 0 main
6: R_MIPS16_26  __mips16_floatsidf
   a:   d306sw  v1,24(sp)
   c:   67a3movea1,v1
   e:   1800    jal 0 main
e: R_MIPS16_26  bar
  12:   6782movea0,v0
  14:   67a3movea1,v1
  16:   1800    jal 0 main
16: R_MIPS16_26 __mips16_fix_truncdfsi
  1a:   6782movea0,v0
  1c:   9606lw  a2,24(sp)
  1e:   9707lw  a3,28(sp)
  20:   6556moves2,a2
  22:   ef00jr  a3
  24:   6304addiu   sp,32
  26:   6500nop

-- though I think this consideration is actually academic as no 

[PATCH] Fix PR63677 and a regression from alias-improvements

2014-11-19 Thread Richard Biener

PR63677 shows that late complete unrolling can expose quite some memory
CSE opportunities (and followup simplifications) which we fail to
exploit because currently DOMs ability to do memory CSE is quite
limited (it requires the redundant load to optimize to directly follow the
load or store).  This is actually a regression from before
alias-improvements, thus to the time where we had multiple virtual
operand symbols which made us handle some cases better than now
with a single memory operand symbol.

The following patch makes DOM use the alias walker to fix that.

The easiest way to do that is to no longer hash virtual operands
and use the expression hash lookup result as candidate if
the expression involves memory.  We can then verify if the candidate
can be used by using the alias walker to walk to the candidate.
If there is an intermediate possibly aliasing stmt we simply
replace the hashtable entry and set up enough info to be able
to rewind to the previous state during unwinding.

This means the DOM implementation of memory CSE is a tad bit more
efficient than that of SCCVN.  But it is also very simplistic as
it requires 1:1 requivalence for the memory access structure
(as compared to SCCVN which can look through various kinds of
aggregate copies or initializations as well as handling accessing
the same memory location with different memory access structures).
This is also the reason that the patch doesn't fix PR63864 which
requires us to look through aggregate copies.

To fix PR63677 I also had to teach DOM the trick of propagating
a + CST as MEM[a, CST] which is used by all other propagators.

Given how easy this was to restore pre-alias-improvements state
I wonder why I didn't think of this before...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?

Thanks,
Richard.

2014-11-19   Richard Biener  rguent...@suse.de

PR tree-optimization/63677
* tree-ssa-dom.c: Include gimplify.h for unshare_expr.
(avail_exprs_stack): Make a vector of pairs.
(struct hash_expr_elt): Replace stmt member with vop member.
(expr_elt_hasher::equal): Simplify.
(initialize_hash_element): Adjust.
(initialize_hash_element_from_expr): Likewise.
(dom_opt_dom_walker::thread_across_edge): Likewise.
(record_cond): Likewise.
(dom_opt_dom_walker::before_dom_children): Likewise.
(print_expr_hash_elt): Likewise.
(remove_local_expressions_from_table): Restore previous state
if requested.
(record_equivalences_from_stmt): Record x + CST as constant
MEM[x, CST] for further propagation.
(vuse_eq): New function.
(lookup_avail_expr): For loads use the alias oracle to see
whether a candidate from the expr hash is usable.
(avail_expr_hash): Do not hash VUSEs.

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-dom-cse-3.c: Likewise.

Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c.orig 2014-11-19 13:28:46.123015743 +0100
--- gcc/tree-ssa-dom.c  2014-11-19 13:58:33.122937543 +0100
*** along with GCC; see the file COPYING3.
*** 66,71 
--- 66,72 
  #include tree-ssa-threadedge.h
  #include tree-ssa-dom.h
  #include inchash.h
+ #include gimplify.h
  
  /* This file implements optimizations on the dominator tree.  */
  
*** struct edge_info
*** 139,145 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vecexpr_hash_elt_t avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
--- 140,146 
 marker.  */
  typedef struct expr_hash_elt * expr_hash_elt_t;
  
! static vecstd::pairexpr_hash_elt_t, expr_hash_elt_t  avail_exprs_stack;
  
  /* Structure for entries in the expression hash table.  */
  
*** struct expr_hash_elt
*** 151,158 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The stmt pointer if this element corresponds to a statement.  */
!   gimple stmt;
  
/* The hash value for RHS.  */
hashval_t hash;
--- 152,160 
/* The expression (rhs) we want to record.  */
struct hashable_expr expr;
  
!   /* The virtual operand associated with the nearest dominating stmt
!  loading from or storing to expr.  */
!   tree vop;
  
/* The hash value for RHS.  */
hashval_t hash;
*** expr_elt_hasher::hash (const value_type
*** 187,196 
  inline bool
  expr_elt_hasher::equal (const value_type p1, const compare_type p2)
  {
-   gimple stmt1 = p1-stmt;
const struct hashable_expr *expr1 = p1-expr;
const struct expr_hash_elt *stamp1 = p1-stamp;
-   gimple stmt2 = p2-stmt;
const struct hashable_expr *expr2 = p2-expr;
const struct expr_hash_elt *stamp2 = p2-stamp;
  
--- 189,196 
*** expr_elt_hasher::equal (const value_type
*** 198,222 

[PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel

Hello Ramana,

Here is the attribute revisited after your comments, so

 - thumb1 is now supported
 - -mflip-thump option added for testing.
 - inlining is allowed between modes.

This set of patches was tested on rev#217709 as:

no regressions with:
arm-sim/
arm-sim/-march=armv7-a
arm-sim/-mthumb
arm-sim/-mthumb/-march=armv7-a

a few artifacts, all of them analyzed, with
arm-sim/-mflip-thumb/
arm-sim/-mflip-thumb//-march=armv7-a
arm-sim/-mflip-thumb//-mthumb
arm-sim/-mflip-thumb//-mthumb/-march=armv7-a


Artifacts are analyzed, they are mostly the fault of the testsuite being 
unable to mix modes in the expected results (e.g thumb[1,2] or arm tests 
predicates, mix setjmp/longjump between modes...)


The support is split as followed:

[1/6] Reorganized arm_option_override to dynamically redefine the flags 
depending on the attribute mode.


[2/6] Reorganized macro settings to be set/unset for each #pragma targets

[3/6] Change ARM_DECLARE_FUNCTION_NAME into a function

[4/6] Implement hooks to support attribute target

[5/6] Implement #pragma target

[6/6] Add -mflip-thumb option for testing

I think I missed the stage3, Anyway would it be OK for stage1 when it 
reopens ?


Christian



RE: [PATCH] MIPS/GCC: Unconditional jump generation bug fix

2014-11-19 Thread Maciej W. Rozycki
On Tue, 18 Nov 2014, Matthew Fortune wrote:

   I admit to being a bit more nervous about 4.9 but the test coverage
   seems thorough enough. I guess I would have been less concerned if the
   optimisation was still just tied to TARGET_MICROMIPS for the 4.9
  branch.
  
   Catherine, what do you think?
  
  This is okay for 4.9 IMO.
 
 OK

 FWIW we've been using this change since Oct 2012 with no issues (as I 
noted it was meant to be included with the original microMIPS support 
submission, but was lost in transit) and also GAS has code to relax 
out-of-range branches to jumps in non-PIC standard MIPS code under the 
same condition this RTL insn uses, so even if a wrong branch slipped 
through here (which it doesn't), then GAS would fix it up.

 See gas/config/tc-mips.c (md_apply_fix) BFD_RELOC_16_PCREL_S2 for the 
relaxation piece if interested.

  Maciej


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Moore, Catherine


 -Original Message-
 From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
 Sent: Tuesday, November 18, 2014 9:42 AM
 To: Andrew Bennett; gcc-patches@gcc.gnu.org
 Cc: Moore, Catherine; Rozycki, Maciej
 Subject: RE: [PATCH] If using branch likelies in MIPS sync code fill the delay
 slot with a nop
 
  The atomic-compare-exchange-3.c and atomic-op-3.c tests are failing
  when using the -mfix-r1 option.  This is due to the fact that the
  delay slot of the branch instruction that checks if the atomic
  operation was not successful can be filled with an operation that
  returns the output result.  For the branch likely case this operation
  can not go in the delay slot because it wont execute when the atomic
  operation was successful and therefore the wrong result will be
  returned.  This patch forces a nop to be placed in the delay slot if a 
  branch
 likely is used.
  The fix is as simple as possible; it does cause a second nop to be
  introduced after the branch instruction in the branch likely case.  As
  this option is rarely used, it makes more sense to keep the code
  readable than trying to optimise it.
 
  The atomic tests now pass successfully.  The ChangeLog and patch are
  below.
 
  Ok to commit?
 
 I'm OK with just making the fix-r1 case safe rather than also complicating
 the normal code path to remove the normal delay slot NOP in this special
 case.  I'd like to see what Catherine thinks though.  To remove the second
 NOP I believe we would have to add a !TARGET_FIX_R1 to the condition
 of the normal delay slot NOP. This seems a bit convoluted when the real
 reason is because branch likely is in use.  To make use of the
 mips_branch_likely flag it would have to be set earlier in:
 mips_output_sync_loop and also get set in mips_sync_loop_insns.
 
 If Catherine thinks getting rid of the second NOP is important enough then
 please base it on mips_branch_likely and fix the callers.
 
  Yes, removing the second NOP is worth the additional effort.


[PATCH, ARM] attribute target (thumb,arm) [1/6]

2014-11-19 Thread Christian Bruel

In preparation of the target attribute,

reorganize ´arm_option_override´ into 3 entities:
arm_option_override_internal_p
arm_option_check_internal
arm_option_param_internal

Also define and use TREE_TARGET macros instead of file-scope variables 
in the machine description.


Thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (arm_option_override): Reoganized and split.
	(arm_option_params_internal); New function.
	(arm_option_check_internal): New function.
	(arm_option_override_internal): New function.
	(restrict_default): New boolean.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.h (TREE_TARGET_THUMB, TREE_TARGET_THUMB1): New macros.
	(TREE_TARGET_THUM2, TREE_TARGET_ARM): Likewise.
	(thumb_code, thumb1_code): Remove.
	* config/arm/arm.md (is_thumb, is_thumb1): Check TARGET flag.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2014-11-18 08:35:38.0 +0100
+++ b/gcc/gcc/config/arm/arm.c	2014-11-18 09:11:19.0 +0100
@@ -744,6 +744,9 @@ const struct arm_fpu_desc *arm_fpu_desc;
 rtx thumb_call_via_label[14];
 static int thumb_call_reg_needed;
 
+/* Remember default option is used.  */
+static bool restrict_default;
+
 /* Bit values used to identify processor capabilities.  */
 #define FL_CO_PROC(1  0)/* Has external co-processor bus */
 #define FL_ARCH3M (1  1)/* Extended multiply */
@@ -886,12 +889,6 @@ int arm_tune_wbuf = 0;
 /* Nonzero if tuning for Cortex-A9.  */
 int arm_tune_cortex_a9 = 0;
 
-/* Nonzero if generating Thumb instructions.  */
-int thumb_code = 0;
-
-/* Nonzero if generating Thumb-1 instructions.  */
-int thumb1_code = 0;
-
 /* Nonzero if we should define __THUMB_INTERWORK__ in the
preprocessor.
XXX This is a bit of a hack, it's intended to help work around
@@ -2601,6 +2598,136 @@ arm_gimplify_va_arg_expr (tree valist, t
   return std_gimplify_va_arg_expr (valist, type, pre_p, post_p);
 }
 
+/* Check any incompatible options that the user has specified.  */
+static void
+arm_option_check_internal (struct gcc_options *opts)
+{
+  /* Make sure that the processor choice does not conflict with any of the
+ other command line choices.  */
+  if (TREE_TARGET_ARM (opts)  !(insn_flags  FL_NOTM))
+error (target CPU does not support ARM mode);
+
+  /* TARGET_BACKTRACE calls leaf_function_p, which causes a crash if done
+ from here where no function is being compiled currently.  */
+  if ((TARGET_TPCS_FRAME || TARGET_TPCS_LEAF_FRAME)  TREE_TARGET_ARM (opts))
+warning (0, enabling backtrace support is only meaningful when compiling for the Thumb);
+
+  if (TREE_TARGET_ARM (opts)  TARGET_CALLEE_INTERWORKING)
+warning (0, enabling callee interworking support is only meaningful when compiling for the Thumb);
+
+  /* If this target is normally configured to use APCS frames, warn if they
+ are turned off and debugging is turned on.  */
+  if (TREE_TARGET_ARM (opts)
+   write_symbols != NO_DEBUG
+   !TARGET_APCS_FRAME
+   (TARGET_DEFAULT  MASK_APCS_FRAME))
+warning (0, -g with -mno-apcs-frame may not give sensible debugging);
+
+  /* iWMMXt unsupported under Thumb mode.  */
+  if (TREE_TARGET_THUMB (opts)  TARGET_IWMMXT)
+error (iWMMXt unsupported under Thumb mode);
+
+  if (TARGET_HARD_TP  TREE_TARGET_THUMB1 (opts))
+error (can not use -mtp=cp15 with 16-bit Thumb);
+
+  if (TREE_TARGET_THUMB (opts)  TARGET_VXWORKS_RTP  flag_pic)
+{
+  error (RTP PIC is incompatible with Thumb);
+  flag_pic = 0;
+}
+
+  /* We only support -mslow-flash-data on armv7-m targets.  */
+  if (target_slow_flash_data
+   ((!(arm_arch7  !arm_arch_notm)  !arm_arch7em)
+	  || (TREE_TARGET_THUMB1 (opts) || flag_pic || TARGET_NEON)))
+error (-mslow-flash-data only supports non-pic code on armv7-m targets);
+}
+
+/* Check any params depending on attributes that the user has specified.  */
+static void
+arm_option_params_internal (struct gcc_options *opts)
+{
+ /* If we are not using the default (ARM mode) section anchor offset
+ ranges, then set the correct ranges now.  */
+  if (TREE_TARGET_THUMB1 (opts))
+{
+  /* Thumb-1 LDR instructions cannot have negative offsets.
+ Permissible positive offset ranges are 5-bit (for byte loads),
+ 6-bit (for halfword loads), or 7-bit (for word loads).
+ Empirical results suggest a 7-bit anchor range gives the best
+ overall code size.  */
+  targetm.min_anchor_offset = 0;
+  targetm.max_anchor_offset = 127;
+}
+  else if (TREE_TARGET_THUMB2 (opts))
+{
+  /* The minimum is set such that the total size of the block
+ for a particular anchor is 248 + 1 + 4095 bytes, which is
+ divisible by eight, ensuring natural spacing of anchors.  */
+  targetm.min_anchor_offset = -248;
+  targetm.max_anchor_offset = 4095;
+}
+  else
+{
+  

Re: [PING^4][PATCH] Warn about unclosed pragma omp declare target.

2014-11-19 Thread Ilya Tocar
As omp target and offloading support is committed to trunk,
I think it's reasonable to add some new warnings.

On 06 Nov 15:27, Ilya Tocar wrote:
 Ping.
 On 30 Oct 18:31, Ilya Tocar wrote:
  Ping.
  On 20 Oct 19:26, Ilya Tocar wrote:
   Ping.
   
   On 02 Oct 17:38, Ilya Tocar wrote:
Ping.
On 15 Aug 16:26, Ilya Tocar wrote:
 Ping.
 
 On 29 Jul 18:45, Ilya Tocar wrote:
  Hi,
  
  As discussed here in 
  https://gcc.gnu.org/ml/gcc/2014-01/msg00189.html
  Gcc should complain about pragma omp declare target without
  corresponding pragma omp end declare target. This patch adds a 
  warning
  for those cases.
  Bootstraps/passes make-check.
  Ok for trunk?
  
  ChangeLog:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * c-decl.c (omp_declare_target_location_stack): New.
  * c-lang.h (omp_declare_target_location_stack): Declare.
  * c-parser.c (warn_unclosed_pragma_omp_target): New.
  (c_parser_translation_unit): Call it.
  (c_parser_omp_declare_target): Remeber location.
  (c_parser_omp_end_declare_target): Forget location.
  
  And ChangeLog for testsuite:
  
  2014-07-29  Ilya Tocar  ilya.to...@intel.com
  
  * gcc.dg/gomp//target-3.c: New testcase.
  
  ---
   gcc/c/c-decl.c   |  3 +++
   gcc/c/c-lang.h   |  3 +++
   gcc/c/c-parser.c | 22 +-
   gcc/testsuite/gcc.dg/gomp/target-3.c | 33 
  +
   4 files changed, 60 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/gcc.dg/gomp/target-3.c
  
  diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
  index 2a4b439..2dd5b2c 100644
  --- a/gcc/c/c-decl.c
  +++ b/gcc/c/c-decl.c
  @@ -158,6 +158,9 @@ enum machine_mode c_default_pointer_mode = 
  VOIDmode;
   /* If non-zero, implicit omp declare target attribute is added 
  into the
  attribute lists.  */
   int current_omp_declare_target_attribute;
  +
  +/* Holds locations of currently open omp declare target pragmas. 
   */
  +veclocation_t omp_declare_target_location_stack;
   
   /* Each c_binding structure describes one binding of an identifier 
  to
  a decl.  All the decls in a scope - irrespective of namespace - 
  are
  diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h
  index e974906..cef995c 100644
  --- a/gcc/c/c-lang.h
  +++ b/gcc/c/c-lang.h
  @@ -59,4 +59,7 @@ struct GTY(()) language_function {
  attribute lists.  */
   extern GTY(()) int current_omp_declare_target_attribute;
   
  +/* Holds locations of currently open omp declare target pragmas. 
   */
  +extern veclocation_t omp_declare_target_location_stack;
  +
   #endif /* ! GCC_C_LANG_H */
  diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
  index e32bf04..0b96fe9 100644
  --- a/gcc/c/c-parser.c
  +++ b/gcc/c/c-parser.c
  @@ -1255,6 +1255,8 @@ static bool c_parser_cilk_verify_simd 
  (c_parser *, enum pragma_context);
   static tree c_parser_array_notation (location_t, c_parser *, tree, 
  tree);
   static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, 
  bool);
   
  +static void warn_unclosed_pragma_omp_target ();
  +
   /* Parse a translation unit (C90 6.7, C99 6.9).
   
  translation-unit:
  @@ -1290,6 +1292,8 @@ c_parser_translation_unit (c_parser *parser)
  }
 while (c_parser_next_token_is_not (parser, CPP_EOF));
   }
  +
  +  warn_unclosed_pragma_omp_target ();
   }
   
   /* Parse an external declaration (C90 6.7, C99 6.9).
  @@ -13068,8 +13072,10 @@ c_finish_omp_declare_simd (c_parser 
  *parser, tree fndecl, tree parms,
   static void
   c_parser_omp_declare_target (c_parser *parser)
   {
  +  location_t loc = c_parser_peek_token (parser)-location;
 c_parser_skip_to_pragma_eol (parser);
 current_omp_declare_target_attribute++;
  +  omp_declare_target_location_stack.safe_push (loc);
   }
   
   static void
  @@ -13104,7 +13110,10 @@ c_parser_omp_end_declare_target (c_parser 
  *parser)
   error_at (loc, %#pragma omp end declare target% without 
  corresponding 
 %#pragma omp declare target%);
 else
  -current_omp_declare_target_attribute--;
  +{
  +  current_omp_declare_target_attribute--;
  +  omp_declare_target_location_stack.pop ();
  +}
   }
   
   
  @@ -14267,4 +14276,15 @@ c_parser_array_notation (location_t loc, 
  c_parser *parser, tree initial_index,
 return value_tree;
   }
   
  +static void
  +warn_unclosed_pragma_omp_target ()
  +{
  +  int i;
  +  

[PATCH, ARM] attribute target (thumb,arm) [2/6]

2014-11-19 Thread Christian Bruel

In preparation of the pragma target

reorganize ´TARGET_CPU_CPP_BUILTINS´ to redefine mode dependent macros 
based on current thumb_p.


Thanks,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-c.c (cpp_def_or_undef): New functions.
	(arm_cpp_builtins): Likewise.
	* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Move mode dependant 
	macros to arm_cpp_builtins.
	* config/arm/arm-protos.h (arm_cpp_builtins): Declare.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm-c.c c/gcc/gcc/config/arm/arm-c.c
--- b/gcc/gcc/config/arm/arm-c.c	2014-11-05 14:34:49.0 +0100
+++ c/gcc/gcc/config/arm/arm-c.c	2014-11-13 11:16:59.0 +0100
@@ -42,3 +42,73 @@ arm_lang_object_attributes_init (void)
 {
   arm_lang_output_object_attributes_hook = arm_output_c_attributes;
 }
+
+/* Define or undefine macro.  */
+
+static void
+cpp_def_or_undef (struct cpp_reader *in, const char *str, bool def_p)
+{
+  if (def_p)
+cpp_define (in, str);
+  else
+cpp_undef (in, str);
+}
+
+/* Define or undefine macros based on the current target.  If the user does
+   #pragma GCC target, we need to adjust the macros dynamically.  */
+
+void
+arm_cpp_builtins (struct cpp_reader *in, bool thumb_p)
+{
+  bool target_32bit_p = !thumb_p || arm_arch_thumb2;
+  bool thumb2_p = thumb_p  arm_arch_thumb2;
+  bool have_ldrex_p = (arm_arch6  !thumb_p) || arm_arch7;
+  bool have_ldrexbh_p = (arm_arch6k  !thumb_p) || arm_arch7;
+  bool have_ldrexd_p = ((arm_arch6k  !thumb_p) || arm_arch7)
+ arm_arch_notm;
+
+  int arm_feature_ldrex = (have_ldrex_p ? 4 : 0)
+| (have_ldrexbh_p ? 3 : 0) | (have_ldrexd_p ? 8 : 0);
+
+  cpp_def_or_undef (in, __thumb__, thumb_p);
+  if (arm_arch_thumb2)
+cpp_def_or_undef (in, __thumb2__, thumb_p);
+  if (TARGET_BIG_END)
+cpp_def_or_undef (in, __THUMBEB__, thumb_p);
+  else
+cpp_def_or_undef (in, __THUMBEL__, thumb_p);
+
+  cpp_def_or_undef (in, __ARM_32BIT_STATE, target_32bit_p); /* TARGET_32BIT  */
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7))   /* TARGET_ARM_QBIT  */
+cpp_def_or_undef (in, __ARM_FEATURE_QBIT, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7))/* TARGET_ARM_SAT  */
+cpp_def_or_undef (in, __ARM_FEATURE_SAT, target_32bit_p);
+
+  if (arm_arch5e  (arm_arch_notm || arm_arch7em)) /* TARGET_DSP_MULTIPLY  */
+cpp_def_or_undef (in, __ARM_FEATURE_DSP, target_32bit_p);
+
+  if (arm_arch6  (arm_arch_notm || arm_arch7em))  /* TARGET_INT_SIMD  */
+cpp_def_or_undef (in, __ARM_FEATURE_SIMD32, target_32bit_p);
+
+ /* TARGET_IDIV  */
+  cpp_def_or_undef (in, __ARM_ARCH_EXT_IDIV__,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+  cpp_def_or_undef (in, __ARM_FEATURE_IDIV,
+		((!thumb_p  arm_arch_arm_hwdiv)
+		 || (thumb2_p  arm_arch_thumb_hwdiv)));
+
+ if (arm_feature_ldrex)
+   cpp_define_formatted (in, __ARM_FEATURE_LDREX=%d, arm_feature_ldrex);
+ else
+   cpp_undef (in, __ARM_FEATURE_LDREX);
+
+ cpp_def_or_undef (in, __ARM_FEATURE_CLZ,
+		   ((TARGET_ARM_ARCH = 5  !thumb_p) || TARGET_ARM_ARCH_ISA_THUMB =2));
+
+ cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
+}
+
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN b/gcc/gcc/config/arm/arm.h c/gcc/gcc/config/arm/arm.h
--- b/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:50.0 +0100
+++ c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
@@ -48,29 +48,12 @@ extern char arm_arch_name[];
 #define TARGET_CPU_CPP_BUILTINS()			\
   do			\
 {			\
-	if (TARGET_DSP_MULTIPLY)			\
-	   builtin_define (__ARM_FEATURE_DSP);	\
-if (TARGET_ARM_QBIT)\
-   builtin_define (__ARM_FEATURE_QBIT);	\
-if (TARGET_ARM_SAT)\
-   builtin_define (__ARM_FEATURE_SAT);	\
 if (TARGET_CRYPTO)\
 	   builtin_define (__ARM_FEATURE_CRYPTO);	\
 	if (unaligned_access)\
 	  builtin_define (__ARM_FEATURE_UNALIGNED);	\
 	if (TARGET_CRC32)\
 	  builtin_define (__ARM_FEATURE_CRC32);	\
-	if (TARGET_32BIT)\
-	  builtin_define (__ARM_32BIT_STATE);		\
-	if (TARGET_ARM_FEATURE_LDREX)\
-	  builtin_define_with_int_value (			\
-	__ARM_FEATURE_LDREX, TARGET_ARM_FEATURE_LDREX);	\
-	if ((TARGET_ARM_ARCH = 5  !TARGET_THUMB)		\
-	 || TARGET_ARM_ARCH_ISA_THUMB =2)			\
-	  builtin_define (__ARM_FEATURE_CLZ);			\
-	if (TARGET_INT_SIMD)	\
-	  builtin_define (__ARM_FEATURE_SIMD32);		\
-\
 	builtin_define_with_int_value (\
 	  __ARM_SIZEOF_MINIMAL_ENUM,\
 	  flag_short_enums ? 1 : 4);\
@@ -89,10 +72,6 @@ extern char arm_arch_name[];
 	if (arm_arch_notm)\
 	  builtin_define (__ARM_ARCH_ISA_ARM);	\
 	builtin_define (__APCS_32__);			\
-	if (TARGET_THUMB)\
-	  builtin_define (__thumb__);			\
-	if (TARGET_THUMB2)\
-	  builtin_define (__thumb2__);		\
 	if (TARGET_ARM_ARCH_ISA_THUMB)			\
 	  builtin_define_with_int_value (		\
 	

[PATCH, ARM] attribute target (thumb,arm) [3/6]

2014-11-19 Thread Christian Bruel
Re-implement ARM_DECLARE_FUNCTION_NAME as a function. That will make 
changed related to unified/divided and mode directives easier to insert.


Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm-protos.h (arm_declare_function_name): Declare.
	(is_called_in_ARM_mode): Remove.
	* config/arm/arm.c (is_called_in_ARM_mode): Declare static bool.
	(arm_declare_function_name): Moved from from ARM_DECLARE_FUNCTION_NAME.
	* config/arm/arm.h (ARM_DECLARE_FUNCTION_NAME): Call
	 arm_declare_function_name.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.c d/gcc/gcc/config/arm/arm.c
--- c/gcc/gcc/config/arm/arm.c	2014-11-18 08:52:48.0 +0100
+++ d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
@@ -26422,6 +26422,23 @@ thumb_pop (FILE *f, unsigned long mask)
   fprintf (f, }\n);
 }
 
+/* Return nonzero if FUNC must be entered in ARM mode.  */
+static bool
+is_called_in_ARM_mode (tree func)
+{
+  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
+
+  /* Ignore the problem about functions whose address is taken.  */
+  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
+return true;
+
+#ifdef ARM_PE
+  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
+#else
+  return false;
+#endif
+}
+
 /* Generate code to return from a thumb function.
If 'reg_containing_return_addr' is -1, then the return address is
actually on the stack, at the stack pointer.  */
@@ -26857,22 +26874,6 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
-/* Return nonzero if FUNC must be entered in ARM mode.  */
-int
-is_called_in_ARM_mode (tree func)
-{
-  gcc_assert (TREE_CODE (func) == FUNCTION_DECL);
-
-  /* Ignore the problem about functions whose address is taken.  */
-  if (TARGET_CALLEE_INTERWORKING  TREE_PUBLIC (func))
-return TRUE;
-
-#ifdef ARM_PE
-  return lookup_attribute (interfacearm, DECL_ATTRIBUTES (func)) != NULL_TREE;
-#else
-  return FALSE;
-#endif
-}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -32386,6 +32387,25 @@ arm_is_constant_pool_ref (rtx x)
 	   CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
 }
 
+void
+arm_declare_function_name (FILE *stream, const char *name, tree decl)
+{
+  if (TARGET_THUMB)
+{
+  if (is_called_in_ARM_mode (decl)
+	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY
+	   cfun-is_thunk))
+	fprintf (stream, \t.code 32\n);
+  else if (TARGET_THUMB1)
+	fprintf (stream, \t.code\t16\n\t.thumb_func\n);
+  else
+	fprintf (stream, \t.thumb\n\t.thumb_func\n);
+}
+
+  if (TARGET_POKE_FUNCTION_NAME)
+arm_poke_function_name (stream, (const char *) name);
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET.  */
@@ -32506,4 +32526,5 @@ arm_sched_fusion_priority (rtx_insn *ins
   *pri = tmp;
   return;
 }
+
 #include gt-arm.h
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm.h d/gcc/gcc/config/arm/arm.h
--- c/gcc/gcc/config/arm/arm.h	2014-11-13 12:16:37.0 +0100
+++ d/gcc/gcc/config/arm/arm.h	2014-11-13 12:19:45.0 +0100
@@ -2184,23 +2184,7 @@ extern int making_const_table;
? 1 : 0)
 
 #define ARM_DECLARE_FUNCTION_NAME(STREAM, NAME, DECL) 	\
-  do			\
-{			\
-  if (TARGET_THUMB) \
-{		\
-  if (is_called_in_ARM_mode (DECL)		\
-	  || (TARGET_THUMB1  !TARGET_THUMB1_ONLY	\
-		   cfun-is_thunk))	\
-fprintf (STREAM, \t.code 32\n) ;		\
-  else if (TARGET_THUMB1)			\
-   fprintf (STREAM, \t.code\t16\n\t.thumb_func\n) ;	\
-  else		\
-   fprintf (STREAM, \t.thumb\n\t.thumb_func\n) ;	\
-}		\
-  if (TARGET_POKE_FUNCTION_NAME)			\
-arm_poke_function_name (STREAM, (const char *) NAME);	\
-}			\
-  while (0)
+  arm_declare_function_name ((STREAM), (NAME), (DECL));
 
 /* For aliases of functions we use .thumb_set instead.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL1, DECL2)		\
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN c/gcc/gcc/config/arm/arm-protos.h d/gcc/gcc/config/arm/arm-protos.h
--- c/gcc/gcc/config/arm/arm-protos.h	2014-11-13 11:16:17.0 +0100
+++ d/gcc/gcc/config/arm/arm-protos.h	2014-11-13 12:21:36.0 +0100
@@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsig
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void arm_declare_function_name (FILE *, const char *, tree);
 extern void thumb2_expand_return (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
@@ -178,9 +179,6 @@ extern const char *thumb1_unexpanded_epi
 extern void 

Re: Stream out default optimization nodes

2014-11-19 Thread H.J. Lu
On Tue, Nov 18, 2014 at 9:29 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Tue, Nov 18, 2014 at 9:27 AM, Jan Hubicka hubi...@ucw.cz wrote:
  https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00473.html
 
  /export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/xg++
  -B/export/gnu/import/git/gcc-test-profiled/bld/./prev-gcc/
  -B/usr/5.0.0/x86_64-unknown-linux-gnu/bin/ -nostdinc++
  -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
  -B/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
   
  -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu
   
  -I/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include
   
  -I/export/gnu/import/git/gcc-test-profiled/src-trunk/libstdc++-v3/libsupc++
  -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs
  -L/export/gnu/import/git/gcc-test-profiled/bld/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
-g -O2 -flto=jobserver -frandom-seed=1 -fprofile-use -DIN_GCC
  -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
  -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute
  -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
  -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE
  -static-libstdc++ -static-libgcc  -o build/genmatch \
  build/genmatch.o ../libcpp/libcpp.a ../libiberty/libiberty.a
  build/errors.o build/vec.o build/hash-table.o
  .././libiberty/libiberty.a
  ../../src-trunk/libcpp/lex.c: In function âend_directiveâ:
  ../../src-trunk/libcpp/lex.c:442:43: error:
  â__builtin_ia32_pcmpestri128â needs isa option -m32 -msse4.2
 index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 ^
  make[7]: *** [/tmp/ccTC6Hk9.ltrans9.ltrans.o] Error 1
 
  Indeed, it looks like the bug is that search_line_sse42 gets inlined int
  end_directive that is compiled w/o SSE support.  Probably something that
  happened previously, too, just led to compiling the function with
  SSE4.2
 
  I will need to setup -m32 LTO bootstrap enviornment...
 

 This is -m64 LTO, not -m32.
 OK then the message seems bogus, too.  I will try to reproduce it.


I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63963

-- 
H.J.


[PATCH, ARM] attribute target (thumb,arm) [4/6]

2014-11-19 Thread Christian Bruel

Implements and document the hooks to support target_attributes.

The emission of blx is handled directly for armv5 to overcome a bug with 
the current binutils that fails with calls to a static symbol in a 
different section. (e.g .text - .text.startup) in different modes.

(ref https://sourceware.org/bugzilla/show_bug.cgi?id=17505)

Regtests included

Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.opt (mthumb): Save.
	* config/arm/arm.h (arm_valid_target_attribute_tree): Declare.
	(arm_reset_previous_fndecl, arm_change_mode_p): Likewise.
	(SWITCHABLE_TARGET): Define.
	* config/arm/arm.c (arm_reset_previous_fndecl): New functions.
	(arm_valid_target_attribute_tree, arm_change_mode_p): Likewise.
	(arm_valid_target_attribute_p): Likewise.
	(arm_set_current_function, arm_can_inline_p): Likewise.
	(arm_valid_target_attribute_rec): Likewise.
	(arm_previous_fndecl): New variable.
	(TARGET_SET_CURRENT_FUNCTION, TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
	(TARGET_CAN_INLINE_P): Define.
	(arm_asm_trampoline_template): Emit mode.
	(arm_file_start): Don't set unified syntax.
	(arm_declare_function_name): Set unified syntax and mode.
	(arm_option_override): Init target_option_default_node.
	and target_option_current_node.
	* config/arm/arm.md (*call_value_symbol): Set mode when possible.
	(*call_symbol): Likewise.
	* doc/extend.texi: Document ARM target and pragma attribute.
	* doc/invoke.texi: Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN d/gcc/gcc/config/arm/arm.c e/gcc/gcc/config/arm/arm.c
--- d/gcc/gcc/config/arm/arm.c	2014-11-18 08:51:50.0 +0100
+++ e/gcc/gcc/config/arm/arm.c	2014-11-18 09:05:57.0 +0100
@@ -80,6 +80,7 @@
 #include opts.h
 #include dumpfile.h
 #include gimple-expr.h
+#include target-globals.h
 #include builtins.h
 #include tm-constrs.h
 #include rtl-iter.h
@@ -258,6 +259,9 @@ static tree arm_build_builtin_va_list (v
 static void arm_expand_builtin_va_start (tree, rtx);
 static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *);
 static void arm_option_override (void);
+static void arm_set_current_function (tree);
+static bool arm_can_inline_p (tree, tree);
+static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -400,6 +404,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE arm_output_function_epilogue
 
+#undef TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P arm_can_inline_p
+
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE arm_option_override
 
@@ -412,6 +419,12 @@ static const struct attribute_spec arm_a
 #undef  TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST arm_adjust_cost
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION arm_set_current_function
+
+#undef TARGET_OPTION_VALID_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_ATTRIBUTE_P arm_valid_target_attribute_p
+
 #undef TARGET_SCHED_REORDER
 #define TARGET_SCHED_REORDER arm_sched_reorder
 
@@ -3205,6 +3218,11 @@ arm_option_override (void)
 
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
+
+  /* Save the initial options in case the user does function specific
+ options.  */
+  target_option_default_node = target_option_current_node
+= build_target_option_node (global_options);
 }
 
 static void
@@ -3358,13 +3376,20 @@ arm_warn_func_return (tree decl)
 static void
 arm_asm_trampoline_template (FILE *f)
 {
+  if (TARGET_UNIFIED_ASM)
+fprintf (f, \t.syntax unified\n);
+  else
+fprintf (f, \t.syntax divided\n);
+
   if (TARGET_ARM)
 {
+  fprintf (f, \t.arm\n);
   asm_fprintf (f, \tldr\t%r, [%r, #0]\n, STATIC_CHAIN_REGNUM, PC_REGNUM);
   asm_fprintf (f, \tldr\t%r, [%r, #0]\n, PC_REGNUM, PC_REGNUM);
 }
   else if (TARGET_THUMB2)
 {
+  fprintf (f, \t.thumb\n);
   /* The Thumb-2 trampoline is similar to the arm implementation.
 	 Unlike 16-bit Thumb, we enter the stub in thumb mode.  */
   asm_fprintf (f, \tldr.w\t%r, [%r, #4]\n,
@@ -26874,6 +26899,23 @@ thumb_far_jump_used_p (void)
   return 0;
 }
 
+/* Check that FUNC is called with a different mode.  */
+
+bool
+arm_change_mode_p (tree func)
+{
+  if (TREE_CODE (func) != FUNCTION_DECL)
+return false;
+
+  tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (func);
+
+  if (!callee_tree)
+callee_tree = target_option_default_node;
+
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+
+  return (TREE_TARGET_THUMB (callee_opts) != TARGET_THUMB);
+}
 
 /* Given the stack offsets and register mask in OFFSETS, decide how
many additional registers to push instead of subtracting a constant
@@ -28448,9 +28490,6 @@ arm_file_start (void)
 {
   int val;
 
-  if 

[PATCH] Fix sanitizer/63690

2014-11-19 Thread Marek Polacek
As the testcase shows, we should really check first that we
have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
On an invalid testcase we might get e.g. STRING_CST.

Bootstrapped/regtested on power8-linux, ok for trunk?

2014-11-19  Marek Polacek  pola...@redhat.com

PR sanitizer/63690
* ubsan.c (instrument_object_size): Check for MEM_REF.

* gcc.dg/ubsan/pr63690.c: New test.

diff --git gcc/testsuite/gcc.dg/ubsan/pr63690.c 
gcc/testsuite/gcc.dg/ubsan/pr63690.c
index e69de29..bcf520a 100644
--- gcc/testsuite/gcc.dg/ubsan/pr63690.c
+++ gcc/testsuite/gcc.dg/ubsan/pr63690.c
@@ -0,0 +1,9 @@
+/* PR sanitizer/63690 */
+/* { dg-do compile } */
+/* { dg-options -fsanitize=undefined } */
+
+void
+foo (void)
+{
+  (*c)++;
+}
diff --git gcc/ubsan.c gcc/ubsan.c
index 7d1e341..ad5665f 100644
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -1539,7 +1539,13 @@ instrument_object_size (gimple_stmt_iterator *gsi, bool 
is_lhs)
 return;
 
   bool decl_p = DECL_P (inner);
-  tree base = decl_p ? inner : TREE_OPERAND (inner, 0);
+  tree base;
+  if (decl_p)
+base = inner;
+  else if (TREE_CODE (inner) == MEM_REF)
+base = TREE_OPERAND (inner, 0);
+  else
+return;
   tree ptr = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (t)), t);
 
   while (TREE_CODE (base) == SSA_NAME)

Marek


[patch, asan, testsuite] Adjust asan expected backtrace

2014-11-19 Thread FX

The attached patch fixes 23 asan testsuite failures on x86_64-apple-darwin14, 
where the backtrace features e.g. wrap_malloc instead of interceptor_malloc.
So I adjusted the dg-output patterns to match.

OK to commit to trunk?



2014-11-19  Francois-Xavier Coudert  fxcoud...@gcc.gnu.org

* c-c++-common/asan/heap-overflow-1.c: Ajust dg-output.
* c-c++-common/asan/memcmp-1.c: Likewise.
* c-c++-common/asan/strncpy-overflow-1.c: Likewise.
* c-c++-common/asan/use-after-free-1.c: Likewise.



asan.diff
Description: Binary data


Re: [PATCH 17/21] PR jit/63854: Fix leaking vec in jit

2014-11-19 Thread Trevor Saunders
On Wed, Nov 19, 2014 at 05:46:17AM -0500, David Malcolm wrote:
 This fixes various leaks of vec buffers seen via valgrind within jit
 (both recording and playback).
 
 Various vec within jit::recording are converted to auto_vec.
 
 Various playback::wrapper subclasses containing vec gain a finalizer
 so they can release the vec when they are collected.

It seems kind of tempting to just use a virtual dtor as the finalizer,
but I'm not sure how to make that work nicely with your overriding of
operator new.

Trev

 
 gcc/jit/ChangeLog:
   PR jit/63854
   * jit-playback.c (gcc::jit::playback::compound_type::set_fields):
   Convert param from const vecplayback::field *  to
   const auto_vecplayback::field * *.
   (gcc::jit::playback::context::new_function_type): Convert param
   param_types from vectype * * to const auto_vectype * *.
   (gcc::jit::playback::context::new_function): Convert param
   params from vecparam * * to const auto_vecparam * *.
   (gcc::jit::playback::context::build_call): Convert param args
   from vecrvalue * to const auto_vecrvalue * *.
   (gcc::jit::playback::context::new_call): Likewise.
   (gcc::jit::playback::context::new_call_through_ptr): Likewise.
   (wrapper_finalizer): New function.
   (gcc::jit::playback::wrapper::operator new): Call the finalizer
   variant of ggc_internal_cleared_alloc, supplying
   wrapper_finalizer.
   (gcc::jit::playback::function::finalizer): New.
   (gcc::jit::playback::block::finalizer): New.
   (gcc::jit::playback::source_file::finalizer): New.
   (gcc::jit::playback::source_line::finalizer): New.
 
   * jit-playback.h
   (gcc::jit::playback::context::new_function_type): Convert param
   param_types from vectype * * to const auto_vectype * *.
   (gcc::jit::playback::context::new_function): Convert param
   params from vecparam * * to const auto_vecparam * *.
   (gcc::jit::playback::context::new_call): Convert param
   args from vecrvalue * to const auto_vecrvalue * *.
   (gcc::jit::playback::context::new_call_through_ptr): Likewise.
   (gcc::jit::playback::context::build_call): Likewise.
   (gcc::jit::playback::context): Convert fields m_functions,
   m_source_files, m_cached_locations from vec to auto_vec.
   (gcc::jit::playback::wrapper::finalizer): New virtual function.
   (gcc::jit::playback::compound_type::set_fields): Convert param fro
   const vecplayback::field *  to
   const auto_vecplayback::field * *.
   (gcc::jit::playback::function::finalizer): New.
   (gcc::jit::playback::block::finalizer): New.
   (gcc::jit::playback::source_file::finalizer): New.
   (gcc::jit::playback::source_line::finalizer): New.
 
   * jit-recording.c
   (gcc::jit::recording::function_type::replay_into): Convert local
   from a vec into an auto_vec.
   (gcc::jit::recording::fields::replay_into): Likewise.
   (gcc::jit::recording::function::replay_into): Likewise.
   (gcc::jit::recording::call::replay_into): Likewise.
   (gcc::jit::recording::call_through_ptr::replay_into): Likewise.
 
   * jit-recording.h (gcc::jit::recording::context): Convert fields
   m_mementos, m_compound_types, m_functions from vec to
   auto_vec .
   (gcc::jit::recording::function_type::get_param_types): Convert
   return type from vectype * to const vectype * .
   (gcc::jit::recording::function_type): Convert field
   m_param_types from a vec to an auto_vec.
   (gcc::jit::recording::fields): Likewise for field m_fields.
   (gcc::jit::recording::function::get_params): Convert return type
   from vec param * to const vecparam * .
   (gcc::jit::recording::function): Convert fields m_params,
   m_locals, m_blocks from vec to auto_vec.
   (gcc::jit::recording::block): Likewise for field m_statements.
   vec to auto_vec.
   (gcc::jit::recording::call): Likewise for field m_args.
   (gcc::jit::recording::call_through_ptr): Likewise.
 ---
  gcc/jit/jit-playback.c  | 73 
 +
  gcc/jit/jit-playback.h  | 27 --
  gcc/jit/jit-recording.c | 16 +--
  gcc/jit/jit-recording.h | 26 +-
  4 files changed, 100 insertions(+), 42 deletions(-)
 
 diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
 index 285a3ef..8fdfa29 100644
 --- a/gcc/jit/jit-playback.c
 +++ b/gcc/jit/jit-playback.c
 @@ -285,15 +285,15 @@ new_compound_type (location *loc,
  }
  
  void
 -playback::compound_type::set_fields (const vecplayback::field * fields)
 +playback::compound_type::set_fields (const auto_vecplayback::field * 
 *fields)
  {
/* Compare with c/c-decl.c: finish_struct. */
tree t = as_tree ();
  
tree fieldlist = NULL;
 -  for (unsigned i = 0; i  fields.length (); i++)
 +  for (unsigned i = 0; i  fields-length (); i++)
  {
 -  field *f = fields[i];
 +  field *f = 

[PATCH, ARM] attribute target (thumb,arm) [5/6]

2014-11-19 Thread Christian Bruel

Implements the hooks for #pragma GCC target

A test included to check that macros were correctly defined/undefined on 
pragma regions.


Thanks

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.h (REGISTER_TARGET_PRAGMAS):
	 Call arm_register_target_pragmas.
	* config/arm/arm-protos.h (arm_register_target_pragmas): Declare.
	* config/arm/arm-c.c (arm_register_target_pragmas): New function.
	(arm_pragma_target_parse): Likewise.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-c.c f/gcc/gcc/config/arm/arm-c.c
--- e/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:03:27.0 +0100
+++ f/gcc/gcc/config/arm/arm-c.c	2014-11-13 13:53:45.0 +0100
@@ -20,9 +20,12 @@
 #include system.h
 #include coretypes.h
 #include tm.h
-#include tm_p.h
 #include tree.h
+#include tm_p.h
 #include c-family/c-common.h
+#include target.h
+#include target-def.h
+#include c-family/c-pragma.h
 
 /* Output C specific EABI object attributes.  These can not be done in
arm.c because they require information from the C frontend.  */
@@ -112,3 +115,78 @@ arm_cpp_builtins (struct cpp_reader *in,
  cpp_def_or_undef (in, __ARM_ASM_SYNTAX_UNIFIED__, inline_asm_unified);
 }
 
+/* Hook to validate the current #pragma GCC target and set the arch custom
+   mode state.  If ARGS is NULL, then POP_TARGET is used to reset
+   the options.  */
+static bool
+arm_pragma_target_parse (tree args, tree pop_target)
+{
+  tree prev_tree = build_target_option_node (global_options);
+  tree cur_tree;
+  struct cl_target_option *prev_opt;
+  struct cl_target_option *cur_opt;
+  bool cur_mode, prev_mode;
+
+  if (! args)
+{
+  cur_tree = ((pop_target) ? pop_target : target_option_default_node);
+  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (cur_tree));
+}
+  else
+{
+  cur_tree = arm_valid_target_attribute_tree (args,  global_options);
+  if (cur_tree == NULL_TREE)
+	{
+	  cl_target_option_restore (global_options,
+TREE_TARGET_OPTION (prev_tree));
+	  return false;
+	}
+}
+
+  target_option_current_node = cur_tree;
+  arm_reset_previous_fndecl ();
+
+  /* Figure out the previous mode.  */
+  prev_opt  = TREE_TARGET_OPTION (prev_tree);
+  cur_opt   = TREE_TARGET_OPTION (cur_tree);
+
+  gcc_assert (prev_opt);
+  gcc_assert (cur_opt);
+
+  cur_mode = TARGET_THUMB_P (cur_opt-x_target_flags);
+  prev_mode = TARGET_THUMB_P (prev_opt-x_target_flags);
+
+  if (prev_mode != cur_mode)
+{
+  /* For the definitions, ensure all newly defined macros are considered
+	 as used for -Wunused-macros.  There is no point warning about the
+	 compiler predefined macros.  */
+  cpp_options *cpp_opts = cpp_get_options (parse_in);
+  unsigned char saved_warn_unused_macros = cpp_opts-warn_unused_macros;
+  cpp_opts-warn_unused_macros = 0;
+
+  /* Update macros.  */
+  arm_cpp_builtins (parse_in, cur_mode);
+
+  cpp_opts-warn_unused_macros = saved_warn_unused_macros;
+}
+
+  return true;
+}
+
+/* Register target pragmas.  We need to add the hook for parsing #pragma GCC
+   option here rather than in arm.c since it will pull in various preprocessor
+   functions, and those are not present in languages like fortran without a
+   preprocessor.  */
+
+void
+arm_register_target_pragmas (void)
+{
+  /* Update pragma hook to allow parsing #pragma GCC target.  */
+  targetm.target_option.pragma_parse = arm_pragma_target_parse;
+
+#ifdef REGISTER_SUBTARGET_PRAGMAS
+  REGISTER_SUBTARGET_PRAGMAS ();
+#endif
+}
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm.h f/gcc/gcc/config/arm/arm.h
--- e/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:59.0 +0100
+++ f/gcc/gcc/config/arm/arm.h	2014-11-13 13:48:01.0 +0100
@@ -2097,7 +2097,8 @@ extern int making_const_table;
   c_register_pragma (0, long_calls, arm_pr_long_calls);		\
   c_register_pragma (0, no_long_calls, arm_pr_no_long_calls);		\
   c_register_pragma (0, long_calls_off, arm_pr_long_calls_off);	\
-  arm_lang_object_attributes_init(); \
+  arm_lang_object_attributes_init();	\
+  arm_register_target_pragmas();   \
 } while (0)
 
 /* Condition code information.  */
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN e/gcc/gcc/config/arm/arm-protos.h f/gcc/gcc/config/arm/arm-protos.h
--- e/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:07:47.0 +0100
+++ f/gcc/gcc/config/arm/arm-protos.h	2014-11-13 13:49:50.0 +0100
@@ -306,6 +306,7 @@ extern const char *arm_rewrite_selected_
 
 /* Defined in gcc/common/config/arm-c.c.  */
 extern void arm_lang_object_attributes_init(void);
+extern void arm_register_target_pragmas (void);
 extern void arm_cpp_builtins (struct cpp_reader *, bool);
 
 extern bool arm_is_constant_pool_ref (rtx);
2014-09-23  Christian Bruel  christian.br...@st.com

	* 

[PATCH, ARM] attribute target (thumb,arm) [6/6]

2014-11-19 Thread Christian Bruel
Implement the -mflip-thump option. Undocumented for internal testing 
only. This option artificially inserts alternative attribute thumb/modes 
on functions.


This close the patch set. Thanks for your review,

Christian
2014-09-23  Christian Bruel  christian.br...@st.com

	* config/arm/arm.c (add_attribute, arm_insert_attributes): New functions
	(TARGET_INSERT_ATTRIBUTES): Define.
	(thumb_flipper): New var.
	* config/arm/arm.opt (-mflip-thumb): New switch.

diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.c g/gcc/gcc/config/arm/arm.c
--- f/gcc/gcc/config/arm/arm.c	2014-11-18 13:16:44.0 +0100
+++ g/gcc/gcc/config/arm/arm.c	2014-11-19 14:04:22.0 +0100
@@ -218,6 +218,7 @@ static void arm_encode_section_info (tre
 
 static void arm_file_end (void);
 static void arm_file_start (void);
+static void arm_insert_attributes (tree, tree *);
 
 static void arm_setup_incoming_varargs (cumulative_args_t, machine_mode,
 	tree, int *, int);
@@ -370,6 +371,9 @@ static const struct attribute_spec arm_a
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE arm_attribute_table
 
+#undef  TARGET_INSERT_ATTRIBUTES
+#define TARGET_INSERT_ATTRIBUTES arm_insert_attributes
+
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START arm_file_start
 #undef TARGET_ASM_FILE_END
@@ -29403,6 +29407,56 @@ arm_valid_target_attribute_tree (tree ar
   return t;
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper = TARGET_THUMB;
+
+static void 
+add_attribute  (const char * mode, tree *attributes)
+{
+  size_t len = strlen (mode);
+  tree value = build_string (len, mode);
+
+  TREE_TYPE (value) = build_array_type (char_type_node,
+	build_index_type (size_int (len)));
+
+  *attributes = tree_cons (get_identifier (target),
+			   build_tree_list (NULL_TREE, value),
+			   *attributes);
+}
+
+/* For testing. Insert thumb or arm modes alternatively on functions.  */
+
+static void
+arm_insert_attributes (tree fndecl, tree * attributes)
+{
+  const char *mode;
+
+  if (TREE_CODE (fndecl) != FUNCTION_DECL || DECL_EXTERNAL(fndecl)
+  || DECL_BUILT_IN (fndecl) || DECL_ARTIFICIAL (fndecl))
+   return;
+
+  /* Nested definitions must inherit mode.  */
+  if (current_function_decl)
+   {
+ mode = TARGET_THUMB ? thumb : arm;  
+ add_attribute (mode, attributes);
+ return;
+   }
+
+  if (! TARGET_FLIP_THUMB)
+return;
+
+  /* If there is already a setting don't change it.  */
+  if (lookup_attribute (target, *attributes) != NULL)
+return;
+
+  mode = thumb_flipper ? thumb : arm;
+  add_attribute (mode, attributes);
+
+  thumb_flipper = !thumb_flipper;
+}
+
 /* Hook to validate attribute((target(string))).  */
 
 static bool
diff '--exclude=ChangeLog*' '--exclude=.svn' '--exclude=*~' '--exclude=#*#' -rupN f/gcc/gcc/config/arm/arm.opt g/gcc/gcc/config/arm/arm.opt
--- f/gcc/gcc/config/arm/arm.opt	2014-11-13 14:05:34.0 +0100
+++ g/gcc/gcc/config/arm/arm.opt	2014-11-19 13:59:46.0 +0100
@@ -122,6 +122,10 @@ Enum(float_abi_type) String(softfp) Valu
 EnumValue
 Enum(float_abi_type) String(hard) Value(ARM_FLOAT_ABI_HARD)
 
+mflip-thumb
+Target Report Var(TARGET_FLIP_THUMB)
+Switch ARM/Thumb modes on alternating functions for compiler testing
+
 mfp16-format=
 Target RejectNegative Joined Enum(arm_fp16_format_type) Var(arm_fp16_format) Init(ARM_FP16_FORMAT_NONE)
 Specify the __fp16 floating-point format


Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
 As the testcase shows, we should really check first that we
 have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
 On an invalid testcase we might get e.g. STRING_CST.

Well, addr_object_size should handle STRING_CSTs just fine, so
why not handle STRING_CSTs in there too (similarly to decl_p = true
case).
Supposedly you want decl_p = true for them and need to tweak
  if (!POINTER_TYPE_P (TREE_TYPE (base))  !DECL_P (base))
but otherwise it should work.
On the other side,
int
foo (int x)
{
  return foobar[x];
}
is already handled by -fsanitize=bounds and for some reason
gimple_assign_load_p doesn't consider handled components with
STRING_CST as base as loads so for -fsanitize=object-size
we aren't called for that.

Jakub


Re: [PATCH, MPX runtime 1/2] Integrate MPX runtime library

2014-11-19 Thread Ilya Enkovich
On 13 Nov 20:56, Joseph Myers wrote:
 On Thu, 13 Nov 2014, Ilya Enkovich wrote:
 
   You can leave it as a single library - it's just that imposes libgcc-like
   constraints on what the library does and how it does things, so as to be
   usable for arbitrary programs built with MPX (e.g. using
   reserved-namespace names such as __write when available - direct syscalls
   may also be a possibility in some cases).
  
  I really don't want to make it libgcc-like and complicate its further
  development.  What is the reason for that?  Is it because library is
  linked by -mmpx?  Can it be avoided if library is linked only at
  '-fcheck-pointer-bounds -mmpx' combination?  If single -mmpx is used
 
 What's -fcheck-pointer-bounds?  I don't see any documentation of it in 
 invoke.texi.  Any options intended for users to use need documenting there 
 - did a documentation patch get missed out from the series of patches 
 posted / committed?
 
 Documentation might help understand the intent of the feature.  I think 
 there are two natural possibilities:
 
 * Use as a lightweight hardening feature enabled by default across a whole 
 distribution (or for a subset of the distribution considered 
 security-critical), by people who don't know the detailed requirements of 
 individual programs built with the option.  In that case there are 
 libgcc-like requirements (as another issue there, it might be problematic 
 to cause arbitrary previously unthreaded programs to link with libpthread; 
 at least, performance issues seem possible).  Outputting reports to files 
 determined by environment variables is probably not a sensible approach in 
 this case.  You might want reports to syslog, or might want MPX-detected 
 issues to be intercepted by crash reporters such as apport so they can 
 collect relevant information and offer to report it to the distributor.
 
 * Use as a debugging tool, by people who understand the requirements of 
 their application / library and can tell whether the things the MPX 
 library is documented as doing might be a problem in their case.  For 
 this, libgcc-like requirements don't apply and output to files seems more 
 sensible.

I think if we compare runtime libraries for these two usage models then we 
don't find much in common.  Thus it should be two independent runtime 
libraries.  This posted library corresponds to the second scenario and is to be 
used mainly for debugging. 

 
 If it's intended only as the latter - and this is documented - then you 
 don't have the libgcc-like requirements, and there's no point in having 
 multiple libraries.  If it's intended for both, that points the way to 
 separate libraries (where the debugging-tool library would be used in that 
 use case, but the crash-reporter-interception case would expect programs 
 to have been linked with only the minimal library).

Would it be enough to mention implicit link and resulting pthread dependency in 
-fcheck-pointer-bounds description in invoke.texi?

 
 -- 
 Joseph S. Myers
 jos...@codesourcery.com

Here is an updated version.  I removed printf usage in a signal handler, added 
symbol versioning, added thread safety comments for static vars, added static 
link support and -static-libmpx option (will update corresponding documentation 
patch).  Does it look OK?

Thanks,
Ilya
--
2014-11-19  Ilya Enkovich  ilya.enkov...@intel.com

* Makefile.def: Add libmpx.
* configure.ac: Add libmpx.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/

2014-11-19  Ilya Enkovich  ilya.enkov...@intel.com

* gcc.c (LIBMPX_LIBS): New.
(LIBMPX_SPEC): New.
(MPX_SPEC): New.
(LINK_COMMAND_SPEC): Add MPX_SPEC.
* c-family/c.opt (static-libmpx): New.

libmpx/

2014-11-19  Ilya Enkovich  ilya.enkov...@intel.com

Initial commit.


diff --git a/Makefile.def b/Makefile.def
index 40bbca9..4a535d2 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -128,6 +128,9 @@ target_modules = { module= libsanitizer;
   bootstrap=true;
   lib_path=.libs;
   raw_cxx=true; };
+target_modules = { module= libmpx;
+  bootstrap=true;
+  lib_path=.libs; };
 target_modules = { module= libvtv;
   bootstrap=true;
   lib_path=.libs;
diff --git a/configure.ac b/configure.ac
index b27fb1d..ccb119b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -162,6 +162,7 @@ target_libraries=target-libgcc \
target-libstdc++-v3 \
target-libsanitizer \
target-libvtv \
+   target-libmpx \
target-libssp \
target-libquadmath \
target-libgfortran \
@@ -642,6 +643,25 @@ if test -d ${srcdir}/libvtv; then
 fi
 fi
 
+
+# Disable libmpx on unsupported systems.
+if test -d ${srcdir}/libmpx; then
+if test x$enable_libmpx = x; then
+   AC_MSG_CHECKING([for libmpx support])
+   if 

Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Jakub Jelinek
On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote:
 On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
  As the testcase shows, we should really check first that we
  have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
  On an invalid testcase we might get e.g. STRING_CST.
 
 Well, addr_object_size should handle STRING_CSTs just fine, so
 why not handle STRING_CSTs in there too (similarly to decl_p = true
 case).
 Supposedly you want decl_p = true for them and need to tweak
   if (!POINTER_TYPE_P (TREE_TYPE (base))  !DECL_P (base))
 but otherwise it should work.
 On the other side,
 int
 foo (int x)
 {
   return foobar[x];
 }
 is already handled by -fsanitize=bounds and for some reason
 gimple_assign_load_p doesn't consider handled components with
 STRING_CST as base as loads so for -fsanitize=object-size
 we aren't called for that.

The patch is ok as is.

Jakub


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Andrew Bennett
   Yes, removing the second NOP is worth the additional effort.

The updated patch is below.

Ok to commit?

Regards,


Andrew


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 02268f3..368c6f0 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
  This will sometimes be a delayed branch; see the write code below
  for details.  */
   mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at, mem, NULL);
-  mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
+
+  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the delay slot
+ of the branch if it is a branch likely because they will not execute when
+ the atomic operation is successful, so place a NOP in there instead.  */
+
+  mips_multi_add_insn (beq%?\t%0,%.,1b%~, at, NULL);
 
   /* if (INSN1 != MOVE  INSN1 != LI) NEWVAL = $TMP3 [delay slot].  */
   if (insn1 != SYNC_INSN1_MOVE  insn1 != SYNC_INSN1_LI  tmp3 != newval)
@@ -13005,7 +13010,7 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
   mips_multi_copy_insn (tmp3_insn);
   mips_multi_set_operand (mips_multi_last_index (), 0, newval);
 }
-  else if (!(required_oldval  cmp))
+  else if (!(required_oldval  cmp)  !mips_branch_likely)
 mips_multi_add_insn (nop, NULL);
 
   /* CMP = 1 -- either standalone or in a delay slot.  */
@@ -13029,12 +13034,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx *operands)
 const char *
 mips_output_sync_loop (rtx_insn *insn, rtx *operands)
 {
-  mips_process_sync_loop (insn, operands);
-
   /* Use branch-likely instructions to work around the LL/SC R1
  errata.  */
   mips_branch_likely = TARGET_FIX_R1;
 
+  mips_process_sync_loop (insn, operands);
+
   mips_push_asm_switch (mips_noreorder);
   mips_push_asm_switch (mips_nomacro);
   mips_push_asm_switch (mips_noat);
@@ -13056,6 +13061,9 @@ mips_output_sync_loop (rtx_insn *insn, rtx *operands)
 unsigned int
 mips_sync_loop_insns (rtx_insn *insn, rtx *operands)
 {
+  /* Use branch-likely instructions to work around the LL/SC R1
+ errata.  */
+  mips_branch_likely = TARGET_FIX_R1;
   mips_process_sync_loop (insn, operands);
   return mips_multi_num_insns;
 }


Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Ramana Radhakrishnan
On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel christian.br...@st.com wrote:

 I think I missed the stage3, Anyway would it be OK for stage1 when it
 reopens ?

Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.

Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana


 Christian



[PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup

2014-11-19 Thread Jakub Jelinek
Hi!

This patch:
1) adds unary op overload to gimple_build_assign_with_ops
   so that many callers don't need to pass NULL_TREE as the last
   argument explicitly
2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar
   reasons
3) renames gimple_assign_set_rhs_with_ops_1 to
   gimple_assign_set_rhs_with_ops so it becomes ternary op overload
   to the existing binary op one and 2) added unary op
and adjusts all users that I found.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-11-19  Jakub Jelinek  ja...@redhat.com

* gimple.h (gimple_build_assign_with_ops): Add unary arg overload.
(gimple_assign_set_rhs_with_ops_1): Renamed to ...
(gimple_assign_set_rhs_with_ops): ... this.  Adjust binary arg
inline overload to use it.  Add unary arg overload.
* gimple.c (gimple_build_assign_with_ops): New unary arg overload.
(gimple_assign_set_rhs_from_tree): Use
gimple_assign_set_rhs_with_ops instead of
gimple_assign_set_rhs_with_ops_1.
(gimple_assign_set_rhs_with_ops_1): Renamed to ...
(gimple_assign_set_rhs_with_ops): ... this.
* ipa-split.c (split_function): Remove last NULL argument
from gimple_build_assign_with_ops call.
* tree-ssa-loop-im.c
(move_computations_dom_walker::before_dom_children): Likewise.
* tsan.c (instrument_builtin_call): Likewise.
* tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store,
vectorizable_conversion, vectorizable_load): Likewise.
* tree-vect-loop.c (vect_is_simple_reduction_1,
get_initial_def_for_induction): Likewise.
* tree-loop-distribution.c (generate_memset_builtin): Likewise.
* tree-vect-patterns.c (vect_handle_widen_op_by_const,
vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type,
vect_recog_over_widening_pattern, vect_recog_rotate_pattern,
vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern,
vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast,
adjust_bool_pattern, vect_recog_bool_pattern): Likewise.
* tree-ssa-phiopt.c (conditional_replacement, abs_replacement,
neg_replacement): Likewise.
* asan.c (build_shadow_mem_access, maybe_create_ssa_name,
maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* omp-low.c (lower_rec_input_clauses, expand_omp_for_generic,
expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
simd_clone_adjust): Likewise.
* tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise.
* gimple-ssa-strength-reduction.c (introduce_cast_before_cand,
replace_one_candidate): Likewise.
* gimple-builder.c (build_type_cast): Likewise.
* tree-ssa-forwprop.c (simplify_rotate): Likewise.
(forward_propagate_addr_expr_1): Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
(simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops
instead of gimple_assign_set_rhs_with_ops_1.
* tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL
argument from gimple_build_assign_with_ops call.
(repropagate_negates): Remove last NULL argument from
gimple_assign_set_rhs_with_ops call.
* ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove
last NULL argument from gimple_build_assign_with_ops call.
(instrument_bool_enum_load): Likewise.  Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
* tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma):
Remove last NULL argument from gimple_build_assign_with_ops call.
(bswap_replace): Likewise.  Use gimple_assign_set_rhs_with_ops instead
of gimple_assign_set_rhs_with_ops_1.
(convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops
instead of gimple_assign_set_rhs_with_ops_1.
* gimple-fold.c (replace_stmt_with_simplification): Likewise.
(rewrite_to_defined_overflow, gimple_build): Remove last NULL argument
from gimple_build_assign_with_ops call.
* tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument
from gimple_assign_set_rhs_with_ops call.
* tree-vrp.c (simplify_truth_ops_using_ranges,
simplify_bit_ops_using_ranges): Remove last NULL argument from
gimple_assign_set_rhs_with_ops call.
(simplify_float_conversion_using_ranges,
simplify_internal_call_using_ranges): Remove last NULL argument from
gimple_build_assign_with_ops call.

--- gcc/gimple.h.jj 2014-11-19 10:45:17.780769013 +0100
+++ gcc/gimple.h2014-11-19 10:52:52.118574835 +0100
@@ -1180,6 +1180,8 @@ gimple gimple_build_assign_with_ops (enu
 tree, tree, 

Re: [PATCH] Fix sanitizer/63690

2014-11-19 Thread Marek Polacek
On Wed, Nov 19, 2014 at 03:12:05PM +0100, Jakub Jelinek wrote:
 On Wed, Nov 19, 2014 at 02:51:11PM +0100, Marek Polacek wrote:
  As the testcase shows, we should really check first that we
  have a MEM_REF, otherwise subsequent TREE_OPERAND might ICE.
  On an invalid testcase we might get e.g. STRING_CST.
 
 Well, addr_object_size should handle STRING_CSTs just fine, so
 why not handle STRING_CSTs in there too (similarly to decl_p = true
 case).
 Supposedly you want decl_p = true for them and need to tweak
   if (!POINTER_TYPE_P (TREE_TYPE (base))  !DECL_P (base))
 but otherwise it should work.
 On the other side,
 int
 foo (int x)
 {
   return foobar[x];
 }
 is already handled by -fsanitize=bounds and for some reason

Right - I've checked that this is the case.  I could mention
that in the original submission.

 gimple_assign_load_p doesn't consider handled components with
 STRING_CST as base as loads so for -fsanitize=object-size
 we aren't called for that.

Thanks for review,

Marek


RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Matthew Fortune
 diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index
 02268f3..368c6f0 100644
 --- a/gcc/config/mips/mips.c
 +++ b/gcc/config/mips/mips.c
 @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
 *operands)
   This will sometimes be a delayed branch; see the write code below
   for details.  */
mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at,
 mem, NULL);
 -  mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
 +
 +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the
 delay slot
 + of the branch if it is a branch likely because they will not
 execute when
 + the atomic operation is successful, so place a NOP in there
 + instead.  */

Please rephrase the comment along the lines of my previous suggestion.
This wording is too complex IMO.

Matthew


RE: [PATCH][AArch64] Adjust generic move costs

2014-11-19 Thread Wilco Dijkstra
Hi Jiong,

Can you commit this please?

2014-11-19  Wilco Dijkstra  wdijk...@arm.com

* gcc/config/aarch64/aarch64.c (generic_regmove_cost):
Increase FP move cost (PR61915).


---
 gcc/config/aarch64/aarch64.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index de53c94..cd30724 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -218,8 +218,10 @@ __extension__
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   NAMED_PARAM (GP2GP, 1),
-  NAMED_PARAM (GP2FP, 2),
-  NAMED_PARAM (FP2GP, 2),
+  /* Avoid the use of slow int-fp moves for spilling by setting
+ their cost higher than memmov_cost.  */
+  NAMED_PARAM (GP2FP, 5),
+  NAMED_PARAM (FP2GP, 5),
   NAMED_PARAM (FP2FP, 2)
 };
 
-- 
1.9.1



[patch, committed]

2014-11-19 Thread FX
Committed as pre-approved by Jakub in PR62132.
It took me three commits (revisions ) to get it right. I apologize for that, 
and I’ll stop coding for the day :(



2014-11-19  Francois-Xavier Coudert  fxcoud...@gcc.gnu.org

PR sanitizer/62132
* c-c++-common/asan/misalign-1.c: Pass -fno-omit-frame-pointer on
darwin, adjust dg-output.
* c-c++-common/asan/misalign-2.c: Likewise.


Index: c-c++-common/asan/misalign-1.c
===
--- c-c++-common/asan/misalign-1.c  (revision 217694)
+++ c-c++-common/asan/misalign-1.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run { target { ilp32 || lp64 } } } */
 /* { dg-options -O2 } */
+/* { dg-additional-options -fno-omit-frame-pointer { target *-*-darwin* } } 
*/
 /* { dg-shouldfail asan } */
 
 struct S { int i; } __attribute__ ((packed));
@@ -38,5 +39,5 @@ main ()
 /* { dg-output ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]* } */
 /* { dg-output 0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r) } */
 /* { dg-output \[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r) } */
-/* { dg-output #0 0x\[0-9a-f\]+ (in 
_*foo(\[^\n\r]*misalign-1.c:10|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r) } */
-/* { dg-output #1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-1.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */
+/* { dg-output #0 0x\[0-9a-f\]+ (in 
_*foo(\[^\n\r]*misalign-1.c:1\[01]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r) } 
*/
+/* { dg-output #1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-1.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */
Index: c-c++-common/asan/misalign-2.c
===
--- c-c++-common/asan/misalign-2.c  (revision 217694)
+++ c-c++-common/asan/misalign-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run { target { ilp32 || lp64 } } } */
 /* { dg-options -O2 } */
+/* { dg-additional-options -fno-omit-frame-pointer { target *-*-darwin* } } 
*/
 /* { dg-shouldfail asan } */
 
 struct S { int i; } __attribute__ ((packed));
@@ -38,5 +39,5 @@ main ()
 /* { dg-output ERROR: AddressSanitizer:\[^\n\r]*on address\[^\n\r]* } */
 /* { dg-output 0x\[0-9a-f\]+ at pc 0x\[0-9a-f\]+ bp 0x\[0-9a-f\]+ sp 
0x\[0-9a-f\]+\[^\n\r]*(\n|\r\n|\r) } */
 /* { dg-output \[^\n\r]*READ of size 4 at 0x\[0-9a-f\]+ thread 
T0\[^\n\r]*(\n|\r\n|\r) } */
-/* { dg-output #0 0x\[0-9a-f\]+ (in 
_*baz(\[^\n\r]*misalign-2.c:22|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r) } */
-/* { dg-output #1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-2.c:34|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */
+/* { dg-output #0 0x\[0-9a-f\]+ (in 
_*baz(\[^\n\r]*misalign-2.c:2\[23]|\[^\n\r]*:0)|\[(\])\[^\n\r]*(\n|\r\n|\r) } 
*/
+/* { dg-output #1 0x\[0-9a-f\]+ (in _*main 
(\[^\n\r]*misalign-2.c:3\[45]|\[^\n\r]*:0)|\[(\]).*(\n|\r\n|\r) } */



Re: [PATCH] gimple_{build_assign,assign_set_rhs}_with_ops* cleanup

2014-11-19 Thread Richard Biener
On Wed, 19 Nov 2014, Jakub Jelinek wrote:

 Hi!
 
 This patch:
 1) adds unary op overload to gimple_build_assign_with_ops
so that many callers don't need to pass NULL_TREE as the last
argument explicitly
 2) adds unary op overload to gimple_assign_set_rhs_with_ops for similar
reasons
 3) renames gimple_assign_set_rhs_with_ops_1 to
gimple_assign_set_rhs_with_ops so it becomes ternary op overload
to the existing binary op one and 2) added unary op
 and adjusts all users that I found.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2014-11-19  Jakub Jelinek  ja...@redhat.com
 
   * gimple.h (gimple_build_assign_with_ops): Add unary arg overload.
   (gimple_assign_set_rhs_with_ops_1): Renamed to ...
   (gimple_assign_set_rhs_with_ops): ... this.  Adjust binary arg
   inline overload to use it.  Add unary arg overload.
   * gimple.c (gimple_build_assign_with_ops): New unary arg overload.
   (gimple_assign_set_rhs_from_tree): Use
   gimple_assign_set_rhs_with_ops instead of
   gimple_assign_set_rhs_with_ops_1.
   (gimple_assign_set_rhs_with_ops_1): Renamed to ...
   (gimple_assign_set_rhs_with_ops): ... this.
   * ipa-split.c (split_function): Remove last NULL argument
   from gimple_build_assign_with_ops call.
   * tree-ssa-loop-im.c
   (move_computations_dom_walker::before_dom_children): Likewise.
   * tsan.c (instrument_builtin_call): Likewise.
   * tree-vect-stmts.c (vect_init_vector, vectorizable_mask_load_store,
   vectorizable_conversion, vectorizable_load): Likewise.
   * tree-vect-loop.c (vect_is_simple_reduction_1,
   get_initial_def_for_induction): Likewise.
   * tree-loop-distribution.c (generate_memset_builtin): Likewise.
   * tree-vect-patterns.c (vect_handle_widen_op_by_const,
   vect_recog_widen_mult_pattern, vect_operation_fits_smaller_type,
   vect_recog_over_widening_pattern, vect_recog_rotate_pattern,
   vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern,
   vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast,
   adjust_bool_pattern, vect_recog_bool_pattern): Likewise.
   * tree-ssa-phiopt.c (conditional_replacement, abs_replacement,
   neg_replacement): Likewise.
   * asan.c (build_shadow_mem_access, maybe_create_ssa_name,
   maybe_cast_to_ptrmode, asan_expand_check_ifn): Likewise.
   * tree-vect-slp.c (vect_get_constant_vectors): Likewise.
   * omp-low.c (lower_rec_input_clauses, expand_omp_for_generic,
   expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
   simd_clone_adjust): Likewise.
   * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Likewise.
   * gimple-ssa-strength-reduction.c (introduce_cast_before_cand,
   replace_one_candidate): Likewise.
   * gimple-builder.c (build_type_cast): Likewise.
   * tree-ssa-forwprop.c (simplify_rotate): Likewise.
   (forward_propagate_addr_expr_1): Remove last NULL argument
   from gimple_assign_set_rhs_with_ops call.
   (simplify_vector_constructor): Use gimple_assign_set_rhs_with_ops
   instead of gimple_assign_set_rhs_with_ops_1.
   * tree-ssa-reassoc.c (maybe_optimize_range_tests): Remove last NULL
   argument from gimple_build_assign_with_ops call.
   (repropagate_negates): Remove last NULL argument from
   gimple_assign_set_rhs_with_ops call.
   * ubsan.c (ubsan_expand_null_ifn, ubsan_expand_objsize_ifn): Remove
   last NULL argument from gimple_build_assign_with_ops call.
   (instrument_bool_enum_load): Likewise.  Remove last NULL argument
   from gimple_assign_set_rhs_with_ops call.
   * tree-ssa-math-opts.c (build_and_insert_cast, convert_mult_to_fma):
   Remove last NULL argument from gimple_build_assign_with_ops call.
   (bswap_replace): Likewise.  Use gimple_assign_set_rhs_with_ops instead
   of gimple_assign_set_rhs_with_ops_1.
   (convert_plusminus_to_widen): Use gimple_assign_set_rhs_with_ops
   instead of gimple_assign_set_rhs_with_ops_1.
   * gimple-fold.c (replace_stmt_with_simplification): Likewise.
   (rewrite_to_defined_overflow, gimple_build): Remove last NULL argument
   from gimple_build_assign_with_ops call.
   * tree-ssa-strlen.c (handle_pointer_plus): Remove last NULL argument
   from gimple_assign_set_rhs_with_ops call.
   * tree-vrp.c (simplify_truth_ops_using_ranges,
   simplify_bit_ops_using_ranges): Remove last NULL argument from
   gimple_assign_set_rhs_with_ops call.
   (simplify_float_conversion_using_ranges,
   simplify_internal_call_using_ranges): Remove last NULL argument from
   gimple_build_assign_with_ops call.
 
 --- gcc/gimple.h.jj   2014-11-19 10:45:17.780769013 +0100
 +++ gcc/gimple.h  2014-11-19 10:52:52.118574835 +0100
 @@ -1180,6 +1180,8 @@ gimple gimple_build_assign_with_ops (enu

RE: [PATCH] If using branch likelies in MIPS sync code fill the delay slot with a nop

2014-11-19 Thread Maciej W. Rozycki
On Wed, 19 Nov 2014, Matthew Fortune wrote:

  @@ -12997,7 +12997,12 @@ mips_process_sync_loop (rtx_insn *insn, rtx
  *operands)
This will sometimes be a delayed branch; see the write code below
for details.  */
 mips_multi_add_insn (is_64bit_p ? scd\t%0,%1 : sc\t%0,%1, at,
  mem, NULL);
  -  mips_multi_add_insn (beq%?\t%0,%.,1b, at, NULL);
  +
  +  /* We can not put the NEWVAL = $TMP3 or CMP = 1 operations in the
  delay slot
  + of the branch if it is a branch likely because they will not
  execute when
  + the atomic operation is successful, so place a NOP in there
  + instead.  */
 
 Please rephrase the comment along the lines of my previous suggestion.
 This wording is too complex IMO.

 Also I'm not sure if it's an enforced rule for GCC sources (it is for 
GDB for example; someone please chime in if I'm missing something here), 
but can you take the opportunity and limit the span of these comment 
lines a bit, like to 74 or maybe even 72 columns, similarly to the rule 
for ChangeLog entries, to make them more readable.  Thanks.

  Maciej


Re: [PATCH, ifcvt] Allow CC mode if HAVE_cbranchcc4 (fix s390 build)

2014-11-19 Thread Richard Biener
On Thu, Nov 13, 2014 at 7:57 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Richard Henderson wrote:
 On 11/12/2014 09:41 PM, Ulrich Weigand wrote:
  * optabs.c (prepare_operand): Gracefully fail if the mode of X
  does not match the operand mode expected by the insn pattern.

 This is ok.

 I've checked this in now, thanks.

 I wondered whether s390 benefit from following i386 in allowing
 movcc to accept immediate operands pre-reload.  This could then
 be re-used by cstorecc4 to allow any CC mode to expand to LOCR
 when available.

 Well, we already compile

 int test (long x, long y)
 {
   return x  y;
 }

 to

 cgr %r2,%r3 # 44*cmpdi_ccs/1[length = 4]
 lhi %r1,0   # 43*movsi_zarch/1  [length = 4]
 lhi %r2,1   # 7 *movsi_zarch/1  [length = 4]
 locrnl  %r2,%r1 # 45*movsicc/2  [length = 4]
 lgfr%r2,%r2 # 17*extendsidi2/1  [length = 4]
 br  %r14# 50return  [length = 2]

 with -march=z196 or higher.  The LOCR is not introduced via
 cstorecc4, but instead directly by noce_emit_cmove via
 emit_conditional_move.  It doesn't seem to matter that the
 patterns don't accept immediate operands; those are loaded
 into registers by maybe_legitimize_operand, which already
 tries a copy_to_mode_reg.

This probably caused bootstrap on s390x-linux to fail as in PR63952
(last checked with rev. 217714).

Richard.

 Bye,
 Ulrich

 --
   Dr. Ulrich Weigand
   GNU/Linux compilers and toolchain
   ulrich.weig...@de.ibm.com



Re: [PATCH, ARM] attribute target (thumb,arm) [0/6]

2014-11-19 Thread Christian Bruel



On 11/19/2014 03:18 PM, Ramana Radhakrishnan wrote:

On Wed, Nov 19, 2014 at 1:24 PM, Christian Bruel christian.br...@st.com wrote:


I think I missed the stage3, Anyway would it be OK for stage1 when it
reopens ?


Since you submitted this well during stage1 and given that these
patches address comments from earlier in the review process we should
aim to get these in for 5.0. If at some point during the review
process it looks risky and there needs to be significant rework we can
always stop. It's on the list of patches to be reviewed and I will
find some dedicated time later this week to set down and review / play
with the patches in an attempt to move this forward as it is a
reasonably large chunk of work.


Thanks, also I forgot to mention that you need
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01231.html
to play with the attribute. Will be part of the same shot.

Christian



Thanks for continuing to work on these patches and addressing the
earlier review comments.

regards
Ramana



Christian



[COMMITTED 1/3] Make TARGET_STATIC_CHAIN allow a function type

2014-11-19 Thread Richard Henderson
As opposed to always being a decl.  This is a prerequisite
to allowing the static chain to be loaded for indirect calls.

* targhooks.c (default_static_chain): Remove check for
DECL_STATIC_CHAIN.
* config/moxie/moxie.c (moxie_static_chain): Likewise.
* config/i386/i386.c (ix86_static_chain): Allow decl or type
as the first argument.
* config/xtensa/xtensa.c (xtensa_static_chain): Change the name
of the unused first parameter.
* doc/tm.texi (TARGET_STATIC_CHAIN): Document the first parameter
may be a type.
* target.def (static_chain): Likewise.
---
 gcc/ChangeLog  | 13 +
 gcc/config/i386/i386.c | 19 +--
 gcc/config/moxie/moxie.c   |  5 +
 gcc/config/xtensa/xtensa.c |  2 +-
 gcc/doc/tm.texi|  2 +-
 gcc/target.def |  6 +++---
 gcc/targhooks.c|  5 +
 7 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3166e03..3b41de2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -27356,13 +27356,10 @@ ix86_minimum_alignment (tree exp, machine_mode mode,
This is a register, unless all free registers are used by arguments.  */
 
 static rtx
-ix86_static_chain (const_tree fndecl, bool incoming_p)
+ix86_static_chain (const_tree fndecl_or_type, bool incoming_p)
 {
   unsigned regno;
 
-  if (!DECL_STATIC_CHAIN (fndecl))
-return NULL;
-
   if (TARGET_64BIT)
 {
   /* We always use R10 in 64-bit mode.  */
@@ -27370,13 +27367,23 @@ ix86_static_chain (const_tree fndecl, bool incoming_p)
 }
   else
 {
-  tree fntype;
+  const_tree fntype, fndecl;
   unsigned int ccvt;
 
   /* By default in 32-bit mode we use ECX to pass the static chain.  */
   regno = CX_REG;
 
-  fntype = TREE_TYPE (fndecl);
+  if (TREE_CODE (fndecl_or_type) == FUNCTION_DECL)
+   {
+  fntype = TREE_TYPE (fndecl_or_type);
+ fndecl = fndecl_or_type;
+   }
+  else
+   {
+ fntype = fndecl_or_type;
+ fndecl = NULL;
+   }
+
   ccvt = ix86_get_callcvt (fntype);
   if ((ccvt  IX86_CALLCVT_FASTCALL) != 0)
{
diff --git a/gcc/config/moxie/moxie.c b/gcc/config/moxie/moxie.c
index d4688d9..148d26b 100644
--- a/gcc/config/moxie/moxie.c
+++ b/gcc/config/moxie/moxie.c
@@ -528,13 +528,10 @@ moxie_arg_partial_bytes (cumulative_args_t cum_v,
 /* Worker function for TARGET_STATIC_CHAIN.  */
 
 static rtx
-moxie_static_chain (const_tree fndecl, bool incoming_p)
+moxie_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p)
 {
   rtx addr, mem;
 
-  if (!DECL_STATIC_CHAIN (fndecl))
-return NULL;
-
   if (incoming_p)
 addr = plus_constant (Pmode, arg_pointer_rtx, 2 * UNITS_PER_WORD);
   else
diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 159a1a7..a0025a5 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -3626,7 +3626,7 @@ xtensa_function_value_regno_p (const unsigned int regno)
expressions that denote where they are stored.  */
 
 static rtx
-xtensa_static_chain (const_tree ARG_UNUSED (fndecl), bool incoming_p)
+xtensa_static_chain (const_tree ARG_UNUSED (fndecl_or_type), bool incoming_p)
 {
   rtx base = incoming_p ? arg_pointer_rtx : stack_pointer_rtx;
   return gen_frame_mem (Pmode, plus_constant (Pmode, base,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0d3a9fd..5b9da47 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3462,7 +3462,7 @@ If the static chain is passed in memory, these macros 
should not be
 defined; instead, the @code{TARGET_STATIC_CHAIN} hook should be used.
 @end defmac
 
-@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree @var{fndecl}, 
bool @var{incoming_p})
+@deftypefn {Target Hook} rtx TARGET_STATIC_CHAIN (const_tree 
@var{fndecl_or_type}, bool @var{incoming_p})
 This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for
 targets that may use different static chain locations for different
 nested functions.  This may be required if the target has function
diff --git a/gcc/target.def b/gcc/target.def
index bc5160d..dc48ae6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4516,8 +4516,8 @@ false for naked functions.  The default implementation 
always returns true.,
  bool, (void),
  hook_bool_void_true)
 
-/* Return an rtx for the static chain for FNDECL.  If INCOMING_P is true,
-   then it should be for the callee; otherwise for the caller.  */
+/* Return an rtx for the static chain for FNDECL_OR_TYPE.  If INCOMING_P
+   is true, then it should be for the callee; otherwise for the caller.  */
 DEFHOOK
 (static_chain,
  This hook replaces the use of @code{STATIC_CHAIN_REGNUM} et al for\n\
@@ -4539,7 +4539,7 @@ will be at an offset from the frame pointer.\n\
 The variables @code{stack_pointer_rtx}, @code{frame_pointer_rtx}, and\n\
 @code{arg_pointer_rtx} will have been 

[COMMITTED 0/3] Allow front end use of the static chain

2014-11-19 Thread Richard Henderson
This is the middle-end and C front end parts of the gccgo related
patch set first posted here.

  https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01009.html

I've re-tested these patches in isolation and have committed them
to mainline.  I'm currently submitting the Go portions for the
upstream review, and hope to have to libffi merge complete soon.

I think it's important that the change to Go using the static chain
happen before gcc5, as it represents an ABI change.  Our next opportunity
to fix the bulk of the non-x86 targets would be gcc6 in a year's time.


r~


Richard Henderson (3):
  Make TARGET_STATIC_CHAIN allow a function type
  Allow the front-end to create calls with a static chain
  Allow the static chain to be set from C

 gcc/ChangeLog| 27 ++
 gcc/c-family/c-common.c  |  2 ++
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-parser.c | 40 ++
 gcc/calls.c  | 14 --
 gcc/config/i386/i386.c   | 19 --
 gcc/config/moxie/moxie.c |  5 +
 gcc/config/xtensa/xtensa.c   |  2 +-
 gcc/doc/extend.texi  | 13 +
 gcc/doc/tm.texi  |  2 +-
 gcc/gimple-fold.c| 21 
 gcc/gimplify.c   | 17 +++-
 gcc/target.def   |  6 +++---
 gcc/targhooks.c  |  5 +
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/cwsc0.c | 16 +++
 gcc/testsuite/gcc.dg/cwsc1.c | 46 
 gcc/tree-cfg.c   | 22 +++--
 18 files changed, 222 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c
 create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c

-- 
1.9.3



[COMMITTED 3/3] Allow the static chain to be set from C

2014-11-19 Thread Richard Henderson
We need to be able to set the static chain on a few calls within the
Go runtime, so expose this with __builtin_call_with_static_chain.

* c-family/c-common.c (c_common_reswords): Add
__builtin_call_with_static_chain.
* c-family/c-common.h (RID_BUILTIN_CALL_WITH_STATIC_CHAIN): New.
* c/c-parser.c (c_parser_postfix_expression): Handle it.
* doc/extend.texi (__builtin_call_with_static_chain): Document it.

* gcc.dg/cwsc0.c: New test.
* gcc.dg/cwsc1.c: New test.
---
 gcc/ChangeLog|  6 ++
 gcc/c-family/c-common.c  |  2 ++
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-parser.c | 40 ++
 gcc/doc/extend.texi  | 13 +
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/gcc.dg/cwsc0.c | 16 +++
 gcc/testsuite/gcc.dg/cwsc1.c | 46 
 8 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/cwsc0.c
 create mode 100644 gcc/testsuite/gcc.dg/cwsc1.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 839111a..95b6b1b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -453,6 +453,8 @@ const struct c_common_resword c_common_reswords[] =
   { __attribute__,   RID_ATTRIBUTE,  0 },
   { __auto_type, RID_AUTO_TYPE,  D_CONLY },
   { __bases,  RID_BASES, D_CXXONLY },
+  { __builtin_call_with_static_chain,
+RID_BUILTIN_CALL_WITH_STATIC_CHAIN, D_CONLY },
   { __builtin_choose_expr, RID_CHOOSE_EXPR, D_CONLY },
   { __builtin_complex, RID_BUILTIN_COMPLEX, D_CONLY },
   { __builtin_shuffle, RID_BUILTIN_SHUFFLE, 0 },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index ca6fc8b..7e53923 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -101,7 +101,7 @@ enum rid
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
   RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
   RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
-  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE,
+  RID_FRACT, RID_ACCUM, RID_AUTO_TYPE, RID_BUILTIN_CALL_WITH_STATIC_CHAIN,
 
   /* C11 */
   RID_ALIGNAS, RID_GENERIC,
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index f90f6af..8a4fd39 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -7414,6 +7414,46 @@ c_parser_postfix_expression (c_parser *parser)
  = comptypes (e1, e2) ? integer_one_node : integer_zero_node;
  }
  break;
+   case RID_BUILTIN_CALL_WITH_STATIC_CHAIN:
+ {
+   vecc_expr_t, va_gc *cexpr_list;
+   c_expr_t *e2_p;
+   tree chain_value;
+
+   c_parser_consume_token (parser);
+   if (!c_parser_get_builtin_args (parser,
+   __builtin_call_with_static_chain,
+   cexpr_list, false))
+ {
+   expr.value = error_mark_node;
+   break;
+ }
+   if (vec_safe_length (cexpr_list) != 2)
+ {
+   error_at (loc, wrong number of arguments to 
+  %__builtin_call_with_static_chain%);
+   expr.value = error_mark_node;
+   break;
+ }
+
+   expr = (*cexpr_list)[0];
+   e2_p = (*cexpr_list)[1];
+   *e2_p = convert_lvalue_to_rvalue (loc, *e2_p, true, true);
+   chain_value = e2_p-value;
+   mark_exp_read (chain_value);
+
+   if (TREE_CODE (expr.value) != CALL_EXPR)
+ error_at (loc, first argument to 
+   %__builtin_call_with_static_chain% 
+   must be a call expression);
+   else if (TREE_CODE (TREE_TYPE (chain_value)) != POINTER_TYPE)
+ error_at (loc, second argument to 
+   %__builtin_call_with_static_chain% 
+   must be a pointer type);
+   else
+ CALL_EXPR_STATIC_CHAIN (expr.value) = chain_value;
+   break;
+ }
case RID_BUILTIN_COMPLEX:
  {
vecc_expr_t, va_gc *cexpr_list;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d10a815..7178c9a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8913,6 +8913,7 @@ in the Cilk Plus language manual which can be found at
 @node Other Builtins
 @section Other Built-in Functions Provided by GCC
 @cindex built-in functions
+@findex __builtin_call_with_static_chain
 @findex __builtin_fpclassify
 @findex __builtin_isfinite
 @findex __builtin_isnormal
@@ -9501,6 +9502,18 @@ depending on the arguments' types.  For example:
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_call_with_static_chain 
(@var{call_exp}, @var{pointer_exp})
+
+The @var{call_exp} expression must be a function call, and the
+@var{pointer_exp} expression must be a pointer.  The 

[COMMITTED 2/3] Allow the front-end to create calls with a static chain

2014-11-19 Thread Richard Henderson
And, at the same time, allow indirect calls to have a static chain.
We'll always eliminate the static chain if we can prove it's unused.

* calls.c (prepare_call_address): Allow decl or type for first arg.
(expand_call): Pass type to prepare_call_address if no decl.
* gimple-fold.c (gimple_fold_call): Eliminate the static chain if
the function doesn't use it; fold it otherwise.
* gimplify.c (gimplify_call_expr): Gimplify the static chain.
* tree-cfg.c (verify_gimple_call): Allow a static chain on indirect
function calls.
---
 gcc/ChangeLog |  8 
 gcc/calls.c   | 14 --
 gcc/gimple-fold.c | 21 +
 gcc/gimplify.c| 17 -
 gcc/tree-cfg.c| 22 +++---
 5 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 7f55aaf..c64c0eb 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -197,7 +197,7 @@ static void restore_fixed_argument_area (rtx, rtx, int, 
int);
CALL_INSN_FUNCTION_USAGE information.  */
 
 rtx
-prepare_call_address (tree fndecl, rtx funexp, rtx static_chain_value,
+prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value,
  rtx *call_fusage, int reg_parm_seen, int sibcallp)
 {
   /* Make a valid memory address and copy constants through pseudo-regs,
@@ -217,12 +217,13 @@ prepare_call_address (tree fndecl, rtx funexp, rtx 
static_chain_value,
 #endif
 }
 
-  if (static_chain_value != 0)
+  if (static_chain_value != 0
+   (TREE_CODE (fndecl_or_type) != FUNCTION_DECL
+ || DECL_STATIC_CHAIN (fndecl_or_type)))
 {
   rtx chain;
 
-  gcc_assert (fndecl);
-  chain = targetm.calls.static_chain (fndecl, false);
+  chain = targetm.calls.static_chain (fndecl_or_type, false);
   static_chain_value = convert_memory_address (Pmode, static_chain_value);
 
   emit_move_insn (chain, static_chain_value);
@@ -3278,8 +3279,9 @@ expand_call (tree exp, rtx target, int ignore)
}
 
   after_args = get_last_insn ();
-  funexp = prepare_call_address (fndecl, funexp, static_chain_value,
-call_fusage, reg_parm_seen, pass == 0);
+  funexp = prepare_call_address (fndecl ? fndecl : fntype, funexp,
+static_chain_value, call_fusage,
+reg_parm_seen, pass == 0);
 
   load_register_parameters (args, num_actuals, call_fusage, flags,
pass == 0, sibcall_failure);
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index acdadcd..4f716b2 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2723,6 +2723,27 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
inplace)
}
 }
 
+  /* Check for indirect calls that became direct calls, and then
+ no longer require a static chain.  */
+  if (gimple_call_chain (stmt))
+{
+  tree fn = gimple_call_fndecl (stmt);
+  if (fn  !DECL_STATIC_CHAIN (fn))
+   {
+ gimple_call_set_chain (stmt, NULL);
+ changed = true;
+   }
+  else
+   {
+ tree tmp = maybe_fold_reference (gimple_call_chain (stmt), false);
+ if (tmp)
+   {
+ gimple_call_set_chain (stmt, tmp);
+ changed = true;
+   }
+   }
+}
+
   if (inplace)
 return changed;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 93c06de..c46fb66 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2432,7 +2432,7 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool 
want_value)
}
 }
 
-  /* Finally, gimplify the function arguments.  */
+  /* Gimplify the function arguments.  */
   if (nargs  0)
 {
   for (i = (PUSH_ARGS_REVERSED ? nargs - 1 : 0);
@@ -2454,6 +2454,21 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, 
bool want_value)
 }
 }
 
+  /* Gimplify the static chain.  */
+  if (CALL_EXPR_STATIC_CHAIN (*expr_p))
+{
+  if (fndecl  !DECL_STATIC_CHAIN (fndecl))
+   CALL_EXPR_STATIC_CHAIN (*expr_p) = NULL;
+  else
+   {
+ enum gimplify_status t;
+ t = gimplify_arg (CALL_EXPR_STATIC_CHAIN (*expr_p), pre_p,
+   EXPR_LOCATION (*expr_p));
+ if (t == GS_ERROR)
+   ret = GS_ERROR;
+   }
+}
+
   /* Verify the function result.  */
   if (want_value  fndecl
VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fnptrtype
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 9dd8961..71a828c 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -,22 +,14 @@ verify_gimple_call (gimple stmt)
   return true;
 }
 
-  /* If there is a static chain argument, this should not be an indirect
- call, and the decl should have DECL_STATIC_CHAIN set.  */
-  if (gimple_call_chain (stmt))
+  /* If there is a static chain argument, the call should either be
+ indirect, or the decl should have 

[PATCH] Factor and export a const_unop, export const_binop

2014-11-19 Thread Richard Biener

The following patch is to avoid needlessly recursing to generic_simplify
from fold_unary when we ask to perform constant folding.  Similar
fold_binary is currently entered to constant fold from 
gimple/generic_simplify and will do extra work in case it isn't able
to fold the constant operation.

To mitigate that (and still missing the fold_ternary case - to be done
as followup in case this gets accepted at this point) - I factor and
export a const_unop and provide an exported const_binop which also
knows to dispatch to fold_relational_const.  These API are intended
to provide pure constant folding (that is, all-constant operands,
not only a constant result).

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok at this point?
(it should fix a tiny compile-time regression)

I'll followup with a const_ternop (better name?) if ok.

Thanks,
Richard.

2014-11-19  Richard Biener  rguent...@suse.de

* fold-const.h (const_unop): Declare.
(const_binop): Likewise.
* fold-const.c (const_binop): Export overload that expects
a type parameter and dispatches to fold_relational_const as well.
Check both operand kinds for guarding the transforms.
(const_unop): New function, with constant folding from fold_unary_loc.
(fold_unary_loc): Dispatch to const_unop for tcc_constant operand.
Remove constant folding done there from the simplifications.
(fold_binary_loc): Check for constants using CONSTANT_CLASS_P.
(fold_negate_expr): Remove dead code from the REAL_CST case.
Avoid building garbage in the COMPLEX_CST case.
* gimple-match-head.c (gimple_resimplify1): Dispatch to
const_unop.
(gimple_resimplify2): Dispatch to const_binop.
(gimple_simplify): Likewise.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-11-18 15:16:04.263514898 +0100
--- gcc/fold-const.c2014-11-19 13:22:24.866032427 +0100
*** static bool negate_expr_p (tree);
*** 115,121 
  static tree negate_expr (tree);
  static tree split_tree (tree, enum tree_code, tree *, tree *, tree *, int);
  static tree associate_trees (location_t, tree, tree, enum tree_code, tree);
- static tree const_binop (enum tree_code, tree, tree);
  static enum comparison_code comparison_to_compcode (enum tree_code);
  static enum tree_code compcode_to_comparison (enum comparison_code);
  static int operand_equal_for_comparison_p (tree, tree, tree);
--- 115,120 
*** static tree fold_negate_const (tree, tre
*** 156,161 
--- 155,163 
  static tree fold_not_const (const_tree, tree);
  static tree fold_relational_const (enum tree_code, tree, tree, tree);
  static tree fold_convert_const (enum tree_code, tree, tree);
+ static tree fold_view_convert_expr (tree, tree);
+ static bool vec_cst_ctor_to_array (tree, tree *);
+ 
  
  /* Return EXPR_LOCATION of T if it is not UNKNOWN_LOCATION.
 Otherwise, return LOC.  */
*** fold_negate_expr (location_t loc, tree t
*** 561,570 
  
  case REAL_CST:
tem = fold_negate_const (t, type);
!   /* Two's complement FP formats, such as c4x, may overflow.  */
!   if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
!   return tem;
!   break;
  
  case FIXED_CST:
tem = fold_negate_const (t, type);
--- 563,569 
  
  case REAL_CST:
tem = fold_negate_const (t, type);
!   return tem;
  
  case FIXED_CST:
tem = fold_negate_const (t, type);
*** fold_negate_expr (location_t loc, tree t
*** 572,584 
  
  case COMPLEX_CST:
{
!   tree rpart = negate_expr (TREE_REALPART (t));
!   tree ipart = negate_expr (TREE_IMAGPART (t));
! 
!   if ((TREE_CODE (rpart) == REAL_CST
! TREE_CODE (ipart) == REAL_CST)
!   || (TREE_CODE (rpart) == INTEGER_CST
!TREE_CODE (ipart) == INTEGER_CST))
  return build_complex (type, rpart, ipart);
}
break;
--- 571,579 
  
  case COMPLEX_CST:
{
!   tree rpart = fold_negate_expr (loc, TREE_REALPART (t));
!   tree ipart = fold_negate_expr (loc, TREE_IMAGPART (t));
!   if (rpart  ipart)
  return build_complex (type, rpart, ipart);
}
break;
*** const_binop (enum tree_code code, tree a
*** 1135,1144 
STRIP_NOPS (arg1);
STRIP_NOPS (arg2);
  
!   if (TREE_CODE (arg1) == INTEGER_CST)
  return int_const_binop (code, arg1, arg2);
  
!   if (TREE_CODE (arg1) == REAL_CST)
  {
machine_mode mode;
REAL_VALUE_TYPE d1;
--- 1130,1139 
STRIP_NOPS (arg1);
STRIP_NOPS (arg2);
  
!   if (TREE_CODE (arg1) == INTEGER_CST  TREE_CODE (arg2) == INTEGER_CST)
  return int_const_binop (code, arg1, arg2);
  
!   if (TREE_CODE (arg1) == REAL_CST  TREE_CODE (arg2) == REAL_CST)
  {
machine_mode mode;
REAL_VALUE_TYPE d1;

  1   2   3   >