[PATCH] Fix loop pattern distribution ICE (PR tree-optimization/54321)

2012-08-20 Thread Jakub Jelinek
Hi!

The middle-end argument of memset is signed (int), so simplify_builtin_call
correctly checks host_integerp (val2, 0), but later on used tree_low_cst
(val2, 1), so for negative values it would ICE.  Fixed thusly, the memset
is supposed to cast the int to unsigned char internally anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-08-20  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/54321
* tree-ssa-forwprop.c (simplify_builtin_call): Pass 0 instead of 1
as second argument to tree_low_cst call on val2.

* gcc.c-torture/compile/pr54321.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2012-08-14 08:45:00.0 +0200
+++ gcc/tree-ssa-forwprop.c 2012-08-20 08:11:06.247936035 +0200
@@ -1554,7 +1554,7 @@ simplify_builtin_call (gimple_stmt_itera
  else
src_buf[0] = tree_low_cst (src1, 0);
  memset (src_buf + tree_low_cst (diff, 1),
- tree_low_cst (val2, 1), tree_low_cst (len2, 1));
+ tree_low_cst (val2, 0), tree_low_cst (len2, 1));
  src_buf[src_len] = '\0';
  /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
 handle embedded '\0's.  */
--- gcc/testsuite/gcc.c-torture/compile/pr54321.c.jj2012-08-20 
08:12:10.955630873 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr54321.c   2012-08-20 
08:13:27.963398948 +0200
@@ -0,0 +1,12 @@
+/* PR tree-optimization/54321 */
+struct S { char s[0]; } *a;
+
+void
+foo (void)
+{
+  char *b = a-s;
+  int c = 0;
+  b[0] = 0;
+  while (++c  9)
+b[c] = 255;
+}

Jakub


Re: [PATCH][C++] Save memory and reallocations in name-lookup

2012-08-20 Thread Richard Guenther
On Sat, 18 Aug 2012, Dimitrios Apostolou wrote:

 Hi,
 
 On Fri, 17 Aug 2012, Jakub Jelinek wrote:
  On Fri, Aug 17, 2012 at 06:41:37AM -0500, Gabriel Dos Reis wrote:
   I am however concerned with:
   
  static void
  store_bindings (tree names, VEC(cxx_saved_binding,gc) **old_bindings)
  {
!   static VEC(tree,heap) *bindings_need_stored = NULL;
   
   I would be more comfortable to see the cache be on per-scope
   (e.g. namespace scope) basis as opposed
   a blanket global cache stored in a global variable.
  
  It is not any kind of cache.  It could be in theory an automatic variable
  vector pointer, it is only used during that function.  The reason why it is
  static variable instead is just to avoid constant allocation/deallocation
  of the vector, this way after the first call it will be already allocated
  (but, upon entry to store_bindings will always be empty).
 
 Why not use stack vector of sufficient size for most cases then? I usually do
 something like:
 
   VEC (tree, stack) *bindings_need_stored = VEC_alloc (tree, stack, 32);
   ...
   VEC_free (tree, stack, bindings_need_stored);
 
 if I've measured that 32 (or whatever) is enough for most cases. I don't have
 a clue about this case though.

I considered that, but I'm not sure it would be an improvement.  We'd
have a re-allocation and free in some cases then.

Richard.


[PATCH] Fix PR54326

2012-08-20 Thread Richard Guenther

The following should fix PR54326 (3.4.6 at least compiles genoutput.c
without warnings after this).

Bootstrap running on x86_64-unknown-linux-gnu.

Richard.

2012-08-20  Richard Guenther  rguent...@suse.de

PR bootstrap/54326
* genoutput.c (note_constraint): Properly use CONST_CAST.

Index: gcc/genoutput.c
===
--- gcc/genoutput.c (revision 190523)
+++ gcc/genoutput.c (working copy)
@@ -1175,7 +1175,7 @@ note_constraint (rtx exp, int lineno)
}
 }
   new_cdata = XNEWVAR (struct constraint_data, sizeof (struct constraint_data) 
+ namelen);
-  strcpy ((char *)new_cdata + offsetof(struct constraint_data, name), name);
+  strcpy (CONST_CAST(char *, new_cdata-name), name);
   new_cdata-namelen = namelen;
   new_cdata-lineno = lineno;
   new_cdata-next_this_letter = *slot;


Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Richard Guenther
On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou ji...@gmx.net wrote:


 2012-08-19  Dimitrios Apostolou  ji...@gmx.net

 * gcc/tree-ssa-pre.c (phi_translate_pool): New static global
 alloc_pool, used for allocating struct expr_pred_trans_d for
 phi_translate_table.
 (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of
 malloc() and free() calls.


 This avoids lots of malloc/free calls and slow iterations during numerous
 htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as
 soon as a post C++ one is available.

Ok, if bootstrap / testing still succeeds.

Thanks,
Richard.


 Thanks,
 Dimitris


Re: CXX conversion: min g++ version pre-requisite?

2012-08-20 Thread Richard Guenther
On Sun, Aug 19, 2012 at 8:44 PM, Gary Funck g...@intrepid.com wrote:
 I filed two bug reports:

 GCC install document does not list minimum required g++ version
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54324

 GCC does not build with G++ version 3.4.0
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54326

 Re: the latter bug report (54326), it might be further
 divided into two bug reports: one documenting the failure
 to build with g++ 3.4.0 and the other having to do with
 the use of casts in genoutput.c.  Let me know if you
 recommend that I separate the bug reports.

Does it still fail, even after fixing genoutput.c?  If so, yes, please open
a bug for the compile-with-3.4.6(!) issue.  No, I think 3.4._0_ should not
be what we should be after.

Richard.

Richard.

 - Gary


Re: [PATCH] Fix PR 52631 (VN does not use simplified expression for lookup)

2012-08-20 Thread Richard Guenther
On Mon, Aug 20, 2012 at 6:49 AM, Andrew Pinski
andrew.pin...@caviumnetworks.com wrote:
 On Wed, Jul 25, 2012 at 4:39 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Tue, Jul 24, 2012 at 5:50 PM, Andrew Pinski
 andrew.pin...@caviumnetworks.com wrote:
 Hi,
   Before tuples was introduced, VN used to lookup the simplified
 expression to see if it was available already and use that instead of
 the non simplified one.  This patch adds the support back to VN to do
 exactly that.

 OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

 I think this should be done for all RHS and SSA name LHS, not only
 for UNARY/BINARY/TERNARY - because even for SINGLE rhs we
 can end up simplifying (for REALPART_EXPR for example which we
 handle as nary, too).  I think we constrain try_to_simplify enough
 so that

 + /* First try to lookup the simplified expression. */
 + if (simplified  valid_gimple_rhs_p (simplified))
 +   {
 + tree result = vn_nary_op_lookup (simplified, NULL);
 + if (result)
 +   {
 + changed = set_ssa_val_to (lhs, result);
 + goto done;
 +   }
 + changed = set_ssa_val_to (lhs, lhs);
 + vn_nary_op_insert (simplified, lhs);
 +   }
   switch (get_gimple_rhs_class (code))
 {
 case GIMPLE_UNARY_RHS:
 case GIMPLE_BINARY_RHS:
 ...

 should work.  As you also insert the simplified variant I think we really
 (finally) want to have a valid_nary_op routine rather than relying on
 valid_gimple_rhs_p which is way too generic.

 I don't see valid_gimple_rhs_p being that generic as it checks to make
 sure the operands of the gimple are valid.
 Maybe I am missing something here though.

valid_gimple_rhs_p checks what it says.  But what we want to know is whether
the rhs is valid for a SCCVN NARY.  Those are not the same.

Richard.

 Thanks,
 Andrew Pinski



 Thanks,
 Richard.

 Thanks,
 Andrew Pinski

 ChangeLog:

 * tree-ssa-sccvn.c (visit_use): Look up the simplified
 expression before visting the original one.

 * gcc.dg/tree-ssa/ssa-fre-9.c: Update the number of
 eliminatations that happen.


Re: [PATCH] Fix loop pattern distribution ICE (PR tree-optimization/54321)

2012-08-20 Thread Richard Guenther
On Mon, Aug 20, 2012 at 8:27 AM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 The middle-end argument of memset is signed (int), so simplify_builtin_call
 correctly checks host_integerp (val2, 0), but later on used tree_low_cst
 (val2, 1), so for negative values it would ICE.  Fixed thusly, the memset
 is supposed to cast the int to unsigned char internally anyway.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Yes, sorry for the mixup.

Thanks,
Richard.

 2012-08-20  Jakub Jelinek  ja...@redhat.com

 PR tree-optimization/54321
 * tree-ssa-forwprop.c (simplify_builtin_call): Pass 0 instead of 1
 as second argument to tree_low_cst call on val2.

 * gcc.c-torture/compile/pr54321.c: New test.

 --- gcc/tree-ssa-forwprop.c.jj  2012-08-14 08:45:00.0 +0200
 +++ gcc/tree-ssa-forwprop.c 2012-08-20 08:11:06.247936035 +0200
 @@ -1554,7 +1554,7 @@ simplify_builtin_call (gimple_stmt_itera
   else
 src_buf[0] = tree_low_cst (src1, 0);
   memset (src_buf + tree_low_cst (diff, 1),
 - tree_low_cst (val2, 1), tree_low_cst (len2, 1));
 + tree_low_cst (val2, 0), tree_low_cst (len2, 1));
   src_buf[src_len] = '\0';
   /* Neither builtin_strncpy_read_str nor builtin_memcpy_read_str
  handle embedded '\0's.  */
 --- gcc/testsuite/gcc.c-torture/compile/pr54321.c.jj2012-08-20 
 08:12:10.955630873 +0200
 +++ gcc/testsuite/gcc.c-torture/compile/pr54321.c   2012-08-20 
 08:13:27.963398948 +0200
 @@ -0,0 +1,12 @@
 +/* PR tree-optimization/54321 */
 +struct S { char s[0]; } *a;
 +
 +void
 +foo (void)
 +{
 +  char *b = a-s;
 +  int c = 0;
 +  b[0] = 0;
 +  while (++c  9)
 +b[c] = 255;
 +}

 Jakub


[PATCH] GTY chain_next annotate gimple_statement_base

2012-08-20 Thread Richard Guenther

This creates better marking code (even though we probably tail-recurse
for the exising one).

Bootstrapped and tested on x86-64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  rguent...@suse.de

* gimple.h (gimple_statement_base): Annotate with GTY chain_next.

Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 190523)
+++ gcc/gimple.h(working copy)
@@ -151,7 +151,7 @@ typedef struct
 /* Data structure definitions for GIMPLE tuples.  NOTE: word markers
are for 64 bit hosts.  */
 
-struct GTY(()) gimple_statement_base {
+struct GTY((chain_next (%h.next))) gimple_statement_base {
   /* [ WORD 1 ]
  Main identifying code for a tuple.  */
   ENUM_BITFIELD(gimple_code) code : 8;


Re: Speedups/Cleanups: End of GSOC patch collection

2012-08-20 Thread Dodji Seketeli
Dimitrios Apostolou ji...@gmx.net a écrit:

[...]

   * include/libiberty.h (XOBDELETE, XOBGROW, XOBGROWVEC, XOBSHRINK)
   (XOBSHRINKVEC, XOBFINISH): New type-safe macros for obstack
   operations.
   (XOBFINISH): Changed to return (T *) instead of T. All callers
   updated.
   * libcpp/include/symtab.h (obstack_chunk_alloc)
   (obstack_chunk_free): Define, so that obstack_init() can be used.
   * libcpp/internal.h (struct cset_converter): Same.
   * libcpp/files.c (_cpp_init_files): Changed _obstack_begin() to
   obstack_init().
   * libcpp/identifiers.c (_cpp_init_hashtable): Same.
   * libcpp/symtab.c (ht_create): Same.
   * libcpp/init.c (cpp_create_reader): Same.


[...]

 +++ libcpp/include/symtab.h   2012-08-18 15:07:01 +

[...]

 +#ifndef obstack_chunk_alloc

Please add a comment here, as you did bellow in hunk for
libcpp/internal.h:

 +#ifndef obstack_chunk_alloc
 +  /* Needed for calling obstack_init().  */
 +  #define obstack_chunk_alloc(void *(*) (long)) xmalloc
 +  #define obstack_chunk_free (void (*) (void *)) free
 +#endif

 +  #define obstack_chunk_alloc(void *(*) (long)) xmalloc
 +  #define obstack_chunk_free (void (*) (void *)) free
 +#endif

[...]

With these changes, the libcpp parts look OK to me if they still
boostrap post c++ conversion.  I am not a maintainer so I a deferring to
Tom and the other maintainers.

Thanks.

-- 
Dodji


Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Jakub Jelinek
On Mon, Aug 20, 2012 at 09:37:39AM +0200, Richard Guenther wrote:
 On Sun, Aug 19, 2012 at 8:30 PM, Dimitrios Apostolou ji...@gmx.net wrote:
 
 
  2012-08-19  Dimitrios Apostolou  ji...@gmx.net
 
  * gcc/tree-ssa-pre.c (phi_translate_pool): New static global
  alloc_pool, used for allocating struct expr_pred_trans_d for
  phi_translate_table.
  (phi_trans_add, init_pre, fini_pre): Use it, avoids thousand of
  malloc() and free() calls.
 
 
  This avoids lots of malloc/free calls and slow iterations during numerous
  htab_delete() in fini_pre(). Tested on pre C++-snapshot, will update info as
  soon as a post C++ one is available.
 
 Ok, if bootstrap / testing still succeeds.

I'd note for all the recently posted patches from Dimitrios, the gcc/
prefix doesn't belong to the ChangeLog entry pathnames, the filenames are
relative to the corresponding ChangeLog location.

Jakub


Re: alloc_pool for tree-ssa-pre.c:phi_translate_table

2012-08-20 Thread Dimitrios Apostolou

On Mon, 20 Aug 2012, Jakub Jelinek wrote:


I'd note for all the recently posted patches from Dimitrios, the gcc/
prefix doesn't belong to the ChangeLog entry pathnames, the filenames are
relative to the corresponding ChangeLog location.



Ah sorry, it's what the mklog utility generates, it seems I got carried 
away over-using it. :-)



Dimitris



Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka
 Well, it should store the largest working set in BBs, or one that came
 from a much longer run. But I see the issue you are pointing out. The
 num_counters (the number of hot bbs) should be reasonable, as the
 total number of BBs is the same between all runs, and I want to show
 the largest or more dominant working set in terms of the number of hot
 bbs. But the min_bb_counter will become more and more inaccurate as
 the number of runs increases, since the counter values are
 accumulated.

Yes and that is the one that we plan to use to determine hot/cold decisions on,
right?

Note that there is no really 1-1 corespondence in betwen BBs and counters.
For each function the there should be num_edges-num_bbs+1 counters.
What do you plan to use BB counts for?
 
 I typed up a detailed email below on why getting this right would be
 difficult, but then I realized there might be a fairly simple accurate
 solution, which I'll describe first:
 
 The only way I see to do this completely accurately is to take two
 passes through the existing gcda files that we are merging into, one
 to read in all the counter values and merge them into all the counter
 values in the arrays from the current run (after which we can do the
 histogramming/working set computation accurately from the merged
 counters), and the second to rewrite them. In libgcov this doesn't
 seem like it would be too difficult to do, although it is a little bit
 of restructuring of the main merging loop and needs some special
 handling for buffered functions (which could increase the memory
 footprint a bit if there are many of these since they will all need to
 be buffered across the iteration over all the gcda files).
 
 The summary merging in coverage.c confuses me a bit as it seems to be
 handling the case when there are multiple program summaries in a
 single gcda file. When would this happen? It looks like the merge
 handling in libgcov should always produce a single program summary per
 gcda file.


This is something Nathan introduced years back. The idea was IMO to handle
more acurately objects linked into multiple binaries. I am not sure
if the code really works or worked on some point.

The idea, as I recall it, was to produce overall checksum of all objects and
have different profile records for each combination.

This is not really useful for profile feedback as you generate single object
file for all uses, but it might become useful for LTO where you know into which
binary you are linking to. I am not really sure it is worth all the 
infrastructure
needed to support this though.
 
 
 
  Why you don't simply write the histogram into gcov file and don't merge the 
  values
  here (i.e. doing the cummulation loop in GCC instead of within libgcov)?
 
 That doesn't completely solve the problem, unfortunately. The reason
 is that you don't know which histogram entry contains a particular
 block each run, and therefore you don't know how to compute the new
 combined histogram value and index for that bb counter. For example, a
 particular histogram index may have 5 counters (bbs) in it for one run
 and 2 counters (bbs) in it for a second run, so the question is how to
 compute the new entry for all of those bb counters, as the 5 bbs from
 the first run may or may not be a superset of the 2 from the second
 run. You could assume that the bbs have the same relative order of
 hotness between runs, and combine the bb counters accordingly, but
 there is still some trickiness because you have to apportion the
 cumulative counter sum stored in the histogram entry between new
 entries. For example, assume the highest indexed non-zero entries (the
 histogram buckets containing the largest counter values) in the two
 incoming histograms are:
 
 histogram 1:
 
 index 100: 4 counters, cumulative value X, min counter value minx
 ...
 
 histogram 2:
 
 index 100: 2 counters, cumulative value Y, min counter value miny
 index 99: 3 counters, cumulative value W, min counter value minw
 ...
 
 To merge, you could assume something like 2 counters with a new
 cumulative value (Y + X*2/4), and new min counter value minx+miny,
 that go into the merged histogram with the index corresponding to
 counter value minx+miny. And then 2 counters have a new cumulative
 value (W*2/3 + X*2/4) and new min counter value minx+minw, that go
 into the merged histogram with index corresponding to counter value
 minw+minx. Etc... Not entirely accurate, although it might be a
 reasonable approximation, but it also requires a number of division
 operations during the merge in libgcov.
 
 Another possibility, that might also provide a reasonable
 approximation, would be to scale the min_bb_counter values in the
 working sets by the sum_all_merged/sum_all_orig, where sum_all_merged
 is the new sum_all, and sum_all_orig is the sum_all from the summary
 whose working_set was chosen to be propagated to the new merged
 summary. This also requires some divisions at libgcov merge time,
 unless we 

Re: [PATCH, RFC] Re-work find_reloads_subreg_address (Re: [PATCH][RFC, Reload]. Reload bug?)

2012-08-20 Thread Tejas Belagod

Tejas Belagod wrote:

Ulrich Weigand wrote:

The following patch implements this idea; it passes a basic regression
test on arm-linux-gnueabi.  (Obviously this would need a lot more
testing on various platforms before getting into mainline ...)

Can you have a look whether this fixes the problem you're seeing?



Sorry for the delay in replying. Thanks for the patch. I tried this patch -  it
doesn't seem to reach as far as cleanup_subreg_operands (), but fails an
assertion in push_reload () in reload.c:1307. I'm currently investigating this
and will let you know the reason soon.



Hi,

Sorry for the delay in replying. These new issues that I was seeing were bugs 
specific to my code that I've fixed. Your patch seems to work fine. Thanks a lot 
for the patch.


Thanks,
Tejas Belagod.
ARM.



[PATCH] Fix PR54327

2012-08-20 Thread Richard Guenther

This is an issue of folding looking at SSA def stmts when SSA form
is not up-to-date.  That's not safe unless the SSA name we are
looking at is not marked for update (see GIMPLE_COND folding in
cfgcleaup for another example).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  rguent...@suse.de

PR tree-optimization/54327
* gimple-fold.c (get_maxval_strlen): Do not walk use-def chains
if the use is registered for SSA update.

* gcc.dg/torture/pr54327.c: New testcase.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 190523)
--- gcc/gimple-fold.c   (working copy)
*** get_maxval_strlen (tree arg, tree *lengt
*** 736,741 
--- 736,746 
return true;
  }
  
+   /* If ARG is registered for SSA update we cannot look at its defining
+  statement.  */
+   if (name_registered_for_update_p (arg))
+ return false;
+ 
/* If we were already here, break the infinite cycle.  */
if (!bitmap_set_bit (visited, SSA_NAME_VERSION (arg)))
  return true;
Index: gcc/testsuite/gcc.dg/torture/pr54327.c
===
*** gcc/testsuite/gcc.dg/torture/pr54327.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr54327.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ 
+ #include string.h
+ #include stdlib.h
+ void treathead ()
+ {
+   char *a = ';' == '\0' ? : 0;
+   if (*a == '=')
+ {
+   while (*a == (*a == 0) || *a == '\'')
+   a++;
+   if (strlen (a)  2)
+   abort ();
+ }
+ }


Re: [PATCH][C++] Get rid of TREE_CHAIN use for TREE_VEC (NON_DEFAULT_TEMPLATE_ARGS_COUNT)

2012-08-20 Thread Richard Guenther
On Fri, 17 Aug 2012, Richard Guenther wrote:

 
 This gets rid of this field, pushing it into a short int in tree_base
 (hopefully 2^16 non-defaulted template args are enough ...).  The
 existing code is a bit confusing because of the differences with
 respect to ENABLE_CHECKING, it has very little testcase coverage as
 well (a single testcase, g++.dg/template/error39.C), so I am not sure
 I got the idea correct (especially as ENABLE_CHECKING vs. 
 non-ENABLE_CHECKING looks to introduce subtle differences?).
 
 Anyway, a previous iteration that failed g++.dg/template/error39.C
 bootstrapped and tested all languages ok, the following version now
 passes g++.dg/template/error39.C and hopefully bootstrap and regtest
 as well.
 
 We build a _lot_ of TREE_VECs from the C++ frontend, so this 8 byte
 saving is quite visible.  TREE_VEC is the most used TREE_CODE by far
 on PR54146:
 
 ...
 ssa_name  363597
 mem_ref   403966
 addr_expr1203839
 tree_list1205721
 tree_vec 2654415
 
 which translates to 109615896 bytes allocated in TREE_VECs (yes,
 that includes the actual storage, the average size is 41.3 bytes,
 with this patch TREE_VEC itself shrinks from 40 bytes to 32 bytes
 which means the average TREE_VEC size is one!).
 
 Let's hope removing TREE_TYPE from TREE_VEC is as easy ;)

It turns out the C++ frontend is again the only user.  It gets
used for DECL_PRIMARY_TEMPLATE.  Now, the C++ fronend indeed seems
to want to associate information with the template argument vector
itself, not the template.  As the patch removing TREE_CAHIN is in
some sense not an improvement (C++ specific bits remain in the
generic tree structure, blocking from the VEC size being moved
to spare bits in tree_base), the DECL_PRIMARY_TEMPLATE usage shows
that using a TREE_VEC to represent the template argument is the
issue.  We cannot simply derive from TREE_VEC because it relies
on storage allocated at its tail, so instead a C++ specific tree
node mimicking a TREE_VEC but sporting extra fields, including
ones for the number of non-defaulted args and for DECL_PRIMARY_TEMPLATE
would be the way to go.  That node would derive from tree_base.

This should be pretty straight-forward to do, but I'll push it
back for now due to lack of time.  So consider this patch
withdrawn.  You'll instead see me trying to move the length
field into tree_base.

Richard.

 Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Ok for the C++ parts?
 
 Thanks,
 Richard.
 
 2012-08-17  Richard Guenther  rguent...@suse.de
 
   * tree.h (struct tree_base): Put address_space into a union,
   alongside a new field non_default_template_args_count used by
   C++ TREE_VECs.
   (struct tree_vec): Derive from tree_typed instead of tree_common.
   (TYPE_ADDR_SPACE): Adjust.
   * tree.c (initialize_tree_contains_struct): Adjust tree type
   hierarchy.
 
   cp/
   * cp-tree.h (NON_DEFAULT_TEMPLATE_ARGS_COUNT): Remove.
   (SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT,
   GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Use
   non_default_template_args_count in tree_base instead of TREE_CHAIN.
   * pt.c (expand_template_argument_pack): Adjust.
   (template_parm_to_arg): Likewise.
   (current_template_args): Likewise.
   (coerce_template_parameter_pack): Likewise.
   (coerce_template_parms): Likewise.
   (tsubst_template_args): Likewise.
   (type_unification_real): Likewise.
   * parser.c (cp_parser_template_argument_list): Likewise.
 
 Index: gcc/tree.h
 ===
 *** gcc/tree.h(revision 190469)
 --- gcc/tree.h(working copy)
 *** struct GTY(()) tree_base {
 *** 453,465 
 unsigned packed_flag : 1;
 unsigned user_align : 1;
 unsigned nameless_flag : 1;
   
 !   unsigned spare : 12;
 ! 
 !   /* This field is only used with type nodes; the only reason it is present
 !  in tree_base instead of tree_type is to save space.  The size of the
 !  field must be large enough to hold addr_space_t values.  */
 !   unsigned address_space : 8;
   };
   
   struct GTY(()) tree_typed {
 --- 453,473 
 unsigned packed_flag : 1;
 unsigned user_align : 1;
 unsigned nameless_flag : 1;
 +   unsigned spare0 : 4;
   
 !   /* The following has to be at a 16-bit alignment boundary to be properly
 !  packed and not make tree_base larger than 64 bits.  */
 !   union tree_base_spare_bits {
 ! /* This field is only used with type nodes; the only reason it is 
 present
 !in tree_base instead of tree_type is to save space.  The size of the
 !field must be large enough to hold addr_space_t values.  */
 ! unsigned char address_space;
 ! 
 ! /* This field is only used with tree_vec nodes by the C++ frontend; the
 !only reason it is present in tree_base instead of tree_vec is to
 !save space.  */
 ! unsigned 

Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html

2012-08-20 Thread Gerald Pfeifer
On Tue, 14 Aug 2012, Tobias Burnus wrote:
 I have committed the patch as obvious, however, I am happy for any 
 comments.

I went ahead and made some smaller changes, patch below.

I noticed you are using q.../q, as in qcodee/code/q,
which we usually don't.  Why that?

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.15
diff -u -3 -p -r1.15 changes.html
--- changes.html19 Aug 2012 19:48:02 -  1.15
+++ changes.html20 Aug 2012 10:32:20 -
@@ -92,18 +92,20 @@ by this change./p
 (re)allocation in hot loops. (For arrays, replacing 
qcodevar=/code/q
 by qcodevar(:)=/code/q disables the automatic reallocation.)/li
 
-liReading floating point numbers which use qcodeq/code/q for the
-exponential (such as code4.0q0/code) is now supported as vendor
+lipReading floating point numbers which use qcodeq/code/q for
+the exponential (such as code4.0q0/code) is now supported as vendor
 extension for better compatibility with old data files. It is strongly
 recommended to use for I/O the equivalent but standard conforming
-qcodee/code/q (such as code4.0e0/code). [For the Fortran
+qcodee/code/q (such as code4.0e0/code)./p
+
+p(For Fortran
 source code, consider replacing the qcodeq/code/q in
 floating-point literals by a kind parameter (e.g. code4.0e0_qp/code
-with a suitable codeqp/code). Note that ndash; in the Fortran
+with a suitable codeqp/code). Note that ndash; in Fortran
 source code ndash; replacing qcodeq/code/q by a simple
-qcodee/code/q is emnot/em equivalent.]/li
+qcodee/code/q is emnot/em equivalent.)/p/li
 
-liThe codeGFORTRAN_TMPDIR/code environment variable, for specifying
+liThe codeGFORTRAN_TMPDIR/code environment variable for specifying
 a non-default directory for files opened with 
codeSTATUS=SCRATCH/code,
 is not used anymore. Instead gfortran checks the POSIX/GNU standard
 codeTMPDIR/code environment variable. If codeTMPDIR/code is not


Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 This adds support for SH's rotcr insn.
 Tested on rev 190459 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 OK?

OK.

Regards,
kaz


Re: [SH] PR 51244 - Use more zero displacement branches

2012-08-20 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 This adds two new patterns to undo an optimization that is done by ifcvt
 and is not beneficial if zero displacement branches are available on SH.
 Tested on rev 190459 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 OK?

OK.

Regards,
kaz


Check for ffs declaration in gcc/configure.ac

2012-08-20 Thread Joseph S. Myers
Building a cross compiler from i686-mingw32 to arm-none-eabi fails
after the move to build as C++ because config/arm/neon.md uses ffs but
no MinGW header declares ffs (and unlike in C, an implicit declaration
of this built-in function can't be used).  This patch fixes this issue
by making the gcc directory configure check for a declaration, so
causing libiberty.h to provide one if needed.

Tested building a cross compiler as described above (fails before the
patch, passes afterwards).  OK to commit?

2012-08-20  Joseph Myers  jos...@codesourcery.com

* configure.ac (ffs): Check for declaration.
* configure, config.in: Regenerate.

Index: configure
===
--- configure   (revision 190529)
+++ configure   (working copy)
@@ -10288,7 +10288,7 @@
 for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \
strsignal strstr stpcpy strverscmp \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free basename getopt clock getpagesize clearerr_unlocked feof_unlocked  
 ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   
fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked 
fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
+   free basename getopt clock getpagesize ffs clearerr_unlocked 
feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked   
fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   fread_unlocked 
fwrite_unlocked getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
 do
   ac_tr_decl=`$as_echo HAVE_DECL_$ac_func | $as_tr_cpp`
 { $as_echo $as_me:${as_lineno-$LINENO}: checking whether $ac_func is 
declared 5
Index: config.in
===
--- config.in   (revision 190529)
+++ config.in   (working copy)
@@ -621,6 +621,12 @@
 #endif
 
 
+/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_DECL_FFS
+#endif
+
+
 /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise
define to 0. */
 #ifndef USED_FOR_TARGET
Index: configure.ac
===
--- configure.ac(revision 190529)
+++ configure.ac(working copy)
@@ -1075,7 +1075,7 @@
 gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \
strsignal strstr stpcpy strverscmp \
errno snprintf vsnprintf vasprintf malloc realloc calloc \
-   free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[
+   free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
 #include ansidecl.h
 #include system.h])
 

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Check for ffs declaration in gcc/configure.ac

2012-08-20 Thread Richard Guenther
On Mon, Aug 20, 2012 at 1:28 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 Building a cross compiler from i686-mingw32 to arm-none-eabi fails
 after the move to build as C++ because config/arm/neon.md uses ffs but
 no MinGW header declares ffs (and unlike in C, an implicit declaration
 of this built-in function can't be used).  This patch fixes this issue
 by making the gcc directory configure check for a declaration, so
 causing libiberty.h to provide one if needed.

 Tested building a cross compiler as described above (fails before the
 patch, passes afterwards).  OK to commit?

Ok.

Thanks,
Richard.

 2012-08-20  Joseph Myers  jos...@codesourcery.com

 * configure.ac (ffs): Check for declaration.
 * configure, config.in: Regenerate.

 Index: configure
 ===
 --- configure   (revision 190529)
 +++ configure   (working copy)
 @@ -10288,7 +10288,7 @@
  for ac_func in getenv atol asprintf sbrk abort atof getcwd getwd \
 strsignal strstr stpcpy strverscmp \
 errno snprintf vsnprintf vasprintf malloc realloc calloc \
 -   free basename getopt clock getpagesize clearerr_unlocked 
 feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked 
   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   
 fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   
 putchar_unlocked putc_unlocked
 +   free basename getopt clock getpagesize ffs clearerr_unlocked 
 feof_unlocked   ferror_unlocked fflush_unlocked fgetc_unlocked fgets_unlocked 
   fileno_unlocked fprintf_unlocked fputc_unlocked fputs_unlocked   
 fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked   
 putchar_unlocked putc_unlocked
  do
ac_tr_decl=`$as_echo HAVE_DECL_$ac_func | $as_tr_cpp`
  { $as_echo $as_me:${as_lineno-$LINENO}: checking whether $ac_func is 
 declared 5
 Index: config.in
 ===
 --- config.in   (revision 190529)
 +++ config.in   (working copy)
 @@ -621,6 +621,12 @@
  #endif


 +/* Define to 1 if we found a declaration for 'ffs', otherwise define to 0. */
 +#ifndef USED_FOR_TARGET
 +#undef HAVE_DECL_FFS
 +#endif
 +
 +
  /* Define to 1 if we found a declaration for 'fgetc_unlocked', otherwise
 define to 0. */
  #ifndef USED_FOR_TARGET
 Index: configure.ac
 ===
 --- configure.ac(revision 190529)
 +++ configure.ac(working copy)
 @@ -1075,7 +1075,7 @@
  gcc_AC_CHECK_DECLS(getenv atol asprintf sbrk abort atof getcwd getwd \
 strsignal strstr stpcpy strverscmp \
 errno snprintf vsnprintf vasprintf malloc realloc calloc \
 -   free basename getopt clock getpagesize gcc_UNLOCKED_FUNCS, , ,[
 +   free basename getopt clock getpagesize ffs gcc_UNLOCKED_FUNCS, , ,[
  #include ansidecl.h
  #include system.h])


 --
 Joseph S. Myers
 jos...@codesourcery.com


Re: combine permutations in gimple

2012-08-20 Thread Richard Guenther
On Sat, Aug 18, 2012 at 11:59 AM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 here is a new patch (passes bootstrap+regtest), which only combines
 permutations if the result is the identity. I'll file a PR about more
 general
 combinations to link to this conversation and the cost hook proposition.

 Ok?

Ok.

Thanks,
Richard.


 2012-08-18  Marc Glisse  marc.gli...@inria.fr


 gcc/
 * fold-const.c (fold_ternary_loc): Detect identity permutations.
 Canonicalize permutations more.
 * tree-ssa-forwprop.c (is_combined_permutation_identity): New
 function.

 (simplify_permutation): Likewise.
 (ssa_forward_propagate_and_combine): Call it.

 gcc/testsuite/
 * gcc.dg/tree-ssa/forwprop-19.c: New testcase.
 * gcc.dg/fold-perm.c: Likewise.

 --
 Marc Glisse
 Index: gcc/fold-const.c
 ===
 --- gcc/fold-const.c(revision 190500)
 +++ gcc/fold-const.c(working copy)
 @@ -14148,58 +14148,96 @@ fold_ternary_loc (location_t loc, enum t
 return fold_build2_loc (loc, PLUS_EXPR, type,
 const_binop (MULT_EXPR, arg0, arg1), arg2);
if (integer_zerop (arg2))
 return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);

return fold_fma (loc, type, arg0, arg1, arg2);

  case VEC_PERM_EXPR:
if (TREE_CODE (arg2) == VECTOR_CST)
 {
 - unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
 + unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask;
   unsigned char *sel = XALLOCAVEC (unsigned char, nelts);
   tree t;
   bool need_mask_canon = false;
 + bool all_in_vec0 = true;
 + bool all_in_vec1 = true;
 + bool maybe_identity = true;
 + bool single_arg = (op0 == op1);
 + bool changed = false;

 + mask = single_arg ? (nelts - 1) : (2 * nelts - 1);
   gcc_assert (nelts == VECTOR_CST_NELTS (arg2));
   for (i = 0; i  nelts; i++)
 {
   tree val = VECTOR_CST_ELT (arg2, i);
   if (TREE_CODE (val) != INTEGER_CST)
 return NULL_TREE;

 - sel[i] = TREE_INT_CST_LOW (val)  (2 * nelts - 1);
 + sel[i] = TREE_INT_CST_LOW (val)  mask;
   if (TREE_INT_CST_HIGH (val)
   || ((unsigned HOST_WIDE_INT)
   TREE_INT_CST_LOW (val) != sel[i]))
 need_mask_canon = true;
 +
 + if (sel[i]  nelts)
 +   all_in_vec1 = false;
 + else
 +   all_in_vec0 = false;
 +
 + if ((sel[i]  (nelts-1)) != i)
 +   maybe_identity = false;
 +   }
 +
 + if (maybe_identity)
 +   {
 + if (all_in_vec0)
 +   return op0;
 + if (all_in_vec1)
 +   return op1;
 }

   if ((TREE_CODE (arg0) == VECTOR_CST
|| TREE_CODE (arg0) == CONSTRUCTOR)
(TREE_CODE (arg1) == VECTOR_CST
   || TREE_CODE (arg1) == CONSTRUCTOR))
 {
   t = fold_vec_perm (type, arg0, arg1, sel);
   if (t != NULL_TREE)
 return t;
 }

 + if (all_in_vec0)
 +   op1 = op0;
 + else if (all_in_vec1)
 +   {
 + op0 = op1;
 + for (i = 0; i  nelts; i++)
 +   sel[i] -= nelts;
 + need_mask_canon = true;
 +   }
 +
 + if (op0 == op1  !single_arg)
 +   changed = true;
 +
   if (need_mask_canon  arg2 == op2)
 {
   tree *tsel = XALLOCAVEC (tree, nelts);
   tree eltype = TREE_TYPE (TREE_TYPE (arg2));
   for (i = 0; i  nelts; i++)
 tsel[i] = build_int_cst (eltype, sel[i]);
 - t = build_vector (TREE_TYPE (arg2), tsel);
 - return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, t);
 + op2 = build_vector (TREE_TYPE (arg2), tsel);
 + changed = true;
 }
 +
 + if (changed)
 +   return build3_loc (loc, VEC_PERM_EXPR, type, op0, op1, op2);
 }
return NULL_TREE;

  default:
return NULL_TREE;
  } /* switch (code) */
  }

  /* Perform constant folding and related simplification of EXPR.
 The related simplifications include x*1 = x, x*0 = 0, etc.,
 Index: gcc/tree-ssa-forwprop.c
 ===
 --- gcc/tree-ssa-forwprop.c (revision 190500)
 +++ gcc/tree-ssa-forwprop.c (working copy)
 @@ -2570,20 +2570,109 @@ combine_conversions (gimple_stmt_iterato
   gimple_assign_set_rhs_code (stmt, CONVERT_EXPR);
   update_stmt (stmt);
   return remove_prop_source_from_use (op0) ? 2 : 1;
 }
 }
  }

return 0;
  }

 +/* Determine whether applying the 2 

Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Richard Guenther
On Fri, Aug 17, 2012 at 7:05 PM, Richard Earnshaw rearn...@arm.com wrote:
 On 17/08/12 16:20, Richard Earnshaw wrote:
 Ok, in which case we have to give is_widening_mult_rhs_p enough smarts
 to not strip

   (s32)u32

 and return u32.

 I'll have another think about it.

 Take two.

 This version should address your concerns about handling

 (u32)u16 * (u32)u16 - u64

 We now look at each operand directly, but when doing so we check whether
 the operand is the same size as the result or not.  When it is, we can
 strip any conversion; when it isn't the conversion must preserve
 signedness of the inner operand and mustn't be a narrowing conversion.

 * tree-ssa-math-opts.c (widening_mult_conversion_strippable_p):
 New function.
 (is_widening_mult_rhs_p): Use it.

 Testing underway (again)

 OK?

Ok.

Thanks,
Richard.

 R.



[PATCH] Remove register_new_name_mapping, do less timevar push/pop

2012-08-20 Thread Richard Guenther

This performs timevar push/pop at the single remaining entry to 
incremental SSA rewrite setup.  The other entry, 
register_new_name_mapping, is removed by making its only other
user use create_new_def_for.  Which needs to handle a NULL DEF
for this case, as we have no access to a DEF op for an SSA name LHS
(but we do for a PHI node).  Something Micha may change soon.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-20  Richard Guenther  rguent...@suse.de

* tree-flow.h (register_new_name_mapping): Remove.
* tree-into-ssa.c (register_new_name_mapping): Likewise.
(add_new_name_mapping): Do not push/pop timevar here.
(create_new_def_for): Instead do it here.  Initialize
update-ssa here, handle a NULL def.
* tree-vrp.c (build_assert_expr_for): Use create_new_def_for.

Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 190525)
--- gcc/tree-flow.h (working copy)
*** void release_defs_bitset (bitmap toremov
*** 516,522 
  /* In tree-into-ssa.c  */
  void update_ssa (unsigned);
  void delete_update_ssa (void);
- void register_new_name_mapping (tree, tree);
  tree create_new_def_for (tree, gimple, def_operand_p);
  bool need_ssa_update_p (struct function *);
  bool name_registered_for_update_p (tree);
--- 516,521 
Index: gcc/tree-into-ssa.c
===
*** gcc/tree-into-ssa.c (revision 190525)
--- gcc/tree-into-ssa.c (working copy)
*** static VEC(gimple_vec, heap) *phis_to_re
*** 111,125 
  static bitmap blocks_with_phis_to_rewrite;
  
  /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES.  These sets need
!to grow as the callers to register_new_name_mapping will typically
!create new names on the fly.  FIXME.  Currently set to 1/3 to avoid
!frequent reallocations but still need to find a reasonable growth
!strategy.  */
  #define NAME_SETS_GROWTH_FACTOR   (MAX (3, num_ssa_names / 3))
  
  
  /* The function the SSA updating data structures have been initialized for.
!NULL if they need to be initialized by register_new_name_mapping.  */
  static struct function *update_ssa_initialized_fn = NULL;
  
  /* Global data to attach to the main dominator walk structure.  */
--- 111,125 
  static bitmap blocks_with_phis_to_rewrite;
  
  /* Growth factor for NEW_SSA_NAMES and OLD_SSA_NAMES.  These sets need
!to grow as the callers to create_new_def_for will create new names on
!the fly.
!FIXME.  Currently set to 1/3 to avoid frequent reallocations but still
!need to find a reasonable growth strategy.  */
  #define NAME_SETS_GROWTH_FACTOR   (MAX (3, num_ssa_names / 3))
  
  
  /* The function the SSA updating data structures have been initialized for.
!NULL if they need to be initialized by create_new_def_for.  */
  static struct function *update_ssa_initialized_fn = NULL;
  
  /* Global data to attach to the main dominator walk structure.  */
*** add_to_repl_tbl (tree new_tree, tree old
*** 587,594 
  static void
  add_new_name_mapping (tree new_tree, tree old)
  {
-   timevar_push (TV_TREE_SSA_INCREMENTAL);
- 
/* OLD and NEW_TREE must be different SSA names for the same symbol.  */
gcc_assert (new_tree != old  SSA_NAME_VAR (new_tree) == SSA_NAME_VAR 
(old));
  
--- 587,592 
*** add_new_name_mapping (tree new_tree, tre
*** 613,620 
   respectively.  */
SET_BIT (new_ssa_names, SSA_NAME_VERSION (new_tree));
SET_BIT (old_ssa_names, SSA_NAME_VERSION (old));
- 
-   timevar_pop (TV_TREE_SSA_INCREMENTAL);
  }
  
  
--- 611,616 
*** delete_update_ssa (void)
*** 2842,2857 
  
  
  /* Create a new name for OLD_NAME in statement STMT and replace the
!operand pointed to by DEF_P with the newly created name.  Return
!the new name and register the replacement mapping NEW, OLD in
 update_ssa's tables.  */
  
  tree
  create_new_def_for (tree old_name, gimple stmt, def_operand_p def)
  {
!   tree new_name = duplicate_ssa_name (old_name, stmt);
  
!   SET_DEF (def, new_name);
  
if (gimple_code (stmt) == GIMPLE_PHI)
  {
--- 2838,2865 
  
  
  /* Create a new name for OLD_NAME in statement STMT and replace the
!operand pointed to by DEF_P with the newly created name.  If DEF_P
!is NULL then STMT should be a GIMPLE assignment.
!Return the new name and register the replacement mapping NEW, OLD in
 update_ssa's tables.  */
  
  tree
  create_new_def_for (tree old_name, gimple stmt, def_operand_p def)
  {
!   tree new_name;
  
!   timevar_push (TV_TREE_SSA_INCREMENTAL);
! 
!   if (!update_ssa_initialized_fn)
! init_update_ssa (cfun);
! 
!   gcc_assert (update_ssa_initialized_fn == cfun);
! 
!   new_name = duplicate_ssa_name (old_name, stmt);
!   if (def)
! SET_DEF (def, new_name);
!   else
! gimple_assign_set_lhs (stmt, 

Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Paolo Bonzini
Il 19/08/2012 18:55, Richard Guenther ha scritto:
  Initially I had one obstack per struct graph, which was better than using
  XNEW for every edge, but still obstack_init() called from new_graph() was
  too frequent.
 
  So in this iteration of the patch the obstack is static global, initialised
  once and never freed. Please advise on whether this is acceptable, and also
  where I should initialise the obstack once, and avoid checking if it's NULL
  in every use.
 
  Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
  I'll retest soon and post update.
 Either an obstack per graph or the ability to specify an obstack for 
 allocation.
 A global obstack with global lifetime is bad IMHO.

Dimitrios's patch has a per-file obstack with per-pass lifetime, which I
think is the right solution---but putting graph_obstack as a static
variable in graphds.h is gly.  You can just move the declaration to
each file separately, and give it a better name perhaps (e.g.
loop_graph_obstack).

Paolo


Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Richard Guenther
On Mon, Aug 20, 2012 at 2:03 PM, Paolo Bonzini bonz...@gnu.org wrote:
 Il 19/08/2012 18:55, Richard Guenther ha scritto:
  Initially I had one obstack per struct graph, which was better than using
  XNEW for every edge, but still obstack_init() called from new_graph() was
  too frequent.
 
  So in this iteration of the patch the obstack is static global, 
  initialised
  once and never freed. Please advise on whether this is acceptable, and 
  also
  where I should initialise the obstack once, and avoid checking if it's 
  NULL
  in every use.
 
  Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
  I'll retest soon and post update.
 Either an obstack per graph or the ability to specify an obstack for 
 allocation.
 A global obstack with global lifetime is bad IMHO.

 Dimitrios's patch has a per-file obstack with per-pass lifetime, which I
 think is the right solution---but putting graph_obstack as a static
 variable in graphds.h is gly.  You can just move the declaration to
 each file separately, and give it a better name perhaps (e.g.
 loop_graph_obstack).

Well, I see that the way it is constructed is that you can at most have a
single live graph at the same time (using the same obstack).  That's a
big limitation IMHO.  Now, if the users would manage the obstack completely
and instead would obstack_init()/free() them that would be different.  Of course
the outcome is exactly as with the initial patch having the obstack per
struct graph.

The present patch has too much low-level stuff exposed and does not easily
allow putting other data on the same obstack.

So I'd vote for managing the obstack completely in the caller for flexibility.
Some may want to allocate auxiliar data on the same obstack for example.

Richard.


 Paolo


[PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-20 Thread Richard Guenther

This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from
80 bytes to 72 bytes on a 64bit host.  Both structures suffer
from the fact they need storage for an integer (length and version)
which leaves unused padding.  Both data structures do not require
as many flag bits as we keep in tree_base though, so they can
conveniently use the upper 4-bytes of the 8-bytes tree_base to
store length / version.

I added a union to tree_base to divide up the space between flags
(possibly) used for all tree kinds and flags that are not used
for those who chose to re-use the upper 4-bytes of tree_base for
something else.

This superseeds the patch that moved the C++ specific usage of
TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC
isn't any closer to be based on tree_base only).

Due to re-use of flags from frontends definitive checking for
flag accesses is not always possible (TREE_NOTRHOW for example).
Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead,
to catch mis-uses of the C++ frontend.  Changed ARGUMENT_PACK_INCOMPLETE_P
to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which
it used on TREE_VECs.

We are very lazy adjusting flag usage documentation :/

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2012-08-20  Richard Guenther  rguent...@suse.de

cp/
* cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor.
(ATTR_IS_DEPENDENT): Likewise.
(ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of
TREE_LANG_FLAG_0 on TREE_VECs.

* tree.h (struct tree_base): Add union to make it possible to
re-use the upper 4 bytes for tree codes that do not need as
many flags as others.  Move visited and default_def_flag to
common bits section in exchange for saturating_flag and
unsigned_flag.  Add SSA name version and tree vec length
fields here.
(struct tree_vec): Remove length field here.
(struct tree_ssa_name): Remove version field here.

Index: trunk/gcc/cp/cp-tree.h
===
*** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200
--- trunk/gcc/cp/cp-tree.h  2012-08-20 13:53:05.212969994 +0200
*** struct GTY((variable_size)) lang_decl {
*** 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0)
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0)
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
--- 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
*** extern void decl_shadowed_for_var_insert
*** 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
--- 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
Index: trunk/gcc/tree.h
===
*** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
--- trunk/gcc/tree.h2012-08-20 13:56:12.109963537 +0200
*** struct GTY(()) tree_base {
*** 427,435 
unsigned addressable_flag : 1;
unsigned volatile_flag : 1;
unsigned readonly_flag : 1;
-   unsigned unsigned_flag : 1;
unsigned asm_written_flag: 1;
unsigned nowarning_flag : 1;
  
unsigned used_flag : 1;
unsigned nothrow_flag : 1;
--- 427,435 
unsigned addressable_flag : 1;
unsigned volatile_flag : 1;
unsigned readonly_flag : 1;
unsigned asm_written_flag: 1;
unsigned nowarning_flag : 1;
+   unsigned visited : 1;
  
unsigned used_flag : 1;
unsigned nothrow_flag : 1;
*** struct GTY(()) tree_base {
*** 438,465 

Re: [wwwdocs] SH 4.8 changes update

2012-08-20 Thread Gerald Pfeifer
On Mon, 20 Aug 2012, Oleg Endo wrote:
 You mean, since SH is neither a primary nor a secondary platform,
 there are no particular release criteria for it?  What does that
 actually mean?

From what I recall, you can make stage 1 type changes even after
stage 1.  However, if your port is broken, the release managers
will shrug their shoulders and release even if your port is quite
broken due to that.


Related to your web page update, I just comitted the patch below
which allows for direct linking to the SH entry.

Gerald


Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.16
diff -u -3 -p -r1.16 changes.html
--- changes.html20 Aug 2012 11:02:28 -  1.16
+++ changes.html20 Aug 2012 12:22:52 -
@@ -159,7 +159,7 @@ by this change./p
 options to the assembler./li
   /ul
 
-h3SH/h3
+h3 id=shSH/h3
   ul
 liThe default alignment settings have been reduced to be less aggressive.
 This results in more compact code for optimization levels other than


Re: [wwwdocs] SH 4.8 changes update

2012-08-20 Thread Richard Guenther
On Mon, Aug 20, 2012 at 1:37 AM, Oleg Endo oleg.e...@t-online.de wrote:
 On Sun, 2012-08-19 at 21:43 +0200, Gerald Pfeifer wrote:
 On Sun, 19 Aug 2012, Oleg Endo wrote:
  Thanks.  Let's hope that I can squeeze in some more stuff while
  stage 1 lasts. :T

 You know that for backend-specific changes (especially for smaller
 ports) you actually have some more leeway?

 You mean, since SH is neither a primary nor a secondary platform, there
 are no particular release criteria for it?  What does that actually
 mean?

It means that release managers do not care about the state of the SH port.
It means SH target maintainers have to watch how the release goes and
decide when to put effort into testing and fixing bugs themselves, otherwise
they may run into the situation that the SH port does not even build for
the release.  So kind of target maintainers for ports that are not primary
or secondary are their own release managers without any control over
release timing.

Richard.


Re: [graphds.h] Allocate graph from obstack

2012-08-20 Thread Dimitrios Apostolou

Hi Paolo,

On Mon, 20 Aug 2012, Paolo Bonzini wrote:

Il 19/08/2012 18:55, Richard Guenther ha scritto:

Initially I had one obstack per struct graph, which was better than using
XNEW for every edge, but still obstack_init() called from new_graph() was
too frequent.

So in this iteration of the patch the obstack is static global, initialised
once and never freed. Please advise on whether this is acceptable, and also
where I should initialise the obstack once, and avoid checking if it's NULL
in every use.

Minor speed gains (couple of ms), tested with pre-C++ conversion snapshot,
I'll retest soon and post update.

Either an obstack per graph or the ability to specify an obstack for allocation.
A global obstack with global lifetime is bad IMHO.


Dimitrios's patch has a per-file obstack with per-pass lifetime


Notice that I never call XOBDELETE with NULL argument, I only free the 
first object, which means that the 4KB per obstack overhead is retained 
until program exit, I did that to save on malloc calls.



Thanks,
Dimitris



Re: [bootstrap] Tentative fix for PR 54281

2012-08-20 Thread Diego Novillo

On 2012-08-19 07:18 , Arnaud Charlet wrote:



The conditionals cannot be removed for the time being because the
foreign language interface of the Ada part of the compiler is
hardcoded for C.

Barring a massive switch to the Ada language for the GNU project,
we would need to switch the foreign language interface to C++,
which might introduce various maintenance issues in the short term
(Arno CCed).


Yes, that would be a large hearthquake for both the compiler and the
run-time, which is unwanted at this stage. I guess the solution for
now is to more selectively export the various symbols as C symbols
and include header files as is.


Thanks.

One potential issue I see here is that as we start using more C++ 
headers (and more C++ in existing headers), we will reach a point in 
which including something like system.h inside an extern C {} block 
will fail.


None of the GCC headers should be included as C code anymore.


Diego.



Re: new sign/zero extension elimination pass

2012-08-20 Thread Tom de Vries
On 11/07/12 12:30, Tom de Vries wrote:
 On 13/11/10 10:50, Eric Botcazou wrote:
 I profiled the pass on spec2000:

 -mabi=32 -mabi=64
 ee-pass (usr time): 0.70 1.16
 total   (usr time):   919.30   879.26
 ee-pass(%): 0.08 0.13

 The pass takes 0.13% or less of the total usr runtime.

 For how many hits?  What are the numbers with --param ee-max-propagate=0?

 Is it necessary to improve the runtime of this pass?

 I've already given my opinion about the implementation.  The other passes in 
 the compiler try hard not to rescan everything when a single bit changes; as 
 currently written, yours doesn't.

 
 Eric,
 
 I've done the following:
 - refactored the pass such that it now scans at most twice over all
   instructions.
 - updated the patch to be applicable to current trunk
 - updated the motivating example to a more applicable one (as discussed in
   this thread), and added that one as test-case.
 - added a part in the header comment illustrating the working of the pass
   on the motivating example.
 
 bootstrapped and reg-tested on x86_64 and i686.
 
 build and reg-tested on mips, mips64, and arm.
 
 OK for trunk?
 

Eric,

does the new patch meet your concerns related to rescanning?

If so, OK for trunk?

Thanks,
- Tom


 Thanks,
 - Tom
 
 2012-07-10  Tom de Vries  t...@codesourcery.com
 
   * ee.c: New file.
   * tree-pass.h (pass_ee): Declare.
   * opts.c ( default_options_table): Set flag_ee at -O2.
   * timevar.def (TV_EE): New timevar.
   * common.opt (fextension-elimination): New option.
   * Makefile.in (ee.o): New rule.
   * passes.c (pass_ee): Add it.
 
   * gcc.dg/extend-1.c: New test.
   * gcc.dg/extend-2.c: Same.
   * gcc.dg/extend-2-64.c: Same.
   * gcc.dg/extend-3.c: Same.
   * gcc.dg/extend-4.c: Same.
   * gcc.dg/extend-5.c: Same.
   * gcc.target/mips/octeon-bbit-2.c: Make test more robust.
 



patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Kenneth Zadeck
This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can be 
logically grouped.This is important for the next patch that I will 
submit this week that frees the rtl level from only being able to 
represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny

Richard's comments:
=
The omission of CONST_FIXED from the cselib_expand_value_rtx_1,
attr_copy_rtx, clear_struct_flag and combine switches looks
unintentional (though only as a missed compiler-speed optimisation).
Same goes for the omission of CONST_VECTOR from check_maybe_invariant.

The omission of CONST_FIXED from dse.c:const_or_frame_p looks like
a missed target-code optimisation.  The function ought to be using
CONSTANT_P instead.

 I did not do what is suggested in the last sentence because
 it changes the behavior of the rtx HIGH.

I don't see any reason to prefer the hashing of CONST_VECTORs in
hash_invariant_expr_1 and invariant_expr_equal_p over the default
cse implementation, so here too I think we can add CONST_VECTOR
to the switch.

cse.c:exp_equiv_p, rtx_equal_for_memref_p, operands_match_p,
rtx_equal_p_cb and rtx_equal_p  are all testing the property
is pointer equality sufficient for this kind of constant.
rtx_renumbered_equal_p is too, so I think the omission of
CONST_FIXED there is again unintentional.

That leaves mark_jump_label_1.  This block is really testing
does this rtx have no operands that might be labels,
which is again true for CONST_FIXED.  TBH, this smacks
of premature optimisation to me.  How do we know that
checking each code here is cheaper than falling through?
I doubt the switch is populated enough to merit a jump table,
and the number of executed branches might well be higher
like this when you take other codes into account.

So in terms of setting the abstraction level, I think there
are two options:

1) Define:

/* Match CONST_*s that can represent compile-time constant integers.  */
#define CASE_CONST_SCALAR_INT \
  case CONST_INT: \
  case CONST_DOUBLE

/* Match CONST_*s for which pointer equality corresponds to value 
equality.  */

#define CASE_CONST_UNIQUE \
  case CONST_INT: \
  case CONST_DOUBLE: \
  case CONST_FIXED

/* Match all CONST_* rtxes.  */
#define CASE_CONST_ANY \
  case CONST_INT: \
  case CONST_DOUBLE: \
  case CONST_FIXED: \
  case CONST_VECTOR

and remove the mark_jump_label_1 cases.

The name CASE_CONST_SCALAR_INT matches SCALAR_INT_MODE_P,
CASE_CONST_UNIQUE is solely for the equality functions above.

2) As for (1), but keep the mark_jump_label_1 cases and
rename CASE_CONST_UNIQUE to CASE_CONST_LEAF so that mark_jump_label_1
can use it.  This implies that all leaf CONST_*s (i.e. those without
rtx operands) will always be hashed to give pointer equality.

I don't like that assumption, and CASE_CONST_UNIQUE feels like a better
abstraction, so I prefer (1) over (2).

For avoidance of doubt, this isn't an approval or anything.
It definitely needs to be submitted upstream.  (Yeah, yeah,
sorry for nagging. :-))

Richard
==

diff -uprN '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c
--- gccBaseline/gcc/alias.c	2012-08-17 09:35:24.794195890 -0400
+++ gccWCase/gcc/alias.c	2012-08-19 09:48:33.666509880 -0400
@@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con
   return XSTR (x, 0) == XSTR (y, 0);
 
 case VALUE:
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
   /* There's no need to compare the contents of CONST_DOUBLEs or
 	 CONST_INTs because pointer equality is a good enough
 	 comparison for these nodes.  */
diff -uprN '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c	2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c	2012-08-17 09:36:49.921199621 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
 
   switch (code)
 {
-case CONST_INT:
 case CONST:
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_INTEGER_OR_FLOAT:
 case CLOBBER:
   return 0;
 
@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
 {
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_INT:
 case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_INTEGER_OR_FLOAT:
 case PC:
 case ADDR_VEC:
 case ADDR_DIFF_VEC:
diff -uprN '--exclude=.svn' 

Re: patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Kenneth Zadeck

I of course meant the machine independent not dependent
On 08/20/2012 09:50 AM, Kenneth Zadeck wrote:
This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can 
be logically grouped.This is important for the next patch that I 
will submit this week that frees the rtl level from only being able to 
represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny




Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Tobias Burnus

Hi Richard,

your patch fails here; I get the build failure:

/projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool 
is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’:
/projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: 
variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable]

   enum tree_code rhs_code;
  ^

Tobias

On 08/17/2012 07:05 PM, Richard Earnshaw wrote:

--- tree-ssa-math-opts.c(revision 190502)
+++ tree-ssa-math-opts.c(local)



@@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree
if (is_gimple_assign (stmt))
{
  rhs_code = gimple_assign_rhs_code (stmt);
- if (TREE_CODE (type) == INTEGER_TYPE
- ? !CONVERT_EXPR_CODE_P (rhs_code)
- : rhs_code != FIXED_CONVERT_EXPR)
+ if (! widening_mult_conversion_strippable_p (type, stmt))
rhs1 = rhs;
  else
{





Re: PATCH: PR bootstrap/54209: [4.8 Regression] Failed to build gcc for Android/x86

2012-08-20 Thread H.J. Lu
On Fri, Aug 17, 2012 at 2:42 AM, Chupin, Pavel V
pavel.v.chu...@intel.com wrote:
 Submitted patch here: https://android-review.googlesource.com/#/c/41705

 -- Pavel


link.h has been added to AOSP.  I am closing PR 54209.

Thanks.

-- 
H.J.


Re: [patch, tree-ssa] PR54295 Incorrect value extension in widening multiply-accumulate

2012-08-20 Thread Richard Earnshaw
On 20/08/12 15:01, Tobias Burnus wrote:
 Hi Richard,
 
 your patch fails here; I get the build failure:
 
 /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c: In function ‘bool 
 is_widening_mult_rhs_p(tree, tree, tree_node**, tree_node**)’:
 /projects/tob/gcc-git/gcc/gcc/tree-ssa-math-opts.c:2014:18: error: 
 variable ‘rhs_code’ set but not used [-Werror=unused-but-set-variable]
 enum tree_code rhs_code;
^
 
 Tobias
 
 On 08/17/2012 07:05 PM, Richard Earnshaw wrote:
 --- tree-ssa-math-opts.c (revision 190502)
 +++ tree-ssa-math-opts.c (local)
 
 @@ -1982,9 +2019,7 @@ is_widening_mult_rhs_p (tree type, tree
 if (is_gimple_assign (stmt))
  {
rhs_code = gimple_assign_rhs_code (stmt);
 -  if (TREE_CODE (type) == INTEGER_TYPE
 -  ? !CONVERT_EXPR_CODE_P (rhs_code)
 -  : rhs_code != FIXED_CONVERT_EXPR)
 +  if (! widening_mult_conversion_strippable_p (type, stmt))
  rhs1 = rhs;
else
  {

 
 


Whoops!  Sorry about that.

Fixed thusly.  Committed as obvious.

PR tree-ssa/54295
* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Delete rhs_code
declaration and setter.

R.

Index: tree-ssa-math-opts.c
===
--- tree-ssa-math-opts.c(revision 190533)
+++ tree-ssa-math-opts.c(working copy)
@@ -2011,14 +2011,12 @@ is_widening_mult_rhs_p (tree type, tree 
 {
   gimple stmt;
   tree type1, rhs1;
-  enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
 {
   stmt = SSA_NAME_DEF_STMT (rhs);
   if (is_gimple_assign (stmt))
{
- rhs_code = gimple_assign_rhs_code (stmt);
  if (! widening_mult_conversion_strippable_p (type, stmt))
rhs1 = rhs;
  else


[C++ Patch] PR 10416

2012-08-20 Thread Paolo Carlini

Hi,

in this old issue submitter points out that we emit too easily 
Wunused_variable warnings even when the destructor has side effects 
(otoh, the constructor is handled Ok). This is particularly annoying 
together with eg, RAII.


Turns out that lately we are already careful when we handle the new 
Wunused_but_set_variable, thus I'm simply proposing to commonize the 
additional check.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2012-08-20  Paolo Carlini  paolo.carl...@oracle.com

PR c++/10416
* decl.c (poplevel): Check TYPE_HAS_NONTRIVIAL_DESTRUCTOR for
Wunused_variable too.

/testsuite
2012-08-20  Paolo Carlini  paolo.carl...@oracle.com

PR c++/10416
* g++.dg/warn/Wunused-var-17.C: New.
Index: testsuite/g++.dg/warn/Wunused-var-17.C
===
--- testsuite/g++.dg/warn/Wunused-var-17.C  (revision 0)
+++ testsuite/g++.dg/warn/Wunused-var-17.C  (revision 0)
@@ -0,0 +1,4 @@
+// PR c++/10416
+// { dg-options -Wunused }
+
+void f () { struct atend { ~atend () { __builtin_printf(leaving f\n); } } a; 
}
Index: cp/decl.c
===
--- cp/decl.c   (revision 190526)
+++ cp/decl.c   (working copy)
@@ -621,16 +621,16 @@ poplevel (int keep, int reverse, int functionbody)
   if (TREE_CODE (decl) == VAR_DECL
   (! TREE_USED (decl) || !DECL_READ_P (decl))
   ! DECL_IN_SYSTEM_HEADER (decl)
-  DECL_NAME (decl)  ! DECL_ARTIFICIAL (decl))
+  DECL_NAME (decl)  ! DECL_ARTIFICIAL (decl)
+  TREE_TYPE (decl) != error_mark_node
+  (!CLASS_TYPE_P (TREE_TYPE (decl))
+ || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl
{
  if (! TREE_USED (decl))
warning (OPT_Wunused_variable, unused variable %q+D, decl);
  else if (DECL_CONTEXT (decl) == current_function_decl
-   TREE_TYPE (decl) != error_mark_node
TREE_CODE (TREE_TYPE (decl)) != REFERENCE_TYPE
-   errorcount == unused_but_set_errorcount
-   (!CLASS_TYPE_P (TREE_TYPE (decl))
-  || !TYPE_HAS_NONTRIVIAL_DESTRUCTOR (TREE_TYPE (decl
+   errorcount == unused_but_set_errorcount)
{
  warning (OPT_Wunused_but_set_variable,
   variable %q+D set but not used, decl); 


[PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Patrick Marlier

In this PR, OMP lowering is not going into the transaction code.
So if GIMPLE_TRANSACTION is found, we lower its body.
(Patch also fixes a format issue.)

Note that PR53992 component in Bugzilla must be change from c to libgomp 
(I don't have bugzilla account with admin rights, who should I ask for 
that?).


Tested on trunk / i686.

Ok for trunk? Ok to backport to 4.7 branch if no regression?
Thanks.

gcc/
2012-08-17  Patrick Marlier  patrick.marl...@gmail.com

PR libgomp/53992
* omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.
Index: omp-low.c
===
--- omp-low.c	(revision 190488)
+++ omp-low.c	(working copy)
@@ -6827,6 +6827,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_cont
   lower_omp (gimple_try_eval_ptr (stmt), ctx);
   lower_omp (gimple_try_cleanup_ptr (stmt), ctx);
   break;
+case GIMPLE_TRANSACTION:
+  lower_omp (gimple_transaction_body_ptr (stmt), ctx);
+  break;
 case GIMPLE_BIND:
   lower_omp (gimple_bind_body_ptr (stmt), ctx);
   break;
@@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool *
   break;
 
 case GIMPLE_COND:
-	{
-	  tree lab = gimple_cond_true_label (stmt);
-	  if (lab)
-	{
-	  n = splay_tree_lookup (all_labels,
- (splay_tree_key) lab);
-	  diagnose_sb_0 (gsi_p, context,
-			 n ? (gimple) n-value : NULL);
-	}
-	  lab = gimple_cond_false_label (stmt);
-	  if (lab)
-	{
-	  n = splay_tree_lookup (all_labels,
- (splay_tree_key) lab);
-	  diagnose_sb_0 (gsi_p, context,
-			 n ? (gimple) n-value : NULL);
-	}
-	}
+  {
+	tree lab = gimple_cond_true_label (stmt);
+	if (lab)
+	  {
+	n = splay_tree_lookup (all_labels,
+   (splay_tree_key) lab);
+	diagnose_sb_0 (gsi_p, context,
+			   n ? (gimple) n-value : NULL);
+	  }
+	lab = gimple_cond_false_label (stmt);
+	if (lab)
+	  {
+	n = splay_tree_lookup (all_labels,
+   (splay_tree_key) lab);
+	diagnose_sb_0 (gsi_p, context,
+			   n ? (gimple) n-value : NULL);
+	  }
+  }
   break;
 
 case GIMPLE_GOTO:
Index: testsuite/gcc.dg/gomp/pr53992.c
===
--- testsuite/gcc.dg/gomp/pr53992.c	(revision 0)
+++ testsuite/gcc.dg/gomp/pr53992.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options -fgnu-tm -fopenmp } */
+/* { dg-require-effective-target fgnu_tm } */
+
+int main() {
+long data[1];
+long i, min=1;
+for (i=0; i1; i++) data[i] = -i;
+
+#pragma omp parallel for
+for (i=0; i1; i++) {
+__transaction_atomic
+{
+if (data[i]  min)
+min = data[i];
+}
+}
+
+return !(min == -);
+}


Fix -ftime-report for C++ lookup

2012-08-20 Thread Diego Novillo

Found this while running -ftime-report on a largish C++ source file.
We need to start TV_NAME_LOOKUP conditionally inside poplevel()
because it may be called from another lookup routine that already has
TV_NAME_LOOKUP going.

Tested on x86_64.  Committed to trunk.

2012-08-20  Diego Novillo  dnovi...@google.com

* decl.c (poplevel): Start TV_NAME_LOOKUP conditionally.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5908996..0dad597 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -552,7 +552,7 @@ poplevel (int keep, int reverse, int functionbody)
   unsigned ix;
   cp_label_binding *label_bind;

-  timevar_start (TV_NAME_LOOKUP);
+  bool subtime = timevar_cond_start (TV_NAME_LOOKUP);
  restart:

   block = NULL_TREE;
@@ -818,7 +818,7 @@ poplevel (int keep, int reverse, int functionbody)
   if (kind == sk_cleanup)
 goto restart;

-  timevar_stop (TV_NAME_LOOKUP);
+  timevar_cond_stop (TV_NAME_LOOKUP, subtime);
   return block;
 }


Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-20 Thread Simon Baldwin
On 17 August 2012 16:55, Joseph S. Myers jos...@codesourcery.com wrote:

 On Fri, 17 Aug 2012, Simon Baldwin wrote:

   You could have just added
 case OPT_cpp_:
   to the switch in gen_producer_string, instead of all this.
 
  Thanks.  I was under the impression, apparently mistaken, that
  OPT_cpp_ exists only if fortran is enabled.

 OPT_* for Fortran options only exist when the Fortran front-end is in the
 source tree (whether or not enabled).  I think we try to avoid knowingly
 breaking use cases where people remove some front ends from the source
 tree, although we don't actively test them and no longer provide split-up
 source tarballs.

Thanks for the update.  Which fix should move forwards?

--
Google UK Limited | Registered Office: Belgrave House, 76 Buckingham
Palace Road, London SW1W 9TQ | Registered in England Number: 3977902


Re: [PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Richard Henderson
On 08/20/2012 07:20 AM, Patrick Marlier wrote:
 2012-08-17  Patrick Marlier  patrick.marl...@gmail.com
 
 PR libgomp/53992
 * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.

Ok everywhere.

The Bugzilla must not change to libgomp, as that is
reserved for the runtime library.


r~


Re: [PATCH] PR53992 - openmp lower transaction code

2012-08-20 Thread Jakub Jelinek
On Mon, Aug 20, 2012 at 10:20:33AM -0400, Patrick Marlier wrote:
 Ok for trunk? Ok to backport to 4.7 branch if no regression?

Ok for both, with the following nits resolved:

 gcc/
 2012-08-17  Patrick Marlier  patrick.marl...@gmail.com
 
   PR libgomp/53992

Use PR middle-end/53992 instead, libgomp is for library issues only.

   * omp-low.c (lower_omp_1): Handle GIMPLE_TRANSACTION.

 @@ -7108,24 +7111,24 @@ diagnose_sb_2 (gimple_stmt_iterator *gsi_p, bool *
break;
  
  case GIMPLE_COND:
 - {
 -   tree lab = gimple_cond_true_label (stmt);
 -   if (lab)
 - {
 -   n = splay_tree_lookup (all_labels,
 -  (splay_tree_key) lab);
 -   diagnose_sb_0 (gsi_p, context,
 -  n ? (gimple) n-value : NULL);
 - }
 -   lab = gimple_cond_false_label (stmt);
 -   if (lab)
 - {
 -   n = splay_tree_lookup (all_labels,
 -  (splay_tree_key) lab);
 -   diagnose_sb_0 (gsi_p, context,
 -  n ? (gimple) n-value : NULL);
 - }
 - }
 +  {
 + tree lab = gimple_cond_true_label (stmt);
 + if (lab)
 +   {
 + n = splay_tree_lookup (all_labels,
 +(splay_tree_key) lab);
 + diagnose_sb_0 (gsi_p, context,
 +n ? (gimple) n-value : NULL);
 +   }
 + lab = gimple_cond_false_label (stmt);
 + if (lab)
 +   {
 + n = splay_tree_lookup (all_labels,
 +(splay_tree_key) lab);
 + diagnose_sb_0 (gsi_p, context,
 +n ? (gimple) n-value : NULL);
 +   }
 +  }
break;
  
  case GIMPLE_GOTO:

Please leave this hunk out.  Formatting can be normally fixed only
if you touch the code in question or at least lines around it, not
in an unrelated patch that touches completely different function.

 --- testsuite/gcc.dg/gomp/pr53992.c   (revision 0)
 +++ testsuite/gcc.dg/gomp/pr53992.c   (working copy)
 @@ -0,0 +1,20 @@

Please add
/* PR middle-end/53992 */
line to the beginning of the file.

 +/* { dg-do compile } */
 +/* { dg-options -fgnu-tm -fopenmp } */
 +/* { dg-require-effective-target fgnu_tm } */
 +
 +int main() {
 +long data[1];
 +long i, min=1;
 +for (i=0; i1; i++) data[i] = -i;
 +
 +#pragma omp parallel for
 +for (i=0; i1; i++) {
 +__transaction_atomic
 +{
 +if (data[i]  min)
 +min = data[i];
 +}
 +}
 +
 +return !(min == -);
 +}

Jakub


Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-20 Thread Joseph S. Myers
On Mon, 20 Aug 2012, Simon Baldwin wrote:

  OPT_* for Fortran options only exist when the Fortran front-end is in the
  source tree (whether or not enabled).  I think we try to avoid knowingly
  breaking use cases where people remove some front ends from the source
  tree, although we don't actively test them and no longer provide split-up
  source tarballs.
 
 Thanks for the update.  Which fix should move forwards?

I think the approach using a new option flag is the way to go, though the 
patch needs (at least) documentation for the new flag in options.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson
On Mon, Aug 20, 2012 at 2:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Well, it should store the largest working set in BBs, or one that came
 from a much longer run. But I see the issue you are pointing out. The
 num_counters (the number of hot bbs) should be reasonable, as the
 total number of BBs is the same between all runs, and I want to show
 the largest or more dominant working set in terms of the number of hot
 bbs. But the min_bb_counter will become more and more inaccurate as
 the number of runs increases, since the counter values are
 accumulated.

 Yes and that is the one that we plan to use to determine hot/cold decisions 
 on,
 right?

Yes, so I agree it is important to do something to update this as
profiles are merged.


 Note that there is no really 1-1 corespondence in betwen BBs and counters.
 For each function the there should be num_edges-num_bbs+1 counters.
 What do you plan to use BB counts for?

True, although what I am using this for is to get an idea of the size
of the working set to help identify large icache footprints in order
to control code size increasing optimizations such as unrolling. The
idea is that a huge number of hot counters indicates a huge number of
hot bbs which in turn indicates a huge number of hot instructions and
therefore high icache pressure. We're using it in the google branch to
control unrolling successfully.


 I typed up a detailed email below on why getting this right would be
 difficult, but then I realized there might be a fairly simple accurate
 solution, which I'll describe first:

 The only way I see to do this completely accurately is to take two
 passes through the existing gcda files that we are merging into, one
 to read in all the counter values and merge them into all the counter
 values in the arrays from the current run (after which we can do the
 histogramming/working set computation accurately from the merged
 counters), and the second to rewrite them. In libgcov this doesn't
 seem like it would be too difficult to do, although it is a little bit
 of restructuring of the main merging loop and needs some special
 handling for buffered functions (which could increase the memory
 footprint a bit if there are many of these since they will all need to
 be buffered across the iteration over all the gcda files).

 The summary merging in coverage.c confuses me a bit as it seems to be
 handling the case when there are multiple program summaries in a
 single gcda file. When would this happen? It looks like the merge
 handling in libgcov should always produce a single program summary per
 gcda file.


 This is something Nathan introduced years back. The idea was IMO to handle
 more acurately objects linked into multiple binaries. I am not sure
 if the code really works or worked on some point.

 The idea, as I recall it, was to produce overall checksum of all objects and
 have different profile records for each combination.

 This is not really useful for profile feedback as you generate single object
 file for all uses, but it might become useful for LTO where you know into 
 which
 binary you are linking to. I am not really sure it is worth all the 
 infrastructure
 needed to support this though.

Ok, so perhaps for the merging in coverage.c we can do something less
accurate, and worry more about the libgcov merging.


 
 
  Why you don't simply write the histogram into gcov file and don't merge 
  the values
  here (i.e. doing the cummulation loop in GCC instead of within libgcov)?

 That doesn't completely solve the problem, unfortunately. The reason
 is that you don't know which histogram entry contains a particular
 block each run, and therefore you don't know how to compute the new
 combined histogram value and index for that bb counter. For example, a
 particular histogram index may have 5 counters (bbs) in it for one run
 and 2 counters (bbs) in it for a second run, so the question is how to
 compute the new entry for all of those bb counters, as the 5 bbs from
 the first run may or may not be a superset of the 2 from the second
 run. You could assume that the bbs have the same relative order of
 hotness between runs, and combine the bb counters accordingly, but
 there is still some trickiness because you have to apportion the
 cumulative counter sum stored in the histogram entry between new
 entries. For example, assume the highest indexed non-zero entries (the
 histogram buckets containing the largest counter values) in the two
 incoming histograms are:

 histogram 1:

 index 100: 4 counters, cumulative value X, min counter value minx
 ...

 histogram 2:

 index 100: 2 counters, cumulative value Y, min counter value miny
 index 99: 3 counters, cumulative value W, min counter value minw
 ...

 To merge, you could assume something like 2 counters with a new
 cumulative value (Y + X*2/4), and new min counter value minx+miny,
 that go into the merged histogram with the index corresponding to
 counter value minx+miny. And then 2 counters have 

[DF] RFC: obstacks in DF

2012-08-20 Thread Dimitrios Apostolou

Hi,

while I was happy using obstacks in other parts of the compiler I thought 
they would provide a handy solution for the XNEWVECs/XRESIZEVECs in 
df-scan.c, especially df_install_refs() which is the heaviest malloc() 
user after the rest of my patches.


In the process I realised that obstacks weren't exactly suitable (thanks 
matz for helping on IRC), nevertheless I have a patch which is stable, a 
bit faster and more memory friendly (don't know why, but RSS memory for 
common non-pathological compilations, was smaller). However after trying 
various incorrect changes I'm aware that there are leaks in there that 
can't be closed without dirty hacks, so I gave up.


I'm posting the simplest but stable version of my changes and would really 
like to hear ideas. There are holes in the obstack that should have been 
free'd, but in the end I didn't see memory increase. I don't know what 
would happen for huge functions that keep the obstack alive for long, I 
didn't have such testcase. Finally, I think my next try will involve 
converting the chains to pool_alloc'ed linked list, do you think that 
makes sense?



Thanks,
Dimitris
=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c   2012-07-16 11:32:42 +
+++ gcc/df-scan.c   2012-08-20 14:01:59 +
@@ -184,6 +184,17 @@ struct df_scan_problem_data
   bitmap_obstack insn_bitmaps;
 };
 
+/* Obstack for storing all of all of
+   insn_info-{defs,uses,eq_uses,mw_hardregs} and
+   bb_info-artificial_{defs,uses}.  */
+static struct obstack df_refs_obstack;
+
+/* Keep the obstack initialised (only 4k overhead) and use this pointer to
+   actually empty it fast.  */
+static void *df_first_refs_obj;
+static int df_refs_obstack_is_init;
+
+
 typedef struct df_scan_bb_info *df_scan_bb_info_t;
 
 
@@ -193,36 +204,13 @@ df_scan_free_internal (void)
 {
   struct df_scan_problem_data *problem_data
 = (struct df_scan_problem_data *) df_scan-problem_data;
-  unsigned int i;
-  basic_block bb;
 
-  /* The vectors that hold the refs are not pool allocated because
- they come in many sizes.  This makes them impossible to delete
- all at once.  */
-  for (i = 0; i  DF_INSN_SIZE(); i++)
-{
-  struct df_insn_info *insn_info = DF_INSN_UID_GET(i);
-  /* Skip the insns that have no insn_info or have been
-deleted.  */
-  if (insn_info)
-   {
- df_scan_free_ref_vec (insn_info-defs);
- df_scan_free_ref_vec (insn_info-uses);
- df_scan_free_ref_vec (insn_info-eq_uses);
- df_scan_free_mws_vec (insn_info-mw_hardregs);
-   }
-}
-
-  FOR_ALL_BB (bb)
-{
-  unsigned int bb_index = bb-index;
-  struct df_scan_bb_info *bb_info = df_scan_get_bb_info (bb_index);
-  if (bb_info)
-   {
- df_scan_free_ref_vec (bb_info-artificial_defs);
- df_scan_free_ref_vec (bb_info-artificial_uses);
-   }
-}
+  /* Delete at once all vectors that hold the refs (all of
+ insn_info-{defs,uses,eq_uses,mw_hardregs} and
+ bb_info-artificial_{defs,uses}) but keep the obstack initialised, so
+ that it's ready for more BBs.  */
+  obstack_free (df_refs_obstack, df_first_refs_obj);
+  df_first_refs_obj = NULL;
 
   free (df-def_info.refs);
   free (df-def_info.begin);
@@ -364,6 +352,14 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
   bb_info-artificial_uses = NULL;
 }
 
+  if (! df_refs_obstack_is_init)
+{
+  obstack_init (df_refs_obstack);
+  df_refs_obstack_is_init = 1;
+}
+  gcc_assert (df_first_refs_obj == NULL);
+  df_first_refs_obj = XOBNEW (df_refs_obstack, void *);
+
   bitmap_initialize (df-hardware_regs_used, problem_data-reg_bitmaps);
   bitmap_initialize (df-regular_block_artificial_uses, 
problem_data-reg_bitmaps);
   bitmap_initialize (df-eh_block_artificial_uses, 
problem_data-reg_bitmaps);
@@ -791,9 +787,15 @@ df_install_ref_incremental (df_ref ref)
 }
 
   ref_rec = *ref_rec_ptr;
+  /* fprintf (stderr, count:%d ref_rec:%p\n, count, ref_rec); */
   if (count)
 {
-  ref_rec = XRESIZEVEC (df_ref, ref_rec, count+2);
+  /* Impossible to actually know if ref_rec was malloc'ed or obstack'ed
+thus we might have a leak here!  */
+  df_ref *ref_rec2 = XOBNEWVEC (df_refs_obstack, df_ref, count+2);
+  memcpy (ref_rec2, ref_rec, count*sizeof(df_ref));
+  /* free (ref_rec); */
+  ref_rec = ref_rec2;
   *ref_rec_ptr = ref_rec;
   ref_rec[count] = ref;
   ref_rec[count+1] = NULL;
@@ -801,7 +803,7 @@ df_install_ref_incremental (df_ref ref)
 }
   else
 {
-  df_ref *ref_rec = XNEWVEC (df_ref, 2);
+  ref_rec = XOBNEWVEC (df_refs_obstack, df_ref, 2);
   ref_rec[0] = ref;
   ref_rec[1] = NULL;
   *ref_rec_ptr = ref_rec;
@@ -1070,8 +1072,9 @@ df_ref_chain_delete (df_ref *ref_rec)
 
   /* If the list is empty, it has a special shared element that is not
  to be deleted.  */
-  if (*start)
-free (start);
+  /* if (*start) */
+  /*   free (start); */
+  /* obstack_free'd 

Re: C++ PR 54197: lifetime of reference not properly extended

2012-08-20 Thread Ollie Wild
On Thu, Aug 16, 2012 at 2:13 PM, Gabriel Dos Reis
g...@integrable-solutions.net wrote:

 On Wed, Aug 15, 2012 at 9:52 AM, Ollie Wild a...@google.com wrote:
  (Adding other C++ maintainers in case someone else wants to have a
  stab.)
 
  Ping?

 I consider Jason to be the expert on this; so let see what he says.

 -- Gaby

Jason, any idea when you can look at this?

The patch is about as short as they come, so it shouldn't take long to review.

Thanks,
Ollie


Re: [DF] RFC: obstacks in DF

2012-08-20 Thread Steven Bosscher
On Mon, Aug 20, 2012 at 4:54 PM, Dimitrios Apostolou ji...@gmx.net wrote:
 In the process I realised that obstacks weren't exactly suitable (thanks
 matz for helping on IRC),

Right, alloc-pool would be a better choice here.

Ciao!
Steven


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Steven Bosscher
On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
 The summary merging in coverage.c confuses me a bit as it seems to be
 handling the case when there are multiple program summaries in a
 single gcda file. When would this happen? It looks like the merge
 handling in libgcov should always produce a single program summary per
 gcda file.

AFAIU it merges existing profile data with new profile data from a
second (or third, or ...) trial run.

Ciao!
Steven


Re: patch for machine dependent rtl section to hide case statements for different types of constants.

2012-08-20 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 The omission of CONST_FIXED from the cselib_expand_value_rtx_1,
 attr_copy_rtx, clear_struct_flag and combine switches looks
 unintentional (though only as a missed compiler-speed optimisation).
 Same goes for the omission of CONST_VECTOR from check_maybe_invariant.

 The omission of CONST_FIXED from dse.c:const_or_frame_p looks like
 a missed target-code optimisation.  The function ought to be using
 CONSTANT_P instead.

  I did not do what is suggested in the last sentence because
  it changes the behavior of the rtx HIGH.

As mentioned privately, that's what we want.

 1) Define:

 /* Match CONST_*s that can represent compile-time constant integers.  */
 #define CASE_CONST_SCALAR_INT \
case CONST_INT: \
case CONST_DOUBLE

 /* Match CONST_*s for which pointer equality corresponds to value 
 equality.  */
 #define CASE_CONST_UNIQUE \
case CONST_INT: \
case CONST_DOUBLE: \
case CONST_FIXED

 /* Match all CONST_* rtxes.  */
 #define CASE_CONST_ANY \
case CONST_INT: \
case CONST_DOUBLE: \
case CONST_FIXED: \
case CONST_VECTOR

 and remove the mark_jump_label_1 cases.

I meant that these three should be the _only_ cases we need.
The reason I listed all those missed cases was that, with the
exception of mark_jump_label_1, the switches really seemed to be
testing one of the three conditions above.

Richard


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li

 So I definitely preffer 2 or 3 over 1. David has experience with 3. How well 
 does
 it work for LIPO?


This (lack of locking, races) is not a new problem. There is no
synchronization in libgcov for profile update/merge at both thread and
process level. Thread level data races leads to inconsistent counters,
but can be mostly corrected under -fprofile-correction and smoothing.
There is also problem with false indirect call targets --- gcc has
mechanism to filter those out.

Process level synchronization problems can happen when two processes
(running the instrumented binary) exit at the same time. The
updated/merged counters from one process may be overwritten by another
process -- this is true for both counter data and summary data.
Solution 3) does not introduce any new problems.

thanks,

David



 Honza

 Thanks,
 Teresa

 
  Honza




 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


failed attempt: retain identifier length from frontend to backend

2012-08-20 Thread Dimitrios Apostolou

Hello,

my last attempt on improving something serious was about three weeks ago, 
trying to keep all lengths of all strings parsed in the frontend for the 
whole compilation phase until the assembly output. I was hoping that would 
help on using faster hashes (knowing the length allows us to hash 4 or 8 
bytes per iteration), quicker strcmp in various places, and using less 
strlen() calls, which show especially on -g3 compilations that store huge 
macro strings.


I'll post no patch here, since what I currently have is a mess in 3 
different branches and most don't even compile. I tried various 
approaches. First I tried adding an extra length parameter in all relevant 
functions, starting from the assembly generation and working my way 
upwards. This got too complex, and I'd really like to ask if you find any 
meaning in changing target specific hooks and macros to actually accept 
length as argument (e.g. ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). 
Changes seemed too intrusive for me to continue.


But seeing that identifier length is there inside struct ht_identifier (or 
cpp_hashnode) and not lost, I tried the approach of having the length at 
str[-4] for all identifiers. To achieve this I changed ht_identifier to 
store str with the flexible array hack. Unfortunately I hadn't noticed 
that ht_identifier was a part of tree_node and also part of too many other 
structs, so changing all those structs to have variable size was not 
without side effects. In the end it compiled but I got crashes all over, 
and I'm sure I didn't do things right since I broke things like the static 
assert in libcpp/identifiers.c, that I don't even understand:


 /* We don't need a proxy since the hash table's identifier comes first
in cpp_hashnode.  However, in case this is ever changed, we have a
static assertion for it.  */
-extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) == 0 
? 1 : -1];

Anyway last attempt was to decouple ht_identifier completely from trees 
and other structs by storing pointer to it, but I was pretty worn out and 
quickly gave up after getting errors on gengtype-generated files that I 
didn't even know how to handle.


Was all this project too ambitious? I'd appreciate all input.


Thanks,
Dimitris



Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li
 If this approach seems like it is feasible, then we could stick with
 the current approach of emitting the working set array in the summary,
 mitigating it somewhat by doing the sum_all based scaling of the
 counter values, then in a follow on patch restructure the merging code
 to delay the working set computation as described above.

+1

David

 Teresa


 Honza

 Thanks,
 Teresa

 
  Honza




 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



 --
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH][LIBTOOL] AIX PIC compiler options

2012-08-20 Thread David Edelsohn
The GCC -fpic/-fPIC option has evolved to mean code generation for a
shared library and changes the optimization behavior of the compiler.
Code for AIX PowerPC always is PIC, but the optimization behavior is
affecting AIX.

libtool exports all global symbols on AIX while GCC binds_local_p()
determines that some function calls are local, causing GCC to emit the
local (non-external) form of function calls, which generates AIX
linker warnings.

The IBM XL compiler defaults to the equivalent of -fpic.  GCC could
try to match that, but it requires working around other GCC bootstrap
problems and continually chasing compatibility.  The other option is
to assume that developers using GCC to build shared libraries either
are using libtool or copying FOSS Makefiles designed for GNU/Linux or
expecting GNU/Linux compatibility.

The following patch starts to implement the latter by adding -fPIC
to the compiler command line options when building objects for a
shared library.  Because this places control in the hands of the user
with a command line option and matches GNU/Linux, this seems better
than playing with defaults and fighting GCC semantics for building
shared libraries.

I also will post this patch upstream to Libtool.

Comments?

Thanks, David

* libtool.m4 (_LT_COMPILER_PIC): Add -fPIC to GCC and GXX for AIX.
* configure: Regenerate.

Index: libtool.m4
===
--- libtool.m4  (revision 190521)
+++ libtool.m4  (working copy)
@@ -3580,6 +3580,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)
@@ -3891,6 +3892,7 @@
# AIX 5 now supports IA64 processor
_LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   fi
+  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-fPIC'
   ;;

 amigaos*)


Re: [C++ Patch] PR 10416

2012-08-20 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson
On Mon, Aug 20, 2012 at 8:35 AM, Steven Bosscher stevenb@gmail.com wrote:
 On Mon, Aug 20, 2012 at 11:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
 The summary merging in coverage.c confuses me a bit as it seems to be
 handling the case when there are multiple program summaries in a
 single gcda file. When would this happen? It looks like the merge
 handling in libgcov should always produce a single program summary per
 gcda file.

 AFAIU it merges existing profile data with new profile data from a
 second (or third, or ...) trial run.

It looks like libgcov will always merge program summaries between
different runs though. As far as I can tell, it will always either
rewrite the whole gcda file with a single merged program summary, or
abort the write if the merge was not successful. However, the comment
above gcov_exit does talk about keeping program summaries separate for
object files contained in different programs, which is what Honza was
describing as the motivation for the coverage.c merging. But I don't
see where multiple program summaries ever get written to the same gcda
file - if the checksums are different it seems to abort the write. But
the code in coverage.c is dealing with a single gcda file containing
multiple program summaries. Is there code somewhere that will cause
this to happen?

Thanks,
Teresa


 Ciao!
 Steven



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Steven Bosscher
On Mon, Aug 20, 2012 at 7:44 PM, Teresa Johnson tejohn...@google.com wrote:
 But
 the code in coverage.c is dealing with a single gcda file containing
 multiple program summaries. Is there code somewhere that will cause
 this to happen?

Not that I know of, no.
(There are enhancement requests for this, see e.g. PR47618).

Ciao!
Steven


Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target

2012-08-20 Thread Thomas Koenig

Hi Tobias,

do you think that -Wtarget-lifetime should be included with -Wall?
I think so, because the code flagged is certainly invalid, and likely
to cause random errors.

Thomas


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Sriraman Tallam
Ping.



On Tue, Aug 14, 2012 at 11:02 AM, Sriraman Tallam tmsri...@google.com wrote:
 +ger...@pfiefer.com

 On Tue, Aug 14, 2012 at 10:51 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi Gerald,

Is this release note alright?

 Thanks,
 -Sri.

 On Fri, Aug 10, 2012 at 7:20 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

I have added a release note for x86 builtins __builtin_cpu_is and
 __builtin_cpu_supports. They were checked in to trunk in rev. 186789.
 Is this ok to submit?

 Thanks,
 -Sri.


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Diego Novillo
On Fri, Aug 10, 2012 at 10:20 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

I have added a release note for x86 builtins __builtin_cpu_is and
 __builtin_cpu_supports. They were checked in to trunk in rev. 186789.
 Is this ok to submit?

I would not include such detailed documentation in changes.html.  I
assume that all this documentation, including caveats and limitations
is documented in the manual itself?  If that's the case, then simply
mention the builtins, an overview description of about a paragraph and
pointers to the user documentation for limitations and caveats.


Diego.


Re: [Google 4.7] Generate pubnames compatible with gdb-index version 7. (issue6459099)

2012-08-20 Thread Sterling Augustine
On Thu, Aug 16, 2012 at 5:32 PM, Cary Coutant ccout...@google.com wrote:
 +/* Output a single entry in the pubnames table.  */
 +
 +static void
 +output_pubname (dw_offset die_offset, pubname_entry *entry)

 For this function, I'd suggest a comment to the effect that the logic
 is lifted from GDB.

 @@ -2424,6 +2424,10 @@ gpubnames
  Common RejectNegative Var(debug_generate_pub_sections, 1)
  Generate DWARF pubnames and pubtypes sections.

 +ggnu-pubnames
 +Common RejectNegative Var(debug_generate_pub_sections, 2)
 +Generate DWARF pubnames and pubtypes sections.

 Instead of RejectNegative, I think these three options should now
 use Negative(...) flags (each one naming the next, circularly). Not
 sure about that, though. (See the treatment of -gdwarf-, -gstabs,
 etc.)

 OK for google/gcc-4_7 branch.

Committed with your recommended changes, as attached.

Sterling


patch.diff
Description: Binary data


Re: [PATCH] Fix -fcompare-debug failure in fwprop (PR rtl-optimization/54294)

2012-08-20 Thread Richard Sandiford
Jakub Jelinek ja...@redhat.com writes:
 2012-08-17  Jakub Jelinek  ja...@redhat.com

   PR rtl-optimization/54294
   * fwprop.c (all_uses_available_at): Ignore debug insns in between
   def_insn and target_insn when checking whether the shortcut is
   possible.

OK, thanks.

Richard


Re: [Patch, Fortran] PR54301 - add warning for pointer might outlive its target

2012-08-20 Thread Tobias Burnus

Hi Thomas,

Thomas Koenig wrote:

do you think that -Wtarget-lifetime should be included with -Wall?
I think so, because the code flagged is certainly invalid, and likely
to cause random errors.


I concur. However, that's what the current version in the trunk does: 
-Wall implies -Wtarget-lifetime.


On the other hand, I just realized that attr.result is not set if the 
function symbols is also its result symbol – hence, there is no warning for:


function f()
  integer, pointer :: f
  integer, target :: t
  f = t
end

I think the following patch will fix that. I will package, regtest and 
commit it later.


Tobias

--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, 
gfc_expr *rvalue)


   warn = lvalue-symtree-n.sym-attr.dummy
 || lvalue-symtree-n.sym-attr.result
+|| lvalue-symtree-n.sym-attr.function
 || lvalue-symtree-n.sym-attr.host_assoc
 || lvalue-symtree-n.sym-attr.use_assoc
 || lvalue-symtree-n.sym-attr.in_common;


[SPARC] Define MAX_FIXED_MODE_SIZE

2012-08-20 Thread Eric Botcazou
SPARC is now one of the last mainstream 64-bit platforms that do not define 
MAX_FIXED_MODE_SIZE to TImode.  Doing so helps the Ada compiler in particular 
because it is a heavy user of structures made up of a pair of pointers.

Bootstrapped/regtested/compat-regtested on SPARC64/Solaris, applied on the 
mainline.


2012-08-20  Eric Botcazou  ebotca...@adacore.com

* config/sparc/sparc.h (MAX_FIXED_MODE_SIZE): Define.


-- 
Eric Botcazou
Index: config/sparc/sparc.h
===
--- config/sparc/sparc.h	(revision 190512)
+++ config/sparc/sparc.h	(working copy)
@@ -475,7 +475,6 @@ extern enum cmodel sparc_cmodel;
 #endif
 
 /* Now define the sizes of the C data types.  */
-
 #define SHORT_TYPE_SIZE		16
 #define INT_TYPE_SIZE		32
 #define LONG_TYPE_SIZE		(TARGET_ARCH64 ? 64 : 32)
@@ -512,7 +511,6 @@ extern enum cmodel sparc_cmodel;
 #define SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64  TARGET_STACK_BIAS)
 
 /* ALIGN FRAMES on double word boundaries */
-
 #define SPARC_STACK_ALIGN(LOC) \
   (TARGET_ARCH64 ? (((LOC)+15)  ~15) : (((LOC)+7)  ~7))
 
@@ -551,6 +549,10 @@ extern enum cmodel sparc_cmodel;
  : MAX ((COMPUTED), (SPECIFIED)))			\
:  MAX ((COMPUTED), (SPECIFIED)))
 
+/* An integer expression for the size in bits of the largest integer machine
+   mode that should actually be used.  We allow pairs of registers.  */
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_ARCH64 ? TImode : DImode)
+
 /* We need 2 words, so we can save the stack pointer and the return register
of the function containing a non-local goto target.  */
 #define STACK_SAVEAREA_MODE(LEVEL) \


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Andi Kleen
Xinliang David Li davi...@google.com writes:

 Process level synchronization problems can happen when two processes
 (running the instrumented binary) exit at the same time. The
 updated/merged counters from one process may be overwritten by another
 process -- this is true for both counter data and summary data.
 Solution 3) does not introduce any new problems.

You could just use lockf() ?

-Andi


[PATCH] Fix valtrack ICE (PR debug/53923)

2012-08-20 Thread Jakub Jelinek
Hi!

On the testcase from this PR on AVR (from libgcc, thus not including
it into testsuite/) we ICE, because dead_debug_insert_temp is called
several times on the same insn, for multi-register hard register for each
regno in it (except the first which doesn't seem to be dead).
In the first call dead_debug_insert_temp changes *DF_REF_REAL_LOC
to DEBUG_EXPR, and in the next call for the next consecutive hard register
we set reg variable to *DF_REF_REAL_LOC and rely on it to be a REG,
when it is a DEBUG_EXPR already instead.  No change is needed in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux (where this never
kicks in though), tested on the testcase in cross to avr.  Ok for trunk?

2012-08-20  Jakub Jelinek  ja...@redhat.com

PR debug/53923
* valtrack.c (dead_debug_insert_temp): Drop non-reg uses
from the chain.

--- gcc/valtrack.c.jj   2012-08-10 12:57:21.0 +0200
+++ gcc/valtrack.c  2012-08-20 14:59:07.609586159 +0200
@@ -336,6 +336,14 @@ dead_debug_insert_temp (struct dead_debu
 {
   if (DF_REF_REGNO (cur-use) == uregno)
{
+ /* If this loc has been changed e.g. to debug_expr already
+as part of a multi-register use, just drop it.  */
+ if (!REG_P (*DF_REF_REAL_LOC (cur-use)))
+   {
+ *tailp = cur-next;
+ XDELETE (cur);
+ continue;
+   }
  *usesp = cur;
  usesp = cur-next;
  *tailp = cur-next;

Jakub


[Patch, Fortran, commit] -Wtarget-lifetime: Fix omission

2012-08-20 Thread Tobias Burnus

Committed as Rev. 190542

Tobias
Index: gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90
===
--- gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90	(Revision 0)
+++ gcc/testsuite/gfortran.dg/warn_target_lifetime_2.f90	(Arbeitskopie)
@@ -0,0 +1,10 @@
+! { dg-do compile }
+! { dg-options -Wtarget-lifetime }
+!
+! PR fortran/54301
+!
+function f()
+  integer, pointer :: f
+  integer, target :: t
+  f = t ! { dg-warning Pointer at .1. in pointer assignment might outlive the pointer target }
+end
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(Revision 190541)
+++ gcc/testsuite/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2012-08-20  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/54301
+	* gfortran.dg/warn_target_lifetime_2.f90: New.
+
 2012-08-20  Paolo Carlini  paolo.carl...@oracle.com
 
 	PR c++/10416
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c	(Revision 190541)
+++ gcc/fortran/expr.c	(Arbeitskopie)
@@ -3673,6 +3673,7 @@ gfc_check_pointer_assign (gfc_expr *lvalue, gfc_ex
 
   warn = lvalue-symtree-n.sym-attr.dummy
 	 || lvalue-symtree-n.sym-attr.result
+	 || lvalue-symtree-n.sym-attr.function
 	 || lvalue-symtree-n.sym-attr.host_assoc
 	 || lvalue-symtree-n.sym-attr.use_assoc
 	 || lvalue-symtree-n.sym-attr.in_common;
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190541)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,6 +1,12 @@
 2012-08-20  Tobias Burnus  bur...@net-b.de
 
 	PR fortran/54301
+	* expr.c (gfc_check_pointer_assign): Warn when a pointer,
+	which is a function result, might outlive its target.
+
+2012-08-20  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/54301
 	* expr.c (gfc_check_pointer_assign): Warn when the pointer
 	might outlive its target.
 	* gfortran.h (struct gfc_option_t): Add warn_target_lifetime.


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Xinliang David Li
I was mistaken here -- gcov_open actually uses locking via fcntl
interface -- so we do need to find a way to solve the summary update
synchronization problem when it is separate out of the per-file update
loop.

David

On Mon, Aug 20, 2012 at 12:03 PM, Andi Kleen a...@firstfloor.org wrote:
 Xinliang David Li davi...@google.com writes:

 Process level synchronization problems can happen when two processes
 (running the instrumented binary) exit at the same time. The
 updated/merged counters from one process may be overwritten by another
 process -- this is true for both counter data and summary data.
 Solution 3) does not introduce any new problems.

 You could just use lockf() ?

 -Andi


Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-08-20 Thread Mark Wielaard
On Fri, Apr 27, 2012 at 08:16:04PM +0200, Mark Wielaard wrote:
 On Fri, 2012-04-27 at 15:43 +0200, Jakub Jelinek wrote:
  On Fri, Apr 27, 2012 at 03:36:56PM +0200, Mark Wielaard wrote:
   But even without this, I think the patch is worth it just to get rid of
   all the relocations necessary otherwise.
  
  IMHO we should defer applying this by a few months, given that GDB support
  is only being added these days and -gdwarf-4 is now the default.
  Applying it in August/September when there is already GDB version with
  the support would be better.
 
 OK, I'll try to remember and ping as soon as a new GDB release is out
 with the patch in.

Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and
elfutils 0.154 out now that support it.

I rebased the patch and tested against GDB 7.5.

2012-08-20  Mark Wielaard  m...@redhat.com

* dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
* dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
(add_AT_low_high_pc): New function.
(AT_lbl): Handle dw_val_class_high_pc.
(print_die): Likewise.
(attr_checksum): Likewise.
(attr_checksum_ordered): Likewise.
(same_dw_val_p): Likewise.
(size_of_die): Likewise.
(value_format): Likewise.
(output_die): Likewise.
(gen_subprogram_die): Use add_AT_low_high_pc.
(add_high_low_attributes): Likewise.
(dwarf2out_finish): Likewise.

OK to commit now?

Thanks,

Mark
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4bc4cc3..11d925b 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -1312,6 +1312,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_node *b)
 case dw_val_class_fde_ref:
   return a-v.val_fde_index == b-v.val_fde_index;
 case dw_val_class_lbl_id:
+case dw_val_class_high_pc:
   return strcmp (a-v.val_lbl_id, b-v.val_lbl_id) == 0;
 case dw_val_class_str:
   return a-v.val_str == b-v.val_str;
@@ -3598,6 +3599,26 @@ add_AT_data8 (dw_die_ref die, enum dwarf_attribute 
attr_kind,
   add_dwarf_attr (die, attr);
 }
 
+/* Add DW_AT_low_pc and DW_AT_high_pc to a DIE.  */
+static inline void
+add_AT_low_high_pc (dw_die_ref die, const char *lbl_low, const char *lbl_high)
+{
+  dw_attr_node attr;
+
+  attr.dw_attr = DW_AT_low_pc;
+  attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_low);
+  add_dwarf_attr (die, attr);
+
+  attr.dw_attr = DW_AT_high_pc;
+  if (dwarf_version  4)
+attr.dw_attr_val.val_class = dw_val_class_lbl_id;
+  else
+attr.dw_attr_val.val_class = dw_val_class_high_pc;
+  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_high);
+  add_dwarf_attr (die, attr);
+}
+
 /* Hash and equality functions for debug_str_hash.  */
 
 static hashval_t
@@ -3981,7 +4002,8 @@ AT_lbl (dw_attr_ref a)
 {
   gcc_assert (a  (AT_class (a) == dw_val_class_lbl_id
|| AT_class (a) == dw_val_class_lineptr
-   || AT_class (a) == dw_val_class_macptr));
+   || AT_class (a) == dw_val_class_macptr
+   || AT_class (a) == dw_val_class_high_pc));
   return a-dw_attr_val.v.val_lbl_id;
 }
 
@@ -4877,6 +4899,7 @@ print_die (dw_die_ref die, FILE *outfile)
case dw_val_class_lbl_id:
case dw_val_class_lineptr:
case dw_val_class_macptr:
+   case dw_val_class_high_pc:
  fprintf (outfile, label: %s, AT_lbl (a));
  break;
case dw_val_class_str:
@@ -5033,6 +5056,7 @@ attr_checksum (dw_attr_ref at, struct md5_ctx *ctx, int 
*mark)
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   break;
 
 case dw_val_class_file:
@@ -5305,6 +5329,7 @@ attr_checksum_ordered (enum dwarf_tag tag, dw_attr_ref at,
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   break;
 
 case dw_val_class_file:
@@ -5770,6 +5795,7 @@ same_dw_val_p (const dw_val_node *v1, const dw_val_node 
*v2, int *mark)
 case dw_val_class_lbl_id:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
+case dw_val_class_high_pc:
   return 1;
 
 case dw_val_class_file:
@@ -7241,6 +7267,9 @@ size_of_die (dw_die_ref die)
case dw_val_class_vms_delta:
  size += DWARF_OFFSET_SIZE;
  break;
+   case dw_val_class_high_pc:
+ size += DWARF2_ADDR_SIZE;
+ break;
default:
  gcc_unreachable ();
}
@@ -7558,6 +7587,17 @@ value_format (dw_attr_ref a)
 case dw_val_class_data8:
   return DW_FORM_data8;
 
+case dw_val_class_high_pc:
+  switch (DWARF2_ADDR_SIZE)
+   {
+ case 4:
+   return DW_FORM_data4;
+ case 8:
+   return DW_FORM_data8;
+ default:
+   gcc_unreachable ();
+   }
+
 default:
   gcc_unreachable ();
 }
@@ -7984,6 +8024,11 @@ output_die (dw_die_ref die)
break;
  }
 
+   case 

Re: dwarf2out.c: For DWARF 4+, output DW_AT_high_pc as constant offset.

2012-08-20 Thread Jakub Jelinek
On Mon, Aug 20, 2012 at 09:59:26PM +0200, Mark Wielaard wrote:
 Ping. There are stable releases of GDB 7.5, valgrind 3.8.0 and
 elfutils 0.154 out now that support it.
 
 I rebased the patch and tested against GDB 7.5.
 
 2012-08-20  Mark Wielaard  m...@redhat.com
 
 * dwarf2out.h (enum dw_val_class): Add dw_val_class_high_pc.
 * dwarf2out.c (dw_val_equal_p): Handle dw_val_class_high_pc.
 (add_AT_low_high_pc): New function.
 (AT_lbl): Handle dw_val_class_high_pc.
 (print_die): Likewise.
 (attr_checksum): Likewise.
 (attr_checksum_ordered): Likewise.
 (same_dw_val_p): Likewise.
 (size_of_die): Likewise.
 (value_format): Likewise.
 (output_die): Likewise.
 (gen_subprogram_die): Use add_AT_low_high_pc.
 (add_high_low_attributes): Likewise.
 (dwarf2out_finish): Likewise.
 
 OK to commit now?

Okay, thanks.

Jakub


Re: Fix PR c++/19351 (operator new[] overflow)

2012-08-20 Thread Jason Merrill

OK.  Sorry for the delay.

Jason


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Gerald Pfeifer
Hi Sriraman,

On Fri, 10 Aug 2012, Sriraman Tallam wrote:
 I have added a release note for x86 builtins __builtin_cpu_is and
 __builtin_cpu_supports. They were checked in to trunk in rev. 186789.

I had hoped one of the x86 maintainers would review this from his
perspective given that they have more background.  For the lack of
that, let me give it a try.

Index: changes.html
===
+li New builtin functions to detect run-time CPU type and ISA:br

built-in, cf. http://gcc.gnu.org/codingconventions.html; here and
in the following.

No br here; ul should just do that.

+ul
+  liBuiltin code__builtin_cpu_is/code has been added to detect if
+  the run-time CPU is of a particular type. The builtin returns a postive
+  integer on a match and zero otherwise. The builtin accepts one string
+  literal argument, the CPU name. For example,

A built-in function...

positive

It accepts one string (to make this shorter)

+  code__builtin_cpu_is(westmere)/code returns a postive integer if

positive

+  the run-time CPU is an Intel Corei7 Westmere processor.  The following

I don't work for Intel, but should there be a space before i7?

+  are the CPU names recognized by code__builtin_cpu_is:/code

How about making this The following are the CPU names recognized for
now, which avoids another reference to the name of the built-in and
makes it clear that this is subject to change.

+  liBuiltin code__builtin_cpu_supports/code has been added to detect

A built-in function...

+  returns a postive integer on a match and zero otherwise. The builtin

positive

+  following are the ISA features recognized by
+  code__builtin_cpu_supports:/code

Same is above?

+pCaveat: If the above builtins are called before any constructors are
+invoked, like during IFUNC initialization, then the CPU detection
+initialization must be explicity run using this newly provided
+builtin,  code__builtin_cpu_init/code.

...using the new built-in function code__builtin_cpu_init/code.

What is a constructor in this context, by the way?  Will this be clear
to all the users?

+code
+static void (*some_ifunc_resolver(void))(void)br
+{br
+nbspnbsp __builtin_cpu_init();br
+nbspnbsp if (__builtin_cpu_is(amdfam10h) ...br
+nbspnbsp if (__builtin_cpu_supports(popcnt) ...br
+}
+/code

How about using pre here? That avoids the br/s which will cause
problems with the web page validator, by the way.


Nice job for documenting this so well.  Thanks for taking the time
and your patience!

The patch is fine modulo the changes I pointed out (though some of
them are more suggestions and you do not need to slavishly follow
those).

Gerald


Re: [patch] Fix problems with -fdebug-types-section and local types

2012-08-20 Thread Cary Coutant
Ping.

http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00398.html

-cary

On Mon, Aug 13, 2012 at 1:13 PM, Cary Coutant ccout...@google.com wrote:
 2012-08-07   Cary Coutant  ccout...@google.com

 gcc/
 * dwarf2out.c (clone_as_declaration): Copy DW_AT_abstract_origin
 attribute.
 (generate_skeleton_bottom_up): Remove DW_AT_object_pointer attribute
 from original DIE.
 (clone_tree_hash): Rename to ...
 (clone_tree_partial): ... this; change callers.  Copy
 DW_TAG_subprogram DIEs as declarations.

 gcc/testsuite/
 * testsuite/g++.dg/debug/dwarf2/dwarf4-nested.C: New test case.
 * testsuite/g++.dg/debug/dwarf2/dwarf4-typedef.C: Add
 -fdebug-types-section flag.

 Ping?

 -cary


Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread H.J. Lu
On Tue, Aug 14, 2012 at 11:59 AM, Diego Novillo dnovi...@google.com wrote:
 On 12-08-14 09:48 , Diego Novillo wrote:

 This merge touches several files, so I'm thinking that the best time is
 going to be on Thu 16/Aug around 2:00 GMT.


 So, the fixes I needed from Lawrence are already in so we can proceed with
 the merge.  I'll commit the merge tonight at ~2:00 GMT.

 After the merge is in, I will send an announcement and request major branch
 merges to wait for another 24 hrs to allow testers the chance to pick up
 this merge.


The C++ merge caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332

GCC memory usage is more than doubled from = 3GB
to = 10GB.  Is this a known issue?

-- 
H.J.


Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Gary Funck
On 08/20/12 01:02:39, Oleg Endo wrote:
 Hello,
 
 This adds support for SH's rotcr insn.
 Tested on rev 190459 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 OK?
 
 Cheers,
 Oleg
 
 ChangeLog:
 
   PR target/50489

Above: that should be: PR target/54089.

   * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and 
   splits.
   (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
   * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
   * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare 
   it.
   
 testsuite/ChangeLog:
   
   PR target/50489

Here also.

   * gcc.target/sh/pr54089-1.c: New.

This is OK.

For a sec. I thought my long-standing bug report
got some attention. :)

- Gary


Re: [SH] PR 54089 - Add support for rotcr insn

2012-08-20 Thread Oleg Endo
On Mon, 2012-08-20 at 14:09 -0700, Gary Funck wrote:
  
  ChangeLog:
  
  PR target/50489
 
 Above: that should be: PR target/54089.
 
  * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and 
  splits.
  (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
  * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
  * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare 
  it.
  
  testsuite/ChangeLog:
  
  PR target/50489
 
 Here also.
 
  * gcc.target/sh/pr54089-1.c: New.
 
 This is OK.

Thanks and sorry for the digit twister.

I've corrected it in the ChangeLog files.  Is there any way to delete
the PR commit comment in bugzilla?

 For a sec. I thought my long-standing bug report
 got some attention. :)

It did, but probably not in a constructive way ;)

Cheers,
Oleg



Re: [PATCH v2, rtl-optimization]: Fix PR46829, ICE in spill_failure with -fschedule-insns [was: Fixing instability of -fschedule-insns for x86]

2012-08-20 Thread Oleg Endo
On Sat, 2012-08-18 at 10:23 +0200, Uros Bizjak wrote:
 On Sat, Aug 18, 2012 at 10:14 AM, Uros Bizjak ubiz...@gmail.com wrote:
 
  After discussion with Oleg, it looks that it is enough to prevent
  wrong registers in the output of the (multi-output) insn pattern. As
  far as inputs are concerned, combine already handles limited reload
  classes in the right way. The problem with x86 is, that reload tried
  to fix output operand of the multi-output ins, where scheduler already
  moved load of ax register before this insn.
 
  Version 2 of the patch now handles only output operands. Also,
  handling of empty constraints was fixed.
 
 ... but here we fail testcase from PR46843... so we HAVE to check
 input operands.

Hm, I'm curious ... how would that work for patterns such as

(define_insn *addc
  [(set (match_operand:SI 0 arith_reg_dest =r)
(plus:SI (plus:SI
(match_operand:SI 1 arith_reg_operand %0)
(match_operand:SI 2 arith_reg_or_0_operand r))
 (match_operand:SI 3 t_reg_operand )))
   (clobber (reg:SI T_REG))]
  TARGET_SH1
  addc %2,%0
  [(set_attr type arith)])

... where the predicate arith_reg_or_0_operand allows either
const_int 0 or an arith_reg_operand, but the constraint r tells
reload to load the constant into a register for this insn.
Probably those kind of patterns that rely on this behavior would need to
be rewritten as insn_and_split to do the constant loading 'manually'?

Cheers,
Oleg



[google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Cary Coutant
This patch is for the google/gcc-4_7 branch.

When a location list or location expression is removed from a DIE, we
need to remove entries in the .debug_addr table that were referenced
by those location expressions.  Except for one case, the existing code
checked only the first descriptor in each location expression instead
of looping through all the descriptors.  In cases where we don't
remove the .debug_addr table entries, an ICE occurs during assembly
output.

This patch also fixes an ICE in output_pubname with -ggnu-pubnames,
where we are asserting on TAGs that GDB doesn't care about.  Instead
of asserting, we should just be setting the flags to 0.


2012-08-20   Cary Coutant  ccout...@google.com

gcc/
* dwarf2out.c (remove_loc_list_addr_table_entries): Change
parameter; update all calls.
(output_pubname): Don't assert on unknown TAGs.
(resolve_addr): Call remove_loc_list_addr_table_entries for all
location expressions.


Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 190548)
+++ gcc/dwarf2out.c (working copy)
@@ -4858,13 +4858,9 @@ remove_addr_table_entry (unsigned int i)
address_table.  */
 
 static void
-remove_loc_list_addr_table_entries (dw_loc_list_ref loc)
+remove_loc_list_addr_table_entries (dw_loc_descr_ref descr)
 {
-  dw_loc_descr_ref descr;
-
-  gcc_assert (loc-replaced);
-
-  for (descr = loc-expr; descr; descr = descr-dw_loc_next)
+  for (; descr; descr = descr-dw_loc_next)
 if (descr-dw_loc_oprnd1.val_index != -1U)
   remove_addr_table_entry (descr-dw_loc_oprnd1.val_index);
 }
@@ -9468,7 +9464,8 @@ output_pubname (dw_offset die_offset, pu
   GDB_INDEX_SYMBOL_STATIC_SET_VALUE(flags, 1);
   break;
 default:
-  gcc_unreachable ();
+ /* For unrecognized TAGs, don't set the flags.  */
+  break;
   }
   dw2_asm_output_data (1, flags  GDB_INDEX_CU_BITSIZE,
GDB-index flags);
@@ -22875,8 +22872,8 @@ resolve_addr (dw_die_ref die)
gcc_assert (!next-ll_symbol);
next-ll_symbol = (*curr)-ll_symbol;
  }
-   if (l-dw_loc_oprnd1.val_index != -1U)
- remove_addr_table_entry (l-dw_loc_oprnd1.val_index);
+   if (dwarf_split_debug_info)
+ remove_loc_list_addr_table_entries (l);
*curr = next;
  }
else
@@ -22891,7 +22888,7 @@ resolve_addr (dw_die_ref die)
  {
loc-replaced = 1;
 if (dwarf_split_debug_info)
-  remove_loc_list_addr_table_entries (loc);
+  remove_loc_list_addr_table_entries (loc-expr);
loc-dw_loc_next = *start;
  }
  }
@@ -22916,8 +22913,8 @@ resolve_addr (dw_die_ref die)
   || l-dw_loc_next != NULL)
   !resolve_addr_in_expr (l))
{
- if (l-dw_loc_oprnd1.val_index != -1U)
-   remove_addr_table_entry (l-dw_loc_oprnd1.val_index);
+ if (dwarf_split_debug_info)
+   remove_loc_list_addr_table_entries (l);
  remove_AT (die, a-dw_attr);
  ix--;
}


Re: [google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Sterling Augustine
On Mon, Aug 20, 2012 at 5:30 PM, Cary Coutant ccout...@google.com wrote:
 This patch is for the google/gcc-4_7 branch.

 When a location list or location expression is removed from a DIE, we
 need to remove entries in the .debug_addr table that were referenced
 by those location expressions.  Except for one case, the existing code
 checked only the first descriptor in each location expression instead
 of looping through all the descriptors.  In cases where we don't
 remove the .debug_addr table entries, an ICE occurs during assembly
 output.

 This patch also fixes an ICE in output_pubname with -ggnu-pubnames,
 where we are asserting on TAGs that GDB doesn't care about.  Instead
 of asserting, we should just be setting the flags to 0.


 2012-08-20   Cary Coutant  ccout...@google.com

 gcc/
 * dwarf2out.c (remove_loc_list_addr_table_entries): Change
 parameter; update all calls.
 (output_pubname): Don't assert on unknown TAGs.
 (resolve_addr): Call remove_loc_list_addr_table_entries for all
 location expressions.

OK for google 4.7.

Sterling


[Google 4.7 Obvious] Fix trailing enumerator comma

2012-08-20 Thread Sterling Augustine
My last change to google 4.7 that included gdb/gdb-index.h as it
exists in binutils, but it included a trailing comma in an enumerator.
I have checked in as obvious the patch below which eliminates the
trailing comma.

Sterling


 2012-08-20  Sterling Augustine  saugust...@google.com

* gdb/gdb-index.h: Remove comma from last enum.


--- branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 18:23:19 
190539
+++ branches/google/gcc-4_7/include/gdb/gdb-index.h 2012/08/20 23:05:44 
190549
@@ -68,7 +68,7 @@
  Give the unused bits a value so gdb will print them sensibly.  */
   GDB_INDEX_SYMBOL_KIND_UNUSED5 = 5,
   GDB_INDEX_SYMBOL_KIND_UNUSED6 = 6,
-  GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7,
+  GDB_INDEX_SYMBOL_KIND_UNUSED7 = 7
 } gdb_index_symbol_kind;

 #define GDB_INDEX_SYMBOL_KIND_SHIFT 28


Re: [google/gcc-4_7] Fix ICEs with -gfission.

2012-08-20 Thread Cary Coutant
 2012-08-20   Cary Coutant  ccout...@google.com

 gcc/
 * dwarf2out.c (remove_loc_list_addr_table_entries): Change
 parameter; update all calls.
 (output_pubname): Don't assert on unknown TAGs.
 (resolve_addr): Call remove_loc_list_addr_table_entries for all
 location expressions.

 OK for google 4.7.

Thanks, committed at r190553.

-cary


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka
 Xinliang David Li davi...@google.com writes:
 
  Process level synchronization problems can happen when two processes
  (running the instrumented binary) exit at the same time. The
  updated/merged counters from one process may be overwritten by another
  process -- this is true for both counter data and summary data.
  Solution 3) does not introduce any new problems.
 
 You could just use lockf() ?

The issue here is holding lock for all the files (that can be many) versus
number of locks limits  possibilities for deadlocking (mind that updating
may happen in different orders on the same files for different programs built
from same objects)

For David: there is no thread safety code in mainline for the counters.
Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
invented)
http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down
as too memory expensive per thread. We could optionaly do atomic updates like 
ICC
or combination of both as discussed in the thread.
So far no one implemented it since the coverage fixups seems to work well 
enough in
pracitce for multithreaded programs where reproducibility do not seem to be 
_that_
important.

For GCC profiled bootstrap however I would like to see the output binary to be
reproducible. We realy ought to update profiles safe for multple processes.
Trashing whole process run is worse than doing race in increment. There is good
chance that one of runs is more important than others and it will get trashed.

I do not think we do have serious update problems in the summaries at the 
moment.
We lock individual files as we update them. The summary is simple enough to be 
safe.
sum_all is summed, max_all is maximum over the individual runs. Even when you 
combine
mutiple programs the summary will end up same. Everything except for max_all is 
ignored
anyway.

Solution 2 (i.e. histogram streaming) will also have the property that it is 
safe
WRT multiple programs, just like sum_all.

Honza
 
 -Andi


Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread Lawrence Crowl
On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote:
 The C++ merge caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332

 GCC memory usage is more than doubled from = 3GB to = 10GB.
 Is this a known issue?

The two memory stat reports show no differences.  Are you sure you
didn't splice in the wrong report?

-- 
Lawrence Crowl


[PATCH, MIPS] fix MIPS16 jump table overflow

2012-08-20 Thread Sandra Loosemore

In config/mips/mips.h, there is presently this comment:

/* ??? 16-bit offsets can overflow in large functions.  */
#define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS

A while ago we had a bug report where a big switch statement did, in 
fact, overflow the range of 16-bit offsets, causing a runtime error.


GCC already has generic machinery to shorten offset tables for switch 
statements that does the necessary range checking, but it only works 
with casesi, not the lower-level tablejump expansion.  So, this 
patch provides a casesi expander to handle this situation.


This patch has been in use on our local 4.5 and 4.6 branches for about a 
year now.  When testing it on mainline, I found it tripped over the 
recent change to add MIPS16 branch overflow checking in other 
situations, causing it to get into an infinite loop.  I think telling it 
to ignore these new jump insns it doesn't know how to process is the 
right thing to do, but I'm not sure if there's a better way to restrict 
the condition or make mips16_split_long_branches more robust.  Richard,
since that's your code I assume you'll suggest an alternative if this 
doesn't meet with your approval.  Is the rest of the patch OK to check in?


-Sandra


2012-08-20  Julian Brown  jul...@codesourcery.com
Sandra Loosemore  san...@codesourcery.com

gcc/
* config/mips/mips.md (MIPS16_T_REGNUM): New constant.
(tablejump): Don't use for MIPS16_SHORT_JUMP_TABLES.
(casesi): New.
(casesi_internal_mips16): New.
* config/mips/mips.c (mips16_split_long_branches): Ignore
casesi_internal_mips16 insns.
* config/mips/mips.h (TARGET_MIPS16_SHORT_JUMP_TABLES): Update
comment.
(CASE_VECTOR_MODE): Use ptr_mode unconditionally.
(CASE_VECTOR_SHORTEN_MODE): Define.
(ASM_OUTPUT_ADDR_DIFF_ELT): Output word-sized addr_diff_elts
when necessary for MIPS16_SHORT_JUMP_TABLES.

gcc/testsuite/
* gcc.target/mips/code-readable-1.c: Generalize assembler
pattern for jump table.
Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md	(revision 190463)
+++ gcc/config/mips/mips.md	(working copy)
@@ -138,6 +138,7 @@
 
 (define_constants
   [(TLS_GET_TP_REGNUM		3)
+   (MIPS16_T_REGNUM		24)
(PIC_FUNCTION_ADDR_REGNUM	25)
(RETURN_ADDR_REGNUM		31)
(CPRESTORE_SLOT_REGNUM	76)
@@ -5904,14 +5905,9 @@
   [(set (pc)
 	(match_operand 0 register_operand))
(use (label_ref (match_operand 1 )))]
-  
+  !TARGET_MIPS16_SHORT_JUMP_TABLES
 {
-  if (TARGET_MIPS16_SHORT_JUMP_TABLES)
-operands[0] = expand_binop (Pmode, add_optab,
-convert_to_mode (Pmode, operands[0], false),
-gen_rtx_LABEL_REF (Pmode, operands[1]),
-0, 0, OPTAB_WIDEN);
-  else if (TARGET_GPWORD)
+  if (TARGET_GPWORD)
 operands[0] = expand_binop (Pmode, add_optab, operands[0],
 pic_offset_table_rtx, 0, 0, OPTAB_WIDEN);
   else if (TARGET_RTP_PIC)
@@ -5937,6 +5933,91 @@
   [(set_attr type jump)
(set_attr mode none)])
 
+;; For MIPS16, we don't know whether a given jump table will use short or
+;; word-sized offsets until late in compilation, when we are able to determine
+;; the sizes of the insns which comprise the containing function.  This
+;; necessitates the use of the casesi rather than the tablejump pattern, since
+;; the latter tries to calculate the index of the offset to jump through early
+;; in compilation, i.e. at expand time, when nothing is known about the
+;; eventual function layout.
+
+(define_expand casesi
+  [(match_operand:SI 0 register_operand )	; index to jump on
+   (match_operand:SI 1 const_int_operand )	; lower bound
+   (match_operand:SI 2 const_int_operand )	; total range
+   (match_operand:SI 3  )			; table label
+   (match_operand:SI 4  )]			; out of range label
+  TARGET_MIPS16_SHORT_JUMP_TABLES
+{
+  if (operands[1] != const0_rtx)
+{
+  rtx reg = gen_reg_rtx (SImode);
+  rtx offset = gen_int_mode (-INTVAL (operands[1]), SImode);
+  
+  if (!arith_operand (offset, SImode))
+offset = force_reg (SImode, offset);
+  
+  emit_insn (gen_addsi3 (reg, operands[0], offset));
+  operands[0] = reg;
+}
+
+  if (!arith_operand (operands[0], SImode))
+operands[0] = force_reg (SImode, operands[0]);
+
+  operands[2] = GEN_INT (INTVAL (operands[2]) + 1);
+
+  emit_jump_insn (gen_casesi_internal_mips16 (operands[0], operands[2],
+	  operands[3], operands[4]));
+
+  DONE;
+})
+
+(define_insn casesi_internal_mips16
+  [(set (pc)
+ (if_then_else
+   (leu (match_operand:SI 0 register_operand d)
+	(match_operand:SI 1 arith_operand dI))
+   (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4))
+			(label_ref (match_operand 2  
+   (label_ref (match_operand 3  
+   (clobber (match_scratch:SI 4 =d))
+   (clobber (match_scratch:SI 5 =d))
+   (clobber (reg:SI MIPS16_T_REGNUM))
+   (use 

Re: [PATCH] convert m32c to constraints.md

2012-08-20 Thread DJ Delorie

I ran the testsuite for you; no regressions.  Looks OK to me, please
apply.  Thanks!


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-20 Thread Sriraman Tallam
Hi Gerald / Diego,

I have made all the mentioned changes.  I also shortened the
description like Diego mentioned by removing all the strings but kept
the caveats. I have not added a reference to the documentation because
i do not know what link to reference. The builtins are completely
documented in extend.texi.

   I have attached the patch. If there are no further comments I will
submit this tomorrow.

Thanks,
-Sri.


On Mon, Aug 20, 2012 at 1:31 PM, Gerald Pfeifer ger...@pfeifer.com wrote:
 Hi Sriraman,

 On Fri, 10 Aug 2012, Sriraman Tallam wrote:
 I have added a release note for x86 builtins __builtin_cpu_is and
 __builtin_cpu_supports. They were checked in to trunk in rev. 186789.

 I had hoped one of the x86 maintainers would review this from his
 perspective given that they have more background.  For the lack of
 that, let me give it a try.

 Index: changes.html
 ===
 +li New builtin functions to detect run-time CPU type and ISA:br

 built-in, cf. http://gcc.gnu.org/codingconventions.html; here and
 in the following.

 No br here; ul should just do that.

 +ul
 +  liBuiltin code__builtin_cpu_is/code has been added to detect if
 +  the run-time CPU is of a particular type. The builtin returns a postive
 +  integer on a match and zero otherwise. The builtin accepts one string
 +  literal argument, the CPU name. For example,

 A built-in function...

 positive

 It accepts one string (to make this shorter)

 +  code__builtin_cpu_is(westmere)/code returns a postive integer if

 positive

 +  the run-time CPU is an Intel Corei7 Westmere processor.  The following

 I don't work for Intel, but should there be a space before i7?

 +  are the CPU names recognized by code__builtin_cpu_is:/code

 How about making this The following are the CPU names recognized for
 now, which avoids another reference to the name of the built-in and
 makes it clear that this is subject to change.

 +  liBuiltin code__builtin_cpu_supports/code has been added to 
 detect

 A built-in function...

 +  returns a postive integer on a match and zero otherwise. The builtin

 positive

 +  following are the ISA features recognized by
 +  code__builtin_cpu_supports:/code

 Same is above?

 +pCaveat: If the above builtins are called before any constructors are
 +invoked, like during IFUNC initialization, then the CPU detection
 +initialization must be explicity run using this newly provided
 +builtin,  code__builtin_cpu_init/code.

 ...using the new built-in function code__builtin_cpu_init/code.

 What is a constructor in this context, by the way?  Will this be clear
 to all the users?

 +code
 +static void (*some_ifunc_resolver(void))(void)br
 +{br
 +nbspnbsp __builtin_cpu_init();br
 +nbspnbsp if (__builtin_cpu_is(amdfam10h) ...br
 +nbspnbsp if (__builtin_cpu_supports(popcnt) ...br
 +}
 +/code

 How about using pre here? That avoids the br/s which will cause
 problems with the web page validator, by the way.


 Nice job for documenting this so well.  Thanks for taking the time
 and your patience!

 The patch is fine modulo the changes I pointed out (though some of
 them are more suggestions and you do not need to slavishly follow
 those).

 Gerald
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.10
diff -u -r1.10 changes.html
--- changes.html10 Aug 2012 16:25:46 -  1.10
+++ changes.html21 Aug 2012 02:38:40 -
@@ -92,6 +92,38 @@
 wrong results.  You must build all
 modules with code-mpreferred-stack-boundary=3/code, including any
 libraries.  This includes the system libraries and startup modules./li
+li New built-in functions to detect run-time CPU type and ISA:
+ul
+  liA built-in function code__builtin_cpu_is/code has been added to
+  detect if the run-time CPU is of a particular type.  It returns a
+  positive integer on a match and zero otherwise.  It accepts one string
+  literal argument, the CPU name.  For example,
+  code__builtin_cpu_is(westmere)/code returns a positive integer if
+  the run-time CPU is an Intel Core i7 Westmere processor.  Please refer
+  to the documentation for the list of valid CPU names recognized./li
+  liA built-in function code__builtin_cpu_supports/code has been
+  added to detect if the run-time CPU supports a particular ISA feature.
+  It returns a positive integer on a match and zero otherwise.  It accepts
+  one string literal argument, the ISA feature.  For example,
+  code__builtin_cpu_supports(ssse3)/code returns a positive integer
+  if the run-time CPU supports SSSE3 instructions.  Please refer to the
+  documentation for the list of valid ISA names recognized./li
+/ul
+pCaveat: If these built-in 

Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-20 Thread H.J. Lu
On Mon, Aug 20, 2012 at 6:31 PM, Lawrence Crowl cr...@google.com wrote:
 On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote:
 The C++ merge caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332

 GCC memory usage is more than doubled from = 3GB to = 10GB.
 Is this a known issue?

 The two memory stat reports show no differences.  Are you sure you
 didn't splice in the wrong report?


Those are outputs from -fmem-report.  I am not sure how useful
they are for this bug.

-- 
H.J.


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Teresa Johnson
On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
 Xinliang David Li davi...@google.com writes:
 
  Process level synchronization problems can happen when two processes
  (running the instrumented binary) exit at the same time. The
  updated/merged counters from one process may be overwritten by another
  process -- this is true for both counter data and summary data.
  Solution 3) does not introduce any new problems.

 You could just use lockf() ?

 The issue here is holding lock for all the files (that can be many) versus
 number of locks limits  possibilities for deadlocking (mind that updating
 may happen in different orders on the same files for different programs built
 from same objects)

 For David: there is no thread safety code in mainline for the counters.
 Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
 invented)
 http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted down
 as too memory expensive per thread. We could optionaly do atomic updates like 
 ICC
 or combination of both as discussed in the thread.
 So far no one implemented it since the coverage fixups seems to work well 
 enough in
 pracitce for multithreaded programs where reproducibility do not seem to be 
 _that_
 important.

 For GCC profiled bootstrap however I would like to see the output binary to be
 reproducible. We realy ought to update profiles safe for multple processes.
 Trashing whole process run is worse than doing race in increment. There is 
 good
 chance that one of runs is more important than others and it will get trashed.

 I do not think we do have serious update problems in the summaries at the 
 moment.
 We lock individual files as we update them. The summary is simple enough to 
 be safe.
 sum_all is summed, max_all is maximum over the individual runs. Even when you 
 combine
 mutiple programs the summary will end up same. Everything except for max_all 
 is ignored
 anyway.

 Solution 2 (i.e. histogram streaming) will also have the property that it is 
 safe
 WRT multiple programs, just like sum_all.

I think the sum_all based scaling of the working set entries also has
this property. What is your opinion on saving the histogram in the
summary and merging histograms together as best as possible compared
to the alternative of saving the working set information as now and
scaling it up by the ratio between the new and old sum_all when
merging?

Thanks,
Teresa


 Honza

 -Andi



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-20 Thread Jan Hubicka
 On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Xinliang David Li davi...@google.com writes:
  
   Process level synchronization problems can happen when two processes
   (running the instrumented binary) exit at the same time. The
   updated/merged counters from one process may be overwritten by another
   process -- this is true for both counter data and summary data.
   Solution 3) does not introduce any new problems.
 
  You could just use lockf() ?
 
  The issue here is holding lock for all the files (that can be many) versus
  number of locks limits  possibilities for deadlocking (mind that updating
  may happen in different orders on the same files for different programs 
  built
  from same objects)
 
  For David: there is no thread safety code in mainline for the counters.
  Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
  invented)
  http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
  down
  as too memory expensive per thread. We could optionaly do atomic updates 
  like ICC
  or combination of both as discussed in the thread.
  So far no one implemented it since the coverage fixups seems to work well 
  enough in
  pracitce for multithreaded programs where reproducibility do not seem to be 
  _that_
  important.
 
  For GCC profiled bootstrap however I would like to see the output binary to 
  be
  reproducible. We realy ought to update profiles safe for multple processes.
  Trashing whole process run is worse than doing race in increment. There is 
  good
  chance that one of runs is more important than others and it will get 
  trashed.
 
  I do not think we do have serious update problems in the summaries at the 
  moment.
  We lock individual files as we update them. The summary is simple enough to 
  be safe.
  sum_all is summed, max_all is maximum over the individual runs. Even when 
  you combine
  mutiple programs the summary will end up same. Everything except for 
  max_all is ignored
  anyway.
 
  Solution 2 (i.e. histogram streaming) will also have the property that it 
  is safe
  WRT multiple programs, just like sum_all.
 
 I think the sum_all based scaling of the working set entries also has
 this property. What is your opinion on saving the histogram in the

I think the scaling will have at lest roundoff issues WRT different merging 
orders.

 summary and merging histograms together as best as possible compared
 to the alternative of saving the working set information as now and
 scaling it up by the ratio between the new and old sum_all when
 merging?

So far I like this option best. But David seems to lean towards thirtd option 
with
whole file locking.  I see it may show to be more extensible in the future.
At the moment I do not understand two things
 1) why do we need info on the number of counter above given threshold, sinc 
ethe hot/cold
decisions usually depends purely on the count cutoff.
Removing those will solve merging issues with variant 2 and then it would 
be probably
good solution.
 2) Do we plan to add some features in near future that will anyway require 
global locking?
I guess LIPO itself does not count since it streams its data into 
independent file as you
mentioned earlier and locking LIPO file is not that hard.
Does LIPO stream everything into that common file, or does it use 
combination of gcda files
and common summary?

What other stuff Google plans to merge?
(In general I would be curious about merging plans WRT profile stuff, so we 
get more
synchronized and effective on getting patches in. We have about two months 
to get it done
in stage1 and it would be nice to get as much as possible. Obviously some 
of the patches will
need bit fo dicsussion like this one. Hope you do not find it frustrating, 
I actually think
this is an important feature).

I also realized today that the common value counters (used by switch, indirect
call and div/mod value profiling) are non-stanble WRT different merging orders
(i.e.  parallel make in train run).  I do not think there is actual solution to
that except for not merging the counter section of this type in libgcov and
merge them in some canonical order at profile feedback time.  Perhaps we just
want to live with this, since the disprepancy here is small. (i.e. these
counters are quite rare and their outcome has just local effect on the final
binary, unlike the global summaries/edge counters).

Honza
 
 Thanks,
 Teresa
 
 
  Honza
 
  -Andi
 
 
 
 -- 
 Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


libgo patch committed: Define F_GETLK and friends on i386 GNU/Linux

2012-08-20 Thread Ian Lance Taylor
On i386 GNU/Linux, when compiling with -D_FILE_OFFSET_BITS=64, fcntl.h
winds up doing this:

#  define F_GETLK   F_GETLK64  /* Get record locking info.  */
...
# define F_GETLK64  12  /* Get record locking info.  */

Because of the ordering, -fdump-go-spec does not write F_GETLK into the
file that it creates.  The effect is that syscall.F_GETLK is not defined
for i386 GNU/Linux.  This patch fixes the problem.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu, both 32-bit and 64-bit mode.
Committed to mainline and 4.7 branch.

Ian

diff -r a602dc132c2d libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh	Tue Aug 14 20:46:42 2012 -0700
+++ b/libgo/mksysinfo.sh	Mon Aug 20 22:10:34 2012 -0700
@@ -211,6 +211,16 @@
   echo const O_CLOEXEC = 0  ${OUT}
 fi
 
+# These flags can be lost on i386 GNU/Linux when using
+# -D_FILE_OFFSET_BITS=64, because we see #define F_SETLK F_SETLK64
+# before we see the definition of F_SETLK64.
+for flag in F_GETLK F_SETLK F_SETLKW; do
+  if ! grep ^const ${flag}  ${OUT} /dev/null 21 \
+   grep ^const ${flag}64  ${OUT} /dev/null 21; then
+echo const ${flag} = ${flag}64  ${OUT}
+  fi
+done
+
 # The signal numbers.
 grep '^const _SIG[^_]' gen-sysinfo.go | \
   grep -v '^const _SIGEV_' | \