Re: failed attempt: retain identifier length from frontend to backend

2012-08-21 Thread Richard Guenther
On Mon, Aug 20, 2012 at 7:03 PM, Dimitrios Apostolou ji...@gmx.net wrote:
 Hello,

 my last attempt on improving something serious was about three weeks ago,
 trying to keep all lengths of all strings parsed in the frontend for the
 whole compilation phase until the assembly output. I was hoping that would
 help on using faster hashes (knowing the length allows us to hash 4 or 8
 bytes per iteration), quicker strcmp in various places, and using less
 strlen() calls, which show especially on -g3 compilations that store huge
 macro strings.

 I'll post no patch here, since what I currently have is a mess in 3
 different branches and most don't even compile. I tried various approaches.
 First I tried adding an extra length parameter in all relevant functions,
 starting from the assembly generation and working my way upwards. This got
 too complex, and I'd really like to ask if you find any meaning in changing
 target specific hooks and macros to actually accept length as argument (e.g.
 ASM_OUTPUT_*) or return it (e.g. ASM_GENERATE_*). Changes seemed too
 intrusive for me to continue.

 But seeing that identifier length is there inside struct ht_identifier (or
 cpp_hashnode) and not lost, I tried the approach of having the length at
 str[-4] for all identifiers. To achieve this I changed ht_identifier to
 store str with the flexible array hack. Unfortunately I hadn't noticed that
 ht_identifier was a part of tree_node and also part of too many other
 structs, so changing all those structs to have variable size was not without
 side effects. In the end it compiled but I got crashes all over, and I'm
 sure I didn't do things right since I broke things like the static assert in
 libcpp/identifiers.c, that I don't even understand:

  /* We don't need a proxy since the hash table's identifier comes first
 in cpp_hashnode.  However, in case this is ever changed, we have a
 static assertion for it.  */
 -extern char proxy_assertion_broken[offsetof (struct cpp_hashnode, ident) ==
 0 ? 1 : -1];

 Anyway last attempt was to decouple ht_identifier completely from trees and
 other structs by storing pointer to it, but I was pretty worn out and
 quickly gave up after getting errors on gengtype-generated files that I
 didn't even know how to handle.

 Was all this project too ambitious? I'd appreciate all input.

I think the proper thing would indeed have been to pass down the length of
the string to relevant functions.

Richard.


 Thanks,
 Dimitris



Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Xinliang David Li
On Mon, Aug 20, 2012 at 10:29 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Mon, Aug 20, 2012 at 6:27 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Xinliang David Li davi...@google.com writes:
  
   Process level synchronization problems can happen when two processes
   (running the instrumented binary) exit at the same time. The
   updated/merged counters from one process may be overwritten by another
   process -- this is true for both counter data and summary data.
   Solution 3) does not introduce any new problems.
 
  You could just use lockf() ?
 
  The issue here is holding lock for all the files (that can be many) versus
  number of locks limits  possibilities for deadlocking (mind that updating
  may happen in different orders on the same files for different programs 
  built
  from same objects)
 
  For David: there is no thread safety code in mainline for the counters.
  Long time ago Zdenek implmented poor-mans TLS for counters (before TLS was 
  invented)
  http://gcc.gnu.org/ml/gcc-patches/2001-11/msg01546.html but it was voted 
  down
  as too memory expensive per thread. We could optionaly do atomic updates 
  like ICC
  or combination of both as discussed in the thread.
  So far no one implemented it since the coverage fixups seems to work well 
  enough in
  pracitce for multithreaded programs where reproducibility do not seem to 
  be _that_
  important.
 
  For GCC profiled bootstrap however I would like to see the output binary 
  to be
  reproducible. We realy ought to update profiles safe for multple processes.
  Trashing whole process run is worse than doing race in increment. There is 
  good
  chance that one of runs is more important than others and it will get 
  trashed.
 
  I do not think we do have serious update problems in the summaries at the 
  moment.
  We lock individual files as we update them. The summary is simple enough 
  to be safe.
  sum_all is summed, max_all is maximum over the individual runs. Even when 
  you combine
  mutiple programs the summary will end up same. Everything except for 
  max_all is ignored
  anyway.
 
  Solution 2 (i.e. histogram streaming) will also have the property that it 
  is safe
  WRT multiple programs, just like sum_all.

 I think the sum_all based scaling of the working set entries also has
 this property. What is your opinion on saving the histogram in the

 I think the scaling will have at lest roundoff issues WRT different merging 
 orders.

 summary and merging histograms together as best as possible compared
 to the alternative of saving the working set information as now and
 scaling it up by the ratio between the new and old sum_all when
 merging?

 So far I like this option best. But David seems to lean towards thirtd option 
 with
 whole file locking.  I see it may show to be more extensible in the future.
 At the moment I do not understand two things
  1) why do we need info on the number of counter above given threshold, sinc 
 ethe hot/cold
 decisions usually depends purely on the count cutoff.
 Removing those will solve merging issues with variant 2 and then it would 
 be probably
 good solution.

This is useful for large applications with a long tail. The
instruction working set for those applications are very large, and
inliner and unroller need to be aware of that and good heuristics can
be developed to throttle aggressive code bloat transformations. For
inliner, it is kind of the like the global budget but more application
dependent. In the long run, we will collect more advanced fdo summary
regarding working set -- it will be working set size for each code
region (locality region).


  2) Do we plan to add some features in near future that will anyway require 
 global locking?
 I guess LIPO itself does not count since it streams its data into 
 independent file as you
 mentioned earlier and locking LIPO file is not that hard.
 Does LIPO stream everything into that common file, or does it use 
 combination of gcda files
 and common summary?

Actually, LIPO module grouping information are stored in gcda files.
It is also stored in a separate .imports file (one per object) ---
this is primarily used by our build system for dependence information.



 What other stuff Google plans to merge?
 (In general I would be curious about merging plans WRT profile stuff, so 
 we get more
 synchronized and effective on getting patches in. We have about two 
 months to get it done
 in stage1 and it would be nice to get as much as possible. Obviously some 
 of the patches will
 need bit fo dicsussion like this one. Hope you do not find it 
 frustrating, I actually think
 this is an important feature).

We plan to merge in the new LIPO implementation based on LTO
streaming. Rong Xu finished this in 4.6 based compiler, and he needs
to port it to 4.8.


thanks,

David


 I also realized today that the common value counters (used by switch, indirect
 call and div/mod value profiling) are 

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
 
 This is useful for large applications with a long tail. The
 instruction working set for those applications are very large, and
 inliner and unroller need to be aware of that and good heuristics can
 be developed to throttle aggressive code bloat transformations. For
 inliner, it is kind of the like the global budget but more application
 dependent. In the long run, we will collect more advanced fdo summary
 regarding working set -- it will be working set size for each code
 region (locality region).

I see, so you use it to estimate size of the working set and effect of bloating
optimizations on cache size. This sounds interesting. What are you experiences
with this?

What concerns me that it is greatly inaccurate - you have no idea how many
instructions given counter is guarding and it can differ quite a lot. Also
inlining/optimization makes working sets significantly different (by factor of
100 for tramp3d). But on the ohter hand any solution at this level will be
greatly inaccurate. So I am curious how reliable data you can get from this?
How you take this into account for the heuristics?

It seems to me that for this use perhaps the simple logic in histogram merging
maximizing the number of BBs for given bucket will work well?  It is
inaccurate, but we are working with greatly inaccurate data anyway.
Except for degenerated cases, the small and unimportant runs will have small BB
counts, while large runs will have larger counts and those are ones we optimize
for anyway.
 
 
   2) Do we plan to add some features in near future that will anyway require 
  global locking?
  I guess LIPO itself does not count since it streams its data into 
  independent file as you
  mentioned earlier and locking LIPO file is not that hard.
  Does LIPO stream everything into that common file, or does it use 
  combination of gcda files
  and common summary?
 
 Actually, LIPO module grouping information are stored in gcda files.
 It is also stored in a separate .imports file (one per object) ---
 this is primarily used by our build system for dependence information.

I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO behave
on GCC bootstrap? (i.e. it does a lot more work in the libgcov module per each
invocation, so I am curious if it is practically useful at all).

With LTO based solution a lot can be probably pushed at link time? Before
actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from
gcda files and do all the merging/updating/CFG constructions that is currently
performed at runtime, right?
 
 
 
  What other stuff Google plans to merge?
  (In general I would be curious about merging plans WRT profile stuff, 
  so we get more
  synchronized and effective on getting patches in. We have about two 
  months to get it done
  in stage1 and it would be nice to get as much as possible. Obviously 
  some of the patches will
  need bit fo dicsussion like this one. Hope you do not find it 
  frustrating, I actually think
  this is an important feature).
 
 We plan to merge in the new LIPO implementation based on LTO
 streaming. Rong Xu finished this in 4.6 based compiler, and he needs
 to port it to 4.8.

Good.  Looks like a lot of work ahead. It would be nice if we can perhaps start
by merging the libgcov infrastructure updates prior the LIPO changes.  From
what I saw at LIPO branch some time ago it has a lot of stuff that is not
exactly LIPO specific.

Honza
 
 
 thanks,
 
 David
 
 
  I also realized today that the common value counters (used by switch, 
  indirect
  call and div/mod value profiling) are non-stanble WRT different merging 
  orders
  (i.e.  parallel make in train run).  I do not think there is actual 
  solution to
  that except for not merging the counter section of this type in libgcov and
  merge them in some canonical order at profile feedback time.  Perhaps we 
  just
  want to live with this, since the disprepancy here is small. (i.e. these
  counters are quite rare and their outcome has just local effect on the final
  binary, unlike the global summaries/edge counters).
 
  Honza
 
  Thanks,
  Teresa
 
  
   Honza
  
   -Andi
 
 
 
  --
  Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: Merge C++ conversion into trunk (0/6 - Overview)

2012-08-21 Thread Richard Guenther
On Tue, Aug 21, 2012 at 3:31 AM, Lawrence Crowl cr...@google.com wrote:
 On 8/20/12, H.J. Lu hjl.to...@gmail.com wrote:
 The C++ merge caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332

 GCC memory usage is more than doubled from = 3GB to = 10GB.
 Is this a known issue?

 The two memory stat reports show no differences.  Are you sure you
 didn't splice in the wrong report?

Well, easy things such as a messed up hashtab conversion (no freeing)
or vec conversion (no freeing) can cause this, or even the gengtype change
causing GC issues  (which is why those should
have been different revisions).

Richard.

 --
 Lawrence Crowl


Loop iterations inline hint

2012-08-21 Thread Jan Hubicka

Hi,
this patch adds a hint that if inlining makes bounds on loop iterations known,
it is probably good idea.  This is primarely targetting Fortran's array
descriptors, but should be generally useful.

Fortran will still need a bit more work. Often we disregard inlining because we
think the call is cold (because it comes from Main) so inlining heuristic will
need more updating and apparently we will also need to update for PHI
conditionals as done in Martin's patch 3/3.

At the moment the hint is interpreted same way as the indirect_call hint from
previous patch.

Martin: I think ipa-cp should also make use of this hint. Resolving number of
loop iterations is important enough reason to specialize in many cases.
I think it already has logic for devirtualization but perhaps it should be made
more aggressive? I was sort of surprised that for Mozila the inlining hint
makes us to catch 20 times more cases than before. Most of the cases sounds like
good ipa-cp candidates.

Also can you please try to finaly make param notes to be used by the virtual
clones machinery and thus make it possible for ipa-cp to specialize for known
aggregate parameters? This should make a lot of difference for Fortran, I think.

Boostrapped/regtested x86_64-linux, will commit it after a bit more testing.
Honza

* gcc.dg/ipa/inlinehint-1.c: New.

PR fortran/48636
* ipa-inline.c (want_inline_small_function_p): Take loop_iterations 
hint.
(edge_badness): Likewise.
* ipa-inline.h (inline_hints_vals): Add INLINE_HINT_loop_iterations.
(inline_summary): Add loop_iterations.
* ipa-inline-analysis.c: Include tree-scalar-evolution.h.
(dump_inline_hints): Dump loop_iterations.
(reset_inline_summary): Free loop_iterations.
(inline_node_duplication_hook): Update loop_iterations.
(dump_inline_summary): Dump loop_iterations.
(will_be_nonconstant_expr_predicate): New function.
(estimate_function_body_sizes): Analyze loops.
(estimate_node_size_and_time): Set hint loop_iterations.
(inline_merge_summary): Merge loop iterations.
(inline_read_section): Stream in loop_iterations.
(inline_write_summary): Stram out loop_iterations.
Index: testsuite/gcc.dg/ipa/inlinehint-1.c
===
*** testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0)
--- testsuite/gcc.dg/ipa/inlinehint-1.c (revision 0)
***
*** 0 
--- 1,16 
+ /* { dg-options -O3 -c -fdump-ipa-inline-details -fno-early-inlining 
-fno-ipa-cp  } */
+ test (int a)
+ {
+int i;
+for (i=0; ia; i++)
+ {
+  test2(a);
+  test2(a);
+ }
+ }
+ m()
+ {
+   test (10);
+ }
+ /* { dg-final { scan-ipa-dump loop_iterations  inline  } } */
+ /* { dg-final { cleanup-ipa-dump inline } } */
Index: ipa-inline.c
===
*** ipa-inline.c(revision 190510)
--- ipa-inline.c(working copy)
*** want_inline_small_function_p (struct cgr
*** 480,486 
 hints suggests that inlining given function is very profitable.  */
else if (DECL_DECLARED_INLINE_P (callee-symbol.decl)
growth = MAX_INLINE_INSNS_SINGLE
!   !(hints  INLINE_HINT_indirect_call))
{
e-inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
  want_inline = false;
--- 480,487 
 hints suggests that inlining given function is very profitable.  */
else if (DECL_DECLARED_INLINE_P (callee-symbol.decl)
growth = MAX_INLINE_INSNS_SINGLE
!   !(hints  (INLINE_HINT_indirect_call
!| INLINE_HINT_loop_iterations)))
{
e-inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
  want_inline = false;
*** edge_badness (struct cgraph_edge *edge, 
*** 863,869 
  if (dump)
fprintf (dump_file, Badness overflow\n);
}
!   if (hints  INLINE_HINT_indirect_call)
badness /= 8;
if (dump)
{
--- 864,871 
  if (dump)
fprintf (dump_file, Badness overflow\n);
}
!   if (hints  (INLINE_HINT_indirect_call
!  | INLINE_HINT_loop_iterations))
badness /= 8;
if (dump)
{
Index: ipa-inline.h
===
*** ipa-inline.h(revision 190510)
--- ipa-inline.h(working copy)
*** typedef struct GTY(()) condition
*** 45,51 
  /* Inline hints are reasons why inline heuristics should preffer inlining 
given function.
 They are represtented as bitmap of the following values.  */
  enum inline_hints_vals {
!   INLINE_HINT_indirect_call = 1
  };
  typedef int inline_hints;
  
--- 45,52 
  /* Inline hints are reasons why inline heuristics should preffer inlining 
given function.
 They are represtented as bitmap of the 

Re: [wwwdocs] Update Fortran secrion in 4.8/changes.html

2012-08-21 Thread Tobias Burnus

Gerald Pfeifer wrote:

I went ahead and made some smaller changes, patch below.


Thanks.


I noticed you are using q.../q, as in qcodee/code/q,
which we usually don't.  Why that?


My impression was that a one-letter code didn't stand out enough and 
looked rather odd; if you think it improves consistency or readability, 
feel free to change it.


 * * *

I intent to commit the attached patch to document two new warning flags, 
which were recently added. (Suggested in ISO/IEC Technical Report 24772 
Guidance for Avoiding Vulnerabilities through Language Selection and Use.)


Tobias
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.17
diff -u -r1.17 changes.html
--- changes.html	20 Aug 2012 12:23:39 -	1.17
+++ changes.html	21 Aug 2012 06:56:55 -
@@ -92,6 +92,21 @@
 (re)allocation in hot loops. (For arrays, replacing qcodevar=/code/q
 by qcodevar(:)=/code/q disables the automatic reallocation.)/li
 
+liThe a
+href=http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html;
+code-Wcompare-reals/code/a flag has been added. When this flag is set,
+warnings are issued when comparing codeREAL/code or
+codeCOMPLEX/code types for equality and inequality; consider replacing
+codea == b/code by codeabs(aminus;b) lt; eps/code with a suitable
+codeeps/code. The -Wcompare-reals flag is enabled by
+code-Wall/code./li
+
+liThe a
+href=http://gcc.gnu.org/onlinedocs/gfortran/Error-and-Warning-Options.html;
+code-Wtarget-lifetime/code/a flag has been added (enabled with
+code-Wall/code), which warns if the pointer in a pointer assignment
+might outlive its target./li
+
 lipReading floating point numbers which use qcodeq/code/q for
 the exponential (such as code4.0q0/code) is now supported as vendor
 extension for better compatibility with old data files. It is strongly


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Xinliang David Li
On Mon, Aug 20, 2012 at 11:33 PM, Jan Hubicka hubi...@ucw.cz wrote:

 This is useful for large applications with a long tail. The
 instruction working set for those applications are very large, and
 inliner and unroller need to be aware of that and good heuristics can
 be developed to throttle aggressive code bloat transformations. For
 inliner, it is kind of the like the global budget but more application
 dependent. In the long run, we will collect more advanced fdo summary
 regarding working set -- it will be working set size for each code
 region (locality region).

 I see, so you use it to estimate size of the working set and effect of 
 bloating
 optimizations on cache size. This sounds interesting. What are you experiences
 with this?

Teresa has done some tunings for the unroller so far. The inliner
tuning is the next step.


 What concerns me that it is greatly inaccurate - you have no idea how many
 instructions given counter is guarding and it can differ quite a lot. Also
 inlining/optimization makes working sets significantly different (by factor of
 100 for tramp3d).

The pre ipa-inline working set is the one that is needed for ipa
inliner tuning. For post-ipa inline code increase transformations,
some update is probably needed.

But on the ohter hand any solution at this level will be
 greatly inaccurate. So I am curious how reliable data you can get from this?
 How you take this into account for the heuristics?

This effort is just the first step to allow good heuristics to develop.


 It seems to me that for this use perhaps the simple logic in histogram merging
 maximizing the number of BBs for given bucket will work well?  It is
 inaccurate, but we are working with greatly inaccurate data anyway.
 Except for degenerated cases, the small and unimportant runs will have small 
 BB
 counts, while large runs will have larger counts and those are ones we 
 optimize
 for anyway.

The working set curve for each type of applications contains lots of
information that can be mined. The inaccuracy can also be mitigated by
more data 'calibration'.



   2) Do we plan to add some features in near future that will anyway 
  require global locking?
  I guess LIPO itself does not count since it streams its data into 
  independent file as you
  mentioned earlier and locking LIPO file is not that hard.
  Does LIPO stream everything into that common file, or does it use 
  combination of gcda files
  and common summary?

 Actually, LIPO module grouping information are stored in gcda files.
 It is also stored in a separate .imports file (one per object) ---
 this is primarily used by our build system for dependence information.

 I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
 behave
 on GCC bootstrap?

We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
main problem for application build -- the link time (for debug build)
is.

 (i.e. it does a lot more work in the libgcov module per each
 invocation, so I am curious if it is practically useful at all).

 With LTO based solution a lot can be probably pushed at link time? Before
 actual GCC starts from the linker plugin, LIPO module can read gcov CFGs from
 gcda files and do all the merging/updating/CFG constructions that is currently
 performed at runtime, right?

The dynamic cgraph build and analysis is still done at runtime.
However, with the new implementation, FE is no longer involved. Gcc
driver is modified to understand module grouping, and lto is used to
merge the streamed output from aux modules.


David



 
  What other stuff Google plans to merge?
  (In general I would be curious about merging plans WRT profile stuff, 
  so we get more
  synchronized and effective on getting patches in. We have about two 
  months to get it done
  in stage1 and it would be nice to get as much as possible. Obviously 
  some of the patches will
  need bit fo dicsussion like this one. Hope you do not find it 
  frustrating, I actually think
  this is an important feature).

 We plan to merge in the new LIPO implementation based on LTO
 streaming. Rong Xu finished this in 4.6 based compiler, and he needs
 to port it to 4.8.

 Good.  Looks like a lot of work ahead. It would be nice if we can perhaps 
 start
 by merging the libgcov infrastructure updates prior the LIPO changes.  From
 what I saw at LIPO branch some time ago it has a lot of stuff that is not
 exactly LIPO specific.

 Honza


 thanks,

 David

 
  I also realized today that the common value counters (used by switch, 
  indirect
  call and div/mod value profiling) are non-stanble WRT different merging 
  orders
  (i.e.  parallel make in train run).  I do not think there is actual 
  solution to
  that except for not merging the counter section of this type in libgcov and
  merge them in some canonical order at profile feedback time.  Perhaps we 
  just
  want to live with this, since the disprepancy here is small. (i.e. 

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
 Teresa has done some tunings for the unroller so far. The inliner
 tuning is the next step.
 
 
  What concerns me that it is greatly inaccurate - you have no idea how many
  instructions given counter is guarding and it can differ quite a lot. Also
  inlining/optimization makes working sets significantly different (by factor 
  of
  100 for tramp3d).
 
 The pre ipa-inline working set is the one that is needed for ipa
 inliner tuning. For post-ipa inline code increase transformations,
 some update is probably needed.
 
 But on the ohter hand any solution at this level will be
  greatly inaccurate. So I am curious how reliable data you can get from this?
  How you take this into account for the heuristics?
 
 This effort is just the first step to allow good heuristics to develop.
 
 
  It seems to me that for this use perhaps the simple logic in histogram 
  merging
  maximizing the number of BBs for given bucket will work well?  It is
  inaccurate, but we are working with greatly inaccurate data anyway.
  Except for degenerated cases, the small and unimportant runs will have 
  small BB
  counts, while large runs will have larger counts and those are ones we 
  optimize
  for anyway.
 
 The working set curve for each type of applications contains lots of
 information that can be mined. The inaccuracy can also be mitigated by
 more data 'calibration'.

Sure, I think I am leaning towards trying the solution 2) with maximizing
counter count merging (probably it would make sense to rename it from BB count
since it is not really BB count and thus it is misleading) and we will see how
well it works in practice.

We have benefits of much fewer issues with profile locking/unlocking and we
lose bit of precision on BB counts. I tend to believe that the error will not
be that important in practice. Another loss is more histogram streaming into
each gcda file, but with skiping zero entries it should not be major overhead
problem I hope.

What do you think?
 
 
 
2) Do we plan to add some features in near future that will anyway 
   require global locking?
   I guess LIPO itself does not count since it streams its data into 
   independent file as you
   mentioned earlier and locking LIPO file is not that hard.
   Does LIPO stream everything into that common file, or does it use 
   combination of gcda files
   and common summary?
 
  Actually, LIPO module grouping information are stored in gcda files.
  It is also stored in a separate .imports file (one per object) ---
  this is primarily used by our build system for dependence information.
 
  I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
  behave
  on GCC bootstrap?
 
 We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
 main problem for application build -- the link time (for debug build)
 is.

I was primarily curious how the LIPOs runtime analysis fare in the situation 
where
you do very many small train runs on rather large app (sure GCC is small to 
google's
use case ;).
 
  (i.e. it does a lot more work in the libgcov module per each
  invocation, so I am curious if it is practically useful at all).
 
  With LTO based solution a lot can be probably pushed at link time? Before
  actual GCC starts from the linker plugin, LIPO module can read gcov CFGs 
  from
  gcda files and do all the merging/updating/CFG constructions that is 
  currently
  performed at runtime, right?
 
 The dynamic cgraph build and analysis is still done at runtime.
 However, with the new implementation, FE is no longer involved. Gcc
 driver is modified to understand module grouping, and lto is used to
 merge the streamed output from aux modules.

I see. Are there any fundamental reasons why it can not be done at link-time
when all gcda files are available? Why the grouping is not done inside linker
plugin?

Honza
 
 
 David


Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines

2012-08-21 Thread Ye Joey
On Fri, Aug 17, 2012 at 9:13 AM, Ian Lance Taylor i...@google.com wrote:

 Looks fine to me.

 Ian
Will backport to arm/embedded-4_7-branch. No sure if appropriate for
4.7 branch since it is not a stability problem.

- Joey


Fix Solaris 9/x86 bootstrap

2012-08-21 Thread Rainer Orth
Solaris 9/x86 bootstrap was broken after the cxx-conversion merge:

In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957:
/vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier before n
umeric constant
/vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before numeric
constant

This happens since g++, unlike gcc, defines __EXTENSIONS__, which
exposes the equivalent of

#define PC 14

Initially I tried to avoid this by having gengtype.c include rtl.h,
which already has the #undef, but this produced so much fallout that I
decided it's better to just replicate it here.

The patch allowed an i386-pc-solaris2.9 bootstrap to finish.  I think
this counts as obvious unless someone prefers the rtl.h route
nonetheless.

Ok for mainline?

Rainer


2012-08-20  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gengtype.c (PC): Undef.

# HG changeset patch
# Parent cf74f0e72cab4965ba20bf236eac2fac2b87064e
Fix Solaris 9 bootstrap

diff --git a/gcc/gengtype.c b/gcc/gengtype.c
--- a/gcc/gengtype.c
+++ b/gcc/gengtype.c
@@ -35,6 +35,8 @@
 #include gengtype.h
 #include filenames.h
 
+#undef PC /* Some systems predefine this symbol; don't let it interfere.  */
+
 /* Data types, macros, etc. used only in this file.  */
 
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix Solaris 9/x86 bootstrap

2012-08-21 Thread Richard Guenther
On Tue, Aug 21, 2012 at 10:53 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Solaris 9/x86 bootstrap was broken after the cxx-conversion merge:

 In file included from /vol/gcc/src/hg/trunk/local/gcc/gengtype.c:957:
 /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected identifier 
 before n
 umeric constant
 /vol/gcc/src/hg/trunk/local/gcc/rtl.def:347: error: expected '}' before 
 numeric
 constant

 This happens since g++, unlike gcc, defines __EXTENSIONS__, which
 exposes the equivalent of

 #define PC 14

 Initially I tried to avoid this by having gengtype.c include rtl.h,
 which already has the #undef, but this produced so much fallout that I
 decided it's better to just replicate it here.

 The patch allowed an i386-pc-solaris2.9 bootstrap to finish.  I think
 this counts as obvious unless someone prefers the rtl.h route
 nonetheless.

 Ok for mainline?

Doesn't that belong in system.h instead?  And removed from rtl.h?

Thanks,
Richard.

 Rainer


 2012-08-20  Rainer Orth  r...@cebitec.uni-bielefeld.de

 * gengtype.c (PC): Undef.



 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University



[SH] PR 39423 - Add support for SH2A movu.w insn

2012-08-21 Thread Oleg Endo
Hello,

This adds support for SH2A's movu.w insn for memory addressing cases as
described in the PR.
Tested on rev 190546 with
make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

PR target/39423
* config/sh/sh.md (*movhi_index_disp): Add support for SH2A 
movu.w insn.

testsuite/ChangeLog:

PR target/39423
* gcc.target/sh/pr39423-2.c: New.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190459)
+++ gcc/config/sh/sh.md	(working copy)
@@ -5667,12 +5667,35 @@
(clobber (reg:SI T_REG))]
   TARGET_SH1
   #
-   1
-  [(parallel [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
-	  (clobber (reg:SI T_REG))])
-   (set (match_dup 0) (zero_extend:SI (match_dup 2)))]
+   can_create_pseudo_p ()
+  [(const_int 0)]
 {
-  operands[2] = gen_lowpart (HImode, operands[0]);
+  rtx mem = operands[1];
+  rtx plus0_rtx = XEXP (mem, 0);
+  rtx plus1_rtx = XEXP (plus0_rtx, 0);
+  rtx mult_rtx = XEXP (plus1_rtx, 0);
+
+  rtx op_1 = XEXP (mult_rtx, 0);
+  rtx op_2 = GEN_INT (exact_log2 (INTVAL (XEXP (mult_rtx, 1;
+  rtx op_3 = XEXP (plus1_rtx, 1);
+  rtx op_4 = XEXP (plus0_rtx, 1);
+  rtx op_5 = gen_reg_rtx (SImode);
+  rtx op_6 = gen_reg_rtx (SImode);
+  rtx op_7 = replace_equiv_address (mem, gen_rtx_PLUS (SImode, op_6, op_4));
+
+  emit_insn (gen_ashlsi3 (op_5, op_1, op_2));
+  emit_insn (gen_addsi3 (op_6, op_5, op_3));
+
+  /* On SH2A the movu.w insn can be used for zero extending loads.  */
+  if (TARGET_SH2A)
+emit_insn (gen_zero_extendhisi2 (operands[0], op_7));
+  else
+{
+  emit_insn (gen_extendhisi2 (operands[0], op_7));
+  emit_insn (gen_zero_extendhisi2 (operands[0],
+   gen_lowpart (HImode, operands[0])));
+}
+  DONE;
 })
 
 (define_insn_and_split *movsi_index_disp
Index: gcc/testsuite/gcc.target/sh/pr39423-2.c
===
--- gcc/testsuite/gcc.target/sh/pr39423-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr39423-2.c	(revision 0)
@@ -0,0 +1,14 @@
+/* Check that displacement addressing is used for indexed addresses with a
+   small offset, instead of re-calculating the index and that the movu.w
+   instruction is used on SH2A.  */
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-options -O2 } */
+/* { dg-skip-if  { sh*-*-* } { * } { -m2a* } } */
+/* { dg-final { scan-assembler-not add\t#1 } } */
+/* { dg-final { scan-assembler movu.w } } */
+
+int
+test_00 (unsigned short tab[], int index)
+{
+  return tab[index + 1];
+}


Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-21 Thread Richard Guenther
On Mon, 20 Aug 2012, Richard Guenther wrote:

 
 This shrinks TREE_VEC from 40 bytes to 32 bytes and SSA_NAME from
 80 bytes to 72 bytes on a 64bit host.  Both structures suffer
 from the fact they need storage for an integer (length and version)
 which leaves unused padding.  Both data structures do not require
 as many flag bits as we keep in tree_base though, so they can
 conveniently use the upper 4-bytes of the 8-bytes tree_base to
 store length / version.
 
 I added a union to tree_base to divide up the space between flags
 (possibly) used for all tree kinds and flags that are not used
 for those who chose to re-use the upper 4-bytes of tree_base for
 something else.
 
 This superseeds the patch that moved the C++ specific usage of
 TREE_CHAIN on TREE_VECs to tree_base (same savings, but TREE_VEC
 isn't any closer to be based on tree_base only).
 
 Due to re-use of flags from frontends definitive checking for
 flag accesses is not always possible (TREE_NOTRHOW for example).
 Where appropriate I added TREE_NOT_CHECK (NODE, TREE_VEC) instead,
 to catch mis-uses of the C++ frontend.  Changed ARGUMENT_PACK_INCOMPLETE_P
 to use TREE_ADDRESSABLE instead of TREE_LANG_FLAG_0 then which
 it used on TREE_VECs.
 
 We are very lazy adjusting flag usage documentation :/
 
 Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

After discussion on IRC I added !SSA_NAME checking to the lang flag
accessors.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-21  Richard Guenther  rguent...@suse.de

cp/
* cp-tree.h (TREE_INDIRECT_USING): Use TREE_LANG_FLAG_0 accessor.
(ATTR_IS_DEPENDENT): Likewise.
(ARGUMENT_PACK_INCOMPLETE_P): Use TREE_ADDRESSABLE instead of
TREE_LANG_FLAG_0 on TREE_VECs.

* tree.h (struct tree_base): Add union to make it possible to
re-use the upper 4 bytes for tree codes that do not need as
many flags as others.  Move visited and default_def_flag to
common bits section in exchange for saturating_flag and
unsigned_flag.  Add SSA name version and tree vec length
fields here.
(struct tree_vec): Remove length field here.
(struct tree_ssa_name): Remove version field here.

Index: trunk/gcc/cp/cp-tree.h
===
*** trunk.orig/gcc/cp/cp-tree.h 2012-08-20 12:47:47.0 +0200
--- trunk/gcc/cp/cp-tree.h  2012-08-20 13:53:05.212969994 +0200
*** struct GTY((variable_size)) lang_decl {
*** 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0)
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) (TREE_LIST_CHECK (NODE)-base.lang_flag_0)
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
--- 2520,2530 
  
  /* In a TREE_LIST concatenating using directives, indicate indirect
 directives  */
! #define TREE_INDIRECT_USING(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  /* In a TREE_LIST in an attribute list, indicates that the attribute
 must be applied at instantiation time.  */
! #define ATTR_IS_DEPENDENT(NODE) TREE_LANG_FLAG_0 (TREE_LIST_CHECK (NODE))
  
  extern tree decl_shadowed_for_var_lookup (tree);
  extern void decl_shadowed_for_var_insert (tree, tree);
*** extern void decl_shadowed_for_var_insert
*** 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_LANG_FLAG_0 (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
--- 2881,2887 
 arguments will be placed into the beginning of the argument pack,
 but additional arguments might still be deduced.  */
  #define ARGUMENT_PACK_INCOMPLETE_P(NODE)\
!   TREE_ADDRESSABLE (ARGUMENT_PACK_ARGS (NODE))
  
  /* When ARGUMENT_PACK_INCOMPLETE_P, stores the explicit template
 arguments used to fill this pack.  */
Index: trunk/gcc/tree.h
===
*** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
--- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200
*** enum omp_clause_code
*** 417,423 
 so all nodes have these fields.
  
 See the accessor macros, defined below, for documentation of the
!fields.  */
  
  struct GTY(()) tree_base {
ENUM_BITFIELD(tree_code) code : 16;
--- 417,424 
 so all nodes have these fields.
  
 See the accessor macros, defined below, for documentation of the
!fields, and the table below which connects the fileds and the
!

Re: [PATCH][RFC] Move TREE_VEC length and SSA_NAME version into tree_base

2012-08-21 Thread Jay Foad
On 21 August 2012 10:58, Richard Guenther rguent...@suse.de wrote:
 Index: trunk/gcc/tree.h
 ===
 *** trunk.orig/gcc/tree.h   2012-08-20 12:47:47.0 +0200
 --- trunk/gcc/tree.h2012-08-21 10:32:47.717394657 +0200
 *** enum omp_clause_code
 *** 417,423 
  so all nodes have these fields.

  See the accessor macros, defined below, for documentation of the
 !fields.  */

   struct GTY(()) tree_base {
 ENUM_BITFIELD(tree_code) code : 16;
 --- 417,424 
  so all nodes have these fields.

  See the accessor macros, defined below, for documentation of the
 !fields, and the table below which connects the fileds and the
 !accessor macros.  */

Typo fileds.

Jay.


Re: [PATCH] Add valgrind support to alloc-pool.c

2012-08-21 Thread Richard Guenther
On Sat, Aug 18, 2012 at 9:56 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Sat, Aug 18, 2012 at 6:17 AM, Andrew Pinski pins...@gmail.com wrote:
 Hi,
   I implemented this patch almost 6 years ago when the df branch was
 being worked on.  It adds valgrind support to  alloc-pool.c to catch
 cases of using memory after free the memory.

 OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

 Ok.

It doesn't work.  Did you check with valgrind checking?

/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c: In function 'void*
pool_alloc(alloc_pool)':
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
primary-expression before 'int'
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
')' before 'int'
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:250:3: error: expected
')' before ';' token
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:263:3: error: 'size'
was not declared in this scope
/space/rguenther/tramp3d/trunk/gcc/alloc-pool.c:303:7: error: 'size'
was not declared in this scope

that's because VALGRIND_DISCARD is not what you think it is.

Testing a fix ...

Richard.

 Thanks,
 Richard.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * alloc-pool.c (pool_alloc): Add valgrind markers.
 (pool_free): Likewise.


[C++ PATCH] Add overflow checking to __cxa_vec_new[23]

2012-08-21 Thread Florian Weimer
I don't think there are any callers out there, but let's fix this for 
completeness.


A compiler emitting code to call this function would still have to 
perform overflow checks for the new T[n][m] case, so this interface is 
not as helpful as it looks at first glance.


Tested on x86_64-redhat-linux-gnu.

--
Florian Weimer / Red Hat Product Security Team
2012-08-21  Florian Weimer  fwei...@redhat.com

	* libsupc++/vec.cc (compute_size): New function.
	(__cxa_vec_new2, __cxa_vec_new3): Use it.

2012-08-21  Florian Weimer  fwei...@redhat.com

	* g++.old-deja/g++.abi/cxa_vec.C (test5, test6): New.

diff --git a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
index f3d602f..e2b82e7 100644
--- a/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
+++ b/gcc/testsuite/g++.old-deja/g++.abi/cxa_vec.C
@@ -8,7 +8,7 @@
 // Avoid use of none-overridable new/delete operators in shared
 // { dg-options -static { target *-*-mingw* } }
 // Test __cxa_vec routines
-// Copyright (C) 2000, 2005 Free Software Foundation, Inc.
+// Copyright (C) 2000-2012 Free Software Foundation, Inc.
 // Contributed by Nathan Sidwell 7 Apr 2000 nathan@nat...@codesourcery.com
 
 #if defined (__GXX_ABI_VERSION)  __GXX_ABI_VERSION = 100
@@ -255,6 +255,80 @@ void test4 ()
   return;
 }
 
+static const std::size_t large_size = std::size_t(1)  (sizeof(std::size_t) * 8 - 2);
+
+// allocate an array whose size causes an overflow during multiplication
+void test5 ()
+{
+  static bool started = false;
+
+  if (!started)
+{
+  started = true;
+  std::set_terminate (test0);
+
+  ctor_count = dtor_count = 1;
+  dtor_repeat = false;
+  blocks = 0;
+
+  try
+{
+  void *ary = abi::__cxa_vec_new (large_size, 4, padding, ctor, dtor);
+	  longjmp (jump, 1);
+}
+  catch (std::bad_alloc)
+	{
+	  if (ctor_count != 1)
+	longjmp (jump, 4);
+	}
+  catch (...)
+{
+  longjmp (jump, 2);
+}
+}
+  else
+{
+  longjmp (jump, 3);
+}
+  return;
+}
+
+// allocate an array whose size causes an overflow during addition
+void test6 ()
+{
+  static bool started = false;
+
+  if (!started)
+{
+  started = true;
+  std::set_terminate (test0);
+
+  ctor_count = dtor_count = 1;
+  dtor_repeat = false;
+  blocks = 0;
+
+  try
+{
+  void *ary = abi::__cxa_vec_new (std::size_t(-1) / 4, 4, padding, ctor, dtor);
+	  longjmp (jump, 1);
+}
+  catch (std::bad_alloc)
+	{
+	  if (ctor_count != 1)
+	longjmp (jump, 4);
+	}
+  catch (...)
+{
+  longjmp (jump, 2);
+}
+}
+  else
+{
+  longjmp (jump, 3);
+}
+  return;
+}
+
 static void (*tests[])() =
 {
   test0,
@@ -262,6 +336,8 @@ static void (*tests[])() =
   test2,
   test3,
   test4,
+  test5,
+  test6,
   NULL
 };
 
diff --git a/libstdc++-v3/libsupc++/vec.cc b/libstdc++-v3/libsupc++/vec.cc
index 700c5ef..bfce117 100644
--- a/libstdc++-v3/libsupc++/vec.cc
+++ b/libstdc++-v3/libsupc++/vec.cc
@@ -1,6 +1,6 @@
 // New abi Support -*- C++ -*-
 
-// Copyright (C) 2000, 2001, 2003, 2004, 2009, 2011
+// Copyright (C) 2000-2012
 // Free Software Foundation, Inc.
 //  
 // This file is part of GCC.
@@ -59,6 +59,19 @@ namespace __cxxabiv1
   globals-caughtExceptions = p-nextException;
   globals-uncaughtExceptions += 1;
 }
+
+// Compute the total size with overflow checking.
+std::size_t compute_size(std::size_t element_count,
+			 std::size_t element_size,
+			 std::size_t padding_size)
+{
+  if (element_size  element_count  std::size_t(-1) / element_size)
+	throw std::bad_alloc();
+  std::size_t size = element_count * element_size;
+  if (size + padding_size  size)
+	throw std::bad_alloc();
+  return size + padding_size;
+}
   }
 
   // Allocate and construct array.
@@ -83,7 +96,8 @@ namespace __cxxabiv1
 		 void *(*alloc) (std::size_t),
 		 void (*dealloc) (void *))
   {
-std::size_t size = element_count * element_size + padding_size;
+std::size_t size
+  = compute_size(element_count, element_size, padding_size);
 char *base = static_cast char * (alloc (size));
 if (!base)
   return base;
@@ -124,7 +138,8 @@ namespace __cxxabiv1
 		 void *(*alloc) (std::size_t),
 		 void (*dealloc) (void *, std::size_t))
   {
-std::size_t size = element_count * element_size + padding_size;
+std::size_t size
+  = compute_size(element_count, element_size, padding_size);
 char *base = static_castchar *(alloc (size));
 if (!base)
   return base;


[SH] Use more multi-line asm outputs

2012-08-21 Thread Oleg Endo
Hello,

This mainly converts the asm outputs to multi-line strings and uses tab
chars instead of '\\t' in the asm strings, in the hope to make stuff
easier to read and a bit more consistent.
Tested on rev 190546 with
make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}

and no new failures.
OK?

Cheers,
Oleg

ChangeLog:

* config/sh/sh.md (cmpeqdi_t, cmpgtdi_t, cmpgedi_t, cmpgeudi_t,
cmpgtudi_t, *movsicc_t_false, *movsicc_t_true, divsi_inv20, 
negsi_cond, truncdihi2, ic_invalidate_line_i, 
ic_invalidate_line_sh4a, ic_invalidate_line_media, movdf_i4,
calli_pcrel, call_valuei, call_valuei_pcrel, sibcalli_pcrel,
sibcall_compact, sibcall_valuei_pcrel, sibcall_value_compact,
casesi_worker_1, casesi_worker_2, bandreg_m2a, borreg_m2a, 
bxorreg_m2a, sp_switch_1, sp_switch_2, stack_protect_set_si,
stack_protect_set_si_media, stack_protect_set_di_media,
stack_protect_test_si, stack_protect_test_si_media,
stack_protect_test_di_media): Convert to multi-line asm output
strings.
(divsi_inv_qitable, divsi_inv_hitable): Use single-alternative 
asm output.
(*andsi3_bclr, rotldi3_mextr, rotrdi3_mextr, calli, 
call_valuei_tbr_rel, movml_push_banked, movml_pop_banked, 
bclr_m2a, bclrmem_m2a, bset_m2a, bsetmem_m2a, bst_m2a, bld_m2a, 
bldsign_m2a, bld_reg, *bld_regqi, band_m2a, bor_m2a, bxor_m2a, 
mextr_rl, *mextr_lr, ): Use tab char instead of '\\t'.
(iordi3): Use braced string.
(*movsi_pop): Use tab chars instead of spaces.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 190546)
+++ gcc/config/sh/sh.md	(working copy)
@@ -541,12 +541,10 @@
 
 ;; On the SH and SH2, the rte instruction reads the return pc from the stack,
 ;; and thus we can't put a pop instruction in its delay slot.
-;; ??? On the SH3, the rte instruction does not use the stack, so a pop
+;; On the SH3 and SH4, the rte instruction does not use the stack, so a pop
 ;; instruction can go in the delay slot.
-
 ;; Since a normal return (rts) implicitly uses the PR register,
 ;; we can't allow PR register loads in an rts delay slot.
-
 (define_delay
   (eq_attr type return)
   [(and (eq_attr in_delay_slot yes)
@@ -1154,9 +1152,21 @@
 	(eq:SI (match_operand:DI 0 arith_reg_operand r,r)
 	   (match_operand:DI 1 arith_reg_or_0_operand N,r)))]
   TARGET_SH1
-  @
-	tst	%S0,%S0\;bf	%,Ldi%=\;tst	%R0,%R0\\n%,Ldi%=:
-	cmp/eq	%S1,%S0\;bf	%,Ldi%=\;cmp/eq	%R1,%R0\\n%,Ldi%=:
+{
+  static const char* alt[] =
+  {
+   tst	%S0,%S0	\n
+	bf	0f		\n
+	tst	%R0,%R0	\n
+0:,
+
+   cmp/eq	%S1,%S0	\n
+	bf	0f		\n
+	cmp/eq	%R1,%R0	\n
+0:
+  };
+  return alt[which_alternative];
+}
   [(set_attr length 6)
(set_attr type arith3b)])
 
@@ -1189,9 +1199,23 @@
 	(gt:SI (match_operand:DI 0 arith_reg_operand r,r)
 	   (match_operand:DI 1 arith_reg_or_0_operand r,N)))]
   TARGET_SH2
-  @
-	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/gt\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=:
-	tst\\t%S0,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/pl\\t%S0\;cmp/hi\\t%S0,%R0\\n%,Ldi%=:
+{
+  static const char* alt[] =
+  {
+   cmp/eq	%S1,%S0	\n
+	bf{.|/}s	0f	\n
+	cmp/gt	%S1,%S0	\n
+	cmp/hi	%R1,%R0	\n
+0:,
+
+tst	%S0,%S0	\n
+	bf{.|/}s	0f	\n
+	cmp/pl	%S0		\n
+	cmp/hi	%S0,%R0	\n
+0:
+  };
+  return alt[which_alternative];
+}
   [(set_attr length 8)
(set_attr type arith3)])
 
@@ -1200,9 +1224,19 @@
 	(ge:SI (match_operand:DI 0 arith_reg_operand r,r)
 	   (match_operand:DI 1 arith_reg_or_0_operand r,N)))]
   TARGET_SH2
-  @
-	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/ge\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=:
-	cmp/pz\\t%S0
+{
+  static const char* alt[] =
+  {
+   cmp/eq	%S1,%S0	\n
+	bf{.|/}s	0f	\n
+	cmp/ge	%S1,%S0	\n
+	cmp/hs	%R1,%R0	\n
+0:,
+
+   cmp/pz	%S0
+  };
+  return alt[which_alternative];
+}
   [(set_attr length 8,2)
(set_attr type arith3,mt_group)])
 
@@ -1215,7 +1249,13 @@
 	(geu:SI (match_operand:DI 0 arith_reg_operand r)
 		(match_operand:DI 1 arith_reg_operand r)))]
   TARGET_SH2
-  cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hs\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=:
+{
+  return   cmp/eq	%S1,%S0	\n
+	 	bf{.|/}s	0f	\n
+	 	cmp/hs	%S1,%S0	\n
+	 	cmp/hs	%R1,%R0	\n
+	 0:;
+}
   [(set_attr length 8)
(set_attr type arith3)])
 
@@ -1224,7 +1264,13 @@
 	(gtu:SI (match_operand:DI 0 arith_reg_operand r)
 		(match_operand:DI 1 arith_reg_operand r)))]
   TARGET_SH2
-  cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/hi\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=:
+{
+  return   cmp/eq	%S1,%S0	\n
+	 	bf{.|/}s	0f	\n
+	 	cmp/hi	%S1,%S0	\n
+	 	cmp/hi	%R1,%R0	\n
+	 0:;
+}
   [(set_attr length 8)
(set_attr type arith3)])
 
@@ -1276,7 +1322,7 @@
   cmpgtu	%N1, %N2, %0
   [(set_attr type cmp_media)])
 
-; These two patterns 

[PATCH] Fix more leaks

2012-08-21 Thread Richard Guenther

This fixes a few more heap leaks.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-08-21  Richard Guenther  rguent...@suse.de

* tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
the affine expansion cache.
* tree-ssa-dom.c (free_expr_hash_elt_contents): New function,
split out from ...
(free_expr_hash_elt): ... this one.
(record_cond): Properly free a not needed hashtable element.
(lookup_avail_expr): Likewise.
* tree-into-ssa.c (init_ssa_renamer): Specify a free function
for the var_infos hashtable.
(update_ssa): Likewise.

Index: gcc/tree-ssa-loop-im.c
===
*** gcc/tree-ssa-loop-im.c  (revision 190533)
--- gcc/tree-ssa-loop-im.c  (working copy)
*** tree_ssa_lim_finalize (void)
*** 2634,2640 
VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop);
  
if (memory_accesses.ttae_cache)
! pointer_map_destroy (memory_accesses.ttae_cache);
  }
  
  /* Moves invariants from loops.  Only expensive invariants are moved out --
--- 2634,2640 
VEC_free (bitmap, heap, memory_accesses.all_refs_stored_in_loop);
  
if (memory_accesses.ttae_cache)
! free_affine_expand_cache (memory_accesses.ttae_cache);
  }
  
  /* Moves invariants from loops.  Only expensive invariants are moved out --
Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c  (revision 190533)
--- gcc/tree-ssa-dom.c  (working copy)
*** print_expr_hash_elt (FILE * stream, cons
*** 649,667 
  }
  }
  
! /* Delete an expr_hash_elt and reclaim its storage.  */
  
  static void
! free_expr_hash_elt (void *elt)
  {
-   struct expr_hash_elt *element = ((struct expr_hash_elt *)elt);
- 
if (element-expr.kind == EXPR_CALL)
  free (element-expr.ops.call.args);
! 
!   if (element-expr.kind == EXPR_PHI)
  free (element-expr.ops.phi.args);
  
free (element);
  }
  
--- 649,672 
  }
  }
  
! /* Delete variable sized pieces of the expr_hash_elt ELEMENT.  */
  
  static void
! free_expr_hash_elt_contents (struct expr_hash_elt *element)
  {
if (element-expr.kind == EXPR_CALL)
  free (element-expr.ops.call.args);
!   else if (element-expr.kind == EXPR_PHI)
  free (element-expr.ops.phi.args);
+ }
+ 
+ /* Delete an expr_hash_elt and reclaim its storage.  */
  
+ static void
+ free_expr_hash_elt (void *elt)
+ {
+   struct expr_hash_elt *element = ((struct expr_hash_elt *)elt);
+   free_expr_hash_elt_contents (element);
free (element);
  }
  
*** lookup_avail_expr (gimple stmt, bool ins
*** 2404,2412 
slot = htab_find_slot_with_hash (avail_exprs, element, element.hash,
   (insert ? INSERT : NO_INSERT));
if (slot == NULL)
! return NULL_TREE;
! 
!   if (*slot == NULL)
  {
struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt);
*element2 = element;
--- 2409,2419 
slot = htab_find_slot_with_hash (avail_exprs, element, element.hash,
   (insert ? INSERT : NO_INSERT));
if (slot == NULL)
! {
!   free_expr_hash_elt_contents (element);
!   return NULL_TREE;
! }
!   else if (*slot == NULL)
  {
struct expr_hash_elt *element2 = XNEW (struct expr_hash_elt);
*element2 = element;
*** lookup_avail_expr (gimple stmt, bool ins
*** 2422,2427 
--- 2429,2436 
VEC_safe_push (expr_hash_elt_t, heap, avail_exprs_stack, element2);
return NULL_TREE;
  }
+   else
+ free_expr_hash_elt_contents (element);
  
/* Extract the LHS of the assignment so that it can be used as the current
   definition of another variable.  */
Index: gcc/tree-into-ssa.c
===
*** gcc/tree-into-ssa.c (revision 190533)
--- gcc/tree-into-ssa.c (working copy)
*** init_ssa_renamer (void)
*** 2291,2297 
/* Allocate memory for the DEF_BLOCKS hash table.  */
gcc_assert (var_infos == NULL);
var_infos = htab_create (VEC_length (tree, cfun-local_decls),
!  var_info_hash, var_info_eq, NULL);
  
bitmap_obstack_initialize (update_ssa_obstack);
  }
--- 2291,2297 
/* Allocate memory for the DEF_BLOCKS hash table.  */
gcc_assert (var_infos == NULL);
var_infos = htab_create (VEC_length (tree, cfun-local_decls),
!  var_info_hash, var_info_eq, free);
  
bitmap_obstack_initialize (update_ssa_obstack);
  }
*** update_ssa (unsigned update_flags)
*** 3170,3176 
  {
/* If we rename bare symbols initialize the mapping to
   auxiliar info we need to keep track of.  */
!   var_infos = htab_create (47, var_info_hash, var_info_eq, NULL);
  
/* If we have to rename some symbols from 

[PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c

2012-08-21 Thread Richard Guenther

Testing in progress.

Richard.

2012-08-21  Richard Guenther  rguent...@suse.de

* alloc-pool.c (pool_alloc): Fix valgrind annotation.
* tree.h: Complete flags documentation.
(CLEANUP_EH_ONLY): Check documented allowed tree codes.

Index: gcc/alloc-pool.c
===
--- gcc/alloc-pool.c(revision 190558)
+++ gcc/alloc-pool.c(working copy)
@@ -247,7 +247,9 @@ void *
 pool_alloc (alloc_pool pool)
 {
   alloc_pool_list header;
-  VALGRIND_DISCARD (int size);
+#ifdef ENABLE_VALGRIND_CHECKING
+  int size;
+#endif
 
   if (GATHER_STATISTICS)
 {
@@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool)
 }
 
   gcc_checking_assert (pool);
-  VALGRIND_DISCARD (size = pool-elt_size - offsetof (allocation_object, 
u.data));
+#ifdef ENABLE_VALGRIND_CHECKING
+  size = pool-elt_size - offsetof (allocation_object, u.data);
+#endif
 
   /* If there are no more free elements, make some more!.  */
   if (!pool-returned_free_list)
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 190558)
+++ gcc/tree.h  (working copy)
@@ -417,7 +417,7 @@ enum omp_clause_code
so all nodes have these fields.
 
See the accessor macros, defined below, for documentation of the
-   fields, and the table below which connects the fileds and the
+   fields, and the table below which connects the fields and the
accessor macros.  */
 
 struct GTY(()) tree_base {
@@ -494,6 +494,9 @@ struct GTY(()) tree_base {
CASE_LOW_SEEN in
CASE_LABEL_EXPR
 
+   PREDICT_EXPR_OUTCOME in
+  PREDICT_EXPR
+
static_flag:
 
TREE_STATIC in
@@ -576,12 +579,16 @@ struct GTY(()) tree_base {
 
OMP_PARALLEL_COMBINED in
OMP_PARALLEL
+
OMP_CLAUSE_PRIVATE_OUTER_REF in
   OMP_CLAUSE_PRIVATE
 
TYPE_REF_IS_RVALUE in
   REFERENCE_TYPE
 
+   ENUM_IS_OPAQUE in
+  ENUMERAL_TYPE
+
protected_flag:
 
TREE_PROTECTED in
@@ -1117,7 +1124,8 @@ extern void omp_clause_range_check_faile
 /* In a TARGET_EXPR or WITH_CLEANUP_EXPR, means that the pertinent cleanup
should only be executed if an exception is thrown, not on normal exit
of its scope.  */
-#define CLEANUP_EH_ONLY(NODE) ((NODE)-base.static_flag)
+#define CLEANUP_EH_ONLY(NODE) \
+  (TREE_CHECK2 (NODE, TARGET_EXPR, WITH_CLEANUP_EXPR)-base.static_flag)
 
 /* In a TRY_CATCH_EXPR, means that the handler should be considered a
separate cleanup in honor_protect_cleanup_actions.  */


Re: [PATCH] Document tree.h flags more, fixup valgrind alloc-pool.c

2012-08-21 Thread Richard Guenther
On Tue, 21 Aug 2012, Richard Guenther wrote:

 
 Testing in progress.
 
 Richard.
 
 2012-08-21  Richard Guenther  rguent...@suse.de
 
   * alloc-pool.c (pool_alloc): Fix valgrind annotation.
   * tree.h: Complete flags documentation.
   (CLEANUP_EH_ONLY): Check documented allowed tree codes.

I have instead applied the following - the C++ frontend uses
CLEANUP_EH_ONLY on C++ specific trees.

Bootstrapped on x86_64-unknown-linux-gnu.

Richard.

2012-08-21  Richard Guenther  rguent...@suse.de

* alloc-pool.c (pool_alloc): Fix valgrind annotation.
* tree.h: Fix typo and complete flags documentation.

Index: gcc/alloc-pool.c
===
--- gcc/alloc-pool.c(revision 190558)
+++ gcc/alloc-pool.c(working copy)
@@ -247,7 +247,9 @@ void *
 pool_alloc (alloc_pool pool)
 {
   alloc_pool_list header;
-  VALGRIND_DISCARD (int size);
+#ifdef ENABLE_VALGRIND_CHECKING
+  int size;
+#endif
 
   if (GATHER_STATISTICS)
 {
@@ -260,7 +262,9 @@ pool_alloc (alloc_pool pool)
 }
 
   gcc_checking_assert (pool);
-  VALGRIND_DISCARD (size = pool-elt_size - offsetof (allocation_object, 
u.data));
+#ifdef ENABLE_VALGRIND_CHECKING
+  size = pool-elt_size - offsetof (allocation_object, u.data);
+#endif
 
   /* If there are no more free elements, make some more!.  */
   if (!pool-returned_free_list)
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 190558)
+++ gcc/tree.h  (working copy)
@@ -417,7 +417,7 @@ enum omp_clause_code
so all nodes have these fields.
 
See the accessor macros, defined below, for documentation of the
-   fields, and the table below which connects the fileds and the
+   fields, and the table below which connects the fields and the
accessor macros.  */
 
 struct GTY(()) tree_base {
@@ -494,6 +494,9 @@ struct GTY(()) tree_base {
CASE_LOW_SEEN in
CASE_LABEL_EXPR
 
+   PREDICT_EXPR_OUTCOME in
+  PREDICT_EXPR
+
static_flag:
 
TREE_STATIC in
@@ -576,12 +579,16 @@ struct GTY(()) tree_base {
 
OMP_PARALLEL_COMBINED in
OMP_PARALLEL
+
OMP_CLAUSE_PRIVATE_OUTER_REF in
   OMP_CLAUSE_PRIVATE
 
TYPE_REF_IS_RVALUE in
   REFERENCE_TYPE
 
+   ENUM_IS_OPAQUE in
+  ENUMERAL_TYPE
+
protected_flag:
 
TREE_PROTECTED in



Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function

2012-08-21 Thread Martin Jambor
On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote:
 Hi,
 
 On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote:
   - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls
 dump_function which in turns calls dump_function_to_file which calls
 push_cfun.  But Ada front end has its idea of the
 current_function_decl and there is no cfun which is an inconsistency
 which makes push_cfun assert fail.  I solved it by temporarily
 setting current_function_decl to NULL_TREE.  It's just dumping and I
 thought that dump_function should be considered middle-end and thus
 middle-end invariants should apply.
  
  If you think that calling dump_function from 
  rest_of_subprog_body_compilation 
  is a layering violation, I don't have a problem with replacing it with a 
  more 
  manual scheme like the one in c-family/c-gimplify.c:c_genericize, 
  provided 
  that this yields roughly the same output.
 
 Richi suggested on IRC that I remove the push/pop_cfun calls from
 dump_function_to_file.  The only problem seems to be
 dump_histograms_for_stmt

Yesterday I actually tried and it is not the only problem.  Another
one is dump_function_to_file-dump_bb-maybe_hot_bb_p which uses cfun
to read profile_status.  There may be others, this one just blew up
first when I set cfun to NULL.  And in future someone is quite likely
to need cfun to dump something new too.

At the same time, re-implementing dumping
c-family/c-gimplify.c:c_genericize when dump_function suffices seems
ugly to me.

So I am going to declare dump_function a front-end interface and use
set_cfun in my original patch in dump_function_to_file like we do in
other such functions.

I hope that will be OK.  Thanks,

Martin

PS: Each of various alternatives proposed in this thread had someone
who opposed it.  If there is a consensus that some of them should be
implemented anyway (like global value profiling hash), I am willing to
do that, I just do not want to end up bickering about the result.


Re: [PATCH] Set current_function_decl in {push,pop}_cfun and push_struct_function

2012-08-21 Thread Richard Guenther
On Tue, Aug 21, 2012 at 1:27 PM, Martin Jambor mjam...@suse.cz wrote:
 On Wed, Aug 15, 2012 at 05:21:04PM +0200, Martin Jambor wrote:
 Hi,

 On Fri, Aug 10, 2012 at 04:57:41PM +0200, Eric Botcazou wrote:
   - ada/gcc-interface/utils.c:rest_of_subprog_body_compilation calls
 dump_function which in turns calls dump_function_to_file which calls
 push_cfun.  But Ada front end has its idea of the
 current_function_decl and there is no cfun which is an inconsistency
 which makes push_cfun assert fail.  I solved it by temporarily
 setting current_function_decl to NULL_TREE.  It's just dumping and I
 thought that dump_function should be considered middle-end and thus
 middle-end invariants should apply.
 
  If you think that calling dump_function from 
  rest_of_subprog_body_compilation
  is a layering violation, I don't have a problem with replacing it with a 
  more
  manual scheme like the one in c-family/c-gimplify.c:c_genericize, 
  provided
  that this yields roughly the same output.

 Richi suggested on IRC that I remove the push/pop_cfun calls from
 dump_function_to_file.  The only problem seems to be
 dump_histograms_for_stmt

 Yesterday I actually tried and it is not the only problem.  Another
 one is dump_function_to_file-dump_bb-maybe_hot_bb_p which uses cfun
 to read profile_status.  There may be others, this one just blew up
 first when I set cfun to NULL.  And in future someone is quite likely
 to need cfun to dump something new too.

 At the same time, re-implementing dumping
 c-family/c-gimplify.c:c_genericize when dump_function suffices seems
 ugly to me.

 So I am going to declare dump_function a front-end interface and use
 set_cfun in my original patch in dump_function_to_file like we do in
 other such functions.

 I hope that will be OK.  Thanks,

Setting cfun has side-effects of switching target stuff which might have
code-generation side-effects because of implementation issues we have
with target/optimize attributes.  So I don't think cfun should be changed
just for dumping.

Can you instead just set current_function_decl and access
struct function via DECL_STRUCT_FUNCTION in the dumpers then?
After all, it it is a front-end interface, the frontend way of saying
this is the current function is to set current_function_decl, not the
middle-end cfun.

Richard.

 Martin

 PS: Each of various alternatives proposed in this thread had someone
 who opposed it.  If there is a consensus that some of them should be
 implemented anyway (like global value profiling hash), I am willing to
 do that, I just do not want to end up bickering about the result.


[Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts

2012-08-21 Thread Georg-Johann Lay
Just as the title says: gcc.dg/fixed-point/convert.c is much too big to run on
embedded targets like AVR.

Note that embedded systems are a main audience of ISO/IEC TR 18037,
and that these systems might have limited resources.

The original convert.c inflates to thousands of functions and set -O0.
Some targets need to emulate *everything*, even integer multiplication,
and the executable is much too fat.

The patch breaks up convert.c in parts so that an AVR ATmega103 device
with 128KiB for executable code (.text + .data + .rodata) can run them.

Ok for trunk?

Johann

* gcc.dg/fixed-point/convert.c: Split into more manageable parts:
* gcc.dg/fixed-point/convert-1.c: New.
* gcc.dg/fixed-point/convert-2.c: New.
* gcc.dg/fixed-point/convert-3.c: New.
* gcc.dg/fixed-point/convert-4.c: New.
* gcc.dg/fixed-point/convert-float-1.c: New.
* gcc.dg/fixed-point/convert-float-2.c: New.
* gcc.dg/fixed-point/convert-float-3.c: New.
* gcc.dg/fixed-point/convert-float-4.c: New.
* gcc.dg/fixed-point/convert-accum-neg.c: New.
* gcc.dg/fixed-point/convert-sat.c: New.
* gcc.dg/fixed-point/convert.h: New.


Index: gcc/testsuite/gcc.dg/fixed-point/convert-sat.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-sat.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-sat.c	(revision 0)
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options -std=gnu99 -O0 } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include convert.h
+
+int main ()
+{
+  SAT_CONV1 (short _Accum, hk);
+  SAT_CONV1 (_Accum, k);
+  SAT_CONV1 (long _Accum, lk);
+  SAT_CONV1 (long long _Accum, llk);
+
+  SAT_CONV2 (unsigned short _Accum, uhk);
+  SAT_CONV2 (unsigned _Accum, uk);
+  SAT_CONV2 (unsigned long _Accum, ulk);
+  SAT_CONV2 (unsigned long long _Accum, ullk);
+
+  SAT_CONV3 (short _Fract, hr);
+  SAT_CONV3 (_Fract, r);
+  SAT_CONV3 (long _Fract, lr);
+  SAT_CONV3 (long long _Fract, llr);
+
+  SAT_CONV4 (signed char);
+  SAT_CONV4 (short);
+  SAT_CONV4 (int);
+  SAT_CONV4 (long);
+  SAT_CONV4 (long long);
+
+  SAT_CONV5 (unsigned char);
+  SAT_CONV5 (unsigned short);
+  SAT_CONV5 (unsigned int);
+  SAT_CONV5 (unsigned long);
+  SAT_CONV5 (unsigned long long);
+
+  SAT_CONV6 (float);
+  SAT_CONV6 (double);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-accum-neg.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options -std=gnu99 -O0 } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include convert.h
+
+int main ()
+{
+  ALL_ACCUM_CONV (short _Accum, hk);
+  ALL_ACCUM_CONV (_Accum, k);
+  ALL_ACCUM_CONV (long _Accum, lk);
+  ALL_ACCUM_CONV (long long _Accum, llk);
+  ALL_ACCUM_CONV (unsigned short _Accum, uhk);
+  ALL_ACCUM_CONV (unsigned _Accum, uk);
+  ALL_ACCUM_CONV (unsigned long _Accum, ulk);
+  ALL_ACCUM_CONV (unsigned long long _Accum, ullk);
+
+  NEG_CONV (short _Fract, hr);
+  NEG_CONV (_Fract, r);
+  NEG_CONV (long _Fract, lr);
+  NEG_CONV (long long _Fract, llr);
+  NEG_CONV (short _Accum, hk);
+  NEG_CONV (_Accum, k);
+  NEG_CONV (long _Accum, lk);
+  NEG_CONV (long long _Accum, llk);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-1.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-1.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options -std=gnu99 -O0 } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include convert.h
+
+int main ()
+{
+  ALL_CONV (short _Fract, hr);
+  ALL_CONV (_Fract, r);
+  ALL_CONV (long _Fract, lr);
+  ALL_CONV (long long _Fract, llr);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert-2.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/fixed-point/convert-2.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options -std=gnu99 -O0 } */
+
+/* C99 6.3 Conversions.
+
+   Check conversions involving fixed-point.  */
+
+extern void abort (void);
+
+#include convert.h
+
+int main ()
+{
+  ALL_CONV (unsigned short _Fract, uhr);
+  ALL_CONV (unsigned _Fract, ur);
+  ALL_CONV (unsigned long _Fract, ulr);
+  ALL_CONV (unsigned long long _Fract, ullr);
+
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/fixed-point/convert.c
===
--- gcc/testsuite/gcc.dg/fixed-point/convert.c	(revision 190558)
+++ 

Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-21 Thread Simon Baldwin
On 20 August 2012 16:45, Joseph S. Myers jos...@codesourcery.com wrote:

 On Mon, 20 Aug 2012, Simon Baldwin wrote:

   OPT_* for Fortran options only exist when the Fortran front-end is in the
   source tree (whether or not enabled).  I think we try to avoid knowingly
   breaking use cases where people remove some front ends from the source
   tree, although we don't actively test them and no longer provide split-up
   source tarballs.
 
  Thanks for the update.  Which fix should move forwards?

 I think the approach using a new option flag is the way to go, though the
 patch needs (at least) documentation for the new flag in options.texi.


Updated version appended below.  Okay for 4.8 trunk?

--

Omit OPT_cpp_ from the DWARF producer string in gfortran.

Gfortran uses -cpp=temporary file internally, and with -grecord_gcc_switches
this command line switch is stored by default in object files.  This causes
problems with build and packaging systems that care about gcc binary
reproducibility and file checksums; the temporary file is different on each
compiler invocation.

Fixed by adding a new opt marker NoDWARFRecord and associated flag, filtering
options for this this setting when writing the producer string, and setting
this flag for fortran -cpp=temporary file

Tested for fortran (suppresses -cpp=...) and c (no effect).

gcc/ChangeLog
2012-08-21  Simon Baldwin  sim...@google.com

* dwarf2out.c (gen_producer_string): Omit command line switch if
CL_NO_DWARF_RECORD flag set.
* opts.c (print_specific_help): Add CL_NO_DWARF_RECORD handling.
* opts.h (CL_NO_DWARF_RECORD): New.
* opt-functions.awk (switch_flags): Add NoDWARFRecord.
* doc/options.texi: Document NoDWARFRecord option flag.
* doc/invoke.texi: Document --help=nodwarfrecord.

gcc/fortran/ChangeLog
2012-08-21  Simon Baldwin  sim...@google.com

* lang.opt (-cpp=): Mark flag NoDWARFRecord.


Index: gcc/doc/options.texi
===
--- gcc/doc/options.texi(revision 190535)
+++ gcc/doc/options.texi(working copy)
@@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl
 specify several different languages.  Each @var{language} must have
 been declared by an earlier @code{Language} record.  @xref{Option file
 format}.
+
+@item NoDWARFRecord
+The option is added to the list of those omitted from the producer string
+written by @option{-grecord-gcc-switches}.
 @end table
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 190535)
+++ gcc/doc/invoke.texi (working copy)
@@ -1330,6 +1330,10 @@ sign in the same continuous piece of tex
 @item @samp{separate}
 Display options taking an argument that appears as a separate word
 following the original option, such as: @samp{-o output-file}.
+
+@item @samp{nodwarfrecord}
+Display only those options that are marked for addition to the list of
+options omitted from @option{-grecord-gcc-switches}.
 @end table

 Thus for example to display all the undocumented target-specific
Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 190535)
+++ gcc/dwarf2out.c (working copy)
@@ -18101,6 +18101,9 @@ gen_producer_string (void)
/* Ignore these.  */
continue;
   default:
+if (cl_options[save_decoded_options[j].opt_index].flags
+CL_NO_DWARF_RECORD)
+ continue;
 gcc_checking_assert (save_decoded_options[j].canonical_option[0][0]
 == '-');
 switch (save_decoded_options[j].canonical_option[0][1])
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 190535)
+++ gcc/opts.c  (working copy)
@@ -1186,7 +1186,9 @@ print_specific_help (unsigned int includ
 {
   if (any_flags == 0)
{
- if (include_flags  CL_UNDOCUMENTED)
+ if (include_flags  CL_NO_DWARF_RECORD)
+   description = _(The following options are not recorded by DWARF);
+  else if (include_flags  CL_UNDOCUMENTED)
description = _(The following options are not documented);
  else if (include_flags  CL_SEPARATE)
description = _(The following options take separate arguments);
@@ -1292,7 +1294,7 @@ common_handle_option (struct gcc_options
/* Walk along the argument string, parsing each word in turn.
   The format is:
   arg = [^]{word}[,{arg}]
-  word = {optimizers|target|warnings|undocumented|
+  word = {optimizers|target|warnings|undocumented|nodwarfrecord|
   params|common|language}  */
while (* a != 0)
  {
@@ -1307,6 +1309,7 @@ common_handle_option (struct gcc_options
  { target, CL_TARGET },
  { warnings, CL_WARNING },
  { undocumented, CL_UNDOCUMENTED 

Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck
Now that I have had a chance to talk to Richard, I have now done 
everything that he requested in his email.


Here is the new patch and changelog.   Everything was tested on x86-64.

2012-08-21  Kenneth Zadeck zad...@naturalbridge.com

* alias.c (rtx_equal_for_memref_p): Convert constant cases.
* combine.c (find_single_use_1, mark_used_regs_combine): Ditto.
 * cse.c (exp_equiv_p, canon_reg, fold_rtx, cse_process_notes_1,
count_reg_usage): Ditto.
* cselib.c (cselib_expand_value_rtx_1): Convert to
CASE_CONST_ANY.
(cselib_subst_to_values): Convert constant cases.
* df-scan.c (df_uses_record): Ditto.
* dse.c (const_or_frame_p): Convert case statements to explicit
if-then-else using mode classes.
* emit-rtl.c (verify_rtx_sharing, copy_insn_1): Convert constant cases.
* explow.c (convert_memory_address_addr_space): Ditto.
* gcse.c (want_to_gcse_p, oprs_unchanged_p, compute_transp): Ditto.
* genattrtab.c (attr_copy_rtx, clear_struct_flag): Ditto.
* ira.c (equiv_init_varies_p, contains_replace_regs,
memref_referenced_p, rtx_moveable_p): Ditto.
* jump.c (mark_jump_label_1): Remove constant cases.
(rtx_renumbered_equal_p): Convert to CASE_CONST_UNIQUE.
* loop-invariant.c (check_maybe_invariant): Convert constant cases.
(hash_invariant_expr_1,invariant_expr_equal_p): Convert to
CASE_CONST_ALL.
* postreload-gcse.c (oprs_unchanged_p): Convert constant cases.
* reginfo.c (reg_scan_mark_refs): Ditto.
* regrename.c (scan_rtx): Ditto.
* reload1.c (eliminate_regs_1, elimination_effects,
scan_paradoxical_subregs): Ditto.
* reload.c (operands_match_p, subst_reg_equivs):  Ditto.
* resource.c (mark_referenced_resources, mark_set_resources): Ditto.
* rtlanal.c (rtx_unstable_p, rtx_varies_p, count_occurrences)
(reg_mentioned_p, modified_between_p, modified_in_p)
(volatile_insn_p, volatile_refs_p, side_effects_p, may_trap_p_1,
inequality_comparisons_p, computed_jump_p_1): Ditto.
* rtl.c (copy_rtx, rtx_equal_p_cb, rtx_equal_p): Ditto.
* sched-deps.c (sched_analyze_2): Ditto.
* valtrack.c (cleanup_auto_inc_dec): Ditto.
* rtl.h: (CASE_CONST_SCALAR_INTEGER, CASE_CONST_UNIQUE,
CASE_CONST_ANY): New macros.


I plan to commit this in a few days unless someone has some comments.   
This is a mostly trivial patch and the changes from that are Richard 
Sandiford's and he is an rtl maintainer.


kenny

On 08/20/2012 09:58 AM, Kenneth Zadeck wrote:

I of course meant the machine independent not dependent
On 08/20/2012 09:50 AM, Kenneth Zadeck wrote:
This patch started out to be a purely mechanical change to the switch 
statements so that the ones that are used to take apart constants can 
be logically grouped. This is important for the next patch that I 
will submit this week that frees the rtl level from only being able 
to represent large integer constants with two HWIs.


I sent the patch to Richard Sandiford and when the comments came back 
from him, this patch turned into something that actually has real 
semantic changes.   (His comments are enclosed below.)   I did almost 
all of Richard's changes because he is generally right about such 
things, but it does mean that the patch has to be more carefully 
reviewed.   Richard does not count his comments as a review.


The patch has, of course, been properly tested on x86-64.

Any comments?  Ok for commit?

Kenny




diff -upNr '--exclude=.svn' gccBaseline/gcc/alias.c gccWCase/gcc/alias.c
--- gccBaseline/gcc/alias.c	2012-08-17 09:35:24.794195890 -0400
+++ gccWCase/gcc/alias.c	2012-08-19 09:48:33.666509880 -0400
@@ -1486,9 +1486,7 @@ rtx_equal_for_memref_p (const_rtx x, con
   return XSTR (x, 0) == XSTR (y, 0);
 
 case VALUE:
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
   /* There's no need to compare the contents of CONST_DOUBLEs or
 	 CONST_INTs because pointer equality is a good enough
 	 comparison for these nodes.  */
diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c	2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c	2012-08-20 15:43:34.659362244 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
 
   switch (code)
 {
-case CONST_INT:
 case CONST:
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
 case CLOBBER:
   return 0;
 
@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
 {
 case LABEL_REF:
 case SYMBOL_REF:
-case CONST_INT:
 case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
 case PC:
 case ADDR_VEC:
 case ADDR_DIFF_VEC:
diff -upNr '--exclude=.svn' gccBaseline/gcc/cse.c gccWCase/gcc/cse.c
--- gccBaseline/gcc/cse.c	2012-07-27 16:58:24.829691705 -0400
+++ gccWCase/gcc/cse.c	2012-08-20 15:47:26.924501205 -0400
@@ -2623,9 +2623,7 @@ 

Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-21 Thread Diego Novillo

On 2012-08-20 22:41 , Sriraman Tallam wrote:

Hi Gerald / Diego,

 I have made all the mentioned changes.  I also shortened the
description like Diego mentioned by removing all the strings but kept
the caveats. I have not added a reference to the documentation because
i do not know what link to reference. The builtins are completely
documented in extend.texi.


Referring to the user's manual is OK, I think.


+pCaveat: If these built-in functions are called before any static
+constructors are invoked, like during IFUNC initialization, then the CPU
+detection initialization must be explicity run using this newly provided


s/explicity/explicitly/

Other than that, it looks fine to me.


Diego.


Re: [PATCH] Combine location with block using block_locations

2012-08-21 Thread Richard Guenther
On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen de...@google.com wrote:
 ping

Conceptually I like the change.  Can a libcpp maintainer please have a 2nd
look?

Dehao, did you do any compile-time and memory-usage benchmarks?

Thanks,
Richard.

 Thanks,
 Dehao

 On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen de...@google.com wrote:
 Hi, Dodji,

 Thanks for the review. I've fixed all the addressed issues. I'm
 attaching the related changes:

 Thanks,
 Dehao

 libcpp/ChangeLog:
 2012-08-01  Dehao Chen  de...@google.com

 * include/line-map.h (MAX_SOURCE_LOCATION): New value.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (COMBINE_LOCATION_DATA): New.
 (IS_ADHOC_LOC): New.
 (expanded_location): New field.
 * line-map.c (location_adhoc_data): New.
 (location_adhoc_data_htab): New.
 (curr_adhoc_loc): New.
 (location_adhoc_data): New.
 (allocated_location_adhoc_data): New.
 (location_adhoc_data_hash): New.
 (location_adhoc_data_eq): New.
 (location_adhoc_data_update): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (linemap_lookup): Change to use new location.
 (linemap_ordinary_map_lookup): Likewise.
 (linemap_macro_map_lookup): Likewise.
 (linemap_macro_map_loc_to_def_point): Likewise.
 (linemap_macro_map_loc_unwind_toward_spel): Likewise.
 (linemap_get_expansion_line): Likewise.
 (linemap_get_expansion_filename): Likewise.
 (linemap_location_in_system_header_p): Likewise.
 (linemap_location_from_macro_expansion_p): Likewise.
 (linemap_macro_loc_to_spelling_point): Likewise.
 (linemap_macro_loc_to_def_point): Likewise.
 (linemap_macro_loc_to_exp_point): Likewise.
 (linemap_resolve_location): Likewise.
 (linemap_unwind_toward_expansion): Likewise.
 (linemap_unwind_to_first_non_reserved_loc): Likewise.
 (linemap_expand_location): Likewise.
 (linemap_dump_location): Likewise.

 Index: libcpp/line-map.c
 ===
 --- libcpp/line-map.c   (revision 190209)
 +++ libcpp/line-map.c   (working copy)
 @@ -25,6 +25,7 @@
  #include line-map.h
  #include cpplib.h
  #include internal.h
 +#include hashtab.h

  static void trace_include (const struct line_maps *, const struct line_map 
 *);
  static const struct line_map * linemap_ordinary_map_lookup (struct 
 line_maps *,
 @@ -50,6 +51,135 @@
  extern unsigned num_expanded_macros_counter;
  extern unsigned num_macro_tokens_counter;

 +/* Data structure to associate an arbitrary data to a source location.  */
 +struct location_adhoc_data {
 +  source_location locus;
 +  void *data;
 +};
 +
 +/* The following data structure encodes a location with some adhoc data
 +   and maps it to a new unsigned integer (called an adhoc location)
 +   that replaces the original location to represent the mapping.
 +
 +   The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the
 +   highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as
 +   the original location. Once identified as the adhoc_loc, the lower 31
 +   bits of the integer is used to index the location_adhoc_data array,
 +   in which the locus and associated data is stored.  */
 +
 +static htab_t location_adhoc_data_htab;
 +static source_location curr_adhoc_loc;
 +static struct location_adhoc_data *location_adhoc_data;
 +static unsigned int allocated_location_adhoc_data;
 +
 +/* Hash function for location_adhoc_data hashtable.  */
 +
 +static hashval_t
 +location_adhoc_data_hash (const void *l)
 +{
 +  const struct location_adhoc_data *lb =
 +  (const struct location_adhoc_data *) l;
 +  return (hashval_t) lb-locus + (size_t) lb-data;
 +}
 +
 +/* Compare function for location_adhoc_data hashtable.  */
 +
 +static int
 +location_adhoc_data_eq (const void *l1, const void *l2)
 +{
 +  const struct location_adhoc_data *lb1 =
 +  (const struct location_adhoc_data *) l1;
 +  const struct location_adhoc_data *lb2 =
 +  (const struct location_adhoc_data *) l2;
 +  return lb1-locus == lb2-locus  lb1-data == lb2-data;
 +}
 +
 +/* Update the hashtable when location_adhoc_data is reallocated.  */
 +
 +static int
 +location_adhoc_data_update (void **slot, void *data)
 +{
 +  *((char **) slot) += ((char *) location_adhoc_data - (char *) data);
 +  return 1;
 +}
 +
 +/* Combine LOCUS and DATA to a combined adhoc loc.  */
 +
 +source_location
 +get_combined_adhoc_loc (source_location locus, void *data)
 +{
 +  struct location_adhoc_data lb;
 +  struct location_adhoc_data **slot;
 +
 +  

[PATCH][4.7] Backport recent heap leak fixes

2012-08-21 Thread Richard Guenther

This backports the obvious heap leak fixes that have accumulated sofar.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-08-21  Richard Guenther  rguent...@suse.de

Backport from mainline
2012-08-16  Richard Guenther  rguent...@suse.de

PR middle-end/54146
* tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
exit vector.
* ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
* cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
* tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
* tree-ssa-loop-im.c (analyze_memory_references): Adjust.
(tree_ssa_lim_finalize): Free all mem_refs.
* tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
scc when bailing out.
* modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
* ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
* graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
FOR_EACH_LOOP_BREAK.

2012-08-17  Richard Guenther  rguent...@suse.de

* tree-sra.c (modify_function): Free redirect_callers vector.
* ipa-split.c (split_function): Free args_to_pass vector.
* tree-vect-stmts.c (vectorizable_operation): Do not pre-allocate
vec_oprnds.
(new_stmt_vec_info): Do not pre-allocate STMT_VINFO_SAME_ALIGN_REFS.
* tree-vect-slp.c (vect_free_slp_instance): Free the instance.
(vect_analyze_slp_instance): Free everything.
(destroy_bb_vec_info): Free the SLP instances.

2012-08-17  Richard Guenther  rguent...@suse.de
 
* params.def (integer-share-limit): Decrease from 256 to 251,
add rationale.

2012-08-21  Richard Guenther  rguent...@suse.de
 
* tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
the affine expansion cache.

Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 190560)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -2290,7 +2290,10 @@ find_loop_niter_by_eval (struct loop *lo
   /* Loops with multiple exits are expensive to handle and less important.  */
   if (!flag_expensive_optimizations
VEC_length (edge, exits)  1)
-return chrec_dont_know;
+{
+  VEC_free (edge, heap, exits);
+  return chrec_dont_know;
+}
 
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
 {
Index: gcc/ipa-pure-const.c
===
--- gcc/ipa-pure-const.c(revision 190560)
+++ gcc/ipa-pure-const.c(working copy)
@@ -803,7 +803,7 @@ end:
if (dump_file)
  fprintf (dump_file, can not prove finiteness of loop 
%i\n, loop-num);
l-looping =true;
-   break;
+   FOR_EACH_LOOP_BREAK (li);
  }
  scev_finalize ();
}
Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 190560)
+++ gcc/ipa-split.c (working copy)
@@ -1239,6 +1239,7 @@ split_function (struct split_point *spli
   }
   call = gimple_build_call_vec (node-decl, args_to_pass);
   gimple_set_block (call, DECL_INITIAL (current_function_decl));
+  VEC_free (tree, heap, args_to_pass);
 
   /* We avoid address being taken on any variable used by split part,
  so return slot optimization is always possible.  Moreover this is
Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 190560)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -3229,6 +3229,7 @@ scop_ivs_can_be_represented (scop_p scop
   loop_iterator li;
   loop_p loop;
   gimple_stmt_iterator psi;
+  bool result = true;
 
   FOR_EACH_LOOP (li, loop, 0)
 {
@@ -3244,11 +3245,16 @@ scop_ivs_can_be_represented (scop_p scop
 
  if (TYPE_UNSIGNED (type)
   TYPE_PRECISION (type) = TYPE_PRECISION 
(long_long_integer_type_node))
-   return false;
+   {
+ result = false;
+ break;
+   }
}
+  if (!result)
+   FOR_EACH_LOOP_BREAK (li);
 }
 
-  return true;
+  return result;
 }
 
 /* Builds the polyhedral representation for a SESE region.  */
Index: gcc/cfgloop.h
===
--- gcc/cfgloop.h   (revision 190560)
+++ gcc/cfgloop.h   (working copy)
@@ -629,7 +629,7 @@ fel_init (loop_iterator *li, loop_p *loo
 
 #define FOR_EACH_LOOP_BREAK(LI) \
   { \
-VEC_free (int, heap, (LI)-to_visit); \
+VEC_free (int, heap, (LI).to_visit); \
 break; \
   }
 
Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 190560)
+++ gcc/tree-ssa-structalias.c  

Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Andi Kleen
 The issue here is holding lock for all the files (that can be many) versus
 number of locks limits  possibilities for deadlocking (mind that updating
 may happen in different orders on the same files for different programs built
 from same objects)

lockf typically has a deadlock detector, and will error out.

-Andi


RE: [AARCH64] [PATCH 1/3] AArch64 Port

2012-08-21 Thread Sofiane Naci
Hi,

Thanks for the feedback. I respond here to the remaining issues:

  Index: gcc/doc/extend.texi
  ===
  --- gcc/doc/extend.texi (revision 187870)
  +++ gcc/doc/extend.texi (working copy)
  @@ -935,7 +935,8 @@
 
   Not all targets support additional floating point types.
 @code{__float80}
   and @code{__float128} types are supported on i386, x86_64 and ia64
 targets.
  -The @code{__float128} type is supported on hppa HP-UX targets.
  +The @code{__float128} type is supported on hppa HP-UX targets and
 ARM AArch64
  +targets.
 
 I don't see any good reason to support it on AArch64, since it's the
 same as long double there.  (It's on PA HP-UX as a workaround for
 libquadmath requiring the type rather than being able to with with a
 type called either long double or __float128 - libquadmath being
 used on PA HP-UX as a workaround for the system libm lacking much long
 double support.  But that shouldn't be an issue for new targets such
 as AArch64 GNU/Linux.  And my understanding from N1582 is that the C
 bindings for IEEE 754-2008, being worked on for a five-part ISO/IEC
 TS, are expected to use names such as _Float128, not __float128, as
 standard names for supported IEEE floating-point types.)

Support for __float128 has been removed.

Fixed in:
r189655 | sofiane | 2012-07-19 13:24:57 +0100 (Thu, 19 Jul 2012) | 19 lines
[AArch64] Remove __float128 support.

 
  +@opindex mbig-endian
  +Generate big-endian code. This is the default when GCC is configured
 for an
  +@samp{aarch64*be-*-*} target.
 
 In general, throughout Texinfo changes, two spaces after . at the
 end of a sentence.
 
  +@item -march=@var{name}
  +@opindex march
  +Specify the name of the target architecture, optionally suffixed by
 one or
  +more feature modifiers. This option has the form
  +@samp{-march=arch[+[no]feature]}, where the only value for
 @samp{arch}
  +is @samp{armv8}, and the possible values for @samp{feature} are
  +@samp{crypto}, @samp{fp}, @samp{simd}.
 
 It's unfortunate that you've chosen this complicated syntax that means
 the generic support for enumerated option arguments cannot be used
 (and so --help information cannot list supported CPUs and features).
 A simpler syntax where -march takes just an architecture name and
 features have separate options would seem better, and more in line
 with most other architectures supported by GCC.
 
 There are several Texinfo problems above.  Instead of feature you
 should use @var{feature}, and since the '[' and ']' are not literal
 text they should be inside @r{} - the proper way of writing
 @samp{-march=arch[+[no]feature]} would be
 @option{-march=@var{arch}@r{[}+@r{[}no@r{]}@var{feature}@r{]}}.
 
 Also, could you document what the feature names mean?

Documentation formatting has been fixed to conform to the required styling.
Also the documentation has been updated to clarify ambiguous parts or add
missing ones.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
  +@item -mcpu=@var{name}
  +@opindex mcpu
  +Specify the name of the target processor, optionally suffixed by one
 or more
  +feature modifiers. This option has the form @samp{-
 cpu=cpu[+[no]feature]},
  +where the possible values for @samp{cpu} are @samp{generic},
 @samp{large},
  +and the possible values for @samp{feature} are @samp{crypto},
 @samp{fp},
  +@samp{simd}.
 
 Same comments apply.

Same as above.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
  +This option is very similar to the -mcpu= option, except that
 instead of
 
 @option{-mcpu=}.  And does -mtune= take feature names or just plain CPU
 names?

Same as above.

Fixed in:
r188895 | belagod | 2012-06-22 18:23:05 +0100 (Fri, 22 Jun 2012) | 11 lines
[AArch64] Fix documentation layout.

 
  +   if (mvn == 0)
  + {
  +   if (widthc != 'd')
  + sprintf (templ,movi\t%%0.%d%c, %%1, lsl %d ,(64/width),
  +   widthc, shift);
  +   else
  + sprintf (templ,movi\t%%d0, %%1);
  + }
  +   else
  + sprintf (templ,mvni\t%%0.%d%c, %%1, lsl %d,(64/width),
  +   widthc, shift);
 
 Presumably you have some logic for why the 40-byte buffer size is
 enough, but could you use snprintf with sizeof (templ) specified in
 the call to protect against any mistakes in that logic?  Also, spaces
 after commas and around the / in the division, and the second line
 in the function call should be lined up immediately after the opening
 '(', not further right.  (Check for and fix all these issues elsewhere
 in the port as well; I've just pointed out a representative instance
 of them.)

sprinsf has been replaced with snprintf and sizeof (templ) as appropriate.

Fixed in:
r188896 | belagod | 2012-06-22 18:32:35 +0100 (Fri, 22 Jun 

PATCH: PR target/54347: REAL_VALUE_TO_TARGET_LONG_DOUBLE shouldn't be used in i386

2012-08-21 Thread H.J. Lu
Hi,

long double may not be 80-bit on i386.  We can't use
REAL_VALUE_TO_TARGET_LONG_DOUBLE for XFmode.  This patch replaces
REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target.  OK to install?

Thanks.

H.J.
---
2012-08-21  H.J. Lu  hongjiu...@intel.com

PR target/54347
* config/i386/i386.c (ix86_split_to_parts): Replace
REAL_VALUE_TO_TARGET_LONG_DOUBLE with real_to_target.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5da4da2..a6fc45b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20743,7 +20743,9 @@ ix86_split_to_parts (rtx operand, rtx *parts, enum 
machine_mode mode)
  parts[2] = gen_int_mode (l[2], SImode);
  break;
case XFmode:
- REAL_VALUE_TO_TARGET_LONG_DOUBLE (r, l);
+ /* We can't use REAL_VALUE_TO_TARGET_LONG_DOUBLE since
+long double may not be 80-bit.  */
+ real_to_target (l, r, mode);
  parts[2] = gen_int_mode (l[2], SImode);
  break;
case DFmode:


RE: [AARCH64] [PATCH 2/3] AArch64 Port

2012-08-21 Thread Sofiane Naci
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Joseph S. Myers
 Sent: 25 May 2012 15:24
 To: Marcus Shawcroft
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [AARCH64] [PATCH 2/3] AArch64 Port
 
 On Fri, 25 May 2012, Marcus Shawcroft wrote:
 
  Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x
  ===
  --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x   (revision
 0)
  +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.x   (revision
 0)
  @@ -0,0 +1,5 @@
  +if { [istarget aarch64_be-*-*] } then {
  +   return 1
  +}
  +
  +return 0
 
 This isn't a suitable way of enabling a test only for one endianness,
 since a test may be run with -mbig-endian or -mlittle-endian with a
 compiler defaulting to the other endianness.  You need to test an
 effective-target keyword instead.
 
  Index: gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x
  ===
  --- gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x   (revision
 0)
  +++ gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.x   (revision
 0)
  @@ -0,0 +1,5 @@
  +if { [istarget aarch64_be-*-*] } then {
  +   return 1
  +}
  +
  +return 0
 
 Likewise.

Thanks. This is now fixed in:

r190482 | sofiane | 2012-08-17 16:02:20 +0100 (Fri, 17 Aug 2012) | 9 lines
[AArch64] Use effective-target to check for big endian

Sofiane





Re: Reproducible gcc builds, gfortran, and -grecord-gcc-switches

2012-08-21 Thread Joseph S. Myers
On Tue, 21 Aug 2012, Simon Baldwin wrote:

 Index: gcc/doc/options.texi
 ===
 --- gcc/doc/options.texi  (revision 190535)
 +++ gcc/doc/options.texi  (working copy)
 @@ -468,4 +468,8 @@ of @option{-@var{opt}}, if not explicitl
  specify several different languages.  Each @var{language} must have
  been declared by an earlier @code{Language} record.  @xref{Option file
  format}.
 +
 +@item NoDWARFRecord
 +The option is added to the list of those omitted from the producer string
 +written by @option{-grecord-gcc-switches}.

Remove added to the list of those (which seems unnecessarily verbose).

 +@item @samp{nodwarfrecord}
 +Display only those options that are marked for addition to the list of
 +options omitted from @option{-grecord-gcc-switches}.

I don't think there's any need for special --help support for options with 
this flag; this flag is really an implementation detail.  (Thus, I think 
all the opts.c changes are unnecessary.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [google/gcc-4_7] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS

2012-08-21 Thread 沈涵
Hi Jing, the crosstool test passed. You can start the review, thanks! -Han

On Wed, Aug 15, 2012 at 3:11 PM, Han Shen(沈涵) shen...@google.com wrote:
 Hi Jing, ping?

 On Mon, Aug 13, 2012 at 10:58 AM, Han Shen(沈涵) shen...@google.com wrote:
 Hi, the google/gcc-4_7 fails to linking anything (on x86-generic), by
 looking into specs file, it seems that 'link_emulation' section is
 missing in specs.

 The problem is in config/i386/linux.h, SUBTARGET_EXTRA_SPECS (which is
 not empty for chrome x86-generic) is overridden by
 LINUX_GRTE_EXTRA_SPECS.

 My fix is to prepend LINUX_GRTE_EXTRA_SPECS to SUBTARGET_EXTRA_SPECS in 
 linux.h

 Jing, could you take a look at this?

 --
 Han Shen

 2012-08-13 Han Shen  shen...@google.com
 * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS): Compute
 new value of LINUX_GRTE_EXTRA_SPECS by pre-pending LINUX_GRTE_EXTRA_SPECS
 to its origin value.
 * gcc/config/i386/gnu-user.h (SUBTARGET_EXTRA_SPECS_STR): Add
 new MACRO to hold value of SUBTARET_EXTRA_SPECS so that
 SUBTARET_EXTRA_SPECS could be replaced later in gnu-user.h

 --- a/gcc/config/i386/gnu-user.h
 +++ b/gcc/config/i386/gnu-user.h
 @@ -92,11 +92,14 @@ along with GCC; see the file COPYING3.  If not see
  #define ASM_SPEC \
--32 %{!mno-sse2avx:%{mavx:-msse2avx}} %{msse2avx:%{!mavx:-msse2avx}}

 -#undef  SUBTARGET_EXTRA_SPECS
 -#define SUBTARGET_EXTRA_SPECS \
 +#undef  SUBTARGET_EXTRA_SPECS_STR
 +#define SUBTARGET_EXTRA_SPECS_STR \
{ link_emulation, GNU_USER_LINK_EMULATION },\
{ dynamic_linker, GNU_USER_DYNAMIC_LINKER }

 +#undef  SUBTARGET_EXTRA_SPECS
 +#define SUBTARGET_EXTRA_SPECS SUBTARGET_EXTRA_SPECS_STR
 +
  #undef LINK_SPEC
  #define LINK_SPEC -m %(link_emulation) %{shared:-shared} \
%{!shared: \
 --- a/gcc/config/i386/linux.h
 +++ b/gcc/config/i386/linux.h
 @@ -32,5 +32,11 @@ along with GCC; see the file COPYING3.  If not see
  #endif

  #undef  SUBTARGET_EXTRA_SPECS
 +#ifndef SUBTARGET_EXTRA_SPECS_STR
  #define SUBTARGET_EXTRA_SPECS \
LINUX_GRTE_EXTRA_SPECS
 +#else
 +#define SUBTARGET_EXTRA_SPECS \
 +  LINUX_GRTE_EXTRA_SPECS \
 +  SUBTARGET_EXTRA_SPECS_STR
 +#endif



 --
 Han Shen |  Software Engineer |  shen...@google.com |  +1-650-440-3330



-- 
Han Shen |  Software Engineer |  shen...@google.com |  +1-650-440-3330


Merge from gcc 4.7 branch to gccgo branch

2012-08-21 Thread Ian Lance Taylor
I've merged gcc 4.7 branch revision 190560 to the gccgo branch.

Ian


Re: [Patch,AVR] PR54222: Add fixed point support

2012-08-21 Thread Denis Chertykov
2012/8/13 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov wrote:
 2012/8/11 Georg-Johann Lay a...@gjlay.de:
 Weddington, Eric schrieb:
 From: Georg-Johann Lay


 The first step would be to bisect and find the patch that lead to
 PR53923.  It was not a change in the avr BE, so the question goes
 to the authors of the respective patch.

 Up to now I didn't even try to bisect; that would take years on the
 host that I have available...

 My only real concern is that this is a major feature addition and
 the AVR port is currently broken.

 I don't know if it's the avr port or some parts of the middle end that
 don't cooperate with avr.

 I would really, really love to see fixed point support added in,
 especially since I know that Sean has worked on it for quite a while,
 and you've also done a lot of work in getting the patches in shape to
 get them committed.

 But, if the AVR port is currently broken (by whomever, and whatever
 patch) and a major feature like this can't be tested to make sure it
 doesn't break anything else in the AVR backend, then I'm hesitant to
 approve (even though I really want to approve).

 I don't understand enough of DF to fix PR53923.  The insn that leads
 to the ICE is (in df-problems.c:dead_debug_insert_temp):


 Today I have updated GCC svn tree and successfully compiled avr-gcc.
 The libgcc2-mulsc3.c from   also compiled without bugs.

 Denis.

 PS: May be I'm doing something wrong ? (I had too long vacations)

 I am configuring with --target=avr --disable-nls --with-dwarf2
 --enable-languages=c,c++ --enable-target-optspace=yes 
 --enable-checking=yes,rtl

 Build GCC is gcc version 4.3.2.
 Build and host are i686-pc-linux-gnu.

 Maybe it's different on a 64-bit computer, but I only have 32-bit host.


I have debugging PR53923 and on my opinion it's not an AVR port bug.
Please commit fixed point support.

Denis.
PS: sorry for delay


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 I plan to commit this in a few days unless someone has some comments.   
 This is a mostly trivial patch and the changes from that are Richard 
 Sandiford's and he is an rtl maintainer.

Please don't do this.  Patches need to be sent for review in their
final form.  Obviously, having got this far with the patch, you're free
to beat me up if I don't review it. :-)

Anyway, please do call it CASE_CONST_SCALAR_INT rather than
CASE_CONST_SCALAR_INTEGER.  Like I said in my original mail,
CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc.,
and (as I didn't say) it'd be better not to have two spellings
of the same thing.

 diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
 --- gccBaseline/gcc/combine.c 2012-08-17 09:35:24.802195795 -0400
 +++ gccWCase/gcc/combine.c2012-08-20 15:43:34.659362244 -0400
 @@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)
 
switch (code)
  {
 -case CONST_INT:
  case CONST:
  case LABEL_REF:
  case SYMBOL_REF:
 -case CONST_DOUBLE:
 -case CONST_VECTOR:
 +CASE_CONST_UNIQUE:
  case CLOBBER:
return 0;
 
 @@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
  {
  case LABEL_REF:
  case SYMBOL_REF:
 -case CONST_INT:
  case CONST:
 -case CONST_DOUBLE:
 -case CONST_VECTOR:
 +CASE_CONST_UNIQUE:
  case PC:
  case ADDR_VEC:
  case ADDR_DIFF_VEC:

These were supposed to be CASE_CONST_ANY.  The omission of CONST_FIXED
looks like an oversight.

switch (code)
  {
 -case CONST_INT:
 -case CONST_DOUBLE:
 -case CONST_FIXED:
 +CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.

 +/* Match CONST_*s for which pointer equality corresponds to value 
 +equality.  */

Should be:

/* Match CONST_*s for which pointer equality corresponds to value equality.  */

(probably an artefact of my work mailer, sorry)

 +
 +
 +

Rather a lot of whitespace there.  One line seems enough, since we're
just before the definition of CONST_INT_P.

OK with those changes, thanks.

Richard


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Richard Sandiford rdsandif...@googlemail.com writes:
switch (code)
  {
 -case CONST_INT:
 -case CONST_DOUBLE:
 -case CONST_FIXED:
 +CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

 This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
 looks like an oversight.

Sorry, snipped the all-important:

 --- gccBaseline/gcc/loop-invariant.c  2012-07-22 16:55:01.239982968 -0400
 +++ gccWCase/gcc/loop-invariant.c 2012-08-20 16:02:30.013430970 -0400
 @@ -203,9 +203,7 @@ check_maybe_invariant (rtx x)

Richard


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck

it would have been tough without the second snippit
On 08/21/2012 01:02 PM, Richard Sandiford wrote:

Richard Sandiford rdsandif...@googlemail.com writes:

switch (code)
  {
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.

Sorry, snipped the all-important:


--- gccBaseline/gcc/loop-invariant.c2012-07-22 16:55:01.239982968 -0400
+++ gccWCase/gcc/loop-invariant.c   2012-08-20 16:02:30.013430970 -0400
@@ -203,9 +203,7 @@ check_maybe_invariant (rtx x)

Richard




Re: [Patch,testsuite] Break gcc.dg/fixed-point/convert.c into manageable parts

2012-08-21 Thread Mike Stump
On Aug 21, 2012, at 4:32 AM, Georg-Johann Lay wrote:
 The patch breaks up convert.c in parts so that an AVR ATmega103 device
 with 128KiB for executable code (.text + .data + .rodata) can run them.
 
 Ok for trunk?

Ok, but watch out for any comments from the fixed-point or the C front-end 
folks.


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Kenneth Zadeck
I am certainly not going to check it in if there are any issues with the 
patch.   However, this was basically a trivial lexicographical cleanup, 
and if no one has any comments on it after a reasonable amount of time, 
then i do feel this is ok.Obviously if anyone has any comments. that 
is completely a different issue.


I named it this way CASE_CONST_SCALAR_INTEGER because i need to 
introduce in the next patch a predicate that looks like


/* Predicate yielding true iff X is an rtx for a integer const.  */
#if TARGET_SUPPORTS_WIDE_INT == 1
#define CONST_INTEGER_P(X) \
  (CONST_INT_P (X) || CONST_WIDE_INT_P (X))
#else
#define CONST_INTEGER_P(X) \
  (CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X))
#endif

for all of the rtxs that represent an integer.  And this name was 
consistent with that.   It may be that you have a suggestion for the 
name of predicate as well but it seemed to make more sense to have the 
rtxs be consistent rather than rtx/mode consistent.


kenny

On 08/21/2012 12:56 PM, Richard Sandiford wrote:

Kenneth Zadeck zad...@naturalbridge.com writes:

I plan to commit this in a few days unless someone has some comments.
This is a mostly trivial patch and the changes from that are Richard
Sandiford's and he is an rtl maintainer.

Please don't do this.  Patches need to be sent for review in their
final form.  Obviously, having got this far with the patch, you're free
to beat me up if I don't review it. :-)

Anyway, please do call it CASE_CONST_SCALAR_INT rather than
CASE_CONST_SCALAR_INTEGER.  Like I said in my original mail,
CASE_CONST_SCALAR_INT chimes nicely with SCALAR_INT_MODE_P, etc.,
and (as I didn't say) it'd be better not to have two spellings
of the same thing.


diff -upNr '--exclude=.svn' gccBaseline/gcc/combine.c gccWCase/gcc/combine.c
--- gccBaseline/gcc/combine.c   2012-08-17 09:35:24.802195795 -0400
+++ gccWCase/gcc/combine.c  2012-08-20 15:43:34.659362244 -0400
@@ -531,12 +531,10 @@ find_single_use_1 (rtx dest, rtx *loc)

switch (code)
  {
-case CONST_INT:
  case CONST:
  case LABEL_REF:
  case SYMBOL_REF:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
  case CLOBBER:
return 0;

@@ -12788,10 +12786,8 @@ mark_used_regs_combine (rtx x)
  {
  case LABEL_REF:
  case SYMBOL_REF:
-case CONST_INT:
  case CONST:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_UNIQUE:
  case PC:
  case ADDR_VEC:
  case ADDR_DIFF_VEC:

These were supposed to be CASE_CONST_ANY.  The omission of CONST_FIXED
looks like an oversight.


switch (code)
  {
-case CONST_INT:
-case CONST_DOUBLE:
-case CONST_FIXED:
+CASE_CONST_UNIQUE:
  case SYMBOL_REF:
  case CONST:
  case LABEL_REF:

This was suppsoed to be CASE_CONST_ANY too.  The omission of CONST_VECTOR
looks like an oversight.


+/* Match CONST_*s for which pointer equality corresponds to value
+equality.  */

Should be:

/* Match CONST_*s for which pointer equality corresponds to value equality.  */

(probably an artefact of my work mailer, sorry)


+
+
+

Rather a lot of whitespace there.  One line seems enough, since we're
just before the definition of CONST_INT_P.

OK with those changes, thanks.

Richard




Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Xinliang David Li
On Tue, Aug 21, 2012 at 12:34 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Teresa has done some tunings for the unroller so far. The inliner
 tuning is the next step.

 
  What concerns me that it is greatly inaccurate - you have no idea how many
  instructions given counter is guarding and it can differ quite a lot. Also
  inlining/optimization makes working sets significantly different (by 
  factor of
  100 for tramp3d).

 The pre ipa-inline working set is the one that is needed for ipa
 inliner tuning. For post-ipa inline code increase transformations,
 some update is probably needed.

 But on the ohter hand any solution at this level will be
  greatly inaccurate. So I am curious how reliable data you can get from 
  this?
  How you take this into account for the heuristics?

 This effort is just the first step to allow good heuristics to develop.

 
  It seems to me that for this use perhaps the simple logic in histogram 
  merging
  maximizing the number of BBs for given bucket will work well?  It is
  inaccurate, but we are working with greatly inaccurate data anyway.
  Except for degenerated cases, the small and unimportant runs will have 
  small BB
  counts, while large runs will have larger counts and those are ones we 
  optimize
  for anyway.

 The working set curve for each type of applications contains lots of
 information that can be mined. The inaccuracy can also be mitigated by
 more data 'calibration'.

 Sure, I think I am leaning towards trying the solution 2) with maximizing
 counter count merging (probably it would make sense to rename it from BB count
 since it is not really BB count and thus it is misleading) and we will see how
 well it works in practice.

 We have benefits of much fewer issues with profile locking/unlocking and we
 lose bit of precision on BB counts. I tend to believe that the error will not
 be that important in practice. Another loss is more histogram streaming into
 each gcda file, but with skiping zero entries it should not be major overhead
 problem I hope.

 What do you think?

 
 
2) Do we plan to add some features in near future that will anyway 
   require global locking?
   I guess LIPO itself does not count since it streams its data into 
   independent file as you
   mentioned earlier and locking LIPO file is not that hard.
   Does LIPO stream everything into that common file, or does it use 
   combination of gcda files
   and common summary?
 
  Actually, LIPO module grouping information are stored in gcda files.
  It is also stored in a separate .imports file (one per object) ---
  this is primarily used by our build system for dependence information.
 
  I see, getting LIPO safe WRT parallel updates will be fun. How does LIPO 
  behave
  on GCC bootstrap?

 We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
 main problem for application build -- the link time (for debug build)
 is.

 I was primarily curious how the LIPOs runtime analysis fare in the situation 
 where
 you do very many small train runs on rather large app (sure GCC is small to 
 google's
 use case ;).


There will be race, but as Teresa mentioned, there is a big chance
that the process which finishes the merge the last is also t the final
overrider of the LIPO summary data.



  (i.e. it does a lot more work in the libgcov module per each
  invocation, so I am curious if it is practically useful at all).
 
  With LTO based solution a lot can be probably pushed at link time? Before
  actual GCC starts from the linker plugin, LIPO module can read gcov CFGs 
  from
  gcda files and do all the merging/updating/CFG constructions that is 
  currently
  performed at runtime, right?

 The dynamic cgraph build and analysis is still done at runtime.
 However, with the new implementation, FE is no longer involved. Gcc
 driver is modified to understand module grouping, and lto is used to
 merge the streamed output from aux modules.

 I see. Are there any fundamental reasons why it can not be done at link-time
 when all gcda files are available?

For build parallelism, the decision should be made as early as
possible -- that is what makes LIPO 'light'.

 Why the grouping is not done inside linker
 plugin?

It is not delayed into link time. In fact linker plugin is not even involved.

David



 Honza


 David


Re: [wwwdocs] Document Runtime CPU detection builtins

2012-08-21 Thread Sriraman Tallam
Committed after making the changes.

One small problem, I am not sure how to fix this:

The hyper link I referenced is :
http://gcc.gnu.org/onlinedocs/gcc/X86-Built_002din-Functions.html#X86-Built_002din-Functions

whereas the committed changes.html is pointing to:
http://gcc.gnu.org/onlinedocs/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions

Please note that the _002din is missing. This makes the link broken,
did I miss anything? I verified that I submitted the right link.

Thanks,
-Sri.

On Tue, Aug 21, 2012 at 5:41 AM, Diego Novillo dnovi...@google.com wrote:
 On 2012-08-20 22:41 , Sriraman Tallam wrote:

 Hi Gerald / Diego,

  I have made all the mentioned changes.  I also shortened the
 description like Diego mentioned by removing all the strings but kept
 the caveats. I have not added a reference to the documentation because
 i do not know what link to reference. The builtins are completely
 documented in extend.texi.


 Referring to the user's manual is OK, I think.

 +pCaveat: If these built-in functions are called before any static
 +constructors are invoked, like during IFUNC initialization, then the
 CPU
 +detection initialization must be explicity run using this newly
 provided


 s/explicity/explicitly/

 Other than that, it looks fine to me.


 Diego.


Re: C++ PATCH for c++/51675 (more constexpr unions)

2012-08-21 Thread H.J. Lu
On Wed, Feb 8, 2012 at 1:23 AM, Jason Merrill ja...@redhat.com wrote:
 More traffic on PR 51675 demonstrates that my earlier patch didn't fix the
 whole problem.  This patch improves handling of user-defined constructors.

 Tested x86_64-pc-linux-gnu, applying to trunk.

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54341

-- 
H.J.


[lra] patch to remove -flra option

2012-08-21 Thread Vladimir Makarov
The following patch mostly removes -flra option by defining a 
machine-dependent hook lra_p.  If the hook returns true, LRA is used.  
Otherwise, reload pass is used.  By default the hook returns false.  It 
returns true for 8 targets, lra was ported (i386, rs6000, arm, s390, 
ia64, sparc, mips, pa).


The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 190564.

2012-08-21  Vladimir Makarov  vmaka...@redhat.com

* targhooks.h (default_lra_p): Declare.
* targhooks.c (default_lra_p): New function.
* target.def (lra_p): New hook.
* ira.h (ira_use_lra_p): New external.
* ira.c (ira_init_once, ira_init, ira_finish_once): Call
lra_start_once, lra_init, lra_finish_once in anyway.
(ira_setup_eliminable_regset, setup_reg_renumber): Use 
ira_use_lra_p instead of

flra_lra.
(ira_use_lra_p): Define.
(ira): Set up ira_use_lra_p.  Use ira_use_lra_p instead of
flra_lra.
* dwarf2out.c: Add ira.h.
(based_loc_descr, compute_frame_pointer_to_fb_displacement): Use
ira_use_lra_p instead of ira_use_lra_p.
* rtlanal.c (simplify_subreg_regno): Add comments.
* Makefile.in (dwarf2out.c): Add dependence ira.h.
* doc/passes.texi: Change LRA pass description.
* doc/tm.texi.in: Add TARGET_LRA_P.
* doc/tm.texi: Update.
* doc/invoke.texi: Remove -flra option.
* common.opt: Remove flra option.  Add description for
flra-reg-spill.
* reginfo.c (allocate_reg_info): Fix a comment typo.
* config/arm/arm.c (TARGET_LRA_P): Define.
(arm_lra_p): New function.
* config/sparc/sparc.c (TARGET_LRA_P): Define.
(sparc_lra_p): New function.
* config/s390/s390.c (TARGET_LRA_P): Define.
(s390_lra_p): New function.
* config/i386/i386.c (TARGET_LRA_P): Define.
(ix86_lra_p): New function.
* config/rs6000/rs6000.c (TARGET_LRA_P): Define.
(rs6000_lra_p): New function.
* config/mips/mips.c (TARGET_LRA_P): Define.
(mips_lra_p): New function.
* config/pa/pa.c (TARGET_LRA_P): Define.
(pa_lra_p): New function.
* config/ia64/ia64.c (TARGET_LRA_P): Define.
(ia64_lra_p): New function.

Index: targhooks.c
===
--- targhooks.c	(revision 190448)
+++ targhooks.c	(working copy)
@@ -840,6 +840,12 @@ default_branch_target_register_class (vo
   return NO_REGS;
 }
 
+extern bool
+default_lra_p (void)
+{
+  return false;
+}
+
 int
 default_register_bank (int hard_regno ATTRIBUTE_UNUSED)
 {
Index: target.def
===
--- target.def	(revision 190448)
+++ target.def	(working copy)
@@ -2332,6 +2332,16 @@ DEFHOOK
  tree, (tree type, tree expr),
  hook_tree_tree_tree_null)
 
+/* Return true if we use LRA instead of reload.  */
+DEFHOOK
+(lra_p,
+ A target hook which returns true if we use LRA instead of reload pass.\
+  It means that LRA was ported to the target.\
+  \
+  The default version of this target hook returns always false.,
+ bool, (void),
+ default_lra_p)
+
 /* Return register bank of given hard regno for the current target.  */
 DEFHOOK
 (register_bank,
Index: targhooks.h
===
--- targhooks.h	(revision 190448)
+++ targhooks.h	(working copy)
@@ -132,6 +132,7 @@ extern rtx default_static_chain (const_t
 extern void default_trampoline_init (rtx, tree, rtx);
 extern int default_return_pops_args (tree, tree, int);
 extern reg_class_t default_branch_target_register_class (void);
+extern bool default_lra_p (void);
 extern int default_register_bank (int);
 extern bool default_different_addr_displacement_p (void);
 extern reg_class_t default_secondary_reload (bool, rtx, reg_class_t,
Index: rtlanal.c
===
--- rtlanal.c	(revision 190448)
+++ rtlanal.c	(working copy)
@@ -3501,6 +3501,7 @@ simplify_subreg_regno (unsigned int xreg
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
+  /* We can use mode change in LRA for some transformations.  */
! lra_in_progress)
 return -1;
 #endif
@@ -3511,6 +3512,8 @@ simplify_subreg_regno (unsigned int xreg
 return -1;
 
   if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
+  /* We should convert arg register in LRA after the elimination
+	 if it is possible.  */
xregno == ARG_POINTER_REGNUM
! lra_in_progress)
 return -1;
Index: ira.c
===
--- ira.c	(revision 190448)
+++ ira.c	(working copy)
@@ -1634,8 +1634,7 @@ void
 ira_init_once (void)
 {
   ira_init_costs_once ();
-  if (flag_lra)
-lra_init_once ();
+  lra_init_once ();
 }
 
 /* Free 

[patch] two more bitmap obstacks

2012-08-21 Thread Steven Bosscher
Hello,

Two more bitmap obstacks, this time in tree-ssa-coalesce.c.

The advantage isn't so much in having the bitmaps on the non-default
obstack, but more in that the bitmaps can be free'ed all at once by
simply releasing the obstack.

Bootstrappedtested on x86_64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


more_obs.diff
Description: Binary data


Re: patch for machine independent rtl section to hide case statements for different types of constants.

2012-08-21 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 I named it this way CASE_CONST_SCALAR_INTEGER because i need to 
 introduce in the next patch a predicate that looks like

 /* Predicate yielding true iff X is an rtx for a integer const.  */
 #if TARGET_SUPPORTS_WIDE_INT == 1
 #define CONST_INTEGER_P(X) \
(CONST_INT_P (X) || CONST_WIDE_INT_P (X))
 #else
 #define CONST_INTEGER_P(X) \
(CONST_INT_P (X) || CONST_DOUBLE_AS_INT_P (X))
 #endif

 for all of the rtxs that represent an integer.  And this name was 
 consistent with that.   It may be that you have a suggestion for the 
 name of predicate as well

Good guess.

 but it seemed to make more sense to have the 
 rtxs be consistent rather than rtx/mode consistent.

Yeah, I think CONST_SCALAR_INT_P would be better here too.  INTEGER
just isn't distinct enough from INT for the difference to be obvious.
It also doesn't indicate that complex integers and vector integers
are excluded.  SCALAR_INT seems a bit more precise, as well as
having precedent.

BTW, the == 1 above looks redundant.

Richard


[PATCH] fix wrong-code bug for -fstrict-volatile-bitfields

2012-08-21 Thread Sandra Loosemore
This patch is a followup to the addition of support for 
-fstrict-volatile-bitfields (required by the ARM EABI); see this thread


http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01889.html

for discussion of the original patch.

That patch only addressed the behavior when extracting the value of a 
volatile bit field, but the same problems affect storing values into a 
volatile bit field (or a field of a packed structure, which is 
effectively implemented as a bit field).  This patch makes the code for 
bitfield stores mirror that for bitfield loads.


Although the fix is in target-independent code, it's really for ARM; 
hence the test case, which (without this patch) generates wrong code. 
Code to determine the access width was correctly preserving the 
user-specified width, but was incorrectly falling through to code that 
assumes word mode.


As well as regression testing on arm-none-eabi, I've bootstrapped and 
regression-tested this patch on x86_64 Linux.  Earlier versions of this 
patch have also been present in our local 4.5 and 4.6 GCC trees for some 
time, so it's been well-tested on a variety of other platforms.  OK to 
check in on mainline?


-Sandra


2012-08-21  Paul Brook  p...@codesourcery.com
Joseph Myers jos...@codesourcery.com
Sandra Loosemore  san...@codesourcery.com

gcc/
* expr.h (store_bit_field): Add packedp parameter to prototype.
* expmed.c (store_bit_field, store_bit_field_1): Add packedp
parameter.  Adjust all callers.
(warn_misaligned_bitfield): New function, split from
extract_fixed_bit_field.
(store_fixed_bit_field): Add packedp parameter.  Fix wrong-code
behavior for the combination of misaligned bitfield and
-fstrict-volatile-bitfields.  Use warn_misaligned_bitfield.
(extract_fixed_bit_field): Use warn_misaligned_bitfield.
* expr.c: Adjust calls to store_bit_field.
(expand_assignment): Identify accesses to packed structures.
(store_field): Add packedp parameter.  Adjust callers.
* calls.c: Adjust calls to store_bit_field.
* ifcvt.c: Likewise.
* config/s390/s390.c: Likewise.

gcc/testsuite/
* gcc.target/arm/volatile-bitfields-5.c: New test case.

Index: gcc/expr.h
===
--- gcc/expr.h	(revision 190541)
+++ gcc/expr.h	(working copy)
@@ -693,7 +693,7 @@ extern void store_bit_field (rtx, unsign
 			 unsigned HOST_WIDE_INT,
 			 unsigned HOST_WIDE_INT,
 			 unsigned HOST_WIDE_INT,
-			 enum machine_mode, rtx);
+			 bool, enum machine_mode, rtx);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			  unsigned HOST_WIDE_INT, int, bool, rtx,
 			  enum machine_mode, enum machine_mode);
Index: gcc/expmed.c
===
--- gcc/expmed.c	(revision 190541)
+++ gcc/expmed.c	(working copy)
@@ -50,7 +50,7 @@ static void store_fixed_bit_field (rtx, 
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
-   rtx);
+   rtx, bool);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
    unsigned HOST_WIDE_INT,
@@ -406,7 +406,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 		   unsigned HOST_WIDE_INT bitnum,
 		   unsigned HOST_WIDE_INT bitregion_start,
 		   unsigned HOST_WIDE_INT bitregion_end,
-		   enum machine_mode fieldmode,
+		   bool packedp, enum machine_mode fieldmode,
 		   rtx value, bool fallback_p)
 {
   unsigned int unit
@@ -638,7 +638,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  if (!store_bit_field_1 (op0, new_bitsize,
   bitnum + bit_offset,
   bitregion_start, bitregion_end,
-  word_mode,
+  false, word_mode,
   value_word, fallback_p))
 	{
 	  delete_insns_since (last);
@@ -859,7 +859,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  tempreg = copy_to_reg (xop0);
 	  if (store_bit_field_1 (tempreg, bitsize, xbitpos,
  bitregion_start, bitregion_end,
- fieldmode, orig_value, false))
+ false, fieldmode, orig_value, false))
 	{
 	  emit_move_insn (xop0, tempreg);
 	  return true;
@@ -872,7 +872,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 return false;
 
   store_fixed_bit_field (op0, offset, bitsize, bitpos,
-			 bitregion_start, bitregion_end, value);
+			 bitregion_start, bitregion_end, value, packedp);
   return true;
 }
 
@@ -885,6 +885,8 @@ store_bit_field_1 (rtx str_rtx, unsigned
These two fields are 0, if the C++ memory model does not apply,
or we are not interested in keeping track of bitfield regions.
 
+   PACKEDP is true for fields with the packed attribute.
+
FIELDMODE is the machine-mode of the FIELD_DECL node for this field.  */
 
 void
@@ -892,6 +894,7 @@ store_bit_field (rtx str_rtx, unsigned H
 		 unsigned HOST_WIDE_INT bitnum,
 		 unsigned HOST_WIDE_INT bitregion_start,
 		 

Re: [PATCH, MIPS] fix MIPS16 jump table overflow

2012-08-21 Thread Richard Sandiford
Sandra Loosemore san...@codesourcery.com writes:
 In config/mips/mips.h, there is presently this comment:

 /* ??? 16-bit offsets can overflow in large functions.  */
 #define TARGET_MIPS16_SHORT_JUMP_TABLES TARGET_MIPS16_TEXT_LOADS

 A while ago we had a bug report where a big switch statement did, in 
 fact, overflow the range of 16-bit offsets, causing a runtime error.

 GCC already has generic machinery to shorten offset tables for switch 
 statements that does the necessary range checking, but it only works 
 with casesi, not the lower-level tablejump expansion.  So, this 
 patch provides a casesi expander to handle this situation.

Nice.

 This patch has been in use on our local 4.5 and 4.6 branches for about a 
 year now.  When testing it on mainline, I found it tripped over the 
 recent change to add MIPS16 branch overflow checking in other 
 situations, causing it to get into an infinite loop.  I think telling it 
 to ignore these new jump insns it doesn't know how to process is the 
 right thing to do, but I'm not sure if there's a better way to restrict 
 the condition or make mips16_split_long_branches more robust.  Richard,
 since that's your code I assume you'll suggest an alternative if this 
 doesn't meet with your approval.

Changing it to:

if (JUMP_P (insn)
 USEFUL_INSN_P (insn)
 get_attr_length (insn)  8
 (any_condjump_p (insn) || any_uncond_jump_p (insn)))

should be OK.

 @@ -5937,6 +5933,91 @@
[(set_attr type jump)
 (set_attr mode none)])
  
 +;; For MIPS16, we don't know whether a given jump table will use short or
 +;; word-sized offsets until late in compilation, when we are able to 
 determine
 +;; the sizes of the insns which comprise the containing function.  This
 +;; necessitates the use of the casesi rather than the tablejump pattern, 
 since
 +;; the latter tries to calculate the index of the offset to jump through 
 early
 +;; in compilation, i.e. at expand time, when nothing is known about the
 +;; eventual function layout.
 +
 +(define_expand casesi
 +  [(match_operand:SI 0 register_operand ); index to jump on
 +   (match_operand:SI 1 const_int_operand )   ; lower bound
 +   (match_operand:SI 2 const_int_operand )   ; total range
 +   (match_operand:SI 3  ); table label
 +   (match_operand:SI 4  )]   ; out of range label

The last two are Pmode rather than SImode.  Since there aren't different
case* patterns for different Pmodes, we can't use :P instead, so let's just
drop the modes on 4 and 5.

Would be nice to add a compile test for -mabi=64 just to make sure
that Pmode == DImode works.  A copy of an existing test like
code-readable-1.c would be fine.

 +(define_insn casesi_internal_mips16
 +  [(set (pc)
 + (if_then_else
 +   (leu (match_operand:SI 0 register_operand d)
 + (match_operand:SI 1 arith_operand dI))
 +   (mem:SI (plus:SI (mult:SI (match_dup 0) (const_int 4))
 + (label_ref (match_operand 2  
 +   (label_ref (match_operand 3  
 +   (clobber (match_scratch:SI 4 =d))
 +   (clobber (match_scratch:SI 5 =d))
 +   (clobber (reg:SI MIPS16_T_REGNUM))
 +   (use (label_ref (match_dup 2)))]

Although this is descriptive, the MEM is probably more trouble
than it's worth.  A hard-coded MEM like this will alias a lot
of things, whereas we're only reading from the function itself.
I think an unspec would be better.

This pattern should have :P for operands 4 and 5, with the pattern
name becoming:

casesi_internal_mips16_mode

PMODE_INSN should make it easy to wrap up the difference.

There shouldn't be any need for the final USE.  Let me know
if you found otherwise, because that sounds like a bug.

 +  TARGET_MIPS16_SHORT_JUMP_TABLES
 +{
 +  rtx diff_vec = PATTERN (next_real_insn (operands[2]));
 +
 +  gcc_assert (GET_CODE (diff_vec) == ADDR_DIFF_VEC);
 +  
 +  output_asm_insn (sltu\t%0, %1, operands);
 +  output_asm_insn (bteqz\t%3, operands);
 +  output_asm_insn (la\t%4, %2, operands);
 +  
 +  switch (GET_MODE (diff_vec))
 +{
 +case HImode:
 +  output_asm_insn (sll\t%5, %0, 1, operands);
 +  output_asm_insn (addu\t%5, %4, %5, operands);
 +  output_asm_insn (lh\t%5, 0(%5), operands);
 +  break;
 +
 +case SImode:
 +  output_asm_insn (sll\t%5, %0, 2, operands);
 +  output_asm_insn (addu\t%5, %4, %5, operands);
 +  output_asm_insn (lw\t%5, 0(%5), operands);
 +  break;
 +
 +default:
 +  gcc_unreachable ();
 +}
 +  
 +  output_asm_insn (addu\t%4, %4, %5, operands);
 +  
 +  return j\t%4;
 +}
 +  [(set_attr length 32)])

The addus here ought to be daddus after the :P change.

I think we can avoid the earlyclobber on operand 4 by moving the LA
after the SLL.

 +#define CASE_VECTOR_MODE ptr_mode
 +
 +/* Only use short offsets if their range will not overflow.  */
 +#define CASE_VECTOR_SHORTEN_MODE(MIN, MAX, BODY) \
 +  (TARGET_MIPS16_SHORT_JUMP_TABLES  ((MIN) 

[PATCH] Fix some leaks and one uninitialized var read

2012-08-21 Thread Jakub Jelinek
Hi!

The recent change in find_assert_locations from XCNEWVEC to XNEWVEC
caused a valgrind warning, because bb_rpo[ENTRY_BLOCK] used to
be accessed, but was never initialized.

Fixed by ignoring edges from ENTRY_BLOCK altogether.

The rest are a couple of memory leak fixes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-08-21  Jakub Jelinek  ja...@redhat.com

* tree-vrp.c (find_assert_locations): Skip also edges
from the entry block.

* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Call
free_stmt_vec_info on orig_cond after gsi_removing it.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Always
free body_cost_vec vector.
(vect_analyze_data_refs): If gather is unsuccessful,
free_data_ref (dr).
* tree-inline.c (tree_function_versioning): Free
old_transforms_to_apply vector.

--- gcc/tree-vrp.c.jj   2012-08-20 20:56:01.0 +0200
+++ gcc/tree-vrp.c  2012-08-21 12:15:32.501753048 +0200
@@ -5596,7 +5596,7 @@ find_assert_locations (void)
  FOR_EACH_EDGE (e, ei, bb-preds)
{
  int pred = e-src-index;
- if (e-flags  EDGE_DFS_BACK)
+ if ((e-flags  EDGE_DFS_BACK) || pred == ENTRY_BLOCK)
continue;
 
  if (!live[pred])
--- gcc/tree-vect-loop-manip.c.jj   2012-08-15 10:55:24.0 +0200
+++ gcc/tree-vect-loop-manip.c  2012-08-21 15:01:02.600750196 +0200
@@ -788,6 +788,7 @@ slpeel_make_loop_iterate_ntimes (struct
 
   /* Remove old loop exit test:  */
   gsi_remove (loop_cond_gsi, true);
+  free_stmt_vec_info (orig_cond);
 
   loop_loc = find_loop_location (loop);
   if (dump_file  (dump_flags  TDF_DETAILS))
--- gcc/tree-vect-data-refs.c.jj2012-08-20 11:09:45.0 +0200
+++ gcc/tree-vect-data-refs.c   2012-08-21 16:32:13.631428796 +0200
@@ -1934,10 +1934,9 @@ vect_enhance_data_refs_alignment (loop_v
  gcc_assert (stat);
   return stat;
 }
-  else
-   VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 }
 
+  VEC_free (stmt_info_for_cost, heap, body_cost_vec);
 
   /* (2) Versioning to force alignment.  */
 
@@ -3313,6 +3312,8 @@ vect_analyze_data_refs (loop_vec_info lo
gather = false;
  if (!gather)
{
+ STMT_VINFO_DATA_REF (stmt_info) = NULL;
+ free_data_ref (dr);
  if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS))
{
  fprintf (vect_dump,
--- gcc/tree-inline.c.jj2012-08-15 10:55:33.0 +0200
+++ gcc/tree-inline.c   2012-08-21 17:28:24.181069515 +0200
@@ -5089,6 +5089,7 @@ tree_function_versioning (tree old_decl,
   VEC_index (ipa_opt_pass,
  old_transforms_to_apply,
  i));
+  VEC_free (ipa_opt_pass, heap, old_transforms_to_apply);
 }
 
   id.copy_decl = copy_decl_no_change;

Jakub


libgcc patch committed: Increase non-split stack space

2012-08-21 Thread Ian Lance Taylor
When a -fsplit-stack function calls a non-split-stack function, the gold
linker automatically redirects the call to __morestack to call
__morestack_non_split instead.  I wrote __morestack_non_split to always
allocate at least 0x4000 bytes.  However, that was unclear thinking;
0x4000 bytes is sufficient for calling into libc, but it is not
sufficient for calling a general function.  This value leads to stack
overruns in ordinary code.  The default thread stack size on x86 and
x86_64 is 0x80 bytes.  This patch significantly increases the stack
size allocated for non-split code, to less than the default but still
larger, 0x10 bytes.

Probably the program should have a way to control this, but I'm not yet
sure what the right API would be for that.  In any case the default
should be larger.

Bootstrapped and ran Go testsuite and split-stack tests on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian


2012-08-21  Ian Lance Taylor  i...@google.com

* config/i386/morestack.S (__morestack_non_split): Increase amount
of space allocated for non-split code stack.


Index: config/i386/morestack.S
===
--- config/i386/morestack.S	(revision 190572)
+++ config/i386/morestack.S	(working copy)
@@ -83,6 +83,9 @@
 #endif
 
 
+# The amount of space we ask for when calling non-split-stack code.
+#define NON_SPLIT_STACK 0x10
+
 # This entry point is for split-stack code which calls non-split-stack
 # code.  When the linker sees this case, it converts the call to
 # __morestack to call __morestack_non_split instead.  We just bump the
@@ -109,7 +112,7 @@ __morestack_non_split:
 
 	movl	%esp,%eax		# Current stack,
 	subl	8(%esp),%eax		# less required stack frame size,
-	subl	$0x4000,%eax		# less space for non-split code.
+	subl	$NON_SPLIT_STACK,%eax	# less space for non-split code.
 	cmpl	%gs:0x30,%eax		# See if we have enough space.
 	jb	2f			# Get more space if we need it.
 
@@ -171,7 +174,8 @@ __morestack_non_split:
 
 	.cfi_adjust_cfa_offset -4	# Account for popped register.
 
-	addl	$0x5000+BACKOFF,4(%esp)	# Increment space we request.
+	# Increment space we request.
+	addl	$NON_SPLIT_STACK+0x1000+BACKOFF,4(%esp)
 
 	# Fall through into morestack.
 
@@ -186,7 +190,7 @@ __morestack_non_split:
 
 	movq	%rsp,%rax		# Current stack,
 	subq	%r10,%rax		# less required stack frame size,
-	subq	$0x4000,%rax		# less space for non-split code.
+	subq	$NON_SPLIT_STACK,%rax	# less space for non-split code.
 
 #ifdef __LP64__
 	cmpq	%fs:0x70,%rax		# See if we have enough space.
@@ -219,7 +223,8 @@ __morestack_non_split:
 
 	.cfi_adjust_cfa_offset -8	# Adjust for popped register.
 
-	addq	$0x5000+BACKOFF,%r10	# Increment space we request.
+	# Increment space we request.
+	addq	$NON_SPLIT_STACK+0x1000+BACKOFF,%r10
 
 	# Fall through into morestack.
 


PATCH: PR middle-end/54332: [4.8 Regression] 481.wrf in SPEC CPU 2006 takes 10GB memory to compile

2012-08-21 Thread H.J. Lu
Hi,

This patch restores df_free_collection_rec call inside the insn traversal
loop and removes the stack allocation check in vec_reserve.  It has
been approved in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54332#c25

It has been tested on Linux/x86-64 and checked in.

Thanks.


H.J.
---
2012-08-21  H.J. Lu  hongjiu...@intel.com

PR middle-end/54332
* df-scan.c (df_bb_verify): Restore df_free_collection_rec call
inside the insn traversal loop.

* vec.h (vec_reserve): Remove the stack allocation check.

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 55492fa..df90365 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -4448,6 +4448,7 @@ df_bb_verify (basic_block bb)
   if (!INSN_P (insn))
 continue;
   df_insn_refs_verify (collection_rec, bb, insn, true);
+  df_free_collection_rec (collection_rec);
 }
 
   /* Do the artificial defs and uses.  */
diff --git a/gcc/vec.h b/gcc/vec.h
index 5fdb859..1922616 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -1099,21 +1099,9 @@ vec_reserve (vec_tT *vec_, int reserve MEM_STAT_DECL)
  sizeof (T), false
  PASS_MEM_STAT);
   else
-{
-  /* Only allow stack vectors when re-growing them.  The initial
-allocation of stack vectors must be done with the
-VEC_stack_alloc macro, because it uses alloca() for the
-allocation.  */
-  if (vec_ == NULL)
-   {
- fprintf (stderr, Stack vectors must be initially allocated 
-  with VEC_stack_alloc.\n);
- gcc_unreachable ();
-   }
-  return (vec_tT *) vec_stack_o_reserve (vec_, reserve,
-  offsetof (vec_tT, vec),
-  sizeof (T) PASS_MEM_STAT);
-}
+return (vec_tT *) vec_stack_o_reserve (vec_, reserve,
+offsetof (vec_tT, vec),
+sizeof (T) PASS_MEM_STAT);
 }
 
 


Another merge from gcc 4.7 branch to gccgo branch

2012-08-21 Thread Ian Lance Taylor
I merged revision 190574 from the gcc 4.7 branch to the gccgo branch.

Ian


Re: [PATCH] Allow dg-skip-if to use compiler flags specified through set_board_info cflags

2012-08-21 Thread Mike Stump
On Aug 11, 2012, at 10:39 AM, Senthil Kumar Selvaraj wrote:
 This patch allows cflags set in board config files using 
 set_board_info cflags to be used in the selectors of
 dg-skip-if and other dejagnu commands that use the check-flags
 proc.

Ok.


[Patch, Fortran, committed] free gfc_code of EXEC_END_PROCEDURE

2012-08-21 Thread Tobias Burnus
Background: There is currently a memory leakage cleanup in the middle 
end – and fixing PR 54332 would probably have been also easier without 
FE leaks.


I think we should join in an try to remove some leakage - and try to not 
introduce new ones.


* * *

Committed: For EXPR_END_PROCEDURE, I have committed one fix as obvious 
(- parse.c). However, I have a test case where parse_contained still 
leaks memory, possibly another, similar patch is needed in addition.


* * *

There are also plenty of leaks related to the freeing of gfc_ss. I 
attached a draft patch (trans-expr.c, trans-intrinsics.c), which is 
probably okay, but not yet regtested.


OK with a changelog (and if it regtested)?

Note: The patch is incomplete, e.g. argss of gfc_conv_procedure_call 
is not (or not always) freed. Ditto for rss of gfc_trans_assignment_1; 
ditto for lss and rss of gfc_trans_pointer_assignment



* * * * * * * * * * * *

Additionally, there is a memory leak when generating more than one 
procedure per TU: The memory which is allocated but not freed 
gfc_generate_function_code - (generate_coarray_init or 
trans_function_start) - init_function_start - prepare_function_start 
- init_emit


The memory should be feed via (backend_init_target or 
lang_dependent_init_target) - expand_dummy_function_end - 
free_after_compilation


The latter seems to operate on cfun – hence, it only frees the last 
cfun and not all.


However, despite some longer debugging (e.g. using a main program, which 
calls create_main_function), I failed to find the problem.


* * *

And module.c can also leak plenty of memory ...

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190574)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2012-08-21  Tobias Burnus  bur...@net-b.de
+
+	* parse.c (parse_contained): Include EXEC_END_PROCEDURE
+	in ns-code to make sure the gfc_code is freed.
+
 2012-08-20  Tobias Burnus  bur...@net-b.de
 
 	PR fortran/54301
Index: gcc/fortran/parse.c
===
--- gcc/fortran/parse.c	(Revision 190574)
+++ gcc/fortran/parse.c	(Arbeitskopie)
@@ -4075,6 +4075,7 @@ parse_contained (int module)
 	case ST_END_PROGRAM:
 	case ST_END_SUBROUTINE:
 	  accept_statement (st);
+	  gfc_current_ns-code = s1.head;
 	  break;
 
 	default:
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 4f7d026..cfb0862 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems)
   loop.to[0] = nelems;
   gfc_trans_scalarizing_loops (loop, loopbody);
   gfc_add_block_to_block (body, loop.pre);
+  gfc_cleanup_loop (loop);
   tmp = gfc_finish_block (body);
 }
   else
@@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2)
   if (!expr2-value.function.isym)
 	{
 	  realloc_lhs_loop_for_fcn_call (se, expr1-where, ss, loop);
+	  gfc_cleanup_loop (loop);
 	  ss-is_alloc_lhs = 1;
 	}
   else
@@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_expr * expr2)
 
   gfc_conv_function_expr (se, expr2);
   gfc_add_block_to_block (se.pre, se.post);
+  gfc_free_ss (se.ss);
 
   return gfc_finish_block (se.pre);
 }
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index fac29c7..d0aebe9 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *expr)
   argse.descriptor_only = 1;
 
   gfc_conv_expr_descriptor (argse, expr-value.function.actual-expr, ss);
+  gfc_free_ss (ss);
   gfc_add_block_to_block (se-pre, argse.pre);
   gfc_add_block_to_block (se-post, argse.post);
 


Re: [PATCH, ARM] Don't pull in unwinder for 64-bit division routines

2012-08-21 Thread Michael Hope
On 17 August 2012 07:29, Julian Brown jul...@codesourcery.com wrote:
 On Thu, 16 Aug 2012 19:56:52 +0100
 Ramana Radhakrishnan ramra...@arm.com wrote:

 On 07/24/12 13:27, Julian Brown wrote:
  On Fri, 20 Jul 2012 11:15:27 +0100
  Julian Brown jul...@codesourcery.com wrote:
 
  Anyway: this revised version of the patch removes the strange
  libgcc Makefile-fragment changes, the equivalent of which have
  since been incorporated into mainline GCC now anyway, so the patch
  is somewhat more straightforward than it was previously.
 
  Joey Ye contacted me offlist and suggested that the t-divmod-ef
  fragment might be better integrated into t-bpabi instead. Doing that
  makes the patch somewhat smaller/cleaner.
 
  Minimally re-tested, looks OK.

 The original submission makes no mention of testing ? The ARM
 specific portions look OK to me modulo no regressions.

 Thanks -- I'm sure I did test the patch, but just omitted to mention
 that fact in the mail :-O. We've also been carrying a version of this
 patch in our local source base for many years now.

Hi Julian.  The test case fails on arm-linux-gnueabi:
 http://gcc.gnu.org/ml/gcc-testresults/2012-08/msg02100.html

FAIL: gcc.target/arm/div64-unwinding.c execution test

The test aborts as _Unwind_RaiseException is not null.  _divdi3.o
itself looks fine and no longer pulls in the unwinder so I assume
something else in the environment is.  I've put the binaries up at
http://people.linaro.org/~michaelh/incoming/div64-unwinding.exe and
http://people.linaro.org/~michaelh/incoming/_divdi3.o if that helps.

-- Michael


Re: [SH] Use more multi-line asm outputs

2012-08-21 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 This mainly converts the asm outputs to multi-line strings and uses tab
 chars instead of '\\t' in the asm strings, in the hope to make stuff
 easier to read and a bit more consistent.
 Tested on rev 190546 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 OK?

OK.

Regards,
kaz


Re: [SH] PR 39423 - Add support for SH2A movu.w insn

2012-08-21 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 This adds support for SH2A's movu.w insn for memory addressing cases as
 described in the PR.
 Tested on rev 190546 with
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}
 
 and no new failures.
 OK?

OK.

Regards,
kaz


Re: [PATCH] Combine location with block using block_locations

2012-08-21 Thread Dehao Chen
On Tue, Aug 21, 2012 at 6:25 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Aug 20, 2012 at 3:18 AM, Dehao Chen de...@google.com wrote:
 ping

 Conceptually I like the change.  Can a libcpp maintainer please have a 2nd
 look?

 Dehao, did you do any compile-time and memory-usage benchmarks?

I don't have a memory benchmarks at hand. But I've tested it through
some huge apps, each of which takes more than 1 hour to build on a
modern machine. None of them had observed noticeable memory footprint
and compile time increase.

Thanks,
Dehao


 Thanks,
 Richard.

 Thanks,
 Dehao

 On Tue, Aug 14, 2012 at 10:13 AM, Dehao Chen de...@google.com wrote:
 Hi, Dodji,

 Thanks for the review. I've fixed all the addressed issues. I'm
 attaching the related changes:

 Thanks,
 Dehao

 libcpp/ChangeLog:
 2012-08-01  Dehao Chen  de...@google.com

 * include/line-map.h (MAX_SOURCE_LOCATION): New value.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (COMBINE_LOCATION_DATA): New.
 (IS_ADHOC_LOC): New.
 (expanded_location): New field.
 * line-map.c (location_adhoc_data): New.
 (location_adhoc_data_htab): New.
 (curr_adhoc_loc): New.
 (location_adhoc_data): New.
 (allocated_location_adhoc_data): New.
 (location_adhoc_data_hash): New.
 (location_adhoc_data_eq): New.
 (location_adhoc_data_update): New.
 (get_combined_adhoc_loc): New.
 (get_data_from_adhoc_loc): New.
 (get_location_from_adhoc_loc): New.
 (location_adhoc_data_init): New.
 (location_adhoc_data_fini): New.
 (linemap_lookup): Change to use new location.
 (linemap_ordinary_map_lookup): Likewise.
 (linemap_macro_map_lookup): Likewise.
 (linemap_macro_map_loc_to_def_point): Likewise.
 (linemap_macro_map_loc_unwind_toward_spel): Likewise.
 (linemap_get_expansion_line): Likewise.
 (linemap_get_expansion_filename): Likewise.
 (linemap_location_in_system_header_p): Likewise.
 (linemap_location_from_macro_expansion_p): Likewise.
 (linemap_macro_loc_to_spelling_point): Likewise.
 (linemap_macro_loc_to_def_point): Likewise.
 (linemap_macro_loc_to_exp_point): Likewise.
 (linemap_resolve_location): Likewise.
 (linemap_unwind_toward_expansion): Likewise.
 (linemap_unwind_to_first_non_reserved_loc): Likewise.
 (linemap_expand_location): Likewise.
 (linemap_dump_location): Likewise.

 Index: libcpp/line-map.c
 ===
 --- libcpp/line-map.c   (revision 190209)
 +++ libcpp/line-map.c   (working copy)
 @@ -25,6 +25,7 @@
  #include line-map.h
  #include cpplib.h
  #include internal.h
 +#include hashtab.h

  static void trace_include (const struct line_maps *, const struct line_map 
 *);
  static const struct line_map * linemap_ordinary_map_lookup (struct 
 line_maps *,
 @@ -50,6 +51,135 @@
  extern unsigned num_expanded_macros_counter;
  extern unsigned num_macro_tokens_counter;

 +/* Data structure to associate an arbitrary data to a source location.  */
 +struct location_adhoc_data {
 +  source_location locus;
 +  void *data;
 +};
 +
 +/* The following data structure encodes a location with some adhoc data
 +   and maps it to a new unsigned integer (called an adhoc location)
 +   that replaces the original location to represent the mapping.
 +
 +   The new adhoc_loc uses the highest bit as the enabling bit, i.e. if the
 +   highest bit is 1, then the number is adhoc_loc. Otherwise, it serves as
 +   the original location. Once identified as the adhoc_loc, the lower 31
 +   bits of the integer is used to index the location_adhoc_data array,
 +   in which the locus and associated data is stored.  */
 +
 +static htab_t location_adhoc_data_htab;
 +static source_location curr_adhoc_loc;
 +static struct location_adhoc_data *location_adhoc_data;
 +static unsigned int allocated_location_adhoc_data;
 +
 +/* Hash function for location_adhoc_data hashtable.  */
 +
 +static hashval_t
 +location_adhoc_data_hash (const void *l)
 +{
 +  const struct location_adhoc_data *lb =
 +  (const struct location_adhoc_data *) l;
 +  return (hashval_t) lb-locus + (size_t) lb-data;
 +}
 +
 +/* Compare function for location_adhoc_data hashtable.  */
 +
 +static int
 +location_adhoc_data_eq (const void *l1, const void *l2)
 +{
 +  const struct location_adhoc_data *lb1 =
 +  (const struct location_adhoc_data *) l1;
 +  const struct location_adhoc_data *lb2 =
 +  (const struct location_adhoc_data *) l2;
 +  return lb1-locus == lb2-locus  lb1-data == lb2-data;
 +}
 +
 +/* Update the hashtable when location_adhoc_data is reallocated.  */
 +
 +static int
 +location_adhoc_data_update (void **slot, 

Build static libgcc with hidden visibility even with --disable-shared

2012-08-21 Thread Joseph S. Myers
As discussed in
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01219.html, it is
desirable for the libgcc build with inhibit_libc defined and
--disable-shared to be similar enough to that build without
inhibit_libc and --enable-shared to be usable to build glibc,
producing the same results as if glibc were built with a toolchain
that already included a shared libgcc and was built against previously
built glibc.  One source of differences noted there was functions in
libgcc.a being hidden only if shared libgcc is also being built.

This patch changes the logic so that the way libgcc.a is built in the
static-only case is more similar to how it is built when shared libgcc
is being built as well; in particular, libgcc symbols are generally
given hidden visibility (if supported) in the static libgcc.

Tested with cross to arm-none-linux-gnueabi that it fixes the
previously observed differences; rebuilding glibc with the second GCC
now produces identical stripped binaries to the results of building
with the first (static-only) GCC, except for the cases of nscd and
static libraries which differ between multiple glibc builds even with
identical compilers (in both cases because of embedded timestamps).
Also bootstrapped with no regressions on x86_64-unknown-linux-gnu as a
sanity check.  OK to commit?

2012-08-21  Joseph Myers  jos...@codesourcery.com

* Makefile.in (vis_hide, gen-hide-list): Do not make definitions
depend on --enable-shared.
($(lib1asmfuncs-o)): Use %.vis files independent of
--enable-shared.
* static-object.mk ($(base)$(objext), $(base).vis)
($(base)_s$(objext)): Use same rules for visibility handling as in
shared-object.mk.

Index: libgcc/Makefile.in
===
--- libgcc/Makefile.in  (revision 190577)
+++ libgcc/Makefile.in  (working copy)
@@ -363,6 +363,7 @@
   ifneq ($(LIBUNWIND),)
 install-libunwind = install-libunwind
   endif
+endif
 
 # For -fvisibility=hidden.  We need both a -fvisibility=hidden on
 # the command line, and a #define to prevent libgcc2.h etc from
@@ -386,11 +387,8 @@
 gen-hide-list = echo  $@
 endif
 
-else
-# Not enable_shared.
+ifneq ($(enable_shared),yes)
 iterator = $(srcdir)/empty.mk $(patsubst 
%,$(srcdir)/static-object.mk,$(iter-items))
-vis_hide =
-gen-hide-list = echo  \$@
 endif
 
 LIB2ADD += enable-execute-stack.c
@@ -439,7 +437,6 @@
   $(LIB2_DIVMOD_FUNCS))
 
 # Build libgcc1 (assembly) components.
-ifeq ($(enable_shared),yes)
 
 lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS))
 $(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC) %.vis
@@ -451,15 +448,10 @@
 lib1asmfuncs-s-o = $(patsubst %,%_s$(objext),$(LIB1ASMFUNCS))
 $(lib1asmfuncs-s-o): %_s$(objext): $(srcdir)/config/$(LIB1ASMSRC)
$(gcc_s_compile) -DL$* -xassembler-with-cpp -c $
+ifeq ($(enable_shared),yes)
+
 libgcc-s-objects += $(lib1asmfuncs-s-o)
 
-else
-
-lib1asmfuncs-o = $(patsubst %,%$(objext),$(LIB1ASMFUNCS))
-$(lib1asmfuncs-o): %$(objext): $(srcdir)/config/$(LIB1ASMSRC)
-   $(gcc_compile) -DL$* -xassembler-with-cpp -c $
-libgcc-objects += $(lib1asmfuncs-o)
-
 endif
 
 # Build lib2funcs.  For the static library also include LIB2FUNCS_ST.
Index: libgcc/static-object.mk
===
--- libgcc/static-object.mk (revision 190577)
+++ libgcc/static-object.mk (working copy)
@@ -24,7 +24,13 @@
 endif
 endif
 
-$(base)$(objext): $o
-   $(gcc_compile) -c -xassembler-with-cpp $
+$(base)$(objext): $o $(base).vis
+   $(gcc_compile) -c -xassembler-with-cpp -include $*.vis $
 
+$(base).vis: $(base)_s$(objext)
+   $(gen-hide-list)
+
+$(base)_s$(objext): $o
+   $(gcc_s_compile) -c -xassembler-with-cpp $
+
 endif

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Build static libgcc with hidden visibility even with --disable-shared

2012-08-21 Thread Ian Lance Taylor
On Tue, Aug 21, 2012 at 5:33 PM, Joseph S. Myers
jos...@codesourcery.com wrote:

 2012-08-21  Joseph Myers  jos...@codesourcery.com

 * Makefile.in (vis_hide, gen-hide-list): Do not make definitions
 depend on --enable-shared.
 ($(lib1asmfuncs-o)): Use %.vis files independent of
 --enable-shared.
 * static-object.mk ($(base)$(objext), $(base).vis)
 ($(base)_s$(objext)): Use same rules for visibility handling as in
 shared-object.mk.

This is OK.

Thanks.

Ian


[Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode

2012-08-21 Thread Terry Guo
Hi,

Due to the impact of ARM UAL, the Thumb1 and Thumb2 mode use LSRS
instruction while the ARM mode uses MOVS instruction. So the following case
is updated accordingly. Is it OK to trunk?

BR,
Terry

2012-08-21  Terry Guo  terry@arm.com

* gcc.target/arm/combine-movs.c: Check movs for ARM mode 
and lsrs for other mode.

diff --git a/gcc/testsuite/gcc.target/arm/combine-movs.c
b/gcc/testsuite/gcc.target/arm/combine-movs.c
index 4209a33..fbef9df 100644
--- a/gcc/testsuite/gcc.target/arm/combine-movs.c
+++ b/gcc/testsuite/gcc.target/arm/combine-movs.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-skip-if  { arm_thumb1 } } */
 /* { dg-options -O }  */

 void foo (unsigned long r[], unsigned int d)
@@ -9,4 +8,5 @@ void foo (unsigned long r[], unsigned int d)
 r[i] = 0;
 }

-/* { dg-final { scan-assembler movs\tr\[0-9\] } } */
+/* { dg-final { scan-assembler movs\tr\[0-9\] { target arm_nothumb } } }
*/
+/* { dg-final { scan-assembler lsrs\tr\[0-9\] { target { ! arm_nothumb }
} } } */





Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Jan Hubicka
  I can go ahead with the histogram approach. There is some roundoff
  error from the working set scaling approach that can affect different
  merging orders as you note, although I think this only really affects the
  small counter values. The other place where saving/merging the histogram

Do you have any intuition on why simple maximalization merging (that is safe wrt
ordering) would be bad idea?

We care only about working set size around top of the histogram and I would say
that we should sort of optimize for the largest (in the number of blocks in hot
area) of the train runs.  One way where things will get messed up is when the
working set is about the same but the runs are of different size so all the
blocks gets accounted into two different buckets.

But in general I do not think there is resonably accurate way to merge the
profiles without actually streaming out all counter IDs in every bucket, so 
perhaps
this will work well enough.  If not, we can in future introduce per-program 
global
summary file that will contain the counters to be merged acurately.

  would help is when the distribution of counter values in the histogram
  varies between runs, say for example, the hottest loop is much hotter in a
  subsequent run, but the rest of the counter values stay largely consistent.
  Note, however, that if the hotspots are different in different runs, then
  merging either the histogram or the working set will have issues. The only
  way to detect this is to recompute the histogram/working set from scratch
  from the merged counters.
 
  I wonder in practice, even when there are a lot of simultaneous runs going
  like in a gcc bootstrap, if we could get reasonably accurate summary
  recomputation without global locking. The reason is that as long as the
  actual counter updates are safe as they are now due to the individual file
  locking, the inaccuracy in the recomputed summary information will not grow
  over time, and there is a reasonable chance that the last run to complete
  and merge will recompute the summary from the final merged counter values
  and get it right (or only be off by a little bit if there are a couple of
  runs completing simultaneously at the end). But this can be investigated as
  a follow on to this patch.
 
 
 
 The main concern is probably the build reproducibility in gcc bootstrap
 with FDO.

Hmm, you mean in the first pass update every file with new counters and in the 
second
pass just update the summaries?
OK, so I guess we went through
 1) two pass updating with race in between pases.
 2) two pass updating with first pass updating countes and second having race 
only for summary update.
(i.e. no races for counters)
 3) two pass updating with flocking (and some way to handle detected deadlocks)
 4) one pass updating with histogram merging + maximalization of working set.
(we do not realy need to scale the buckets, we can simply merge the 
histograms
 and then mutiply them by nruns before comparing to actual counters.
 This assumes that working sets of all runs are about the same, but should 
work
 resonably in practice I think.

I guess 3/4 are acceptable WRT bootstrap reproducibility. 

I have no experience with flocking large number of files and portability of
this solution i.e.  to Windows.  If you think that 2) would be too inaccurate
in practice and 3) has chance to be portable, we could go for this.  It will
solve the precision problems and will also work for LIPO summaries.
I would be curious about effect on profiledbootstrap time of this if you 
implement
it.

Honza
 
 David
 
 
 
 
  Thanks,
  Teresa
 
  
   
   
  2) Do we plan to add some features in near future that will
  anyway require global locking?
 I guess LIPO itself does not count since it streams its data
  into independent file as you
 mentioned earlier and locking LIPO file is not that hard.
 Does LIPO stream everything into that common file, or does it
  use combination of gcda files
 and common summary?
   
Actually, LIPO module grouping information are stored in gcda files.
It is also stored in a separate .imports file (one per object) ---
this is primarily used by our build system for dependence
  information.
   
I see, getting LIPO safe WRT parallel updates will be fun. How does
  LIPO behave
on GCC bootstrap?
  
   We have not tried gcc bootstrap with LIPO. Gcc compile time is not the
   main problem for application build -- the link time (for debug build)
   is.
 
  I was primarily curious how the LIPOs runtime analysis fare in the
  situation where
  you do very many small train runs on rather large app (sure GCC is small
  to google's
  use case ;).
  
(i.e. it does a lot more work in the libgcov module per each
invocation, so I am curious if it is practically useful at all).
   
With LTO based solution a lot can be probably pushed at link time?
  Before
actual GCC starts from the linker 

Re: [PATCH 4/4] Reduce the size of optabs representation

2012-08-21 Thread Mike Stump
On Jul 19, 2012, at 11:24 AM, Richard Henderson wrote:
 +# genopinit produces two files.
 +insn-opinit.c insn-opinit.h: s-opinit ; @true
 +s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md
 + $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \
 +   insn-conditions.md -htmp-opinit.h -ctmp-opinit.c
 + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h
 + $(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c
 + $(STAMP) s-opinit

Breaks my port without the attached patch...

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 67f1d66..bd31c9b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3484,7 +3484,7 @@ s-attrtab : $(MD_DEPS) build/genattrtab$(build_exeext) \
 # genopinit produces two files.
 insn-opinit.c insn-opinit.h: s-opinit ; @true
 s-opinit: $(MD_DEPS) build/genopinit$(build_exeext) insn-conditions.md
-   $(RUN_GEN) build/genopinit$(build_exeext) $(md_file) \
+   $(RUN_GEN) build/genopinit$(build_exeext) $(MD_INCS) $(md_file) \
  insn-conditions.md -htmp-opinit.h -ctmp-opinit.c
$(SHELL) $(srcdir)/../move-if-change tmp-opinit.h insn-opinit.h
$(SHELL) $(srcdir)/../move-if-change tmp-opinit.c insn-opinit.c


Re: [PATCH] Add working-set size and hotness information to fdo summary (issue6465057)

2012-08-21 Thread Teresa Johnson
On Tue, Aug 21, 2012 at 6:56 PM, Jan Hubicka hubi...@ucw.cz wrote:
  I can go ahead with the histogram approach. There is some roundoff
  error from the working set scaling approach that can affect different
  merging orders as you note, although I think this only really affects the
  small counter values. The other place where saving/merging the histogram

 Do you have any intuition on why simple maximalization merging (that is safe 
 wrt
 ordering) would be bad idea?

When you say maximalization merging are you talking about the
histogram merging approach I mentioned a few emails back (my response
on Aug 19) where we assume the same relative order of hotness in the
counters between runs, and accumulate the counter values in the
histogram in that order?

This would be inaccurate if different runs exercised different areas
of the code, and thus the counters would be ordered in the histogram
differently.


 We care only about working set size around top of the histogram and I would 
 say

For optimizations that care about the boundary between hot and cold
such as code layout I think we will also care about the smaller values
in the histogram (to have a good idea of what constitutes a cold block
counter value).

 that we should sort of optimize for the largest (in the number of blocks in 
 hot
 area) of the train runs.  One way where things will get messed up is when the
 working set is about the same but the runs are of different size so all the
 blocks gets accounted into two different buckets.

I'm not sure I understand the last sentence - is this something that
would not get handled by merging the histogram entries as I described
earlier? Or it sounds like you might have a different merging approach
in mind?


 But in general I do not think there is resonably accurate way to merge the
 profiles without actually streaming out all counter IDs in every bucket, so 
 perhaps
 this will work well enough.  If not, we can in future introduce per-program 
 global
 summary file that will contain the counters to be merged acurately.

Sounds good.


  would help is when the distribution of counter values in the histogram
  varies between runs, say for example, the hottest loop is much hotter in a
  subsequent run, but the rest of the counter values stay largely consistent.
  Note, however, that if the hotspots are different in different runs, then
  merging either the histogram or the working set will have issues. The only
  way to detect this is to recompute the histogram/working set from scratch
  from the merged counters.
 
  I wonder in practice, even when there are a lot of simultaneous runs going
  like in a gcc bootstrap, if we could get reasonably accurate summary
  recomputation without global locking. The reason is that as long as the
  actual counter updates are safe as they are now due to the individual file
  locking, the inaccuracy in the recomputed summary information will not grow
  over time, and there is a reasonable chance that the last run to complete
  and merge will recompute the summary from the final merged counter values
  and get it right (or only be off by a little bit if there are a couple of
  runs completing simultaneously at the end). But this can be investigated as
  a follow on to this patch.
 


 The main concern is probably the build reproducibility in gcc bootstrap
 with FDO.

 Hmm, you mean in the first pass update every file with new counters and in 
 the second
 pass just update the summaries?

Right, that's what I had in mind (what you have described in #2 below).

 OK, so I guess we went through
  1) two pass updating with race in between pases.
  2) two pass updating with first pass updating countes and second having race 
 only for summary update.
 (i.e. no races for counters)
  3) two pass updating with flocking (and some way to handle detected 
 deadlocks)
  4) one pass updating with histogram merging + maximalization of working set.
 (we do not realy need to scale the buckets, we can simply merge the 
 histograms
  and then mutiply them by nruns before comparing to actual counters.

By merging the histograms (and accumulating the counter values stored
there as we merge), I don't think we need to multiply the counter
values by nruns, do we?

  This assumes that working sets of all runs are about the same, but 
 should work
  resonably in practice I think.

 I guess 3/4 are acceptable WRT bootstrap reproducibility.

 I have no experience with flocking large number of files and portability of
 this solution i.e.  to Windows.  If you think that 2) would be too inaccurate
 in practice and 3) has chance to be portable, we could go for this.  It will
 solve the precision problems and will also work for LIPO summaries.
 I would be curious about effect on profiledbootstrap time of this if you 
 implement
 it.

I'm hoping that 2) will be accurate enough in practice, but it will
need some investigation.

Thanks,
Teresa


 Honza

 David



 
  Thanks,
  Teresa
 

[Patch, Fortran, committed] Free loop and gfc_ss data

2012-08-21 Thread Tobias Burnus

Committed as Rev. 190586 after successful regtesting.

That's the version I also had attached to 
http://gcc.gnu.org/ml/fortran/2012-08/msg00118.html; as written there:


The patch is incomplete, e.g. argss of gfc_conv_procedure_call is not 
(or not always) freed. Ditto for rss of gfc_trans_assignment_1; ditto 
for lss and rss of gfc_trans_pointer_assignment.


Tobias
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(Revision 190585)
+++ gcc/fortran/trans-expr.c	(Arbeitskopie)
@@ -533,6 +533,7 @@ gfc_copy_class_to_class (tree from, tree to, tree
   loop.to[0] = nelems;
   gfc_trans_scalarizing_loops (loop, loopbody);
   gfc_add_block_to_block (body, loop.pre);
+  gfc_cleanup_loop (loop);
   tmp = gfc_finish_block (body);
 }
   else
@@ -6770,6 +6771,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_
   if (!expr2-value.function.isym)
 	{
 	  realloc_lhs_loop_for_fcn_call (se, expr1-where, ss, loop);
+	  gfc_cleanup_loop (loop);
 	  ss-is_alloc_lhs = 1;
 	}
   else
@@ -6778,6 +6780,7 @@ gfc_trans_arrayfunc_assign (gfc_expr * expr1, gfc_
 
   gfc_conv_function_expr (se, expr2);
   gfc_add_block_to_block (se.pre, se.post);
+  gfc_free_ss (se.ss);
 
   return gfc_finish_block (se.pre);
 }
Index: gcc/fortran/trans-intrinsic.c
===
--- gcc/fortran/trans-intrinsic.c	(Revision 190585)
+++ gcc/fortran/trans-intrinsic.c	(Arbeitskopie)
@@ -1328,6 +1328,7 @@ gfc_conv_intrinsic_rank (gfc_se *se, gfc_expr *exp
   argse.descriptor_only = 1;
 
   gfc_conv_expr_descriptor (argse, expr-value.function.actual-expr, ss);
+  gfc_free_ss (ss);
   gfc_add_block_to_block (se-pre, argse.pre);
   gfc_add_block_to_block (se-post, argse.post);
 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 190585)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2012-08-22  Tobias Burnus  bur...@net-b.de
+
+	* trans-expr.c (gfc_copy_class_to_class,
+	gfc_trans_arrayfunc_assign): Free loop and ss data.
+	* trans-intrinsic.c (gfc_trans_arrayfunc_assign): Free ss data.
+
 2012-08-21  Tobias Burnus  bur...@net-b.de
 
 	* parse.c (parse_contained): Include EXEC_END_PROCEDURE