RE: [PING Updated]: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-10-08 Thread Bin Cheng
Ping.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
 Behalf Of Bin Cheng
 Sent: Tuesday, September 25, 2012 4:00 PM
 To: 'Richard Sandiford'
 Cc: Ramana Radhakrishnan; Richard Earnshaw; gcc-patches@gcc.gnu.org
 Subject: RE: [Updated]: [PATCH GCC/ARM] Fix problem that hardreg_cprop
 opportunities are missed on thumb1
 
 
  -Original Message-
  From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
  Sent: Wednesday, September 05, 2012 6:09 AM
  To: Bin Cheng
  Cc: Ramana Radhakrishnan; 'Eric Botcazou'; gcc-patches@gcc.gnu.org
  Subject: Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop
  opportunities are missed on thumb1
 
  Subtraction of zero isn't canonical rtl though.  Passes after
  peephole2
 would
  be well within their rights to simplify the expression back to a move.
  From that point of view, making the passes recognise (plus X 0) and
  (minus
 X 0)
  as special cases would be inconsistent.
 
  Rather than make the Thumb 1 CC usage implicit in the rtl stream, and
 carry
  the current state around in cfun-machine, it seems like it would be
 better to
  get md_reorg to rewrite the instructions into a form that makes the
  use of condition codes explicit.
 
  md_reorg also sounds like a better place in the pipeline than
  peephole2 to
 be
  doing this kind of transformation, although I admit I have zero
  evidence
 to
  back that up...
 
 
 Hi Richard,
 
 This is the updated patch according to your suggestions. I removed the
 peephole2 patterns and introduced function thumb1_reorg to rewrite
 instructions in md_reorg pass.
 
 In addition to missed propagation, this patch also detects following case:
   mov r5, r0
   str r0, [r4]   ---miscellaneous irrelevant instructions
   [cmp r0, 0]---saved
   bne  .Lxxx
 
 Patch tested on arm-none-eabi/cortex-m0, no regressions introduced.
 
 Is it OK?
 
 Thanks.
 
 2012-09-25  Bin Cheng  bin.ch...@arm.com
 
   * config/arm/arm.c (thumb1_reorg): New function.
   (arm_reorg): Call thumb1_reorg.
   (thumb1_final_prescan_insn): Record src operand in thumb1_cc_op0.
   * config/arm/arm.md : Remove peephole2 patterns which rewrites move
   into subtract of ZERO.





Re: Fixup INTEGER_CST

2012-10-08 Thread Richard Guenther
On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  I added a santy check that after fixup all types that lost in the merging 
  are
  really dead.  And it turns out we have some zombies around.
 
  INTEGER_CST needs special care because it is special cased by the 
  streamer.  We also
  do not want to do inplace modificaitons on it because that would corrupt 
  the hashtable
  used by tree.c's sharing code
 
  Bootstrapped/regtested x86_64-linux, OK?

 No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
 want to fixup
 them where they end up used unfixed.

 Erm, I think it is what the patch does?

Ah, indeed.

 It replaces pointers to integer_cst with type that did not survive by pointer
 to new integer_cst. (with the optimization that INTEGER_CST with overflow
 is changed in place because it is allowed to do so).

Btw ...

  @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
 LTO_FIXUP_TREE (t-type_non_common.binfo);
 
 LTO_FIXUP_TREE (TYPE_CONTEXT (t));
  +
  +  if (TREE_CODE (t) == METHOD_TYPE)
  +TYPE_METHOD_BASETYPE (t);
  +  if (TREE_CODE (t) == OFFSET_TYPE)
  +TYPE_OFFSET_BASETYPE (t);

that looks like a no-op to me ... (both are TYPE_MAXVAL which
is already fixed up).

Thus, ok with removing the above hunk.

Thanks,
Richard.

   }
 
   /* Fix up fields of a BINFO T.  */


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 3:16 AM, Jonathan Wakely jwakely@gmail.com wrote:
 On 7 October 2012 21:31, Manuel López-Ibáñez wrote:
 On 7 October 2012 22:13, Jonathan Wakely jwakely@gmail.com wrote:

 On Oct 7, 2012 12:00 AM, NightStrike nightstr...@gmail.com wrote:

 On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
 lopeziba...@gmail.com wrote:
  Hi,
 
  GCC now requires ISL and a very new CLOOG but download_prerequisites
  does not download those. Also, there is only one sensible place to

 As of what version is isl/cloog no longer optional?

 If they're really no longer optional then the prerequisites page and 4.8
 changes page need to be updated.

 The patch downloads isl and cloog unconditionally, does gcc build them
 unconditionally if they're found in the source dir?  If they are still
 optional I don't want download_prerequisites to fetch files that will slow
 down building gcc by building libs and enabling features I don't use.

 I guess they are optional in the sense that you can configure gcc to
 not require them. But the default configure in x86_64-gnu-linux
 requires isl and cloog.

 Are you sure?

 Seems to me the default is still the same as it always has been, i.e.
 Graphite optimisations can be enabled if ISL and cloog are present,
 but they're not required.  I can bootstrap without ISL anyway.

If good enough ISL and cloog are not found graphite is simply disabled
unless you explicitely enabled it via specifying either of ISL or cloog
configury.

Richard.


Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Richard Guenther
On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
 The following patch speeds LRA up more on PR54146.  Below times for
 compilation of the test on gcc17.fsffrance.org (an AMD machine):

 Before:
 real=1214.71 user=1192.05 system=22.48
 After:
 real=1144.37 user=1124.31 system=20.11

 Hi Vlad,

 The next bottle-neck in my timings is in
 lra-eliminate.c:lra_eliminate(), in this loop:

FOR_EACH_BB (bb)
  FOR_BB_INSNS_SAFE (bb, insn, temp)
{
if (bitmap_bit_p (insns_with_changed_offsets, INSN_UID (insn)))
   process_insn_for_elimination (insn, final_p);
}

 The problem is in bitmap_bit_p. Random access to a large bitmap can be
 very slow.

 I'm playing with a patch to expand the insns_with_changed_offsets
 bitmap to an sbitmap, and will send a patch if this works better.

Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Richard.

 Ciao!
 Steven


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen de...@google.com wrote:
 Attached is the updated patch. Yes, if we add a VRP pass before
 profile pass, this patch would be unnecessary. Should we add a VRP
 pass?

No, we don't want VRP in early optimizations.

Richard.

 Thanks,
 Dehao

 On Sat, Oct 6, 2012 at 9:38 AM, Jan Hubicka hubi...@ucw.cz wrote:
 ping^2

 Honza, do you think this patch can make into 4.8 stage 1?

 +  if (check_value_one ^ integer_onep (val))

 Probably better as !=
 (especially because GNU coding standard allows predicates to return more than
 just boolean)


 +{
 +  edge e1;
 +  edge_iterator ei;
 +  tree val = gimple_phi_arg_def (phi_stmt, i);
 +  edge e = gimple_phi_arg_edge (phi_stmt, i);
 +
 +  if (!TREE_CONSTANT (val) || !(integer_zerop (val) || integer_onep 
 (val)))
 +   continue;
 +  if (check_value_one ^ integer_onep (val))
 +   continue;
 +  if (VEC_length (edge, e-src-succs) != 1)
 +   {
 + if (!predicted_by_p (exit_edge-src, PRED_LOOP_ITERATIONS_GUESSED)
 +  !predicted_by_p (exit_edge-src, PRED_LOOP_ITERATIONS)
 +  !predicted_by_p (exit_edge-src, PRED_LOOP_EXIT))
 +   predict_edge_def (e, PRED_LOOP_EXIT, NOT_TAKEN);
 + continue;
 +   }
 +
 +  FOR_EACH_EDGE (e1, ei, e-src-preds)
 +   if (!predicted_by_p (exit_edge-src, PRED_LOOP_ITERATIONS_GUESSED)
 +!predicted_by_p (exit_edge-src, PRED_LOOP_ITERATIONS)
 +!predicted_by_p (exit_edge-src, PRED_LOOP_EXIT))
 + predict_edge_def (e1, PRED_LOOP_EXIT, NOT_TAKEN);

 Here you found an edge that you know is going to terminate the loop
 and you want to predict all paths to this edge as unlikely.
 Perhaps you want to use predict paths leading_to_edge for edge?

 You do not need to check PRED_LOOP_ITERATIONS and 
 PRED_LOOP_ITERATIONS_GUESSED
 because those never go to the non-exit edges.

 The nature of predict_paths_for_bb type heuristic is that they are not really
 additive: if the path leads to two different aborts it does not make it more
 sure that it will be unlikely.  So perhaps you can add !predicted_by_p (e, 
 pred)
 prior predict_edge_def call in the function?

 I wonder if we did VRP just before branch predction to jump thread the 
 shortcut
 condtions into loopback edges, would be there still cases where this
 optimization will match?

 Honza


[Patch ARM] Fix that miss DMB instruction for ARMv6-M

2012-10-08 Thread Terry Guo
Hi,

When running libstdc++ regression test on Cortex-M0, the case 49445.cc fails
with error message:

/tmp/ccMqZdgc.o: In function `std::atomicfloat::load(std::memory_order)
const':^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/tmp/ccMqZdgc.o: In function `std::atomictacos::load(std::memory_order)
const':^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
/home/build/work/GCC-4-7-build/build-native/gcc-final/arm-none-eabi/armv6-m/
libstdc++-v3/include/atomic:202: undefined reference to
`__sync_synchronize'^M
collect2: error: ld returned 1 exit status^M
compiler exited with status 1

After investigation, the reason is current gcc doesn't think armv6-m has DMB
instruction. While according to ARM manuals, it has. With this wrong
assumption, the expand_mem_thread_fence will generate a call to library
function __sync_synchronize rather than DMB instruction. While no code to
implement this library function, so the error generates.

The attached patch intends to fix this issue by letting gcc also think
armv6-m has DMB instruction. Is it OK to trunk?

BR,
Terry

2012-10-08  Terry Guo  terry@arm.com

* config/arm/arm.c (arm_arch6m): New variable to denote armv6-m
architecture.
* config/arm/arm.h (TARGET_HAVE_DMB): The armv6-m also has DMB
instruction.



armv6m-dmb.patch
Description: Binary data


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-08 Thread Manuel López-Ibáñez
On 8 October 2012 09:18, Richard Guenther richard.guent...@gmail.com wrote:
 On Mon, Oct 8, 2012 at 3:16 AM, Jonathan Wakely jwakely@gmail.com wrote:
 On 7 October 2012 21:31, Manuel López-Ibáñez wrote:
 On 7 October 2012 22:13, Jonathan Wakely jwakely@gmail.com wrote:

 On Oct 7, 2012 12:00 AM, NightStrike nightstr...@gmail.com wrote:

 On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
 lopeziba...@gmail.com wrote:
  Hi,
 
  GCC now requires ISL and a very new CLOOG but download_prerequisites
  does not download those. Also, there is only one sensible place to

 As of what version is isl/cloog no longer optional?

 If they're really no longer optional then the prerequisites page and 4.8
 changes page need to be updated.

 The patch downloads isl and cloog unconditionally, does gcc build them
 unconditionally if they're found in the source dir?  If they are still
 optional I don't want download_prerequisites to fetch files that will slow
 down building gcc by building libs and enabling features I don't use.

 I guess they are optional in the sense that you can configure gcc to
 not require them. But the default configure in x86_64-gnu-linux
 requires isl and cloog.

 Are you sure?

 Seems to me the default is still the same as it always has been, i.e.
 Graphite optimisations can be enabled if ISL and cloog are present,
 but they're not required.  I can bootstrap without ISL anyway.

 If good enough ISL and cloog are not found graphite is simply disabled
 unless you explicitely enabled it via specifying either of ISL or cloog
 configury.

As I said, this didn't work for me, after trying quite a few things
(not specifying anything, using with-ils/with-cloog, build cloog/isl
in several ways...). I could try to reproduce the issues and open PRs,
but it doesn't seem worth the time. My advice would be: use the script
or disable graphite, and be happy. In any case, I am not going to
commit the patch, I'll keep it local. Anyone feel free to take it and
do what you wish with it. I think there are some nice parts even if
cloog and isl are removed.


[testsuite] Minor housekeeping work

2012-10-08 Thread Eric Botcazou
Recent tests added to gcc.dg/tree-ssa don't clean up after themselves.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2012-10-08  Eric Botcazou  ebotca...@adacore.com

* gcc.dg/tree-ssa/slsr-30.c: Use correct cleanup directive.
* gcc.dg/tree-ssa/attr-hotcold-2.c: Likewise.
* gcc.dg/tree-ssa/ldist-21.c: Add missing cleanup directive.


-- 
Eric BotcazouIndex: gcc.dg/tree-ssa/slsr-30.c
===
--- gcc.dg/tree-ssa/slsr-30.c	(revision 192137)
+++ gcc.dg/tree-ssa/slsr-30.c	(working copy)
@@ -21,4 +21,4 @@ f (int s, long c)
 }
 
 /* { dg-final { scan-tree-dump-times  \\*  3 dom2 } } */
-/* { dg-final { cleanup-tree-dump optimized } } */
+/* { dg-final { cleanup-tree-dump dom2 } } */
Index: gcc.dg/tree-ssa/attr-hotcold-2.c
===
--- gcc.dg/tree-ssa/attr-hotcold-2.c	(revision 192137)
+++ gcc.dg/tree-ssa/attr-hotcold-2.c	(working copy)
@@ -25,4 +25,4 @@ void f(int x, int y)
the testcase around too much.  */
 /* { dg-final { scan-ipa-dump-times block 5, loop depth 0, count 0, freq \[6-9\]\[0-9\]\[0-9\]\[0-9\] 1 profile_estimate } } */
 
-/* { dg-final { cleanup-tree-dump profile_estimate } } */
+/* { dg-final { cleanup-ipa-dump profile_estimate } } */
Index: gcc.dg/tree-ssa/ldist-21.c
===
--- gcc.dg/tree-ssa/ldist-21.c	(revision 192137)
+++ gcc.dg/tree-ssa/ldist-21.c	(working copy)
@@ -9,3 +9,4 @@ void bar(char *p, int n)
 }
 
 /* { dg-final { scan-tree-dump generated memmove ldist } } */
+/* { dg-final { cleanup-tree-dump ldist } } */


Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Jakub Jelinek
On Mon, Oct 08, 2012 at 09:20:47AM +0200, Richard Guenther wrote:
 On Sun, Oct 7, 2012 at 11:27 PM, Steven Bosscher stevenb@gmail.com 
 wrote:
  The next bottle-neck in my timings is in
  lra-eliminate.c:lra_eliminate(), in this loop:
 
 FOR_EACH_BB (bb)
   FOR_BB_INSNS_SAFE (bb, insn, temp)
 {
 if (bitmap_bit_p (insns_with_changed_offsets, INSN_UID (insn)))
process_insn_for_elimination (insn, final_p);
 }
 
  The problem is in bitmap_bit_p. Random access to a large bitmap can be
  very slow.
 
  I'm playing with a patch to expand the insns_with_changed_offsets
  bitmap to an sbitmap, and will send a patch if this works better.
 
 Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

Or use temporarily some rtx flag on the insns, from what I can see,
in_struct on *INSN is right now only used during scheduling and from reorg
till eoc, so for LRA sitting in between both scheduling passes it might
be possible to use that bit too.

Jakub


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-10-08 Thread Richard Guenther
On Fri, 5 Oct 2012, Steven Bosscher wrote:

 On Fri, Sep 14, 2012 at 2:26 PM, Richard Guenther rguent...@suse.de wrote:
  If you can figure out a better name for the function we should
  probably move it to cfganal.c
 
 It looks like my previous e-mail about this appears to have gone got
 somehow, so retry:
 
 Your my_rev_post_order_compute is simply inverted_post_order_compute.
 The only difference is that you'll want to ignore EXIT_BLOCK, which is
 always added to the list by inverted_post_order_compute.

Indeed.  inverted_post_order_compute seems to handle a CFG without
infinite-loop and noreturns connected to exit though.  Possibly
that's why it doesn't care for not handling entry/exit.

I'm testing a patch to use inverted_post_order_compute from PRE.

Richard.


Re: vec_cond_expr adjustments

2012-10-08 Thread Richard Guenther
On Fri, Oct 5, 2012 at 5:01 PM, Marc Glisse marc.gli...@inria.fr wrote:
 [I am still a little confused, sorry for the long email...]


 On Tue, 2 Oct 2012, Richard Guenther wrote:

 +  if (TREE_CODE (op0) == VECTOR_CST  TREE_CODE (op1) == VECTOR_CST)
 +{
 +  int count = VECTOR_CST_NELTS (op0);
 +  tree *elts =  XALLOCAVEC (tree, count);
 +  gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
 +
 +  for (int i = 0; i  count; i++)
 +   {
 + tree elem_type = TREE_TYPE (type);
 + tree elem0 = VECTOR_CST_ELT (op0, i);
 + tree elem1 = VECTOR_CST_ELT (op1, i);
 +
 + elts[i] = fold_relational_const (code, elem_type,
 +  elem0, elem1);
 +
 + if(elts[i] == NULL_TREE)
 +   return NULL_TREE;
 +
 + elts[i] = fold_negate_const (elts[i], elem_type);



 I think you need to invent something new similar to STORE_FLAG_VALUE
 or use STORE_FLAG_VALUE here.  With the above you try to map
 {0, 1} to {0, -1} which is only true if the operation on the element
 types
 returns {0, 1} (thus, STORE_FLAG_VALUE is 1).


 Er, seems to me that constant folding of a scalar comparison in the
 front/middle-end only returns {0, 1}.

 [and later]

 I'd say adjust your fold-const patch to not negate the scalar result
 but build a proper -1 / 0 value based on integer_zerop().


 I don't mind doing it that way, but I would like to understand first.
 LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So negating
 should be the same as testing with integer_zerop to build -1 or 0. Is it
 just a matter of style (then I am ok), or am I missing a reason which makes
 the negation wrong?

Just a matter of style.  Negating is a lot less descriptive for the actual
set of return values we produce.

 The point is we need to define some semantics for vector comparison
 results.


 Yes. I think a documentation patch should come first: generic.texi is
 missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't mention
 vectors. But before that we need to decide what to put there...


 One variant is to make it target independent which in turn
 would inhibit (or make it more difficult) to exploit some target features.
 You for example use {0, -1} for truth values - probably to exploit target
 features -


 Actually it was mostly because that is the meaning in the language. OpenCL
 says that ab is a vector of 0 and -1, and that ?: only looks at the MSB of
 the elements in the condition. The fact that it matches what some targets do
 is a simple consequence of the fact that OpenCL was based on what hardware
 already did.

Yes, it seems that the {0, -1} choice is most reasonable for GENERIC.  So
let's document that.


 even though the most natural middle-end way would be to
 use {0, 1} as for everything else


 I agree that it would be natural and convenient in a number of places.


 (caveat: there may be both signed and unsigned bools, we don't allow
 vector components with non-mode precision, thus you could argue that a
 signed bool : 1 is just sign-extended for your solution).


 Not sure how that would translate in the code.


 A different variant is to make it target dependent to leverage
 optimization opportunities


 That's an interesting possibility...


 that's why STORE_FLAG_VALUE exists.


 AFAICS it only appears when we go from gimple to rtl, not before (and there
 is already a VECTOR_STORE_FLAG_VALUE, although no target defines it). Which
 doesn't mean we couldn't make it appear earlier for vectors.


 For example with vector comparisons a  v result, when
 performing bitwise operations on it, you either have to make the target
 expand code to produce {0, -1} even if the natural compare instruction
 would, say, produce {0, 0x8} - or not constrain the possible values
 of its result (like forwprop would do with your patch).  In general we
 want constant folding to yield the same results as if the HW carried
 out the operation to make -O0 code not diverge from -O1.  Thus,

 v4si g;
 int main() { g = { 1, 2, 3, 4 }  { 4, 3, 2, 1}; }

 should not assign different values to g dependent on constant propagation
 performed or not.


 That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
 your target likes it or not. Depending on how things are handled,
 comparisons could be constrained internally to only appear (possibly
 indirectly) in the first argument of a vec_cond_expr.

Yes, I realized that later.


 The easiest way out is something like STORE_FLAG_VALUE
 if there does not exist a middle-end choice for vector true / false
 components
 that can be easily generated from what the target produces.

 Like if you perform a FP comparison

 int main () { double x = 1.0; static _Bool b; b = x  3.0; }

 you get without CCP on x86_64:

ucomisd -8(%rbp), %xmm0
seta%al
movb%al, b.1715(%rip)

 thus the equivalent of

flag_reg = x  3.0;
b = flag_reg ? 1 : 0;


 where 

Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Jan Hubicka
 On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen de...@google.com wrote:
  Attached is the updated patch. Yes, if we add a VRP pass before
  profile pass, this patch would be unnecessary. Should we add a VRP
  pass?
 
 No, we don't want VRP in early optimizations.

I am not quite sure about that.  VRP
 1) makes branch prediction work better by doing jump threading early
 2) is, after FRE, most effective tree pass on removing code by my profile
statistics.

But that would require more analysis.
The patch is OK.
Honza


Re: patch to fix constant math - second small patch

2012-10-08 Thread Richard Guenther
On Sat, Oct 6, 2012 at 12:48 AM, Kenneth Zadeck
zad...@naturalbridge.com wrote:
 This patch adds machinery to genmodes.c so that largest possible sizes of
 various data structures can be determined at gcc build time.  These
 functions create 3 symbols that are available in insn-modes.h:
 MAX_BITSIZE_MODE_INT - the bitsize of the largest int.
 MAX_BITSIZE_MODE_PARTIAL_INT - the bitsize of the largest partial int.
 MAX_BITSIZE_MODE_ANY_INT - the largest bitsize of any kind of int.

Ok.  Please document these macros in rtl.texi.

Richard.


Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Guenther
On Sat, Oct 6, 2012 at 5:55 PM, Kenneth Zadeck zad...@naturalbridge.com wrote:
 This is the third patch in the series of patches to fix constant math.
 this one changes some predicates at the rtl level to use the new predicate
 CONST_SCALAR_INT_P.
 I did not include a few that were tightly intertwined with other changes.

 Not all of these changes are strictly mechanical.   Richard, when reviewing
 this had me make additional changes to remove what he thought were latent
 bugs at the rtl level.   However, it appears that the bugs were not latent.
 I do not know what is going on here but i am smart enough to not look a gift
 horse in the mouth.

 All of this was done on the same machine with no changes and identical
 configs.  It is an x86-64 with ubuntu 12-4.

 ok for commit?

Patch missing, but if it's just mechanical changes and introduction
of CONST_SCALAR_INT_P consider it pre-approved.

Richard.

 in the logs below, gbBaseline is a trunk from friday and the gbWide is the
 same revision but with my patches.  Some of this like gfortran.dg/pr32627 is
 obviously flutter, but the rest does not appear to be.

 =
 heracles:~/gcc(13) gccBaseline/contrib/compare_tests
 gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log
 New tests that PASS:

 gcc.dg/builtins-85.c scan-assembler mysnprintf
 gcc.dg/builtins-85.c scan-assembler-not __chk_fail
 gcc.dg/builtins-85.c (test for excess errors)


 heracles:~/gcc(14) gccBaseline/contrib/compare_tests
 gbBaseline/gcc/testsuite/gfortran/gfortran.log
 gbWide/gcc/testsuite/gfortran/gfortran.log
 New tests that PASS:

 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test for
 excess errors)
 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess errors)
 gfortran.dg/pr32627.f03  -Os  (test for excess errors)
 gfortran.dg/pr32635.f  -O0  execution test
 gfortran.dg/pr32635.f  -O0  (test for excess errors)
 gfortran.dg/substr_6.f90  -O2  (test for excess errors)

 Old tests that passed, that have disappeared: (Eeek!)

 gfortran.dg/pr32627.f03  -O1  (test for excess errors)
 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops
 -finline-functions  (test for excess errors)
 gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
 gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
 Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as
 tool-and-target-specific interface file.

 === g++ Summary ===

 # of expected passes49793
 # of expected failures284
 # of unsupported tests601

 runtest completed at Fri Oct  5 16:10:20 2012
 heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using
 /usr/share/dejagnu/config/unix.exp as generic interface file for target.
 Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as
 tool-and-target-specific interface file.

 === g++ Summary ===

 # of expected passes50472
 # of expected failures284
 # of unsupported tests613

 runtest completed at Fri Oct  5 19:51:50 2012







Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka
 On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
  On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka hubi...@ucw.cz wrote:
   Hi,
   I added a santy check that after fixup all types that lost in the 
   merging are
   really dead.  And it turns out we have some zombies around.
  
   INTEGER_CST needs special care because it is special cased by the 
   streamer.  We also
   do not want to do inplace modificaitons on it because that would corrupt 
   the hashtable
   used by tree.c's sharing code
  
   Bootstrapped/regtested x86_64-linux, OK?
 
  No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
  want to fixup
  them where they end up used unfixed.
 
  Erm, I think it is what the patch does?
 
 Ah, indeed.
 
  It replaces pointers to integer_cst with type that did not survive by 
  pointer
  to new integer_cst. (with the optimization that INTEGER_CST with overflow
  is changed in place because it is allowed to do so).
 
 Btw ...
 
   @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
  LTO_FIXUP_TREE (t-type_non_common.binfo);
  
  LTO_FIXUP_TREE (TYPE_CONTEXT (t));
   +
   +  if (TREE_CODE (t) == METHOD_TYPE)
   +TYPE_METHOD_BASETYPE (t);
   +  if (TREE_CODE (t) == OFFSET_TYPE)
   +TYPE_OFFSET_BASETYPE (t);
 
 that looks like a no-op to me ... (both are TYPE_MAXVAL which
 is already fixed up).

Ah, indeed.  They were result of experimenting with the stale pointers to the
obsoletted types and field decls.  I now understand where they come from.  The
reason is twofold.

  1) after merging records we replace field decls in the cache
 by new ones.  This however does not mean that they die, because
 the existing pointers to them are not replaced.
 I have WIP patch for that that however require one extra pass
 over the list of all trees.
  2) As we query the type_hash while we are rewritting the types,
 we run into instability of the hashtable. This manifests itself
 as an ICE when one adds sanity check that while merging function
 types their arg types are equivalent, too.
 This ICEs compiling i.e. sqlite but I did not really managed to
 reduce this.  I tracked it down to the argument type being inserted
 into gimple_type_hash but at the time we query the new argument type,
 the original is no longer found despite their hashes are equivalent.
 The problem is hidden when things fit into the leader cache,
 so one needs rather big testcase.

So I tried to register all gimple types first.  Use TREE_VISITED within
the merging code to mark that type is not a leader and then TREE_CHAIN 
to point to the leader.  This avoids need to re-query the hashtable
from the later fixups.  We only look for types with TREEE_VISITED
and replace them by TREE_CHAIN.
This has two passes.  First we compute the main variants and mark
field_decls and type_decls for merging and in last pass we finally do
fixup on what remained in the table.

This allows me to poison pointers in the removed types in a way
so the GGC would ICE if they stayed reachable.
I however need the extra pass because
 1) I can not update the type_decls/field_decls while registering
types or I run into the hash table problems
 2) I can not merge the second two passes because at the time
I find type/field decls equialent there may be earlier pointers
to them.

Honza


Re: Check that unlinked uses do not contain ssa-names when renaming.

2012-10-08 Thread Richard Guenther
On Sun, Oct 7, 2012 at 12:44 PM, Tom de Vries tom_devr...@mentor.com wrote:
 Richard,

 attached patch checks that unlinked uses do not contain ssa-names when 
 renaming.

 This assert triggers when compiling (without the fix) the PR54735 example.

 AFAIU, it was due to chance that we caught the PR54735 bug by hitting the
 verification failure, because the new vdef introduced by renaming happened to 
 be
 the same name as the ssa name referenced in the invalid unlinked use (in terms
 of maybe_replace_use: rdef == use).

 The assert from this patch catches all cases that an unlinked use contains an
 ssa-name.

 Bootstrapped and reg-tested on x86_64 (Ada inclusive).

 OK for trunk?

I don't think that is exactly what we should assert here ... (I thought about
adding checking myself ...).  What we'd want to assert is that before
any new DEF is registered (which may re-allocate an SSA name) that
no uses with SSA_NAME_IN_FREELIST appear.  Thus, a light verification
pass would be necessary at the beginning of update_ssa
(which I queued onto my TODO list ...).  We'd want that anyway to for
example catch the case where a non-virtual operand is partially renamed.

Thanks,
Richard.

 Thanks,
 - Tom

 2012-10-07  Tom de Vries  t...@codesourcery.com

 * tree-into-ssa.c (maybe_replace_use): Add assert.


Re: patch to fix constant math

2012-10-08 Thread Richard Guenther
On Sun, Oct 7, 2012 at 4:58 PM, Kenneth Zadeck zad...@naturalbridge.com wrote:

 On 10/07/2012 09:19 AM, Richard Guenther wrote:

 In fact, you could argue that the tree level did it wrong (not that i am
 suggesting to change this).   But it makes me think what was going on
  when
 the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI
  was
 made.   For that there is an implication that it could never take more
  than
 a HWI since no place in the code even checks TREE_INT_CST_HIGH for
  these.

 Well - on the tree level we now always have two HWIs for all INTEGER_CSTs.
 If
 we can, based on the size of the underlying mode, reduce that to one
 HWI we already
 win something.  If we add an explicit length to allow a smaller
 encoding for larger modes
 (tree_base conveniently has an available 'int' for this ...) then we'd
 win in more cases.
 Thus, is CONST_INT really necessarily better than optimized CONST_WIDE_INT
 storage?

 i have to admit, that looking at these data structures gives me a headache.
 This all looks like something that Rube Goldberg would have invented had he
 done object oriented design  (Richard S did not know who Rube Goldberg when
 i mentioned this name to him a few years ago since this is an American
 thing, but the british had their own equivalent and I assume the germans do
 too.).

 i did the first cut of changing the rtl level structure and Richard S threw
 up on it and suggested what is there now, which happily (for me) i was able
 to get mike to implement.

 mike also did the tree level version of the data structures for me.   i will
 make sure he used that left over length field.

 The bottom line is that you most likely just save the length, but that is a
 big percent of something this small.  Both of rtl ints have a mode, so if we
 can make that change later, it will be space neutral.

Yes.

Btw, as for Richards idea of conditionally placing the length field in
rtx_def looks like overkill to me.  These days we'd merely want to
optimize for 64bit hosts, thus unconditionally adding a 32 bit
field to rtx_def looks ok to me (you can wrap that inside a union to
allow both descriptive names and eventual different use - see what
I've done to tree_base)

Richard.


Re: vec_cond_expr adjustments

2012-10-08 Thread Marc Glisse

On Mon, 8 Oct 2012, Richard Guenther wrote:


VEC_COND_EXPR is more complicated. We could for instance require that it
takes as first argument a vector of -1 and 0 (thus 0, !=0 and the neon
thing are equivalent). Which would leave to decide what the expansion of
vec_cond_expr passes to the targets when the first argument is not a
comparison, between !=0, 0, ==-1 or others (I vote for 0 because of
opencl). One issue is that targets wouldn't know if it was a dummy
comparison that can safely be ignored because the other part is the result
of logical operations on comparisons (thus composed of -1 and 0) or a
genuine comparison with an arbitrary vector, so a new optimization would be
needed (in the back-end I guess or we would need an alternate instruction to
vcond) to detect if a vector is a signed boolean vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics
and looks at the MSB of each element. I am not sure that would change much,
it would mostly delay the apparition of 0 to RTL expansion time (and thus
make gimple slightly lighter).


I think we should delay the decision on how to optimize this.  It's indeed
not trivial and the GIMPLE middle-end aggressively forwards feeding
comparisons into the VEC_COND_EXPR expressions already (somewhat
defeating any CSE that might be possible here) in forwprop.


Thanks for going through the long email :-)

What does that imply for the first argument of VEC_COND_EXPR? Currently, 
the expander asserts that it is a comparison, but that is not reflected in 
the gimple checkers.


If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the 
case for a comparison), I don't think it prevents from later relaxing that 
to 0 or !=0. But then I don't know how to handle expansion when the 
argument is neither a comparison (vcond) nor a constant (vec_merge? I 
haven't tried but that should be doable), I would have to pass 0 or !=0 
to the target. So is the best choice to document that VEC_COND_EXPR takes 
as first argument a comparison and make gimple checking reflect that? 
(seems sad, but at least that would tell me what I can/can't do)


By the way, since we are documenting comparisons as returning 0 and -1, 
does that bring back the integer_truep predicate?


--
Marc Glisse


Re: Fixup INTEGER_CST

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 11:18 AM, Jan Hubicka hubi...@ucw.cz wrote:
 On Sun, Oct 7, 2012 at 7:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
  On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka hubi...@ucw.cz wrote:
   Hi,
   I added a santy check that after fixup all types that lost in the 
   merging are
   really dead.  And it turns out we have some zombies around.
  
   INTEGER_CST needs special care because it is special cased by the 
   streamer.  We also
   do not want to do inplace modificaitons on it because that would 
   corrupt the hashtable
   used by tree.c's sharing code
  
   Bootstrapped/regtested x86_64-linux, OK?
 
  No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
  want to fixup
  them where they end up used unfixed.
 
  Erm, I think it is what the patch does?

 Ah, indeed.

  It replaces pointers to integer_cst with type that did not survive by 
  pointer
  to new integer_cst. (with the optimization that INTEGER_CST with overflow
  is changed in place because it is allowed to do so).

 Btw ...

   @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
  LTO_FIXUP_TREE (t-type_non_common.binfo);
  
  LTO_FIXUP_TREE (TYPE_CONTEXT (t));
   +
   +  if (TREE_CODE (t) == METHOD_TYPE)
   +TYPE_METHOD_BASETYPE (t);
   +  if (TREE_CODE (t) == OFFSET_TYPE)
   +TYPE_OFFSET_BASETYPE (t);

 that looks like a no-op to me ... (both are TYPE_MAXVAL which
 is already fixed up).

 Ah, indeed.  They were result of experimenting with the stale pointers to the
 obsoletted types and field decls.  I now understand where they come from.  The
 reason is twofold.

   1) after merging records we replace field decls in the cache
  by new ones.  This however does not mean that they die, because
  the existing pointers to them are not replaced.
  I have WIP patch for that that however require one extra pass
  over the list of all trees.

Yes, I think this is also why we do

  /* ???  Not sure the above is all relevant in this
 path canonicalizing TYPE_FIELDS to that of the
 main variant.  */
  if (ix  i)
lto_fixup_types (f2);
  streamer_tree_cache_insert_at (cache, f1, ix);

something I dislike as well and something we should try to address in a
more formal way.

   2) As we query the type_hash while we are rewritting the types,
  we run into instability of the hashtable. This manifests itself
  as an ICE when one adds sanity check that while merging function
  types their arg types are equivalent, too.
  This ICEs compiling i.e. sqlite but I did not really managed to
  reduce this.  I tracked it down to the argument type being inserted
  into gimple_type_hash but at the time we query the new argument type,
  the original is no longer found despite their hashes are equivalent.
  The problem is hidden when things fit into the leader cache,
  so one needs rather big testcase.

Ugh.  For reduction you can disable those caches though.  The above
means there is a disconnect between hashing and comparing.
Maybe it's something weird with the early out

  if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
goto same_types;
?

 So I tried to register all gimple types first.  Use TREE_VISITED within
 the merging code to mark that type is not a leader and then TREE_CHAIN
 to point to the leader.  This avoids need to re-query the hashtable
 from the later fixups.  We only look for types with TREEE_VISITED
 and replace them by TREE_CHAIN.

TREE_CHAIN is unused for types?  But we probably shouldn't add a new
use ...

 This has two passes.  First we compute the main variants and mark
 field_decls and type_decls for merging and in last pass we finally do
 fixup on what remained in the table.

 This allows me to poison pointers in the removed types in a way
 so the GGC would ICE if they stayed reachable.
 I however need the extra pass because
  1) I can not update the type_decls/field_decls while registering
 types or I run into the hash table problems
  2) I can not merge the second two passes because at the time
 I find type/field decls equialent there may be earlier pointers
 to them.

You need to merge all trees reachable from the one you start at once
(what I'm working on from time to time - work per tree SCC, in a DFS
walk).

Richard.

 Honza


Re: vec_cond_expr adjustments

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 11:34 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 8 Oct 2012, Richard Guenther wrote:

 VEC_COND_EXPR is more complicated. We could for instance require that it
 takes as first argument a vector of -1 and 0 (thus 0, !=0 and the neon
 thing are equivalent). Which would leave to decide what the expansion of
 vec_cond_expr passes to the targets when the first argument is not a
 comparison, between !=0, 0, ==-1 or others (I vote for 0 because of
 opencl). One issue is that targets wouldn't know if it was a dummy
 comparison that can safely be ignored because the other part is the
 result
 of logical operations on comparisons (thus composed of -1 and 0) or a
 genuine comparison with an arbitrary vector, so a new optimization would
 be
 needed (in the back-end I guess or we would need an alternate instruction
 to
 vcond) to detect if a vector is a signed boolean vector.
 We could instead say that vec_cond_expr really follows OpenCL's semantics
 and looks at the MSB of each element. I am not sure that would change
 much,
 it would mostly delay the apparition of 0 to RTL expansion time (and
 thus
 make gimple slightly lighter).


 I think we should delay the decision on how to optimize this.  It's indeed
 not trivial and the GIMPLE middle-end aggressively forwards feeding
 comparisons into the VEC_COND_EXPR expressions already (somewhat
 defeating any CSE that might be possible here) in forwprop.


 Thanks for going through the long email :-)

 What does that imply for the first argument of VEC_COND_EXPR? Currently, the
 expander asserts that it is a comparison, but that is not reflected in the
 gimple checkers.

And I don't think we should reflect that in the gimple checkers rather fixup the
expander (transparently use p != 0 or p  0).

 If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the
 case for a comparison), I don't think it prevents from later relaxing that
 to 0 or !=0. But then I don't know how to handle expansion when the
 argument is neither a comparison (vcond) nor a constant (vec_merge? I
 haven't tried but that should be doable), I would have to pass 0 or !=0 to
 the target.

Yes.

 So is the best choice to document that VEC_COND_EXPR takes as
 first argument a comparison and make gimple checking reflect that? (seems
 sad, but at least that would tell me what I can/can't do)

No, that would just mean that in GIMPLE you'd add this p != 0 or p  0.
And at some point in the future I really really want to push this embedded
expression to a separate statement so I have a SSA definition for it.

 By the way, since we are documenting comparisons as returning 0 and -1, does
 that bring back the integer_truep predicate?

Not sure, true would still be != 0 or all_onesp (all bits of the
precision are 1), no?

Richard.

 --
 Marc Glisse


Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka
2) As we query the type_hash while we are rewritting the types,
   we run into instability of the hashtable. This manifests itself
   as an ICE when one adds sanity check that while merging function
   types their arg types are equivalent, too.
   This ICEs compiling i.e. sqlite but I did not really managed to
   reduce this.  I tracked it down to the argument type being inserted
   into gimple_type_hash but at the time we query the new argument type,
   the original is no longer found despite their hashes are equivalent.
   The problem is hidden when things fit into the leader cache,
   so one needs rather big testcase.
 
 Ugh.  For reduction you can disable those caches though.  The above
 means there is a disconnect between hashing and comparing.
 Maybe it's something weird with the early out
 
   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
 goto same_types;
 ?

Well, the problem goes away when you process all types before changing, so I
think it really is instability of hash table computation. But I am not sure how
to test for it.
Even disabling the caching and recomputing after gimple_register_type leads
to different results.
 
  So I tried to register all gimple types first.  Use TREE_VISITED within
  the merging code to mark that type is not a leader and then TREE_CHAIN
  to point to the leader.  This avoids need to re-query the hashtable
  from the later fixups.  We only look for types with TREEE_VISITED
  and replace them by TREE_CHAIN.
 
 TREE_CHAIN is unused for types?  But we probably shouldn't add a new
 use ...

It is used, but unused for type merging.  
 /* Nodes are chained together for many purposes.
   Types are chained together to record them for being output to the debugger
   (see the function `chain_type'). */

We know that types that lost merging will not be used later, so we can
overwrite pointers we don't need.

When one removes the type from variant list during registering, one can
also use TYPE_MAIN_VARIANT, for example.
 
  This has two passes.  First we compute the main variants and mark
  field_decls and type_decls for merging and in last pass we finally do
  fixup on what remained in the table.
 
  This allows me to poison pointers in the removed types in a way
  so the GGC would ICE if they stayed reachable.
  I however need the extra pass because
   1) I can not update the type_decls/field_decls while registering
  types or I run into the hash table problems
   2) I can not merge the second two passes because at the time
  I find type/field decls equialent there may be earlier pointers
  to them.
 
 You need to merge all trees reachable from the one you start at once
 (what I'm working on from time to time - work per tree SCC, in a DFS
 walk).

Yep, doing things per-SCC is definitely good idea. 

It will also give a chance to improve the hash itself.  If you process in SCC
order you know that all references outside SCC have already leaders set and you
can hash their addresses rather than using the weak hash.

I would really love to see this done.  After updating Mozilla we now need 10GB
of RAM and about 18 minutes for merging (they merged in new JIT that aparently
plays badly with our types). This makes any development/testing difficult.

Honza


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Jan Hubicka
 On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka hubi...@ucw.cz wrote:
  On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen de...@google.com wrote:
   Attached is the updated patch. Yes, if we add a VRP pass before
   profile pass, this patch would be unnecessary. Should we add a VRP
   pass?
 
  No, we don't want VRP in early optimizations.
 
  I am not quite sure about that.  VRP
   1) makes branch prediction work better by doing jump threading early
 
 Well ... but jump threading may need basic-block duplication which may
 increase code size.  Also VRP and FRE have pass ordering issues.
 
   2) is, after FRE, most effective tree pass on removing code by my profile
  statistics.
 
 We also don't have DSE in early opts.  I don't want to end up with the
 situation that we do everything in early opts ... we should do _less_ there
 (but eventually iterate properly when processing cycles).

Yep, i am not quite sure about most sane variant.  Missed simple jump threading
in early opts definitely confuse both profile estimate and inline size
estimates.  But I am also not thrilled by adding more passes to early opts at
all.  Also last time I looked into this, CCP missed a lot of CCP oppurtunities
making VRP to artifically look like more useful.

Have patch that bit improves profile updating after jump threading (i.e.
re-does the profile for simple cases), but still jump threading is the most
common case for profile become inconsistent after expand.

On a related note, with -fprofile-report I can easilly track how much of code
each pass in the queue removed.  I was thinking about running this on Mozilla
and -O1 and removing those passes that did almost nothing.  Those are mostly
re-run passes, both at Gimple and RTL level. Our passmanager is not terribly
friendly for controlling pass per-repetition.

With introduction of -Og pass queue, do you think introducing -O1 pass queue
for late tree passes (that will be quite short) is sane? What about RTL
level?  I guess we can split the queues for RTL optimizations, too.
All optimizations passes prior register allocation are sort of optional
and I guess there are also -Og candidates.

I hoever find the 3 times duplicated queues bit uncool, too, but I guess
it is most compatible with PM organization.

At -O3 the most effective passes on combine.c
are:

cfg (because of cfg cleanup) -1.5474%
Early inlning -0.4991%
FRE -7.9369%
VRP -0.9321% (if run early), ccp does -0.2273%
tailr -0.5305%

After IPA
copyrename -2.2850% (it packs cleanups after inlining)
forwprop -0.5432%
VRP -0.9700% (if rerun after early passes, otherwise it is about 2%)
PRE -2.4123%
DOM -0.5182%

RTL passes
into_cfglayout -3.1400% (i.e. first cleanup_cfg)
fwprop1 -3.0467%
cprop -2.7786%
combine -3.3346%
IRA -3.4912% (i.e. the cost model preffers hard regs)
bbro -0.9765%

The numbers on tramp3d and LTO cc1 binary and not that different.
Honza


RE: [Patch] Fix PR53397

2012-10-08 Thread Kumar, Venkataramanan
Hi Richard,

I have incorporated your comments. 

 Yes, call dump_mem_ref then, instead of repeating parts of its body.

Reference object  is not yet created at the place we check for invariance. It 
is still a tree expression.  I created a common function and used at all places 
to dump the step, base and delta values of  memory reference being 
analyzed.

Please find the modified patch attached.

GCC regression make check -k passes with x86_64-unknown-linux-gnu.

Regards,
Venkat.

-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com] 
Sent: Thursday, October 04, 2012 6:26 PM
To: Kumar, Venkataramanan
Cc: Richard Guenther; gcc-patches@gcc.gnu.org
Subject: Re: [Patch] Fix PR53397

On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan 
venkataramanan.ku...@amd.com wrote:
 Hi Richi,

 (Snip)
 + (!cst_and_fits_in_hwi (step))
 +{
 +  if( loop-inner != NULL)
 +{
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 +{
 +  fprintf (dump_file, Reference %p:\n, (void *) ref);
 +  fprintf (dump_file, (base  );
 +  print_generic_expr (dump_file, base, TDF_SLIM);
 +  fprintf (dump_file, , step );
 +  print_generic_expr (dump_file, step, TDF_TREE);
 +  fprintf (dump_file, )\n);

 No need to repeat this - all references are dumped when we gather them.
 (Snip)

 The dumping happens at record_ref which is called after these statements to 
 record these references.

 When the step is invariant  we return from the function without recording the 
 references.

  so I thought of dumping the references here.

 Is there a cleaner way to dump the references at one place?

Yes, call dump_mem_ref then, instead of repeating parts of its body.

Richard.

 Regards,
 Venkat.



 -Original Message-
 From: Richard Guenther [mailto:rguent...@suse.de]
 Sent: Tuesday, October 02, 2012 5:42 PM
 To: Kumar, Venkataramanan
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [Patch] Fix PR53397

 On Mon, 1 Oct 2012, venkataramanan.ku...@amd.com wrote:

 Hi,

 The below patch fixes the FFT/Scimark regression caused by useless 
 prefetch generation.

 This fix tries to make prefetch less aggressive by prefetching arrays 
 in the inner loop, when the step is invariant in the entire loop nest.

 GCC currently tries to prefetch invariant steps when they are in the 
 inner loop. But does not check if the step is variant in outer loops.

 In the scimark FFT case, the trip count of the inner loop varies by a 
 non constant step, which is invariant in the inner loop.
 But the step variable is varying in outer loop. This makes inner loop 
 trip count small (at run time varies sometimes as small as 1
 iteration)

 Prefetching ahead x iteration when the inner loop trip count is 
 smaller than x leads to useless prefetches.

 Flag used: -O3 -march=amdfam10

 Before
 **  **
 ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
 ** for details. (Results can be submitted to p...@nist.gov) **
 **  **
 Using   2.00 seconds min time per kenel.
 Composite Score:  550.50
 FFT Mflops:38.66(N=1024)
 SOR Mflops:   617.61(100 x 100)
 MonteCarlo: Mflops:   173.74
 Sparse matmult  Mflops:   675.63(N=1000, nz=5000)
 LU  Mflops:  1246.88(M=100, N=100)


 After
 **  **
 ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
 ** for details. (Results can be submitted to p...@nist.gov) **
 **  **
 Using   2.00 seconds min time per kenel.
 Composite Score:  639.20
 FFT Mflops:   479.19(N=1024)
 SOR Mflops:   617.61(100 x 100)
 MonteCarlo: Mflops:   173.18
 Sparse matmult  Mflops:   679.13(N=1000, nz=5000)
 LU  Mflops:  1246.88(M=100, N=100)

 GCC regression make check -k passes with x86_64-unknown-linux-gnu 
 New tests that PASS:

 gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c 
 scan-tree-dump aprefetch Issued prefetch
 gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c 
 scan-tree-dump aprefetch loop variant step
 gcc.dg/pr53397-2.c scan-tree-dump aprefetch Not prefetching
 gcc.dg/pr53397-2.c (test for excess errors)


 Checked CPU2006 and polyhedron on latest AMD processor, no regressions noted.

 Ok to commit in trunk?

 regards,
 Venkat

 gcc/ChangeLog
 +2012-10-01  Venkataramanan Kumar  venkataramanan.ku...@amd.com
 +
 +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
 +   Perform non constant step prefetching in inner loop, only $
 +   when it is invariant in the entire loop nest.  $
 +   * testsuite/gcc.dg/pr53397-1.c: New test case $
 +   Checks we are prefecthing for loop invariant 

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-08 Thread Richard Guenther
On Sat, Oct 6, 2012 at 11:34 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
 I benchmarked the patch moving loop header copying and it is quite noticeable 
 win.

 Some testsuite updating is needed. In many cases it is just because the
 optimizations are now happening earlier.
 There are however few testusite failures I have torubles to deal with:
 ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c scan-tree-dump-times 
 vrp1 Threaded jump 3
 ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c 
 scan-tree-dump-times vrp1 Jumps threaded: 1 1
 ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c 
 scan-tree-dump-times vect vectorized 1 loops 2
 ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98  
 scan-tree-dump-times vrp1 if  1
 ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11  
 scan-tree-dump-times vrp1 if  1

 This is mostly about VRP losing its ability to thread some jumps from the
 duplicated loop header out of the loop across the loopback edge.  This seems 
 to
 be due to loop updating logic.  Do we care about these?

Yes, I think so.  At least we care that the optimized result is the same.

Can you elaborate on due to loop updating logic?

Can you elaborate on the def_split_header_continue_p change?  Which probably
should be tested and installed separately?

Thanks,
Richard.

 Honza

 Index: tree-ssa-threadupdate.c
 ===
 *** tree-ssa-threadupdate.c (revision 192123)
 --- tree-ssa-threadupdate.c (working copy)
 *** static bool
 *** 846,854 
   def_split_header_continue_p (const_basic_block bb, const void *data)
   {
 const_basic_block new_header = (const_basic_block) data;
 !   return (bb != new_header
 !  (loop_depth (bb-loop_father)
 ! = loop_depth (new_header-loop_father)));
   }

   /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
 --- 846,860 
   def_split_header_continue_p (const_basic_block bb, const void *data)
   {
 const_basic_block new_header = (const_basic_block) data;
 !   const struct loop *l;
 !
 !   if (bb == new_header
 !   || loop_depth (bb-loop_father)  loop_depth 
 (new_header-loop_father))
 ! return false;
 !   for (l = bb-loop_father; l; l = loop_outer (l))
 ! if (l == new_header-loop_father)
 !   return true;
 !   return false;
   }

   /* Thread jumps through the header of LOOP.  Returns true if cfg changes.
 Index: testsuite/gcc.dg/unroll_2.c
 ===
 *** testsuite/gcc.dg/unroll_2.c (revision 192123)
 --- testsuite/gcc.dg/unroll_2.c (working copy)
 ***
 *** 1,5 
   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
 ! /* { dg-options -O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
 -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
 -fenable-rtl-loop2_unroll } */

   unsigned a[100], b[100];
   inline void bar()
 --- 1,5 
   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
 ! /* { dg-options -O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
 -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
 -fenable-rtl-loop2_unroll -fno-tree-dominator-opts } */

   unsigned a[100], b[100];
   inline void bar()
 Index: testsuite/gcc.dg/unroll_3.c
 ===
 *** testsuite/gcc.dg/unroll_3.c (revision 192123)
 --- testsuite/gcc.dg/unroll_3.c (working copy)
 ***
 *** 1,5 
   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
 ! /* { dg-options -O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
 -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo 
 } */

   unsigned a[100], b[100];
   inline void bar()
 --- 1,5 
   /* { dg-do compile  { target i?86-*-linux* x86_64-*-linux* } } */
 ! /* { dg-options -O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
 -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo 
 -fno-tree-dominator-opts } */

   unsigned a[100], b[100];
   inline void bar()
 Index: testsuite/gcc.dg/torture/pr23821.c
 ===
 *** testsuite/gcc.dg/torture/pr23821.c  (revision 192123)
 --- testsuite/gcc.dg/torture/pr23821.c  (working copy)
 ***
 *** 1,9 
   /* { dg-do compile } */
   /* { dg-skip-if  { *-*-* } { -O0 -fno-fat-lto-objects } {  } } */
 ! /* At -O1 DOM threads a jump in a non-optimal way which leads to
  the bogus propagation.  */
 ! /* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 ! /* { dg-options -fdump-tree-ivcanon-details } */

   int a[199];

 --- 1,8 
   /* { dg-do compile } */
   /* { dg-skip-if  { *-*-* } { -O0 -fno-fat-lto-objects } {  } } */
 ! /* DOM threads a jump in a non-optimal way which leads to
  the bogus propagation.  */
 ! /* { dg-options -fdump-tree-ivcanon-details -fno-tree-dominator-opts } */

   int a[199];

 

[C++ Patch/RFC] PR 54194

2012-10-08 Thread Paolo Carlini

Hi,

in this PR submitter points out that in the -Wparentheses warning, for, eg,

char in[4]={0}, out[6];
out[1] = in[1]  0x0F | ((in[3]  0x3C)  2);

warning: suggest parentheses around arithmetic in operand of ‘|’ 
[-Wparentheses]


the caret points to end of the expression, ie the final closing 
parenthesis, which is rather misleading, because the problem is actually 
in the first operand of '|'. Ideally I guess one would like to somehow 
point to that first operand, but our infrastructure (shared with the C 
front-end, at the moment) isn't really ready to do that, and probably we 
would like to use a range (more than a caret) below the whole first 
operand (the problem isn't really with  per se). Considering also what 
we are already doing elsewhere, it seems to me that a straightforward 
and good improvement is obtained by passing to warn_about_parentheses 
the location of the outer operand (together with its code), as per the 
attached patchlet: then in the example the caret points to the actual 
'|' operator mentioned in the error message. Post 4.8.0 we can imagine 
further improvements...


What do you think?

Thanks,
Paolo.

///

Index: cp/typeck.c
===
--- cp/typeck.c (revision 192130)
+++ cp/typeck.c (working copy)
@@ -3630,7 +3630,8 @@ build_x_binary_op (location_t loc, enum tree_code
!error_operand_p (arg2)
(code != LSHIFT_EXPR
  || !CLASS_TYPE_P (TREE_TYPE (arg1
-warn_about_parentheses (code, arg1_code, orig_arg1, arg2_code, orig_arg2);
+warn_about_parentheses (loc, code, arg1_code, orig_arg1,
+   arg2_code, orig_arg2);
 
   if (processing_template_decl  expr != error_mark_node)
 return build_min_non_dep (code, expr, orig_arg1, orig_arg2);
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 192130)
+++ c-family/c-common.c (working copy)
@@ -10428,7 +10428,7 @@ warn_array_subscript_with_type_char (tree index)
was enclosed in parentheses.  */
 
 void
-warn_about_parentheses (enum tree_code code,
+warn_about_parentheses (location_t loc, enum tree_code code,
enum tree_code code_left, tree arg_left,
enum tree_code code_right, tree arg_right)
 {
@@ -10449,26 +10449,26 @@ void
 {
 case LSHIFT_EXPR:
   if (code_left == PLUS_EXPR || code_right == PLUS_EXPR)
-   warning (OPT_Wparentheses,
-suggest parentheses around %+% inside %%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around %+% inside %%);
   else if (code_left == MINUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
-suggest parentheses around %-% inside %%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around %-% inside %%);
   return;
 
 case RSHIFT_EXPR:
   if (code_left == PLUS_EXPR || code_right == PLUS_EXPR)
-   warning (OPT_Wparentheses,
-suggest parentheses around %+% inside %%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around %+% inside %%);
   else if (code_left == MINUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
-suggest parentheses around %-% inside %%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around %-% inside %%);
   return;
 
 case TRUTH_ORIF_EXPR:
   if (code_left == TRUTH_ANDIF_EXPR || code_right == TRUTH_ANDIF_EXPR)
-   warning (OPT_Wparentheses,
-suggest parentheses around %% within %||%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around %% within %||%);
   return;
 
 case BIT_IOR_EXPR:
@@ -10476,18 +10476,19 @@ void
  || code_left == PLUS_EXPR || code_left == MINUS_EXPR
  || code_right == BIT_AND_EXPR || code_right == BIT_XOR_EXPR
  || code_right == PLUS_EXPR || code_right == MINUS_EXPR)
-   warning (OPT_Wparentheses,
+   warning_at (loc, OPT_Wparentheses,
 suggest parentheses around arithmetic in operand of %|%);
   /* Check cases like x|y==z */
   else if (TREE_CODE_CLASS (code_left) == tcc_comparison
   || TREE_CODE_CLASS (code_right) == tcc_comparison)
-   warning (OPT_Wparentheses,
+   warning_at (loc, OPT_Wparentheses,
 suggest parentheses around comparison in operand of %|%);
   /* Check cases like !x | y */
   else if (code_left == TRUTH_NOT_EXPR
!APPEARS_TO_BE_BOOLEAN_EXPR_P (code_right, arg_right))
-   warning (OPT_Wparentheses, suggest parentheses around operand of 
-%!% or change %|% to %||% or %!% to %~%);
+   warning_at (loc, OPT_Wparentheses,
+   suggest parentheses around operand of 
+ 

[wwwdocs,avr]: Deprecate/remove -mshort-calls, --with-avrlibc is default

2012-10-08 Thread Georg-Johann Lay
Applied the following changes to 4.7/4.8 release notes caveats.



Index: htdocs/gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.127
retrieving revision 1.128
diff -u -p -r1.127 -r1.128
--- htdocs/gcc-4.7/changes.html	20 Sep 2012 06:34:08 -	1.127
+++ htdocs/gcc-4.7/changes.html	8 Oct 2012 08:54:49 -	1.128
@@ -107,6 +107,10 @@
   has been enhanced.  As a result, all objects contributing to an
   application must either be compiled with GCC versions up to 4.6.x or
   with GCC versions 4.7.0 or later./li
+
+liThe AVR port's code-mshort-calls/code command line option has
+  been deprecated.  It will be removed in the GCC 4.8 release.
+  See code-mrelax/code for a replacement./li
   
 liThe ARM port's code-mwords-little-endian/code option has
 been deprecated.  It will be removed in a future release./li
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.39
retrieving revision 1.40
diff -u -p -r1.39 -r1.40
--- htdocs/gcc-4.8/changes.html	6 Oct 2012 22:20:07 -	1.39
+++ htdocs/gcc-4.8/changes.html	8 Oct 2012 10:07:33 -	1.40
@@ -38,6 +38,18 @@ explicit use of vector types may be inco
 built with older versions of GCC.  Auto-vectorized code is not affected
 by this change./p
 
+pOn AVR, support has been removed for the command line
+  option code-mshort-calls/code deprecated in GCC 4.7./p
+
+pOn AVR, the configure option code--with-avrlibc/code supported since
+  GCC 4.7.2 is turned on per default for all non-RTEMS configurations.
+  This option arranges for a better integration of
+  a href=http://www.nongnu.org/avr-libc/;AVR Libc/a with avr-gcc.
+  For technical details, see a href=http://gcc.gnu.org/PR54461;PR54461/a.
+  To turn off the option in non-RTEMS configurations, use
+  code--with-avrlibc=no/code.  If the compiler is configured for
+  RTEMS, the option is always turned off./p
+
 h2General Optimizer Improvements (and Changes)/h2
 
   ul


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 12:01 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Mon, Oct 8, 2012 at 11:04 AM, Jan Hubicka hubi...@ucw.cz wrote:
  On Mon, Oct 8, 2012 at 4:50 AM, Dehao Chen de...@google.com wrote:
   Attached is the updated patch. Yes, if we add a VRP pass before
   profile pass, this patch would be unnecessary. Should we add a VRP
   pass?
 
  No, we don't want VRP in early optimizations.
 
  I am not quite sure about that.  VRP
   1) makes branch prediction work better by doing jump threading early

 Well ... but jump threading may need basic-block duplication which may
 increase code size.  Also VRP and FRE have pass ordering issues.

   2) is, after FRE, most effective tree pass on removing code by my profile
  statistics.

 We also don't have DSE in early opts.  I don't want to end up with the
 situation that we do everything in early opts ... we should do _less_ there
 (but eventually iterate properly when processing cycles).

 Yep, i am not quite sure about most sane variant.  Missed simple jump 
 threading
 in early opts definitely confuse both profile estimate and inline size
 estimates.  But I am also not thrilled by adding more passes to early opts at
 all.  Also last time I looked into this, CCP missed a lot of CCP oppurtunities
 making VRP to artifically look like more useful.

Eh .. that shouldn't happen.  Do you have testcases by any chance?
I used to duplicate each SSA propagator pass and checked -fdump-statistics-stats
for that the 2nd pass does nothing (thus chaining CCP doesn't improve results).
But maybe that's not the issue you run into here?

 Have patch that bit improves profile updating after jump threading (i.e.
 re-does the profile for simple cases), but still jump threading is the most
 common case for profile become inconsistent after expand.

 On a related note, with -fprofile-report I can easilly track how much of code
 each pass in the queue removed.  I was thinking about running this on Mozilla
 and -O1 and removing those passes that did almost nothing.  Those are mostly
 re-run passes, both at Gimple and RTL level. Our passmanager is not terribly
 friendly for controlling pass per-repetition.

Sure.  You can also more thorougly instrument passes and use
-fdump-statistics for that (I've done that), but we usually have testcases
that require that each pass that still is there is present ...

 With introduction of -Og pass queue, do you think introducing -O1 pass queue
 for late tree passes (that will be quite short) is sane?

Yes.  I don't like the dump-file naming mess that results though, but if
we want to support optimized attribute switching between -O1 and -O2
then I guess we have to live with that ...

Originally I wanted to base -Og on -O1 (thus have them mostly share the
pass queue) and retain the same pass queue for -O2 and -Os.  Maybe
that's what we eventually want to do.  Thus, add a (off for -Og) loop
optimizer sub-pass to the queue and schedule some scalar cleanups
after it but inside it.

 What about RTL
 level?  I guess we can split the queues for RTL optimizations, too.
 All optimizations passes prior register allocation are sort of optional
 and I guess there are also -Og candidates.

Yes.  Though I first wanted to see actual issues with the RTL optimizers
and -Og.

 I hoever find the 3 times duplicated queues bit uncool, too, but I guess
 it is most compatible with PM organization.

Indeed ;)  We should at least try to share the queues for -Og and -O1.

 At -O3 the most effective passes on combine.c
 are:

 cfg (because of cfg cleanup) -1.5474%
 Early inlning -0.4991%
 FRE -7.9369%
 VRP -0.9321% (if run early), ccp does -0.2273%

I think VRP has the advantage of taking loop iteration counts into account.
Maybe we can add sth similar to CCP.  It's sad that VRP is too expensive,
it really is a form of CCP so merging both passes would be best (we can
at a single point, add_equivalence, turn off equivalence processing - the most
expensive part of VRP, and call that CCP ...).

 tailr -0.5305%

 After IPA
 copyrename -2.2850% (it packs cleanups after inlining)
 forwprop -0.5432%
 VRP -0.9700% (if rerun after early passes, otherwise it is about 2%)
 PRE -2.4123%
 DOM -0.5182%

 RTL passes
 into_cfglayout -3.1400% (i.e. first cleanup_cfg)
 fwprop1 -3.0467%
 cprop -2.7786%
 combine -3.3346%
 IRA -3.4912% (i.e. the cost model preffers hard regs)
 bbro -0.9765%

 The numbers on tramp3d and LTO cc1 binary and not that different.

Yes.

Richard.

 Honza


Re: [Patch] Fix PR53397

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 12:01 PM, Kumar, Venkataramanan
venkataramanan.ku...@amd.com wrote:
 Hi Richard,

 I have incorporated your comments.

 Yes, call dump_mem_ref then, instead of repeating parts of its body.

 Reference object  is not yet created at the place we check for invariance. It 
 is still a tree expression.  I created a common function and used at all 
 places to dump the step, base and delta values of  memory reference 
 being analyzed.

 Please find the modified patch attached.

 GCC regression make check -k passes with x86_64-unknown-linux-gnu.

I presume also bootstrapped.

Ok.

Thanks,
Richard.

 Regards,
 Venkat.

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Thursday, October 04, 2012 6:26 PM
 To: Kumar, Venkataramanan
 Cc: Richard Guenther; gcc-patches@gcc.gnu.org
 Subject: Re: [Patch] Fix PR53397

 On Tue, Oct 2, 2012 at 6:40 PM, Kumar, Venkataramanan 
 venkataramanan.ku...@amd.com wrote:
 Hi Richi,

 (Snip)
 + (!cst_and_fits_in_hwi (step))
 +{
 +  if( loop-inner != NULL)
 +{
 +  if (dump_file  (dump_flags  TDF_DETAILS))
 +{
 +  fprintf (dump_file, Reference %p:\n, (void *) ref);
 +  fprintf (dump_file, (base  );
 +  print_generic_expr (dump_file, base, TDF_SLIM);
 +  fprintf (dump_file, , step );
 +  print_generic_expr (dump_file, step, TDF_TREE);
 +  fprintf (dump_file, )\n);

 No need to repeat this - all references are dumped when we gather them.
 (Snip)

 The dumping happens at record_ref which is called after these statements 
 to record these references.

 When the step is invariant  we return from the function without recording 
 the references.

  so I thought of dumping the references here.

 Is there a cleaner way to dump the references at one place?

 Yes, call dump_mem_ref then, instead of repeating parts of its body.

 Richard.

 Regards,
 Venkat.



 -Original Message-
 From: Richard Guenther [mailto:rguent...@suse.de]
 Sent: Tuesday, October 02, 2012 5:42 PM
 To: Kumar, Venkataramanan
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [Patch] Fix PR53397

 On Mon, 1 Oct 2012, venkataramanan.ku...@amd.com wrote:

 Hi,

 The below patch fixes the FFT/Scimark regression caused by useless
 prefetch generation.

 This fix tries to make prefetch less aggressive by prefetching arrays
 in the inner loop, when the step is invariant in the entire loop nest.

 GCC currently tries to prefetch invariant steps when they are in the
 inner loop. But does not check if the step is variant in outer loops.

 In the scimark FFT case, the trip count of the inner loop varies by a
 non constant step, which is invariant in the inner loop.
 But the step variable is varying in outer loop. This makes inner loop
 trip count small (at run time varies sometimes as small as 1
 iteration)

 Prefetching ahead x iteration when the inner loop trip count is
 smaller than x leads to useless prefetches.

 Flag used: -O3 -march=amdfam10

 Before
 **  **
 ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
 ** for details. (Results can be submitted to p...@nist.gov) **
 **  **
 Using   2.00 seconds min time per kenel.
 Composite Score:  550.50
 FFT Mflops:38.66(N=1024)
 SOR Mflops:   617.61(100 x 100)
 MonteCarlo: Mflops:   173.74
 Sparse matmult  Mflops:   675.63(N=1000, nz=5000)
 LU  Mflops:  1246.88(M=100, N=100)


 After
 **  **
 ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **
 ** for details. (Results can be submitted to p...@nist.gov) **
 **  **
 Using   2.00 seconds min time per kenel.
 Composite Score:  639.20
 FFT Mflops:   479.19(N=1024)
 SOR Mflops:   617.61(100 x 100)
 MonteCarlo: Mflops:   173.18
 Sparse matmult  Mflops:   679.13(N=1000, nz=5000)
 LU  Mflops:  1246.88(M=100, N=100)

 GCC regression make check -k passes with x86_64-unknown-linux-gnu
 New tests that PASS:

 gcc.dg/pr53397-1.c scan-assembler prefetcht0 gcc.dg/pr53397-1.c
 scan-tree-dump aprefetch Issued prefetch
 gcc.dg/pr53397-1.c (test for excess errors) gcc.dg/pr53397-2.c
 scan-tree-dump aprefetch loop variant step
 gcc.dg/pr53397-2.c scan-tree-dump aprefetch Not prefetching
 gcc.dg/pr53397-2.c (test for excess errors)


 Checked CPU2006 and polyhedron on latest AMD processor, no regressions 
 noted.

 Ok to commit in trunk?

 regards,
 Venkat

 gcc/ChangeLog
 +2012-10-01  Venkataramanan Kumar  venkataramanan.ku...@amd.com
 +
 +   * tree-ssa-loop-prefetch.c (gather_memory_references_ref):$
 +   Perform non constant step prefetching in inner loop, only $
 +   

Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher
On Mon, Oct 8, 2012 at 10:18 AM, Jakub Jelinek ja...@redhat.com wrote:
  I'm playing with a patch to expand the insns_with_changed_offsets
  bitmap to an sbitmap, and will send a patch if this works better.

 Or make insns_with_changed_offsets a VEC of insns (or a pointer-set).

 Or use temporarily some rtx flag on the insns, from what I can see,
 in_struct on *INSN is right now only used during scheduling and from reorg
 till eoc, so for LRA sitting in between both scheduling passes it might
 be possible to use that bit too.

AFAICT neither of these ideas will work because only insn UIDs are
used when computing insns_with_changed_offsets. You'd need the actual
insn for a VEC, pointer map or flag. Also, with a VEC or pointer map,
it's difficult to union of the the insn_bitmap sets.

The patch I have for this uses an sbitmap, it's posted in a new thread
starting here:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00698.html

Ciao!
Steven


Re: [lra] patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher
On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
 * lra-lives.c (lra_start_point_ranges, lra_finish_point_ranges):
 Remove.
 (process_bb_lives): Change start regno in
 EXECUTE_IF_SET_IN_BITMAP.  Iterate on DF_LR_IN (bb) instead of
 pseudos_live_through_calls.

This can be done a bit better still by checking whether the
pseudos_live_through_calls set is empty:

* lra-lives.c (process_bb_lives): At the top of a basic block, break
from the loop over pseudos_live_through_calls if the set is empty.

--- lra-lives.c.orig   2012-10-08 12:24:10.0 +0200
+++ lra-lives.c2012-10-08 12:26:07.0 +0200
@@ -751,8 +751,12 @@ process_bb_lives (basic_block bb)
 mark_pseudo_dead (i);

   EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi)
-if (sparseset_bit_p (pseudos_live_through_calls, j))
-  check_pseudos_live_through_calls (j);
+{
+  if (sparseset_cardinality (pseudos_live_through_calls) == 0)
+   break;
+  if (sparseset_bit_p (pseudos_live_through_calls, j))
+   check_pseudos_live_through_calls (j);
+}

   incr_curr_point (freq);
 }


This test is extremely cheap (the load for the cardinality test
re-used by sparseset_bit_p) and it cuts down the time spent in live
range chains even further (especially e.g. for blocks that don't
contain calls).

OK for the branch if it passes bootstrap+testing on x86_64-unknown-linux-gnu?

Ciao!
Steven


[RFC] Implement load sinking in loops

2012-10-08 Thread Eric Botcazou
Hi,

we recently noticed that, even at -O3, the compiler doesn't figure out that 
the following loop is dumb:

#define SIZE 64

int foo (int v[])
{
  int r;

  for (i = 0; i  SIZE; i++)
r = v[i];

  return r;
}

which was a bit of a surprise.  On second thoughts, this isn't entirely 
unexpected, as it probably matters only for (slightly) pathological cases.
The attached patch nevertheless implements a form of load sinking in loops so 
as to optimize these cases.  It's combined with invariant motion to optimize:

int foo (int v[], int a)
{
  int r, i;

  for (i = 0; i  SIZE; i++)
r = v[i] + a;

  return r;
}

and with store sinking to optimize:

int foo (int v1[], int v2[])
{
  int r[SIZE];
  int i, j;

  for (j = 0; j  SIZE; j++)
for (i = 0; i  SIZE; i++)
  r[j] = v1[j] + v2[i];

  return r[SIZE - 1];
}

The optimization is enabled at -O2 in the patch for measurement purposes but, 
given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap, 
compiler-only, all languages except Go), it's probably best suited to -O3.
Or perhaps we don't care and it should simply be dropped...  Thoughts?

Tested on x86_64-suse-linux.


2012-10-08  Eric Botcazou  ebotca...@adacore.com

* gimple.h (gsi_insert_seq_on_edge_before): Declare.
* gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
* tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
(mem_ref_in_stmt): Remove gcc_assert.
(copy_load_and_single_use_chain): New function.
(execute_lm): Likewise.
(hoist_memory_references): Hoist the loads after the stores.
(ref_always_accessed_p): Rename into...
(ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
(can_lsm_ref_p): New function extracted from...
(can_sm_ref_p): ...here.  Call it.
(follow_invariant_single_use_chain): New function.
(can_lm_ref_p): Likewise.
(find_refs_for_sm): Rename into..
(find_refs_for_lsm): ...this.  Find load hoisting opportunities.
(loop_suitable_for_sm): Rename into...
(loop_suitable_for_lsm): ...this.
(store_motion_loop): Rename into...
(load_store_motion_loop): ...this.  Adjust calls to above functions.
(tree_ssa_lim): Likewise.


2012-10-08  Eric Botcazou  ebotca...@adacore.com

* gcc.dg/tree-ssa/loadmotion-1.c: New test.
* gcc.dg/tree-ssa/loadmotion-2.c: New test.
* gcc.dg/tree-ssa/loadmotion-3.c: New test.


-- 
Eric BotcazouIndex: gimple.h
===
--- gimple.h	(revision 192137)
+++ gimple.h	(working copy)
@@ -5196,6 +5196,7 @@ void gsi_move_before (gimple_stmt_iterat
 void gsi_move_to_bb_end (gimple_stmt_iterator *, basic_block);
 void gsi_insert_on_edge (edge, gimple);
 void gsi_insert_seq_on_edge (edge, gimple_seq);
+void gsi_insert_seq_on_edge_before (edge, gimple_seq);
 basic_block gsi_insert_on_edge_immediate (edge, gimple);
 basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
 void gsi_commit_one_edge_insert (edge, basic_block *);
Index: gimple-iterator.c
===
--- gimple-iterator.c	(revision 192137)
+++ gimple-iterator.c	(working copy)
@@ -677,6 +677,16 @@ gsi_insert_seq_on_edge (edge e, gimple_s
   gimple_seq_add_seq (PENDING_STMT (e), seq);
 }
 
+/* Likewise, but append it instead of prepending it.  */
+
+void
+gsi_insert_seq_on_edge_before (edge e, gimple_seq seq)
+{
+  gimple_seq pending = NULL;
+  gimple_seq_add_seq (pending, seq);
+  gimple_seq_add_seq (pending, PENDING_STMT (e));
+  PENDING_STMT (e) = pending;
+}
 
 /* Insert the statement pointed-to by GSI into edge E.  Every attempt
is made to place the statement in an existing basic block, but
Index: tree-ssa-loop-im.c
===
--- tree-ssa-loop-im.c	(revision 192137)
+++ tree-ssa-loop-im.c	(working copy)
@@ -103,6 +103,7 @@ typedef struct mem_ref_loc
 {
   tree *ref;			/* The reference itself.  */
   gimple stmt;			/* The statement in that it occurs.  */
+  tree lhs;			/* The (ultimate) LHS for a load.  */
 } *mem_ref_loc_p;
 
 DEF_VEC_P(mem_ref_loc_p);
@@ -674,7 +675,6 @@ mem_ref_in_stmt (gimple stmt)
 
   if (!mem)
 return NULL;
-  gcc_assert (!store);
 
   hash = iterative_hash_expr (*mem, 0);
   ref = (mem_ref_p) htab_find_with_hash (memory_accesses.refs, *mem, hash);
@@ -2192,6 +2192,140 @@ execute_sm (struct loop *loop, VEC (edge
   execute_sm_if_changed (ex, ref-mem, tmp_var, store_flag);
 }
 
+/* Copy the load and the chain of single uses described by LOC and return the
+   sequence of new statements.  Also set NEW_LHS to the copy of LOC-LHS.  */
+
+static gimple_seq
+copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
+{
+  tree mem = *loc-ref;
+  tree lhs, tmp_var, ssa_name;
+  gimple_seq seq = NULL;
+  gimple stmt;
+  unsigned n = 0;
+
+  /* First copy the 

Re: [RFC] Make vectorizer to skip loops with small iteration estimate

2012-10-08 Thread Jan Hubicka
 On Sat, Oct 6, 2012 at 11:34 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Hi,
  I benchmarked the patch moving loop header copying and it is quite 
  noticeable win.
 
  Some testsuite updating is needed. In many cases it is just because the
  optimizations are now happening earlier.
  There are however few testusite failures I have torubles to deal with:
  ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/pr21559.c 
  scan-tree-dump-times vrp1 Threaded jump 3
  ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/tree-ssa/ssa-dom-thread-2.c 
  scan-tree-dump-times vrp1 Jumps threaded: 1 1
  ./testsuite/gcc/gcc.sum:FAIL: gcc.dg/vect/O3-slp-reduc-10.c 
  scan-tree-dump-times vect vectorized 1 loops 2
  ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++98  
  scan-tree-dump-times vrp1 if  1
  ./testsuite/g++/g++.sum:FAIL: g++.dg/tree-ssa/pr18178.C -std=gnu++11  
  scan-tree-dump-times vrp1 if  1
 
  This is mostly about VRP losing its ability to thread some jumps from the
  duplicated loop header out of the loop across the loopback edge.  This 
  seems to
  be due to loop updating logic.  Do we care about these?
 
 Yes, I think so.  At least we care that the optimized result is the same.

it is not, we really lose optimization in those testcases.
The ones that are still optimized well I updated in the patch bellow.
 
 Can you elaborate on due to loop updating logic?

The problem is:
  /* We do not allow VRP information to be used for jump threading
 across a back edge in the CFG.  Otherwise it becomes too
 difficult to avoid eliminating loop exit tests.  Of course
 EDGE_DFS_BACK is not accurate at this time so we have to
 recompute it.  */
  mark_dfs_back_edges ();

  /* Do not thread across edges we are about to remove.  Just marking
 them as EDGE_DFS_BACK will do.  */
  FOR_EACH_VEC_ELT (edge, to_remove_edges, i, e)
e-flags |= EDGE_DFS_BACK;

Loop header copying puts some conditional before loop and we want to thread
up to exit out of the loop (that I think it rather important optimization).
But it no longer happens before back edge is in the way.  At least that was
the case in the tree-ssa failures I analyzed.
 
 Can you elaborate on the def_split_header_continue_p change?  Which probably
 should be tested and installed separately?

Yes, that one is latent bug.  The code is expecting that loop exit is recognized
by loop depth decreasing that is not true.
It reproduces as ICE during bootstrap with the patch.
I will regtest/bootstrap and commit it today.

Honza


Re: Scheduler: Save state at the end of a block

2012-10-08 Thread Bernd Schmidt
On 08/13/2012 05:42 PM, Vladimir Makarov wrote:
 On 08/13/2012 06:32 AM, Bernd Schmidt wrote:
 This is a small patch for sched-rgn that attempts to save DFA state at
 the end of a basic block and re-use it in successor blocks. This was a
 customer-requested optimization; I've not seen it make much of a
 difference in any macro benchmarks.
 Bootstrapped and tested on x86_64-linux and also tested on c6x-elf. OK?



 Yes.  Thanks for the patch, Bernd.

It's been a while, so I thought I'd better mention I've checked this in
now after retesting.


Bernd


Re: patch to fix constant math - third small patch

2012-10-08 Thread Kenneth Zadeck

yes, my bad.   here it is with the patches.
On 10/06/2012 11:55 AM, Kenneth Zadeck wrote:

This is the third patch in the series of patches to fix constant math.
this one changes some predicates at the rtl level to use the new 
predicate CONST_SCALAR_INT_P.

I did not include a few that were tightly intertwined with other changes.

Not all of these changes are strictly mechanical.   Richard, when 
reviewing this had me make additional changes to remove what he 
thought were latent bugs at the rtl level.   However, it appears that 
the bugs were not latent.I do not know what is going on here but i 
am smart enough to not look a gift horse in the mouth.


All of this was done on the same machine with no changes and identical 
configs.  It is an x86-64 with ubuntu 12-4.


ok for commit?

in the logs below, gbBaseline is a trunk from friday and the gbWide is 
the same revision but with my patches.  Some of this like 
gfortran.dg/pr32627 is obviously flutter, but the rest does not appear 
to be.


=
heracles:~/gcc(13) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log

New tests that PASS:

gcc.dg/builtins-85.c scan-assembler mysnprintf
gcc.dg/builtins-85.c scan-assembler-not __chk_fail
gcc.dg/builtins-85.c (test for excess errors)


heracles:~/gcc(14) gccBaseline/contrib/compare_tests 
gbBaseline/gcc/testsuite/gfortran/gfortran.log 
gbWide/gcc/testsuite/gfortran/gfortran.log

New tests that PASS:

gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test 
for excess errors)
gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess 
errors)

gfortran.dg/pr32627.f03  -Os  (test for excess errors)
gfortran.dg/pr32635.f  -O0  execution test
gfortran.dg/pr32635.f  -O0  (test for excess errors)
gfortran.dg/substr_6.f90  -O2  (test for excess errors)

Old tests that passed, that have disappeared: (Eeek!)

gfortran.dg/pr32627.f03  -O1  (test for excess errors)
gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops 
-finline-functions  (test for excess errors)

gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes49793
# of expected failures284
# of unsupported tests601

runtest completed at Fri Oct  5 16:10:20 2012
heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using 
/usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as 
tool-and-target-specific interface file.


=== g++ Summary ===

# of expected passes50472
# of expected failures284
# of unsupported tests613

runtest completed at Fri Oct  5 19:51:50 2012







diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 299150e..0404605 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -3633,9 +3633,8 @@ expand_debug_locations (void)
 
 	gcc_assert (mode == GET_MODE (val)
 			|| (GET_MODE (val) == VOIDmode
-			 (CONST_INT_P (val)
+			 (CONST_SCALAR_INT_P (val)
 || GET_CODE (val) == CONST_FIXED
-|| CONST_DOUBLE_AS_INT_P (val) 
 || GET_CODE (val) == LABEL_REF)));
 	  }
 
diff --git a/gcc/combine.c b/gcc/combine.c
index 4e0a579..b531305 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2617,16 +2617,19 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p,
  constant.  */
   if (i1 == 0
(temp = single_set (i2)) != 0
-   (CONST_INT_P (SET_SRC (temp))
-	  || CONST_DOUBLE_AS_INT_P (SET_SRC (temp)))
+   CONST_SCALAR_INT_P (SET_SRC (temp))
GET_CODE (PATTERN (i3)) == SET
-   (CONST_INT_P (SET_SRC (PATTERN (i3)))
-	  || CONST_DOUBLE_AS_INT_P (SET_SRC (PATTERN (i3
+   CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp)))
 {
   rtx dest = SET_DEST (PATTERN (i3));
   int offset = -1;
   int width = 0;
+  
+  /* There are not explicit tests to make sure that this is not a
+	 float, but there is code here that would not be correct if it
+	 was.  */
+  gcc_assert (GET_MODE_CLASS (GET_MODE (SET_SRC (temp))) != MODE_FLOAT);
 
   if (GET_CODE (dest) == ZERO_EXTRACT)
 	{
@@ -5102,8 +5105,7 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
 	  if (GET_CODE (new_rtx) == CLOBBER  XEXP (new_rtx, 0) == const0_rtx)
 		return new_rtx;
 
-	  if (GET_CODE (x) == SUBREG
-		   (CONST_INT_P (new_rtx) || CONST_DOUBLE_AS_INT_P (new_rtx)))
+	  if (GET_CODE (x) == SUBREG  CONST_SCALAR_INT_P (new_rtx))
 		{
 		  enum machine_mode mode = GET_MODE (x);
 
@@ -7133,7 +7135,7 @@ make_extraction (enum machine_mode mode, rtx inner, HOST_WIDE_INT pos,
   if (mode == tmode)
 	return new_rtx;
 
-  if (CONST_INT_P 

[PATCH] Remove my_rev_post_order_compute

2012-10-08 Thread Richard Guenther

This replaces my_rev_post_order_compute in PRE by the already
existing inverted_post_order_compute, with the necessary adjustments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-08  Richard Guenther  rguent...@suse.de

* tree-ssa-pre.c (postorder_num): New global.
(compute_antic): Initialize all blocks and adjust for
generic postorder.
(my_rev_post_order_compute): Remove.
(init_pre): Use inverted_post_order_compute.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 192119)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -430,6 +430,7 @@ typedef struct bb_bitmap_sets
 
 /* Basic block list in postorder.  */
 static int *postorder;
+static int postorder_num;
 
 /* This structure is used to keep track of statistics on what
optimization PRE was able to perform.  */
@@ -2456,7 +2457,7 @@ compute_antic (void)
   has_abnormal_preds = sbitmap_alloc (last_basic_block);
   sbitmap_zero (has_abnormal_preds);
 
-  FOR_EACH_BB (block)
+  FOR_ALL_BB (block)
 {
   edge_iterator ei;
   edge e;
@@ -2480,9 +2481,7 @@ compute_antic (void)
 }
 
   /* At the exit block we anticipate nothing.  */
-  ANTIC_IN (EXIT_BLOCK_PTR) = bitmap_set_new ();
   BB_VISITED (EXIT_BLOCK_PTR) = 1;
-  PA_IN (EXIT_BLOCK_PTR) = bitmap_set_new ();
 
   changed_blocks = sbitmap_alloc (last_basic_block + 1);
   sbitmap_ones (changed_blocks);
@@ -2496,7 +2495,7 @@ compute_antic (void)
 for PA ANTIC computation.  */
   num_iterations++;
   changed = false;
-  for (i = n_basic_blocks - NUM_FIXED_BLOCKS - 1; i = 0; i--)
+  for (i = postorder_num - 1; i = 0; i--)
{
  if (TEST_BIT (changed_blocks, postorder[i]))
{
@@ -2525,7 +2524,7 @@ compute_antic (void)
fprintf (dump_file, Starting iteration %d\n, num_iterations);
  num_iterations++;
  changed = false;
- for (i = n_basic_blocks - NUM_FIXED_BLOCKS - 1 ; i = 0; i--)
+ for (i = postorder_num - 1 ; i = 0; i--)
{
  if (TEST_BIT (changed_blocks, postorder[i]))
{
@@ -4593,78 +4592,6 @@ remove_dead_inserted_code (void)
   BITMAP_FREE (worklist);
 }
 
-/* Compute a reverse post-order in *POST_ORDER.  If INCLUDE_ENTRY_EXIT is
-   true, then then ENTRY_BLOCK and EXIT_BLOCK are included.  Returns
-   the number of visited blocks.  */
-
-static int
-my_rev_post_order_compute (int *post_order, bool include_entry_exit)
-{
-  edge_iterator *stack;
-  int sp;
-  int post_order_num = 0;
-  sbitmap visited;
-
-  if (include_entry_exit)
-post_order[post_order_num++] = EXIT_BLOCK;
-
-  /* Allocate stack for back-tracking up CFG.  */
-  stack = XNEWVEC (edge_iterator, n_basic_blocks + 1);
-  sp = 0;
-
-  /* Allocate bitmap to track nodes that have been visited.  */
-  visited = sbitmap_alloc (last_basic_block);
-
-  /* None of the nodes in the CFG have been visited yet.  */
-  sbitmap_zero (visited);
-
-  /* Push the last edge on to the stack.  */
-  stack[sp++] = ei_start (EXIT_BLOCK_PTR-preds);
-
-  while (sp)
-{
-  edge_iterator ei;
-  basic_block src;
-  basic_block dest;
-
-  /* Look at the edge on the top of the stack.  */
-  ei = stack[sp - 1];
-  src = ei_edge (ei)-src;
-  dest = ei_edge (ei)-dest;
-
-  /* Check if the edge source has been visited yet.  */
-  if (src != ENTRY_BLOCK_PTR  ! TEST_BIT (visited, src-index))
-{
-  /* Mark that we have visited the destination.  */
-  SET_BIT (visited, src-index);
-
-  if (EDGE_COUNT (src-preds)  0)
-/* Since the SRC node has been visited for the first
-   time, check its predecessors.  */
-stack[sp++] = ei_start (src-preds);
-  else
-post_order[post_order_num++] = src-index;
-}
-  else
-{
-  if (ei_one_before_end_p (ei)  dest != EXIT_BLOCK_PTR)
-post_order[post_order_num++] = dest-index;
-
-  if (!ei_one_before_end_p (ei))
-ei_next (stack[sp - 1]);
-  else
-sp--;
-}
-}
-
-  if (include_entry_exit)
-post_order[post_order_num++] = ENTRY_BLOCK;
-
-  free (stack);
-  sbitmap_free (visited);
-  return post_order_num;
-}
-
 
 /* Initialize data structures used by PRE.  */
 
@@ -4686,9 +4613,8 @@ init_pre (void)
   connect_infinite_loops_to_exit ();
   memset (pre_stats, 0, sizeof (pre_stats));
 
-
-  postorder = XNEWVEC (int, n_basic_blocks - NUM_FIXED_BLOCKS);
-  my_rev_post_order_compute (postorder, false);
+  postorder = XNEWVEC (int, n_basic_blocks);
+  postorder_num = inverted_post_order_compute (postorder);
 
   alloc_aux_for_blocks (sizeof (struct bb_bitmap_sets));
 


[PATCH] Fix PR54825

2012-10-08 Thread Richard Guenther

This fixes PR54825, properly FRE/PRE vector BIT_FIELD_REFs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-08  Richard Guenther  rguent...@suse.de

PR tree-optimization/54825
* tree-ssa-sccvn.c (vn_nary_length_from_stmt): Handle BIT_FIELD_REF.
(init_vn_nary_op_from_stmt): Likewise.
* tree-ssa-pre.c (compute_avail): Use vn_nary_op_lookup_stmt.
* tree-ssa-sccvn.h (sizeof_vn_nary_op): Avoid overflow.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 192120)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_nary_length_from_stmt (gimple stmt)
*** 2194,2199 
--- 2194,2202 
  case VIEW_CONVERT_EXPR:
return 1;
  
+ case BIT_FIELD_REF:
+   return 3;
+ 
  case CONSTRUCTOR:
return CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt));
  
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2220,2225 
--- 2223,2235 
vno-op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
break;
  
+ case BIT_FIELD_REF:
+   vno-length = 3;
+   vno-op[0] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
+   vno-op[1] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 1);
+   vno-op[2] = TREE_OPERAND (gimple_assign_rhs1 (stmt), 2);
+   break;
+ 
  case CONSTRUCTOR:
vno-length = CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt));
for (i = 0; i  vno-length; ++i)
*** init_vn_nary_op_from_stmt (vn_nary_op_t
*** 2227,2232 
--- 2237,2243 
break;
  
  default:
+   gcc_checking_assert (!gimple_assign_single_p (stmt));
vno-length = gimple_num_ops (stmt) - 1;
for (i = 0; i  vno-length; ++i)
vno-op[i] = gimple_op (stmt, i + 1);
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 192120)
--- gcc/tree-ssa-pre.c  (working copy)
*** compute_avail (void)
*** 3850,3860 
  || code == VEC_COND_EXPR)
continue;
  
! vn_nary_op_lookup_pieces (gimple_num_ops (stmt) - 1,
!   code,
!   gimple_expr_type (stmt),
!   gimple_assign_rhs1_ptr (stmt),
!   nary);
  if (!nary)
continue;
  
--- 3850,3856 
  || code == VEC_COND_EXPR)
continue;
  
! vn_nary_op_lookup_stmt (stmt, nary);
  if (!nary)
continue;
  
Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 192120)
--- gcc/tree-ssa-sccvn.h(working copy)
*** typedef const struct vn_nary_op_s *const
*** 51,57 
  static inline size_t
  sizeof_vn_nary_op (unsigned int length)
  {
!   return sizeof (struct vn_nary_op_s) + sizeof (tree) * (length - 1);
  }
  
  /* Phi nodes in the hashtable consist of their non-VN_TOP phi
--- 51,57 
  static inline size_t
  sizeof_vn_nary_op (unsigned int length)
  {
!   return sizeof (struct vn_nary_op_s) + sizeof (tree) * length - sizeof 
(tree);
  }
  
  /* Phi nodes in the hashtable consist of their non-VN_TOP phi


Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 1:36 PM, Kenneth Zadeck zad...@naturalbridge.com wrote:
 yes, my bad.   here it is with the patches.

Just for the record, ok!

Thanks,
Richard.

 On 10/06/2012 11:55 AM, Kenneth Zadeck wrote:

 This is the third patch in the series of patches to fix constant math.
 this one changes some predicates at the rtl level to use the new predicate
 CONST_SCALAR_INT_P.
 I did not include a few that were tightly intertwined with other changes.

 Not all of these changes are strictly mechanical.   Richard, when
 reviewing this had me make additional changes to remove what he thought were
 latent bugs at the rtl level.   However, it appears that the bugs were not
 latent.I do not know what is going on here but i am smart enough to not
 look a gift horse in the mouth.

 All of this was done on the same machine with no changes and identical
 configs.  It is an x86-64 with ubuntu 12-4.

 ok for commit?

 in the logs below, gbBaseline is a trunk from friday and the gbWide is the
 same revision but with my patches.  Some of this like gfortran.dg/pr32627 is
 obviously flutter, but the rest does not appear to be.

 =
 heracles:~/gcc(13) gccBaseline/contrib/compare_tests
 gbBaseline/gcc/testsuite/gcc/gcc.log gbWide/gcc/testsuite/gcc/gcc.log
 New tests that PASS:

 gcc.dg/builtins-85.c scan-assembler mysnprintf
 gcc.dg/builtins-85.c scan-assembler-not __chk_fail
 gcc.dg/builtins-85.c (test for excess errors)


 heracles:~/gcc(14) gccBaseline/contrib/compare_tests
 gbBaseline/gcc/testsuite/gfortran/gfortran.log
 gbWide/gcc/testsuite/gfortran/gfortran.log
 New tests that PASS:

 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-loops (test for
 excess errors)
 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer  (test for excess
 errors)
 gfortran.dg/pr32627.f03  -Os  (test for excess errors)
 gfortran.dg/pr32635.f  -O0  execution test
 gfortran.dg/pr32635.f  -O0  (test for excess errors)
 gfortran.dg/substr_6.f90  -O2  (test for excess errors)

 Old tests that passed, that have disappeared: (Eeek!)

 gfortran.dg/pr32627.f03  -O1  (test for excess errors)
 gfortran.dg/pr32627.f03  -O3 -fomit-frame-pointer -funroll-all-loops
 -finline-functions  (test for excess errors)
 gfortran.dg/pr32627.f03  -O3 -g  (test for excess errors)
 gfortran.dg/substring_equivalence.f90  -O  (test for excess errors)
 Using /home/zadeck/gcc/gccBaseline/gcc/testsuite/config/default.exp as
 tool-and-target-specific interface file.

 === g++ Summary ===

 # of expected passes49793
 # of expected failures284
 # of unsupported tests601

 runtest completed at Fri Oct  5 16:10:20 2012
 heracles:~/gcc(16) tail gbWide/gcc/testsuite/g++/g++.log Using
 /usr/share/dejagnu/config/unix.exp as generic interface file for target.
 Using /home/zadeck/gcc/gccWide/gcc/testsuite/config/default.exp as
 tool-and-target-specific interface file.

 === g++ Summary ===

 # of expected passes50472
 # of expected failures284
 # of unsupported tests613

 runtest completed at Fri Oct  5 19:51:50 2012








Re: [PATCH] PR 53528 c++/ C++11 Generalized Attribute support

2012-10-08 Thread Dodji Seketeli
Jason Merrill ja...@redhat.com writes:

 OK.

Thanks.  Committed to trunk at revision r192199.

-- 
Dodji


Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-08 Thread Dominique Dhumieres
  It appears that the patch should also special case the scan-assembler 
  .internal.*Foo.methodEv
  tests in g++.dg/ext/visibility/pragma-override1.C and 
  g++.dg/ext/visibility/pragma-override2.C
  on darwin as well...

 Done, thanks.

Jason,

These tests are still failing on darwin. I think that
target { ! *-*-solaris2* } { ! *-*-darwin* }
sould be replaced with
target { ! { *-*-solaris2* *-*-darwin* } }

TIA

Dominique


Re: [RFC] Implement load sinking in loops

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Hi,

 we recently noticed that, even at -O3, the compiler doesn't figure out that
 the following loop is dumb:

 #define SIZE 64

 int foo (int v[])
 {
   int r;

   for (i = 0; i  SIZE; i++)
 r = v[i];

   return r;
 }

 which was a bit of a surprise.  On second thoughts, this isn't entirely
 unexpected, as it probably matters only for (slightly) pathological cases.
 The attached patch nevertheless implements a form of load sinking in loops so
 as to optimize these cases.  It's combined with invariant motion to optimize:

 int foo (int v[], int a)
 {
   int r, i;

   for (i = 0; i  SIZE; i++)
 r = v[i] + a;

   return r;
 }

 and with store sinking to optimize:

 int foo (int v1[], int v2[])
 {
   int r[SIZE];
   int i, j;

   for (j = 0; j  SIZE; j++)
 for (i = 0; i  SIZE; i++)
   r[j] = v1[j] + v2[i];

   return r[SIZE - 1];
 }

 The optimization is enabled at -O2 in the patch for measurement purposes but,
 given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
 compiler-only, all languages except Go), it's probably best suited to -O3.
 Or perhaps we don't care and it should simply be dropped...  Thoughts?

Incidentially we have scev-const-prop to deal with the similar case of
scalar computations.  But I realize this doesn't work for expressions that
are dependent on a loop variant load.

@@ -103,6 +103,7 @@ typedef struct mem_ref_loc
 {
   tree *ref;   /* The reference itself.  */
   gimple stmt; /* The statement in that it occurs.  */
+  tree lhs;/* The (ultimate) LHS for a load.  */
 } *mem_ref_loc_p;

isn't that the lhs of stmt?

+static gimple_seq
+copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
+{
+  tree mem = *loc-ref;
+  tree lhs, tmp_var, ssa_name;
+  gimple_seq seq = NULL;
+  gimple stmt;
+  unsigned n = 0;
+
+  /* First copy the load and create the new LHS for it.  */
+  lhs = gimple_assign_lhs (loc-stmt);
+  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));

use make_temp_ssa_name or simply copy_ssa_name (not sure you need
fancy names here).

+  if (gimple_assign_rhs1 (use_stmt) == lhs)
+   {
+ op1 = ssa_name;
+ op2 = gimple_assign_rhs2 (use_stmt);
+   }
+  else
+   {
+ op1 = gimple_assign_rhs1 (use_stmt);
+ op2 = ssa_name;
+   }

this may enlarge lifetime of the other operand?  And it looks like it would
break with unary stmts (accessing out-of-bounds op2).  Also for
is_gimple_min_invariant other operand which may be for example a.b
you need to unshare_expr it.

+  lhs = gimple_assign_lhs (use_stmt);
+  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));
+  stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
+  ssa_name = make_ssa_name (tmp_var, stmt);
+  gimple_assign_set_lhs (stmt, ssa_name);

see above.  This can now be simplified to

   lhs = gimple_assign_lhs (use_stmt);
   ssa_name = copy_ssa_name (lhs, NULL);
   stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);

Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
What you want is apply this transform to as much of the _DEF_s of
the loop-closed PHI nodes - only values used outside of the loop are
interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
uses SCEV to compute the overall effect of the iterations).  So what
you want to know is whether when walking the DEF chain of the
loop closed PHI you end up at definitions before the loop or at
definitions that are not otherwise used inside the loop.

Which means it is really expression sinking.  Does tree-ssa-sink manage
to sink anything out of a loop?  Even scalar computation parts I mean?  For

 for (..)
   {
 a = x[i];
 y[i] = a;
 b = a * 2;
   }
  ... = b;

it should be able to sink b = a*2.

So I think the more natural place to implement this is either SCEV cprop
or tree-ssa-sink.c.  And process things from the loop-closed PHI use
walking the DEFs (first process all, marking interesting things to also
catch commonly used exprs for two PHI uses).

Again you might simply want to open a bugreport for this unless you
want to implement it yourself.

Thanks,
Richard.

 Tested on x86_64-suse-linux.


 2012-10-08  Eric Botcazou  ebotca...@adacore.com

 * gimple.h (gsi_insert_seq_on_edge_before): Declare.
 * gimple-iterator.c (gsi_insert_seq_on_edge_before): New function.
 * tree-ssa-loop-im.c (struct mem_ref_loc): Add LHS field.
 (mem_ref_in_stmt): Remove gcc_assert.
 (copy_load_and_single_use_chain): New function.
 (execute_lm): Likewise.
 (hoist_memory_references): Hoist the loads after the stores.
 (ref_always_accessed_p): Rename into...
 (ref_always_stored_p): ...this.  Remove STORE_P and add ONCE_P.
 (can_lsm_ref_p): New 

Re: [RFC] Implement load sinking in loops

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 2:32 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Oct 8, 2012 at 12:38 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Hi,

 we recently noticed that, even at -O3, the compiler doesn't figure out that
 the following loop is dumb:

 #define SIZE 64

 int foo (int v[])
 {
   int r;

   for (i = 0; i  SIZE; i++)
 r = v[i];

   return r;
 }

 which was a bit of a surprise.  On second thoughts, this isn't entirely
 unexpected, as it probably matters only for (slightly) pathological cases.
 The attached patch nevertheless implements a form of load sinking in loops so
 as to optimize these cases.  It's combined with invariant motion to optimize:

 int foo (int v[], int a)
 {
   int r, i;

   for (i = 0; i  SIZE; i++)
 r = v[i] + a;

   return r;
 }

 and with store sinking to optimize:

 int foo (int v1[], int v2[])
 {
   int r[SIZE];
   int i, j;

   for (j = 0; j  SIZE; j++)
 for (i = 0; i  SIZE; i++)
   r[j] = v1[j] + v2[i];

   return r[SIZE - 1];
 }

 The optimization is enabled at -O2 in the patch for measurement purposes but,
 given how rarely it triggers (e.g. exactly 10 occurrences in a GCC bootstrap,
 compiler-only, all languages except Go), it's probably best suited to -O3.
 Or perhaps we don't care and it should simply be dropped...  Thoughts?

 Incidentially we have scev-const-prop to deal with the similar case of
 scalar computations.  But I realize this doesn't work for expressions that
 are dependent on a loop variant load.

 @@ -103,6 +103,7 @@ typedef struct mem_ref_loc
  {
tree *ref;   /* The reference itself.  */
gimple stmt; /* The statement in that it occurs.  */
 +  tree lhs;/* The (ultimate) LHS for a load.  */
  } *mem_ref_loc_p;

 isn't that the lhs of stmt?

 +static gimple_seq
 +copy_load_and_single_use_chain (mem_ref_loc_p loc, tree *new_lhs)
 +{
 +  tree mem = *loc-ref;
 +  tree lhs, tmp_var, ssa_name;
 +  gimple_seq seq = NULL;
 +  gimple stmt;
 +  unsigned n = 0;
 +
 +  /* First copy the load and create the new LHS for it.  */
 +  lhs = gimple_assign_lhs (loc-stmt);
 +  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, n++));

 use make_temp_ssa_name or simply copy_ssa_name (not sure you need
 fancy names here).

 +  if (gimple_assign_rhs1 (use_stmt) == lhs)
 +   {
 + op1 = ssa_name;
 + op2 = gimple_assign_rhs2 (use_stmt);
 +   }
 +  else
 +   {
 + op1 = gimple_assign_rhs1 (use_stmt);
 + op2 = ssa_name;
 +   }

 this may enlarge lifetime of the other operand?  And it looks like it would
 break with unary stmts (accessing out-of-bounds op2).  Also for
 is_gimple_min_invariant other operand which may be for example a.b
 you need to unshare_expr it.

 +  lhs = gimple_assign_lhs (use_stmt);
 +  tmp_var = create_tmp_reg (TREE_TYPE (lhs), get_lsm_tmp_name (mem, 
 n++));
 +  stmt = gimple_build_assign_with_ops (rhs_code, tmp_var, op1, op2);
 +  ssa_name = make_ssa_name (tmp_var, stmt);
 +  gimple_assign_set_lhs (stmt, ssa_name);

 see above.  This can now be simplified to

lhs = gimple_assign_lhs (use_stmt);
ssa_name = copy_ssa_name (lhs, NULL);
stmt = gimple_build_assign_with_ops (rhs_code, ssa_name, op1, op2);

 Btw - isn't this all a bit backward (I mean the analysis in execute_lm?)
 What you want is apply this transform to as much of the _DEF_s of
 the loop-closed PHI nodes - only values used outside of the loop are
 interesting.  Thats (sort-of) what SCEV const-prop does (well, it also
 uses SCEV to compute the overall effect of the iterations).  So what
 you want to know is whether when walking the DEF chain of the
 loop closed PHI you end up at definitions before the loop or at
 definitions that are not otherwise used inside the loop.

 Which means it is really expression sinking.  Does tree-ssa-sink manage
 to sink anything out of a loop?  Even scalar computation parts I mean?  For

  for (..)
{
  a = x[i];
  y[i] = a;
  b = a * 2;
}
   ... = b;

 it should be able to sink b = a*2.

 So I think the more natural place to implement this is either SCEV cprop
 or tree-ssa-sink.c.  And process things from the loop-closed PHI use
 walking the DEFs (first process all, marking interesting things to also
 catch commonly used exprs for two PHI uses).

 Again you might simply want to open a bugreport for this unless you
 want to implement it yourself.

We indeed sink 2*tem but not a[i] here.  Because tree-ssa-sink.c doesn't
sink loads (IIRC) at all, but I've seen patches to fix that (IIRC).

int a[256];
int foo (int x)
{
  int i, k = 0;
  for (i = 0; i  x; ++i)
{
  int tem = a[i];
  k = 2*tem;
}
  return k;
}

Richard.

 Thanks,
 Richard.

 Tested on x86_64-suse-linux.


 2012-10-08  Eric Botcazou  ebotca...@adacore.com

 * gimple.h (gsi_insert_seq_on_edge_before): Declare.
 * gimple-iterator.c 

gcc/lto/lto.c: Free lto_file struct after closing the file

2012-10-08 Thread Tobias Burnus

lto_obj_file_open allocates:
  lo = XCNEW (struct lto_simple_object);
However, the data is never freed - neither explicitly nor in 
lto_obj_file_close.


In the attached patch, I free the memory now after the call to 
lto_obj_file_close.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias


patch.diff
Description: application/unknown


Re: gcc/lto/lto.c: Free lto_file struct after closing the file

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 2:39 PM, Tobias Burnus bur...@net-b.de wrote:
 lto_obj_file_open allocates:
   lo = XCNEW (struct lto_simple_object);
 However, the data is never freed - neither explicitly nor in
 lto_obj_file_close.

 In the attached patch, I free the memory now after the call to
 lto_obj_file_close.

 Build and regtested on x86-64-gnu-linux.
 OK for the trunk?

Ok.

Thanks,
Richard.

 Tobias


Re: [lra] another patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher
On Mon, Oct 8, 2012 at 1:00 AM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 This patch changes the worklist-like bitmap in lra_eliminate() to an
 sbitmap.  Effect on compile time:

I have another patch to also make lra_constraint_insn_stack_bitmap.

Without patch:
log.0: LRA non-specific:  46.94 ( 6%)
log.0: LRA virtuals elimination:  51.56 ( 6%)
log.0: LRA reload inheritance  :   0.03 ( 0%)
log.0: LRA create live ranges  :  46.67 ( 6%)
log.0: LRA hard reg assignment :   0.55 ( 0%)

With patch:
log.3: LRA non-specific:  18.14 ( 2%)
log.3: LRA virtuals elimination:   8.04 ( 1%)
log.3: LRA reload inheritance  :   0.03 ( 0%)
log.3: LRA create live ranges  :  45.01 ( 6%)
log.3: LRA hard reg assignment :   0.63 ( 0%)

I'll go through the usual testing cycle again with my patch set and
post the final patch here for review today or tomorrow.

At this point I think it's clear that we can speed up LRA even on
crazy-large test cases, so I would not object anymore to a merge into
the trunk at this point.

Ciao!
Steven


Re: [patch] Add option to compute reaching and live definitions

2012-10-08 Thread Paolo Bonzini
Il 07/10/2012 19:18, Steven Bosscher ha scritto:
 Hello,
 
 The attached patch adds a DF changeable flag to compute a subset of
 reaching definitions that are also live at the program points they
 reach. This is an idea I discussed with Paolo many years ago already,
 but until today it hadn't really ever been close to the top of my todo
 list, but trying to compile the test case for PR54146 with -fweb
 finally changed that :-)
 
 The idea is to prune the DF_RD_OUT set of each basic block by
 registers live in DF_LR_OUT. I've implemented this pruning with the
 same approach as the sparse formulation of RD dataflow, expanding the
 regs in DF_LR_OUT to the corresponding set of DEFs and using that set
 to mask out dead DEFs in DF_RD_OUT. This is a convenient formulation
 because DF_LR is already expressed in terms of regnos (like
 sparse_kill  friends), and the formulation also works fine for the
 dense formulation, of course.
 
 The effect on compile time for a set of cc1-i files is negligible (not
 measurable, anyway), but for crazy large test cases like PR54146 this
 patch is the difference between triggering out-of-memory or completing
 the pass (at least -fweb, probably also the other affected passes).
 
 Bootstrappedtested on powerpc64-unknown-linux-gnu. OK for trunk?

Ok.

I wonder if we actually need the non-pruned version anywhere...

Paolo

 df_rd_pruned.diff
 
   * bitmap.h (bitmap_and_into): Update prototype.
   * bitmap.c (bitmap_and_into): Return true if the target bitmap
   changed, false otherwise.
 
   * df.h (df_dump_insn_problem_function): New function type.
   (struct df_problem): Add two functions, to dump just before and
   just after an insn.
   (DF_RD_PRUNE_DEAD_DEFS): New changable flag.
   (df_dump_insn_top, df_dump_insn_bottom): New prototypes.
   * df-core (df_dump_region): Use dump_bb.
   (df_dump_bb_problem_data): New function.
   (df_dump_top, df_dump_bottom): Rewrite using df_dump_bb_problem_data.
   (df_dump_insn_problem_data): New function.
   (df_dump_insn_top, df_dump_insn_bottom): New functions.
   * df-scan.c (problem_SCAN): Add NULL fields for new members.
   * df-problems.c (df_rd_local_compute): Ignore hard registers if
   DF_NO_HARD_REGS is in effect.
   (df_rd_transfer_function): If DF_RD_PRUNE_DEAD_DEFS is in effect,
   prune reaching defs using the LR problem.
   (df_rd_start_dump): Fix dumping of DEFs map.
   (df_rd_dump_defs_set): New function.
   (df_rd_top_dump, df_rd_bottom_dump): Use it.
   (problem_RD): Add NULL fields for new members.
   (problem_LR, problem_LIVE): Likewise.
   (df_chain_bb_dump): New function.
   (df_chain_top_dump): Dump only for artificial DEFs and USEs,
   using df_chain_bb_dump.
   (df_chain_bottom_dump): Likewise.
   (df_chain_insn_top_dump, df_chain_insn_bottom_dump): New functions.
   (problem_CHAIN): Add them as new members.
   (problem_WORD_LR, problem_NOTE): Add NULL fields for new members.
   (problem_MD): Likewise.
   * cfgrtl.c (rtl_dump_bb): Use df_dump_insn_top and df_dump_insn_bottom.
   (print_rtl_with_bb): Likewise.
 
   * dce.c (init_dce): Use DF_RD_PRUNE_DEAD_DEFS.
   * loop-invariant.c (find_defs): Likewise.
   * loop-iv.c (iv_analysis_loop_init): Likewise.
   * ree.c (find_and_remove_re): Likewise.
   * web.c (web_main): Likewise.
 
 Index: bitmap.h
 ===
 --- bitmap.h  (revision 192106)
 +++ bitmap.h  (working copy)
 @@ -224,7 +224,7 @@ extern unsigned long bitmap_count_bits (const_bitm
 are three operand versions that to not destroy the source bitmaps.
 The operations supported are ,  ~, |, ^.  */
  extern void bitmap_and (bitmap, const_bitmap, const_bitmap);
 -extern void bitmap_and_into (bitmap, const_bitmap);
 +extern bool bitmap_and_into (bitmap, const_bitmap);
  extern bool bitmap_and_compl (bitmap, const_bitmap, const_bitmap);
  extern bool bitmap_and_compl_into (bitmap, const_bitmap);
  #define bitmap_compl_and(DST, A, B) bitmap_and_compl (DST, B, A)
 Index: bitmap.c
 ===
 --- bitmap.c  (revision 192106)
 +++ bitmap.c  (working copy)
 @@ -916,17 +916,18 @@ bitmap_and (bitmap dst, const_bitmap a, const_bitm
  dst-indx = dst-current-indx;
  }
  
 -/* A = B.  */
 +/* A = B.  Return true if A changed.  */
  
 -void
 +bool
  bitmap_and_into (bitmap a, const_bitmap b)
  {
bitmap_element *a_elt = a-first;
const bitmap_element *b_elt = b-first;
bitmap_element *next;
 +  bool changed = false;
  
if (a == b)
 -return;
 +return false;
  
while (a_elt  b_elt)
  {
 @@ -935,6 +936,7 @@ bitmap_and_into (bitmap a, const_bitmap b)
 next = a_elt-next;
 bitmap_element_free (a, a_elt);
 a_elt = next;
 +   changed = true;
   }
else if (b_elt-indx  a_elt-indx)
   b_elt = 

[Patch] Fix PR52945

2012-10-08 Thread Dominique Dhumieres
The following patch fixes PR52945 on Darwin. It as beem approved
by Jan Hubicka in PR52945#c5. Since I don't have write permission,
could someone commit it for me?

TIA

Dominique

2012-10-08  Dominique d'Humieres  domi...@lps.ens.fr

PR gcc/52945
* testsuite/gcc.dg/lto/pr52634_0.c: skip the test on Darwin.

--- /opt/gcc/_clean/gcc/testsuite/gcc.dg/lto/pr52634_0.c2012-04-10 
08:58:02.0 +0200
+++ /opt/gcc/work/gcc/testsuite/gcc.dg/lto/pr52634_0.c  2012-06-19 
15:09:29.0 +0200
@@ -1,3 +1,5 @@
+/* { dg-require-weak  } */
+/* { dg-require-alias  } */
 /* { dg-lto-do link } */
 /* { dg-lto-options {{-flto -r -nostdlib -flto-partition=1to1}} */
 extern int cfliteValueCallBacks;


Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Pavel Chupin
On Android NDK libstdc++ is configured, built and packaged separately.
The problem is not dependency on libgcc sources but rather dependency
on the symlink which is generated during libgcc build and cannot be
found if libstdc++ is configured and built separately.
It was working fine for 4.4 and 4.6. This issue has been introduced in 4.7.

Do you think libstdc++ should not be built separately?

2012/10/6 Andrew Pinski pins...@gmail.com:
 On Fri, Oct 5, 2012 at 12:13 PM, Andrew Pinski pins...@gmail.com wrote:
 On Fri, Oct 5, 2012 at 7:32 AM, Pavel Chupin pavel.v.chu...@gmail.com 
 wrote:
 I can't configure libstdc++ separately. To reproduce:

 mkdir BUILD
 cd BUILD
 ../libstdc++-v3/configure

 Error:
 make: *** No rule to make target
 `/users/pvchupin/android/toolchain/gcc/gcc-4.8/BUILD/../libgcc/gthr-default.h',
 needed by `bits/gthr-default.h'.  Stop.

 See fix attached.

 Ok for trunk and 4.7?

 Why do you want to compile libstdc++ separately from GCC?  I think you
 need to explain why you want to do that.  In fact libstdc++ depends on
 libgcc internals is not a bug but rather a feature.

 One more thing is that for cases where target==host!=build, you can
 just use the libraries which are produced by the cross compiler and
 use make all-host and make install-host for the programs.

 This should simplify how Yocto builds the native GCC and not
 worrying about building libstdc++ separately.

 Thanks,
 Andrew Pinski


 Thanks,
 Andrew Pinski




 2012-10-05  Pavel Chupin  pavel.v.chu...@intel.com

 Fix missing gthr-default.h issue on separate libstdc++ configure
 * libstdc++-v3/acinclude.m4: Define glibcxx_thread_h.
 * libstdc++-v3/include/Makefile.am: Use glibcxx_thread_h.
 * libstdc++-v3/Makefile.in: Regenerate.
 * libstdc++-v3/configure: Regenerate.
 * libstdc++-v3/doc/Makefile.in: Regenerate.
 * libstdc++-v3/include/Makefile.in: Regenerate.
 * libstdc++-v3/libsupc++/Makefile.in: Regenerate.
 * libstdc++-v3/po/Makefile.in: Regenerate.
 * libstdc++-v3/python/Makefile.in: Regenerate.
 * libstdc++-v3/src/Makefile.in: Regenerate.
 * libstdc++-v3/src/c++11/Makefile.in: Regenerate.
 * libstdc++-v3/src/c++98/Makefile.in: Regenerate.
 * libstdc++-v3/testsuite/Makefile.in: Regenerate.

 --
 Pavel Chupin
 Intel Corporation



-- 
Pavel Chupin
Software Engineer
Intel Corporation


Re: [PATCH] Improve debug info for partial inlining (PR debug/54519, take 2)

2012-10-08 Thread H.J. Lu
On Fri, Oct 5, 2012 at 7:19 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 05, 2012 at 03:59:55PM +0200, Richard Guenther wrote:
 I don't think we want to rely on that ... so just keep the push/pop_cfun.

 Ok, so this is what I'm retesting (basically just comments added and the two
 lines (subcode and set) swapped:

 2012-10-05  Jakub Jelinek  ja...@redhat.com

 PR debug/54519
 * ipa-split.c (split_function): Add debug args and
 debug source and normal stmts for args_to_skip which are
 gimple regs.
 * tree-inline.c (copy_debug_stmt): When inlining, adjust
 source debug bind stmts to debug binds of corresponding
 DEBUG_EXPR_DECL.


This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54837

-- 
H.J.


Re: [C++ Patch/RFC] PR 54194

2012-10-08 Thread Paolo Carlini

On 10/08/2012 03:57 PM, Jason Merrill wrote:
This is definitely an improvement, though for warnings about issues 
with the left or right argument, we could use the EXPR_LOCATION of the 
problematic argument rather than the location of the new operand.

I agree. Let me see if I can figure out something straightforward enough.

Thanks!
Paolo.


Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Paolo Carlini

On 10/08/2012 03:43 PM, Pavel Chupin wrote:

This issue has been introduced in 4.7.
Irrespective of what we are eventually going to do from a practical 
point of view, I think it would be important to understand when/what 
introduced the issue: did you analyze that in any detail?


Thanks,
Paolo.


Re: [patch] Add option to compute reaching and live definitions

2012-10-08 Thread Steven Bosscher
On Mon, Oct 8, 2012 at 3:27 PM, Paolo Bonzini wrote:
 I wonder if we actually need the non-pruned version anywhere...

I don't think so, but I'm not sure. Only ddg.c and loop-iv.c access
the DF_RD results directly (i.e. not via DU/UD chains). For loop-iv
the pruned version is fine. For ddg I didn't feel comfortable enough
with that code to perform the changes there as well.

Ciao!
Steven


Re: Fixup INTEGER_CST

2012-10-08 Thread Jan Hubicka
2) As we query the type_hash while we are rewritting the types,
   we run into instability of the hashtable. This manifests itself
   as an ICE when one adds sanity check that while merging function
   types their arg types are equivalent, too.
   This ICEs compiling i.e. sqlite but I did not really managed to
   reduce this.  I tracked it down to the argument type being inserted
   into gimple_type_hash but at the time we query the new argument type,
   the original is no longer found despite their hashes are equivalent.
   The problem is hidden when things fit into the leader cache,
   so one needs rather big testcase.
 
 Ugh.  For reduction you can disable those caches though.  The above
 means there is a disconnect between hashing and comparing.
 Maybe it's something weird with the early out
 
   if (TYPE_ARG_TYPES (t1) == TYPE_ARG_TYPES (t2))
 goto same_types;
 ?

I filled in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54856 sadly the
testcase I reduced with yesterday tree do not reproduce on today tree on
different machine.  Perhaps it is hash table conflict with GGC or something
like that.

sqlite seems big enough to trigger the bug quite reproducibly. On current
mainline I however need to disable leader cache (that was not true on weekend
on the other machine ;)

Honza


Re: [i386] recognize haddpd

2012-10-08 Thread Marc Glisse

On Fri, 28 Sep 2012, Uros Bizjak wrote:


2) {v[0]-v[1], v[0]-v[1]} is not recognized as a hsubpd because
vec_duplicate doesn't match vec_concat. Do we really need to duplicate (no
pun intended) the pattern?


You can add this transformation to simplify-rtx.c. Probably vec_concat
with two equal operands can be canonicalized as vec_duplicate.


Actually, it is replacing vec_duplicate with vec_concat that would help. 
Well, I'll see about that later.


Here is what I came up with, trying to follow your other advice (thanks a 
lot!).


Passes bootstrap+testsuite.

2012-10-08  Marc Glisse  marc.gli...@inria.fr

gcc/
PR target/54400
* config/i386/i386.md (type attribute): Add sseadd1.
(unit attribute): Add support for sseadd1.
* config/i386/sse.md (sse3_hplusminus_insnv2df3): split into...
(sse3_haddv2df3): ... expander.
(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
(sse3_hsubv2df3): ... define_insn.
(*sse3_haddv2df3_low): New define_insn.
(*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
PR target/54400
* gcc.target/i386/pr54400.c: New testcase.

--
Marc GlisseIndex: gcc/testsuite/gcc.target/i386/pr54400.c
===
--- gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -msse3 -mfpmath=sse } */
+
+#include x86intrin.h
+
+double f (__m128d p)
+{
+  return p[0] - p[1];
+}
+
+double g1 (__m128d p)
+{
+  return p[0] + p[1];
+}
+
+double g2 (__m128d p)
+{
+  return p[1] + p[0];
+}
+
+__m128d h (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] - p[1], q[0] - q[1] };
+  return r;
+}
+
+__m128d i1 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[0] + q[1] };
+  return r;
+}
+
+__m128d i2 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[1] + q[0] };
+  return r;
+}
+
+__m128d i3 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[0] + q[1] };
+  return r;
+}
+
+__m128d i4 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[1] + q[0] };
+  return r;
+}
+
+/* { dg-final { scan-assembler-times hsubpd 2 } } */
+/* { dg-final { scan-assembler-times haddpd 6 } } */
+/* { dg-final { scan-assembler-not unpck } } */

Property changes on: gcc/testsuite/gcc.target/i386/pr54400.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 192206)
+++ gcc/config/i386/i386.md (working copy)
@@ -320,36 +320,36 @@
 ;; provided in other attributes.
 (define_attr type
   other,multi,
alu,alu1,negnot,imov,imovx,lea,
incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   
sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,lwp,
+   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   ssediv,sseins,ssemuladd,sse4arg,lwp,
mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft
   (const_string other))
 
 ;; Main data type used by the insn
 (define_attr mode
   unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF
   (const_string unknown))
 
 ;; The CPU unit operations uses.
 (define_attr unit integer,i387,sse,mmx,unknown
   (cond [(eq_attr type 
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint)
   (const_string i387)
 (eq_attr type 
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+ 
sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg)
   (const_string sse)
 (eq_attr type mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft)
   (const_string mmx)
 (eq_attr type other)
   (const_string unknown)]
 (const_string integer)))
 
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr length_immediate 
Index: gcc/config/i386/sse.md
===
--- gcc/config/i386/sse.md  (revision 192206)
+++ gcc/config/i386/sse.md  (working copy)
@@ -1209,42 +1209,120 @@
  (vec_select:DF (match_dup 1) (parallel [(const_int 3)])))
(plusminus:DF
  (vec_select:DF (match_dup 2) (parallel [(const_int 2)]))
  (vec_select:DF (match_dup 2) (parallel [(const_int 3)]))]
   TARGET_AVX
   

RFA: PATCH to acinclude.m4 to fix gas version detection

2012-10-08 Thread Jason Merrill

On 10/04/2012 11:40 AM, Jason Merrill wrote:

Recent versions of binutils seem to have started putting ' around the
version number in bfd/configure.in, which was confusing gcc configure.


When this change was made to binutils, the other directories changed to 
using bfd/configure --version to get the version number, so this version 
of my patch uses that instead of changing the regexp.  This patch also 
fixes another issue I noticed with AIX configury.


OK for trunk?

Jason

commit 94d42e379702606ec09b241d54ed7ad72cfaff99
Author: Jason Merrill ja...@redhat.com
Date:   Fri Oct 5 18:59:08 2012 -0400

	* acinclude.m4 (gcc_cv_gas_version): Try bfd/configure --version first.
	* configure.ac (gcc_cv_gld_version): Likewise.
	(gcc_cv_as_aix_ref): Fix typo.
	* configure: Regenerate.

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
index c24464b..f7699ea 100644
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -389,6 +389,8 @@ dnl # gcc_cv_as_gas_srcdir must be defined before this.
 dnl # This gross requirement will go away eventually.
 AC_DEFUN([_gcc_COMPUTE_GAS_VERSION],
 [gcc_cv_as_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
+gcc_cv_gas_version=`$gcc_cv_as_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+if test x$gcc_cv_gas_version != x; then true; else
 for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
@@ -397,7 +399,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
   if test x$gcc_cv_gas_version != x; then
 break
   fi
-done
+done; fi
 gcc_cv_gas_major_version=`expr $gcc_cv_gas_version : VERSION=\([[0-9]]*\)`
 gcc_cv_gas_minor_version=`expr $gcc_cv_gas_version : VERSION=[[0-9]]*\.\([[0-9]]*\)`
 gcc_cv_gas_patch_version=`expr $gcc_cv_gas_version : VERSION=[[0-9]]*\.[[0-9]]*\.\([[0-9]]*\)`
diff --git a/gcc/configure b/gcc/configure
index 45bba8e..fe4f3c7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -21233,6 +21233,8 @@ if test $gcc_cv_as = ../gas/as-new$build_exeext; then
 $as_echo newly built gas 6; }
   in_tree_gas=yes
   gcc_cv_as_bfd_srcdir=`echo $srcdir | sed -e 's,/gcc$,,'`/bfd
+gcc_cv_gas_version=`$gcc_cv_as_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+if test x$gcc_cv_gas_version != x; then true; else
 for f in $gcc_cv_as_bfd_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure \
  $gcc_cv_as_gas_srcdir/configure.in \
@@ -21241,7 +21243,7 @@ for f in $gcc_cv_as_bfd_srcdir/configure \
   if test x$gcc_cv_gas_version != x; then
 break
   fi
-done
+done; fi
 gcc_cv_gas_major_version=`expr $gcc_cv_gas_version : VERSION=\([0-9]*\)`
 gcc_cv_gas_minor_version=`expr $gcc_cv_gas_version : VERSION=[0-9]*\.\([0-9]*\)`
 gcc_cv_gas_patch_version=`expr $gcc_cv_gas_version : VERSION=[0-9]*\.[0-9]*\.\([0-9]*\)`
@@ -21393,13 +21395,15 @@ $as_echo newly built ld 6; }
 	elif test $ld_is_gold = yes; then
 	  in_tree_ld_is_elf=yes
 	fi
+	gcc_cv_gld_version=`$gcc_cv_ld_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+	if test x$gcc_cv_gld_version != x; then true; else
 	for f in $gcc_cv_ld_bfd_srcdir/configure $gcc_cv_ld_gld_srcdir/configure $gcc_cv_ld_gld_srcdir/configure.in $gcc_cv_ld_gld_srcdir/Makefile.in
 	do
 		gcc_cv_gld_version=`sed -n -e 's/^[ 	]*\(VERSION=[0-9]*\.[0-9]*.*\)/\1/p'  $f`
 		if test x$gcc_cv_gld_version != x; then
 			break
 		fi
-	done
+	done; fi
 	gcc_cv_gld_major_version=`expr $gcc_cv_gld_version : VERSION=\([0-9]*\)`
 	gcc_cv_gld_minor_version=`expr $gcc_cv_gld_version : VERSION=[0-9]*\.\([0-9]*\)`
 else
@@ -25346,8 +25350,8 @@ if test ${gcc_cv_as_aix_ref+set} = set; then :
 else
   gcc_cv_as_aix_ref=no
 if test $in_tree_gas = yes; then
-if test $gcc_cv_gas_vers -ge `expr \( \( 2.21.0 \* 1000 \) + gcc_cv_as_aix_ref=yes \) \* 1000 + `
-  then :
+if test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 21 \) \* 1000 + 0`
+  then gcc_cv_as_aix_ref=yes
 fi
   elif test x$gcc_cv_as != x; then
 $as_echo '	.csect stuff[rw]
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 6ad6d19..3013555 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -2046,6 +2046,8 @@ if test $gcc_cv_ld = ../ld/ld-new$build_exeext \
 	elif test $ld_is_gold = yes; then
 	  in_tree_ld_is_elf=yes
 	fi
+	gcc_cv_gld_version=`$gcc_cv_ld_bfd_srcdir/configure --version | sed -n -e '1s,.* ,VERSION=,p'`
+	if test x$gcc_cv_gld_version != x; then true; else
 	for f in $gcc_cv_ld_bfd_srcdir/configure $gcc_cv_ld_gld_srcdir/configure $gcc_cv_ld_gld_srcdir/configure.in $gcc_cv_ld_gld_srcdir/Makefile.in
 	do
 changequote(,)dnl
@@ -2053,7 +2055,7 @@ changequote(,)dnl
 		if test x$gcc_cv_gld_version != x; then
 			break
 		fi
-	done
+	done; fi
 	gcc_cv_gld_major_version=`expr $gcc_cv_gld_version : VERSION=\([0-9]*\)`
 	gcc_cv_gld_minor_version=`expr $gcc_cv_gld_version : VERSION=[0-9]*\.\([0-9]*\)`
 changequote([,])dnl
@@ -3878,7 +3880,7 @@ LCF0:
 case $target in
   *-*-aix*)
 	gcc_GAS_CHECK_FEATURE([.ref support],
-	  

Ping Re: Defining C99 predefined macros for whole translation unit

2012-10-08 Thread Joseph S. Myers
Ping.  This patch 
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01907.html (non-C parts) is 
pending review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure

2012-10-08 Thread Paolo Carlini
Hi,

Pavel Chupin pavel.v.chu...@gmail.com ha scritto:

It has been changed here:
http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=630d52ca0a88d173f89634a5d7dd8aee07d04d80

subj:Move gthr to toplevel libgcc

I see, thanks. Let's add Rainer in CC, see if he expected this to happen or not.

Paolo



Re: patch to fix constant math

2012-10-08 Thread Nathan Froyd
- Original Message -
 Btw, as for Richards idea of conditionally placing the length field
 in
 rtx_def looks like overkill to me.  These days we'd merely want to
 optimize for 64bit hosts, thus unconditionally adding a 32 bit
 field to rtx_def looks ok to me (you can wrap that inside a union to
 allow both descriptive names and eventual different use - see what
 I've done to tree_base)

IMHO, unconditionally adding that field isn't optimize for 64-bit
hosts, but gratuitously make one of the major compiler data
structures bigger on 32-bit hosts.  Not everybody can cross-compile
from a 64-bit host.  And even those people who can don't necessarily
want to.  Please try to consider what's best for all the people who
use GCC, not just the cases you happen to be working with every day.

-Nathan


[testsuite] Require tls_runtime in gcc.target/i386/pr54445-1.c

2012-10-08 Thread Rainer Orth
gcc.target/i386/pr54445-1.c FAILs to execute on Solaris 9 with native TLS:

ld.so.1: pr54445-1.exe: fatal: pr54445-1.exe: object requires TLS, but TLS faile
d to initialize

The following patch fixes this by both requiring TLS runtime support and
adding the necessary options.

Tested with the appropriate runtest invocation in i386-pc-solaris2.9 and
x86_64-unknown-linux-gnu, installed on mainline.

Rainer


2012-10-08  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.target/i386/pr54445-1.c: Require tls_runtime, add tls options.

# HG changeset patch
# Parent 67ccd7a114e0eaf13cdb8c6d8f109c8fdfb86a96
Require tls_runtime in gcc.target/i386/pr54445-1.c

diff --git a/gcc/testsuite/gcc.target/i386/pr54445-1.c b/gcc/testsuite/gcc.target/i386/pr54445-1.c
--- a/gcc/testsuite/gcc.target/i386/pr54445-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr54445-1.c
@@ -1,5 +1,6 @@
-/* { dg-do run } */
+/* { dg-do run { target tls_runtime } } */
 /* { dg-options -O2 } */
+/* { dg-add-options tls } */
 
 __thread unsigned char tls_array[64];
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: patch to fix constant math

2012-10-08 Thread Robert Dewar

On 10/8/2012 11:01 AM, Nathan Froyd wrote:

- Original Message -

Btw, as for Richards idea of conditionally placing the length field
in
rtx_def looks like overkill to me.  These days we'd merely want to
optimize for 64bit hosts, thus unconditionally adding a 32 bit
field to rtx_def looks ok to me (you can wrap that inside a union to
allow both descriptive names and eventual different use - see what
I've done to tree_base)


IMHO, unconditionally adding that field isn't optimize for 64-bit
hosts, but gratuitously make one of the major compiler data
structures bigger on 32-bit hosts.  Not everybody can cross-compile
from a 64-bit host.  And even those people who can don't necessarily
want to.  Please try to consider what's best for all the people who
use GCC, not just the cases you happen to be working with every day.


I think that's rasonable in general, but as time goes on, and every
$300 laptop is 64-bit capable, one should not go TOO far out of the
way trying to make sure we can compile everything on a 32-bit machine.
After all, we don't try to ensure we can compile on a 16-bit machine
though when I helped write the Realia COBOL compiler, it was a major
consideration that we had to be able to compile arbitrarily large
programs on a 32-bit machine with one megabyte of memory. That was
achieved at the time, but is hardly relevant now!



Re: [PATCH] Fix inclusion of cxxabi_forced.h in dynamic_bitset

2012-10-08 Thread Joe Seymour
On 10/06/12 01:50, Paolo Carlini wrote:
 On 10/06/2012 02:33 AM, Joe Seymour wrote:
 I'm seeing tr2/headers/all.cc fail in the libstdc++ testsuite:

 In file included from
 src/gcc-mainline/libstdc++-v3/testsuite/tr2/headers/all.cc:22:0:
 /scratch/jseymour/mainline/i686-pc-linux-gnu/install/opt/codesourcery/include/c++/4.8.0/tr2/dynamic_bitset:42:27:
 fatal error: cxxabi_forced.h: No such file or directory
   #include cxxabi_forced.h
 ^
 compilation terminated.


  From libstdc++-v3/libsupc++/Makefile.am:
 bits_HEADERS = \
 atomic_lockfree_defines.h cxxabi_forced.h \
 exception_defines.h exception_ptr.h hash_bytes.h nested_exception.h
 Looking at how other headers in that list are treated, I believe it is the
 include of cxxabi_forced.h in dynamic_bitset at fault. This patch corrects 
 it.
 I'm pretty sure you are right. Any idea why the test isn't failing for 
 anybody else?

I was surprised not to find any other references to this failure as well,
especially as I observed the failure with pristine FSF sources. I've had a
closer look:

* We (CodeSourcery/Mentor) test the installation directory, with something like:

g++ -D_GLIBCXX_ASSERT -fmessage-length=0  -DLOCALEDIR=.
-I/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/testsuite/util
\
/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/testsuite/tr2/headers/all.cc
  -std=gnu++0x -S  -o all.s

* The standard make check invocation tests the objdir/srcdir with a longer
command, passing various paths etc, in particular:

-I/scratch/jseymour/mainline/i686-pc-linux-gnu/src/gcc-mainline/libstdc++-v3/libsupc++

Because all the headers in libstdc++-v3 are in that directory cxxabi_forced.h is
found successfully. It is the Makefile that places it in a different directory
during installation.

I suppose to get this test working correctly, we need to move the files listed
in bits_HEADERS into a bits/ directory in the source tree, then make appropriate
changes to cater for the adjusted directory layout.

Joe


Third ping: Re: Add a configure option to disable system header canonicalizations (issue6495088)

2012-10-08 Thread Simon Baldwin
Ping, again.

On 1 October 2012 16:56, Simon Baldwin sim...@google.com wrote:

 Ping, again.


 On 21 September 2012 12:45, Simon Baldwin sim...@google.com wrote:
 
  Ping.
 
  http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00459.html
 
  Full text of previous message and context at URL above.  No comments
  or code changes since.  Patch description left below for convenience.
 
  
   Add flags to disable system header canonicalizations.
  
   Libcpp may canonicalize system header paths with lrealpath() for
   diagnostics,
   dependency output, and similar.  If gcc is held in a symlink farm the
   canonicalized paths may be meaningless to users, and will also
   conflict with
   build frameworks that (for example) disallow absolute paths to header
   files.
  
   This change adds -f[no-]canonical-system-headers to the gcc command
   line, and
   a configure option --[en/dis]able-canonical-system-headers to set
   default
   behaviour, allowing the user to select whether or not to implement
   r186991.
   Default is enabled.  See also PR c++/52974.
  
   Tested for regressions with bootstrap builds of C and C++, both with
   and
   without configure --disable-canonical-system-headers.

--
Google UK Limited | Registered Office: Belgrave House, 76 Buckingham
Palace Road, London SW1W 9TQ | Registered in England Number: 3977902


[C++] Omit overflow check for new char[n]

2012-10-08 Thread Florian Weimer
If the size of the inner array elements is 1 and we do not need a 
cookie, we do not need to insert an overflow check.  This applies to the 
relatively frequent new char[n] case.


Built and regression-tested on x86_64-redhat-linux-gnu.  Okay for trunk?

--
Florian Weimer / Red Hat Product Security Team

gcc/:

2012-10-08  Florian Weimer  fwei...@redhat.com

	* init.c (build_new_1): Do not check for arithmetic overflow if
	inner array size is 1.

gcc/testsuite/:

2012-10-08  Florian Weimer  fwei...@redhat.com

	* g++.dg/init/new40.C: New.

Index: gcc/cp/ChangeLog
===
--- gcc/cp/ChangeLog	(revision 192206)
+++ gcc/cp/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2012-10-08  Florian Weimer  fwei...@redhat.com
+
+	* init.c (build_new_1): Do not check for arithmetic overflow if
+	inner array size is 1.
+
 2012-10-08  Dodji Seketeli  do...@redhat.com
 
 	PR c++/53528 C++11 attribute support
Index: gcc/cp/init.c
===
--- gcc/cp/init.c	(revision 192206)
+++ gcc/cp/init.c	(working copy)
@@ -2184,6 +2184,8 @@
   bool outer_nelts_from_type = false;
   double_int inner_nelts_count = double_int_one;
   tree alloc_call, alloc_expr;
+  /* Size of the inner array elements. */
+  double_int inner_size;
   /* The address returned by the call to operator new.  This node is
  a VAR_DECL and is therefore reusable.  */
   tree alloc_node;
@@ -2345,8 +2347,6 @@
   double_int max_size
 	= double_int_one.llshift (TYPE_PRECISION (sizetype) - 1,
   HOST_BITS_PER_DOUBLE_INT);
-  /* Size of the inner array elements. */
-  double_int inner_size;
   /* Maximum number of outer elements which can be allocated. */
   double_int max_outer_nelts;
   tree max_outer_nelts_tree;
@@ -2450,7 +2450,13 @@
 	  if (array_p  TYPE_VEC_NEW_USES_COOKIE (elt_type))
 	size = size_binop (PLUS_EXPR, size, cookie_size);
 	  else
-	cookie_size = NULL_TREE;
+	{
+	  cookie_size = NULL_TREE;
+	  /* No size arithmetic necessary, so the size check is
+		 not needed. */
+	  if (outer_nelts_check != NULL  inner_size == double_int_one)
+		outer_nelts_check = NULL_TREE;
+	}
 	  /* Perform the overflow check.  */
 	  if (outer_nelts_check != NULL_TREE)
 size = fold_build3 (COND_EXPR, sizetype, outer_nelts_check,
@@ -2486,7 +2492,13 @@
 	  /* Use a global operator new.  */
 	  /* See if a cookie might be required.  */
 	  if (!(array_p  TYPE_VEC_NEW_USES_COOKIE (elt_type)))
-	cookie_size = NULL_TREE;
+	{
+	  cookie_size = NULL_TREE;
+	  /* No size arithmetic necessary, so the size check is
+		 not needed. */
+	  if (outer_nelts_check != NULL  inner_size == double_int_one)
+		outer_nelts_check = NULL_TREE;
+	}
 
 	  alloc_call = build_operator_new_call (fnname, placement,
 		size, cookie_size,
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(revision 192206)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2012-10-08  Florian Weimer  fwei...@redhat.com
+
+	* g++.dg/init/new40.C: New.
+
 2012-10-08  Oleg Endo  olege...@gcc.gnu.org
 
 	PR target/54685
Index: gcc/testsuite/g++.dg/init/new40.C
===
--- gcc/testsuite/g++.dg/init/new40.C	(revision 0)
+++ gcc/testsuite/g++.dg/init/new40.C	(working copy)
@@ -0,0 +1,112 @@
+// Testcase for overflow handling in operator new[].
+// Optimization of unnecessary overflow checks.
+// { dg-do run }
+
+#include assert.h
+#include stdlib.h
+#include stdexcept
+
+static size_t magic_allocation_size
+  = 1 + (size_t (1)  (sizeof (size_t) * 8 - 1));
+
+struct exc : std::bad_alloc {
+};
+
+static size_t expected_size;
+
+struct pod_with_new {
+  char ch;
+  void *operator new[] (size_t sz)
+  {
+if (sz != expected_size)
+  abort ();
+throw exc ();
+  }
+};
+
+struct with_new {
+  char ch;
+  with_new () { }
+  ~with_new () { }
+  void *operator new[] (size_t sz)
+  {
+if (sz != size_t (-1))
+  abort ();
+throw exc ();
+  }
+};
+
+struct non_pod {
+  char ch;
+  non_pod () { }
+  ~non_pod () { }
+};
+
+void *
+operator new (size_t sz) _GLIBCXX_THROW (std::bad_alloc)
+{
+  if (sz != expected_size)
+abort ();
+  throw exc ();
+}
+
+int
+main ()
+{
+  if (sizeof (pod_with_new) == 1)
+expected_size = magic_allocation_size;
+  else
+expected_size = -1;
+
+  try {
+new pod_with_new[magic_allocation_size];
+abort ();
+  } catch (exc ) {
+  }
+
+  if (sizeof (with_new) == 1)
+expected_size = magic_allocation_size;
+  else
+expected_size = -1;
+
+  try {
+new with_new[magic_allocation_size];
+abort ();
+  } catch (exc ) {
+  }
+
+  expected_size = magic_allocation_size;
+  try {
+new char[magic_allocation_size];
+abort ();
+  } catch (exc ) {
+  }
+
+  expected_size = -1;
+
+  try {
+new 

Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Marc Glisse

On Fri, 5 Oct 2012, Jason Merrill wrote:


+   error_at (loc, conversion of scalar to vector 
+  involves truncation);


These errors should print the types involved.  They also need to be 
suppressed when !(complain  tf_error).


Hello,

here is a new version of the patch. Differences with the previous one 
should only be comments, testsuite, printing types and inhibiting error 
messages.


Passes bootstrap+testsuite. scal-to-vec1.c was failing but then Joseph 
showed me the \[^\\n\]* trick and I retested with:

make check-gcc 'RUNTESTFLAGS=dg.exp=scal-to-vec1.c'

2012-09-22  Marc Glisse  marc.gli...@inria.fr

PR c++/54427

c/
* c-typeck.c: Include c-common.h.
(enum stv_conv): Moved to c-common.h.
(scalar_to_vector): Moved to c-common.c.
(build_binary_op): Adapt to scalar_to_vector's new prototype.
* Make-lang.in: c-typeck.c depends on c-common.h.

c-family/
* c-common.c (scalar_to_vector): Moved from c-typeck.c. Support
more operations. Make error messages optional.
* c-common.h (enum stv_conv): Moved from c-typeck.c.
(scalar_to_vector): Declare.

cp/
* typeck.c (cp_build_binary_op): Handle mixed scalar-vector
operations.
[LSHIFT_EXPR, RSHIFT_EXPR]: Likewise.

gcc/
* fold-const.c (fold_binary_loc): Use build_zero_cst instead of
build_int_cst for a potential vector.

testsuite/
* c-c++-common/vector-scalar.c: New testcase.
* g++.dg/ext/vector18.C: New testcase.
* g++.dg/ext/vector5.C: This is not an error anymore.
* gcc.dg/init-vec-1.c: Move ...
* c-c++-common/init-vec-1.c: ... here. Adapt error message.
* gcc.c-torture/execute/vector-shift1.c: Move ...
* c-c++-common/torture/vector-shift1.c: ... here.
* gcc.dg/scal-to-vec1.c: Move ...
* c-c++-common/scal-to-vec1.c: ... here. Avoid narrowing for
C++11. Adapt error messages.
* gcc.dg/convert-vec-1.c: Move ...
* c-c++-common/convert-vec-1.c: ... here.
* gcc.dg/scal-to-vec2.c: Move ...
* c-c++-common/scal-to-vec2.c: ... here.



--
Marc GlisseIndex: testsuite/g++.dg/ext/vector18.C
===
--- testsuite/g++.dg/ext/vector18.C (revision 0)
+++ testsuite/g++.dg/ext/vector18.C (revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options -std=c++11 } */
+
+typedef signed char __attribute__((vector_size(128) )) vec;
+
+template class A, class B
+auto f (A *a, B b) - decltype (*a + b);
+
+void f (...) {}
+
+void g (vec *v, long long l)
+{
+  f (v, l);
+}

Property changes on: testsuite/g++.dg/ext/vector18.C
___
Added: svn:eol-style
   + native
Added: svn:keywords
   + Author Date Id Revision URL

Index: testsuite/g++.dg/ext/vector5.C
===
--- testsuite/g++.dg/ext/vector5.C  (revision 192153)
+++ testsuite/g++.dg/ext/vector5.C  (working copy)
@@ -1,8 +1,8 @@
 // PR c++/30022
 // { dg-do compile }
 
 void foo()
 {
   int __attribute__((vector_size(8))) v;
-  v = 1/v;  // { dg-error invalid operands of types }
+  v = 1/v;
 }
Index: testsuite/c-c++-common/init-vec-1.c
===
--- testsuite/c-c++-common/init-vec-1.c (revision 191610)
+++ testsuite/c-c++-common/init-vec-1.c (working copy)
@@ -1,4 +1,4 @@
 /* Don't ICE or emit spurious errors when init a vector with a scalar.  */
 /* { dg-do compile } */
 typedef float v2sf __attribute__ ((vector_size (8)));
-v2sf a = 0.0;  /* { dg-error incompatible types } */
+v2sf a = 0.0;  /* { dg-error incompatible types|cannot convert } */
Index: testsuite/c-c++-common/torture/vector-shift1.c
===
--- testsuite/c-c++-common/torture/vector-shift1.c  (revision 191610)
+++ testsuite/c-c++-common/torture/vector-shift1.c  (working copy)
@@ -1,10 +1,11 @@
+/* { dg-do run } */
 #define vector __attribute__((vector_size(8*sizeof(short
 
 int main (int argc, char *argv[]) {
   vector short v0 = {argc,2,3,4,5,6,7};
   vector short v1 = {2,2,2,2,2,2,2};
   vector short r1,r2,r3,r4;
   int i = 8;
 
   r1 = v0  1;
   r2 = v0  1;
Index: testsuite/c-c++-common/scal-to-vec1.c
===
--- testsuite/c-c++-common/scal-to-vec1.c   (revision 191610)
+++ testsuite/c-c++-common/scal-to-vec1.c   (working copy)
@@ -6,38 +6,38 @@
 __attribute__((vector_size((elcount)*sizeof(type type
 
 #define vidx(type, vec, idx) (*((type *) (vec) + idx))
 
 
 extern float sfl;
 extern int   sint;
 extern long long sll;
 
 int main (int argc, char *argv[]) {
-vector(8, short) v0 = {argc, 1,2,3,4,5,6,7};
+vector(8, short) v0 = {(short)argc, 1,2,3,4,5,6,7};
 vector(8, short) v1;
 
  

[PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Marek Polacek
As the testcase shows, we ICEd when generating the debug info for C++
and not splitting types into multiple registers.
The issue is in vt_add_function_parameter that we assumed that the 
DECL_RTL expression was a pseudo register.  But in that case it is
better to just give up than to ICE.
Regtested/bootstrapped on x86_64, ok for trunk?

2012-10-08  Marek Polacek  pola...@redhat.com

PR debug/54831
* var-tracking.c (vt_add_function_parameter): Use condition in place
of gcc_assert.

* testsuite/g++.dg/debug/pr54831.C: New test.

--- gcc/testsuite/g++.dg/debug/pr54831.C.mp 2012-10-08 12:14:55.790807737 
+0200
+++ gcc/testsuite/g++.dg/debug/pr54831.C2012-10-08 12:51:53.856042257 
+0200
@@ -0,0 +1,20 @@
+// PR debug/54831
+// { dg-do compile }
+// { dg-options -O -fno-split-wide-types -g }
+
+struct S
+{
+  int m1();
+  int m2();
+};
+
+typedef void (S::*mptr) ();
+
+mptr gmp;
+void bar (mptr f);
+
+void foo (mptr f)
+{
+  f = gmp;
+  bar (f);
+}
--- gcc/var-tracking.c.mp   2012-10-08 10:56:32.354556352 +0200
+++ gcc/var-tracking.c  2012-10-08 12:50:09.627307344 +0200
@@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
 
   if (parm != decl)
 {
-  /* Assume that DECL_RTL was a pseudo that got spilled to
-memory.  The spill slot sharing code will force the
+  /* If that DECL_RTL wasn't a pseudo that got spilled to
+memory, bail out.  The spill slot sharing code will force the
 memory to reference spill_slot_decl (%sfp), so we don't
 match above.  That's ok, the pseudo must have referenced
 the entire parameter, so just reset OFFSET.  */
-  gcc_assert (decl == get_spill_slot_decl (false));
+  if (decl != get_spill_slot_decl (false))
+return;
   offset = 0;
 }
 
Marek


Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak
On Mon, Oct 8, 2012 at 4:40 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Fri, 28 Sep 2012, Uros Bizjak wrote:

 2) {v[0]-v[1], v[0]-v[1]} is not recognized as a hsubpd because
 vec_duplicate doesn't match vec_concat. Do we really need to duplicate
 (no
 pun intended) the pattern?


 You can add this transformation to simplify-rtx.c. Probably vec_concat
 with two equal operands can be canonicalized as vec_duplicate.


 Actually, it is replacing vec_duplicate with vec_concat that would help.
 Well, I'll see about that later.

 Here is what I came up with, trying to follow your other advice (thanks a
 lot!).

 Passes bootstrap+testsuite.

 2012-10-08  Marc Glisse  marc.gli...@inria.fr

 gcc/
 PR target/54400
 * config/i386/i386.md (type attribute): Add sseadd1.
 (unit attribute): Add support for sseadd1.
 * config/i386/sse.md (sse3_hplusminus_insnv2df3): split into...
 (sse3_haddv2df3): ... expander.
 (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
 (sse3_hsubv2df3): ... define_insn.
 (*sse3_haddv2df3_low): New define_insn.
 (*sse3_hsubv2df3_low): New define_insn.

 gcc/testsuite/
 PR target/54400

 * gcc.target/i386/pr54400.c: New testcase.

 --
 Marc Glisse

 Index: gcc/testsuite/gcc.target/i386/pr54400.c
 ===
 --- gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
 +++ gcc/testsuite/gcc.target/i386/pr54400.c (revision 0)
 @@ -0,0 +1,53 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -msse3 -mfpmath=sse } */
 +
 +#include x86intrin.h
 +
 +double f (__m128d p)
 +{
 +  return p[0] - p[1];
 +}
 +
 +double g1 (__m128d p)
 +{
 +  return p[0] + p[1];
 +}
 +
 +double g2 (__m128d p)
 +{
 +  return p[1] + p[0];
 +}
 +
 +__m128d h (__m128d p, __m128d q)
 +{
 +  __m128d r = { p[0] - p[1], q[0] - q[1] };
 +  return r;
 +}
 +
 +__m128d i1 (__m128d p, __m128d q)
 +{
 +  __m128d r = { p[0] + p[1], q[0] + q[1] };
 +  return r;
 +}
 +
 +__m128d i2 (__m128d p, __m128d q)
 +{
 +  __m128d r = { p[0] + p[1], q[1] + q[0] };
 +  return r;
 +}
 +
 +__m128d i3 (__m128d p, __m128d q)
 +{
 +  __m128d r = { p[1] + p[0], q[0] + q[1] };
 +  return r;
 +}
 +
 +__m128d i4 (__m128d p, __m128d q)
 +{
 +  __m128d r = { p[1] + p[0], q[1] + q[0] };
 +  return r;
 +}
 +
 +/* { dg-final { scan-assembler-times hsubpd 2 } } */
 +/* { dg-final { scan-assembler-times haddpd 6 } } */
 +/* { dg-final { scan-assembler-not unpck } } */

 Property changes on: gcc/testsuite/gcc.target/i386/pr54400.c
 ___
 Added: svn:keywords
+ Author Date Id Revision URL
 Added: svn:eol-style
+ native

 Index: gcc/config/i386/i386.md
 ===
 --- gcc/config/i386/i386.md (revision 192206)
 +++ gcc/config/i386/i386.md (working copy)
 @@ -320,36 +320,36 @@
  ;; provided in other attributes.
  (define_attr type
other,multi,
 alu,alu1,negnot,imov,imovx,lea,
 incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
 icmp,test,ibr,setcc,icmov,
 push,pop,call,callv,leave,
 str,bitmanip,
 fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
 sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
 -
 sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
 -   ssemuladd,sse4arg,lwp,
 +   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
 +   ssediv,sseins,ssemuladd,sse4arg,lwp,
 mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft
(const_string other))

  ;; Main data type used by the insn
  (define_attr mode

 unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF
(const_string unknown))

  ;; The CPU unit operations uses.
  (define_attr unit integer,i387,sse,mmx,unknown
(cond [(eq_attr type
 fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint)
(const_string i387)
  (eq_attr type
 sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
 - sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
 +
 sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
   ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg)
(const_string sse)
  (eq_attr type mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft)
(const_string mmx)
  (eq_attr type other)
(const_string unknown)]
  (const_string integer)))

You missed the most important sseadd1 addition, the one that prevents
checking of operand2 when calculating memory attribute:

 (and (eq_attr type
 !alu1,negnot,ishift1,
   imov,imovx,icmp,test,bitmanip,
   fmov,fcmp,fsgn,
   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt)
  

Re: [PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Jakub Jelinek
On Mon, Oct 08, 2012 at 05:58:15PM +0200, Marek Polacek wrote:
 2012-10-08  Marek Polacek  pola...@redhat.com
 
   PR debug/54831
   * var-tracking.c (vt_add_function_parameter): Use condition in place
   of gcc_assert.

Perhaps s/in place/instead/ ?

 --- gcc/var-tracking.c.mp 2012-10-08 10:56:32.354556352 +0200
 +++ gcc/var-tracking.c2012-10-08 12:50:09.627307344 +0200
 @@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
  
if (parm != decl)
  {
 -  /* Assume that DECL_RTL was a pseudo that got spilled to
 -  memory.  The spill slot sharing code will force the
 +  /* If that DECL_RTL wasn't a pseudo that got spilled to
 +  memory, bail out.  The spill slot sharing code will force the

I'd perhaps add s/The/Otherwise, the/ here.

memory to reference spill_slot_decl (%sfp), so we don't
match above.  That's ok, the pseudo must have referenced
the entire parameter, so just reset OFFSET.  */
 -  gcc_assert (decl == get_spill_slot_decl (false));
 +  if (decl != get_spill_slot_decl (false))
 +return;
offset = 0;
  }

Ok with those changes.

Jakub


Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak
On Mon, Oct 8, 2012 at 6:08 PM, Uros Bizjak ubiz...@gmail.com wrote:

 +(define_insn *sse3_haddv2df3
[(set (match_operand:V2DF 0 register_operand =x,x)
 (vec_concat:V2DF
 - (plusminus:DF
 + (plus:DF
 +   (vec_select:DF
 + (match_operand:V2DF 1 register_operand 0,x)
 + (parallel [(match_operand:SI 3 const_0_to_1_operand)]))
 +   (vec_select:DF
 + (match_dup 1)
 + (parallel [(match_operand:SI 4 const_0_to_1_operand)])))
 + (plus:DF
 +   (vec_select:DF
 + (match_operand:V2DF 2 nonimmediate_operand xm,xm)
 + (parallel [(match_operand:SI 5 const_0_to_1_operand)]))
 +   (vec_select:DF
 + (match_dup 2)
 + (parallel [(match_operand:SI 6 const_0_to_1_operand)])]
 +  TARGET_SSE3  INTVAL (operands[3]) != INTVAL (operands[4])
 +INTVAL (operands[5]) != INTVAL (operands[6])
 +  @
 +   haddpd\t{%2, %0|%0, %2}
 +   vhaddpd\t{%2, %1, %0|%0, %1, %2}
 +  [(set_attr isa noavx,avx)
 +   (set_attr type sseadd)
 +   (set_attr prefix orig,vex)
 +   (set_attr mode V2DF)])

 Please use (match_dup 3) in place of (match_operand 5) and (match_dup
 4) in place of (match_operand 6) predicates. These should be the same.

Oh, I was too quick with this part. The code above is OK, since we can
permute every part independently.

Uros.


[patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread Steve Ellcey
The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and it can
insert the field 'a' into 'c' with a shift and a full store instead of an
insert because the store just overwrites unintialized data.  I changed the
code to force the compiler to preserve the other fields of 'c' and that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  sell...@mips.com

* gcc.target/ext_ins.c: Modify f2 to aviod uninitialized data.


diff --git a/gcc/testsuite/gcc.target/mips/ext_ins.c 
b/gcc/testsuite/gcc.target/mips/ext_ins.c
index f0169bc..36f0f3f 100644
--- a/gcc/testsuite/gcc.target/mips/ext_ins.c
+++ b/gcc/testsuite/gcc.target/mips/ext_ins.c
@@ -18,9 +18,8 @@ NOMIPS16 unsigned int f1 (struct A a)
   return a.j;
 }
 
-NOMIPS16 void f2 (int i)
+NOMIPS16 struct A f2 (struct A a, int i)
 {
-  struct A c;
-  c.j = i;
-  func (c);
+  a.j = i;
+  return a;
 }


Re: patch to fix constant math

2012-10-08 Thread Richard Guenther
On Mon, Oct 8, 2012 at 5:01 PM, Nathan Froyd froy...@mozilla.com wrote:
 - Original Message -
 Btw, as for Richards idea of conditionally placing the length field
 in
 rtx_def looks like overkill to me.  These days we'd merely want to
 optimize for 64bit hosts, thus unconditionally adding a 32 bit
 field to rtx_def looks ok to me (you can wrap that inside a union to
 allow both descriptive names and eventual different use - see what
 I've done to tree_base)

 IMHO, unconditionally adding that field isn't optimize for 64-bit
 hosts, but gratuitously make one of the major compiler data
 structures bigger on 32-bit hosts.  Not everybody can cross-compile
 from a 64-bit host.  And even those people who can don't necessarily
 want to.  Please try to consider what's best for all the people who
 use GCC, not just the cases you happen to be working with every day.

The challenge would of course be to have the overhead only for a minority
of all RTX codes.  After all that 32bits are free to be used for every one.

And I would not consider RTX a 'major compiler data structure' - of course
that makes the whole issue somewhat moot ;)

Richard.

 -Nathan


[patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey
The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
level does not do whichever optimization it is that results in a bbit instead
of a bbit[01]l.  I would like to skip this test for -Os the way it already gets
skipped for -O0.

Tested on mips-mti-elf.  Ok for checkin?

Steve Ellcey
sell...@mips.com



2012-10-08  Steve Ellcey  sell...@mips.com

* gcc.target/octeon-bbit-2.c: Skip for -Os optimization level.


diff --git a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c 
b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
index 9bd8dce..7d88d68 100644
--- a/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
+++ b/gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -march=octeon -mbranch-likely -fno-unroll-loops } */
-/* { dg-skip-if code quality test { *-*-* } { -O0 } {  } } */
+/* { dg-skip-if code quality test { *-*-* } { -O0 -Os } {  } } */
 /* { dg-final { scan-assembler \tbbit\[01\]\t } } */
 /* { dg-final { scan-assembler-not \tbbit\[01\]l\t } } */
 /* { dg-final { scan-assembler \tbnel\t } } */


[lra] 3rd patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher
Hello,

This patch makes lra_constraint_insn_stack_bitmap an sbitmap. This
reduces compile time by another minute or so on gcc17 for the test
case of PR54146, and I think it's a general improvement also for less
extreme code. For cc1-i files the compile time change tends to be a
little less but that may just be noise.

Bootstrappedtested on x86_64-unknown-linux-gnu. OK for lra-branch?

(This is the combined patch of all changes in my check-out of the
lra-branch. The lra.c and lra-constraints.c bits are new, the rest was
posted previously and is awaiting review also.)

Ciao!
Steven

* lra-int.h (lra_constraint_insn_stack_bitmap,
lra_constraint_insn_stack): Remove.
(lra_pop_insn, lra_insn_stack_length): New prototypes.
* lra.c (lra_constraint_insn_stack_bitmap): Make static sbitmap.
(lra_constraint_insn_stack): Make static.
(lra_push_insn_1): New function.
(lra_push_insn): Rewrite using lra_push_insn_1.
(lra_push_insn_and_update_insn_regno_info): Likewise.
(lra_pop_insn, lra_insn_stack_length): New functions.
* lra_constraints.c (lra_constraints): Use new interface to
insns stack instead of manipulating in-place.
* lra-eliminations.c (add_insn_bitmap_to_set): New function.
(update_reg_eliminate): Make argument an sbitmap.  Return a bool
telling whether the input sbitmap has changed.
(lra_eliminate): Allocate and free the worklist set as an sbitmap.

* lra-lives.c (curr_point): Make non-static in lra_create_live_ranges.
(mark_pseudo_live): Take POINT argument.
(mark_pseudo_dead): Likewise.
(mark_regno_live): Likewise, and return a bool to indicate that
someting changed in the dataflow sets.
(mark_regno_dead): Likewise.
(next_program_point): Renamed from incr_curr_point, and take the
current program point as a by-reference argument.
(process_bb_lives): Take the current program point as by-ref argument.
Try to only do a program point increment if this is necessary.
(remove_some_program_points_and_update_live_ranges): If no compression
can be done, don't update the live ranges.
(lra_create_live_ranges): Make curr_point local, and pass it around.
Visit blocks in topological order of the reverse CFG.

* lra-int.h (lra_assert): Define as duplicate of gcc_checking_assert.


lra-patch3.diff
Description: Binary data


Re: [PATCH] Fix up vt_add_function_parameter (PR debug/54831)

2012-10-08 Thread Marek Polacek
On Mon, Oct 08, 2012 at 06:09:41PM +0200, Jakub Jelinek wrote:
 Ok with those changes.

Thanks, this is what I've checked in:

2012-10-08  Marek Polacek  pola...@redhat.com

PR debug/54831
* var-tracking.c (vt_add_function_parameter): Use condition instead
of gcc_assert.

* testsuite/g++.dg/debug/pr54831.C: New test.

--- gcc/testsuite/g++.dg/debug/pr54831.C.mp 2012-10-08 12:14:55.790807737 
+0200
+++ gcc/testsuite/g++.dg/debug/pr54831.C2012-10-08 19:20:45.771190631 
+0200
@@ -0,0 +1,20 @@
+// PR debug/54831
+// { dg-do compile }
+// { dg-options -O -fno-split-wide-types -g }
+
+struct S
+{
+  int m1();
+  int m2();
+};
+
+typedef void (S::*mptr) ();
+
+mptr gmp;
+void bar (mptr f);
+
+void foo (mptr f)
+{
+  f = gmp;
+  bar (f);
+}
--- gcc/var-tracking.c.mp   2012-10-08 10:56:32.354556352 +0200
+++ gcc/var-tracking.c  2012-10-08 19:19:15.031950120 +0200
@@ -9404,12 +9404,13 @@ vt_add_function_parameter (tree parm)
 
   if (parm != decl)
 {
-  /* Assume that DECL_RTL was a pseudo that got spilled to
-memory.  The spill slot sharing code will force the
-memory to reference spill_slot_decl (%sfp), so we don't
-match above.  That's ok, the pseudo must have referenced
-the entire parameter, so just reset OFFSET.  */
-  gcc_assert (decl == get_spill_slot_decl (false));
+  /* If that DECL_RTL wasn't a pseudo that got spilled to
+memory, bail out.  Otherwise, the spill slot sharing code
+will force the memory to reference spill_slot_decl (%sfp),
+so we don't match above.  That's ok, the pseudo must have
+referenced the entire parameter, so just reset OFFSET.  */
+  if (decl != get_spill_slot_decl (false))
+return;
   offset = 0;
 }
 

Marek


Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Jason Merrill

OK.

Jason


Re: RFA: darwin PATCH to fix build, internal visibility

2012-10-08 Thread Jason Merrill

On 10/08/2012 08:28 AM, Dominique Dhumieres wrote:

These tests are still failing on darwin. I think that
target { ! *-*-solaris2* } { ! *-*-darwin* }
sould be replaced with
target { ! { *-*-solaris2* *-*-darwin* } }


Could someone with a darwin box handy make the appropriate change?

Thanks.

Jason




Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Mike Stump
On Oct 8, 2012, at 9:21 AM, Steve Ellcey sell...@mips.com wrote:
 The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
 level does not do whichever optimization it is that results in a bbit instead
 of a bbit[01]l.  I would like to skip this test for -Os the way it already 
 gets
 skipped for -O0.
 
 Tested on mips-mti-elf.  Ok for checkin?

Ideally I'd like a mips expert to weigh in on this.  The issue is, is the code 
smaller with the other instruction?  If so, is there a reasonable way to obtain 
that type of win more often in the port with -Os?  Now, if you are that mips 
expert, that's fine, but, trivially you don't need my approval to check it in.  
If the code is larger, trivially, the patch is ok.  If the optimization 
generally hurt code size and can't be made to win, the patch is ok.  If always 
the same size, it would seem ok.   I just don't have the mips specific 
background to know which case this is.


Re: [C++] Mixed scalar-vector operations

2012-10-08 Thread Mike Stump
On Oct 8, 2012, at 8:53 AM, Marc Glisse marc.gli...@inria.fr wrote:
 On Fri, 5 Oct 2012, Jason Merrill wrote:
 
 +   error_at (loc, conversion of scalar to vector 
 +  involves truncation);
 
 These errors should print the types involved.  They also need to be 
 suppressed when !(complain  tf_error).
 
 Hello,
 
 here is a new version of the patch.

All I can say is thank you for pressing forward and not being discouraged.  In 
the end, it feels like we'll have better vector support in C++.  :-)


Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread Mike Stump
On Oct 8, 2012, at 9:16 AM, Steve Ellcey sell...@mips.com wrote:
 The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS 
 because
 the compiler is smart enough now to see that 'c' is uninitialized and it can
 insert the field 'a' into 'c' with a shift and a full store instead of an
 insert because the store just overwrites unintialized data.  I changed the
 code to force the compiler to preserve the other fields of 'c' and that makes
 it use the insert instruction in both big and little endian modes.
 
 Tested on mips-mti-elf.
 
 OK to checkin?

Ok.


Re: [patch, mips, testsuite] Fix gcc.target/mips/octeon-bbit-2.c for -Os

2012-10-08 Thread Steve Ellcey
On Mon, 2012-10-08 at 11:09 -0700, Mike Stump wrote:
 On Oct 8, 2012, at 9:21 AM, Steve Ellcey sell...@mips.com wrote:
  The gcc.target/octeon-bbit-2.c is failing with -Os because that optimization
  level does not do whichever optimization it is that results in a bbit 
  instead
  of a bbit[01]l.  I would like to skip this test for -Os the way it already 
  gets
  skipped for -O0.
  
  Tested on mips-mti-elf.  Ok for checkin?
 
 Ideally I'd like a mips expert to weigh in on this.  The issue is, is the 
 code smaller with the other instruction?
 If so, is there a reasonable way to obtain that type of win more often in the 
 port with -Os?  Now, if you are that
 mips expert, that's fine, but, trivially you don't need my approval to check 
 it in.  If the code is larger,
 trivially, the patch is ok.  If the optimization generally hurt code size and 
 can't be made to win, the patch is ok.
 If always the same size, it would seem ok.   I just don't have the mips 
 specific background to know which case this
 is.

Well, I checked -O1, -O2 and -Os.  The -Os code is smaller then -O1 but
larger then -O2.  I didn't dig deep enough to find out exactly which
optimization is causing the change in instruction usage.  Perhaps
Richard Sandiford will have an opinion on this change.

Steve Ellcey
sell...@mips.com





Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread David Daney

On 10/08/2012 11:15 AM, Mike Stump wrote:

On Oct 8, 2012, at 9:16 AM, Steve Ellcey sell...@mips.com wrote:

The gcc.target/mips/ext_ins.c was failing in little endian mode on MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and it can
insert the field 'a' into 'c' with a shift and a full store instead of an
insert because the store just overwrites unintialized data.  I changed the
code to force the compiler to preserve the other fields of 'c' and that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?


Ok.


I don't think this is the proper fix for this.

Use of BBIT{0,1} instructions will always be smaller than the 
alternative.  So disabling the test for -Os doesn't fix the problem the 
test is designed to find.


The real problem is that some optimizer is broken.  Instead of disabling 
the tests, can we fix the problem instead?


The goal of the testsuite should be to detect problems, not yield clean 
results.


If Richard disagrees with me, then I would defer to him.


David Daney



[google/4_7] Patch committed: backport the static prediction for short-circuit patch from trunk

2012-10-08 Thread Dehao Chen
I have backported r192215 from trunk to google-4_7:

2012-10-08  Dehao Chen  de...@google.com

* predict.c (predict_extra_loop_exits): Use
predict_paths_leading_to_edge to replace predict_edge_def.

Bootstrapped and passed crosstool test.

Dehao


Re: [PATCH] PR c++/53540 - using fails to be equivalent to typedef

2012-10-08 Thread Jason Merrill
Let's move the alias template case from primary_template_instantiation_p 
into alias_template_specialization_p and call the latter from the 
former.  And also call it from tsubst.


Jason


Re: [patch, mips, testsuite] Fix test to handle optimizations

2012-10-08 Thread David Daney
Really I meant this in reply to the  'Fix 
gcc.target/mips/octeon-bbit-2.c for -Os' thread.  Sorry for confusing 
the issue here.


I don't really have an objection to this one.

David Daney

On 10/08/2012 11:28 AM, David Daney wrote:

On 10/08/2012 11:15 AM, Mike Stump wrote:

On Oct 8, 2012, at 9:16 AM, Steve Ellcey sell...@mips.com wrote:

The gcc.target/mips/ext_ins.c was failing in little endian mode on
MIPS because
the compiler is smart enough now to see that 'c' is uninitialized and
it can
insert the field 'a' into 'c' with a shift and a full store instead
of an
insert because the store just overwrites unintialized data.  I
changed the
code to force the compiler to preserve the other fields of 'c' and
that makes
it use the insert instruction in both big and little endian modes.

Tested on mips-mti-elf.

OK to checkin?


Ok.


I don't think this is the proper fix for this.

Use of BBIT{0,1} instructions will always be smaller than the
alternative.  So disabling the test for -Os doesn't fix the problem the
test is designed to find.

The real problem is that some optimizer is broken.  Instead of disabling
the tests, can we fix the problem instead?

The goal of the testsuite should be to detect problems, not yield clean
results.

If Richard disagrees with me, then I would defer to him.


David Daney





New Spanish PO file for 'gcc' (version 4.7.2)

2012-10-08 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Spanish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/es.po

(This file, 'gcc-4.7.2.es.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



[C++ PATCH] Fix ICE in cp_tree_equal (PR c++/54858)

2012-10-08 Thread Jakub Jelinek
Hi!

The following testcase ICEs because cp_tree_equal doesn't handle
FIELD_DECLs (in 4.4 it was enough to have c0/d0 and c1/d1 in the testcase,
now 12 lines are needed due to introduction of a hash table).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk/4.7?

2012-10-08  Jakub Jelinek  ja...@redhat.com

PR c++/54858
* tree.c (cp_tree_equal): Handle FIELD_DECL.

* g++.dg/template/crash113.C: New test.

--- gcc/cp/tree.c.jj2012-10-05 21:26:54.0 +0200
+++ gcc/cp/tree.c   2012-10-08 18:19:00.897543649 +0200
@@ -2559,6 +2559,7 @@ cp_tree_equal (tree t1, tree t2)
 
 case VAR_DECL:
 case CONST_DECL:
+case FIELD_DECL:
 case FUNCTION_DECL:
 case TEMPLATE_DECL:
 case IDENTIFIER_NODE:
--- gcc/testsuite/g++.dg/template/pr54858.C.jj  2012-10-08 18:15:55.470586784 
+0200
+++ gcc/testsuite/g++.dg/template/pr54858.C 2012-10-08 18:14:28.0 
+0200
@@ -0,0 +1,21 @@
+// PR c++/54858
+// { dg-do compile }
+
+template int struct A {};
+template typename T, T * struct B {};
+template typename D struct C
+{
+  A0 c0; BA0, C::c0 d0; // { dg-error could not convert template 
argument }
+  A0 c1; BA0, C::c1 d1; // { dg-error could not convert template 
argument }
+  A0 c2; BA0, C::c2 d2; // { dg-error could not convert template 
argument }
+  A0 c3; BA0, C::c3 d3; // { dg-error could not convert template 
argument }
+  A0 c4; BA0, C::c4 d4; // { dg-error could not convert template 
argument }
+  A0 c5; BA0, C::c5 d5; // { dg-error could not convert template 
argument }
+  A0 c6; BA0, C::c6 d6; // { dg-error could not convert template 
argument }
+  A0 c7; BA0, C::c7 d7; // { dg-error could not convert template 
argument }
+  A0 c8; BA0, C::c8 d8; // { dg-error could not convert template 
argument }
+  A0 c9; BA0, C::c9 d9; // { dg-error could not convert template 
argument }
+  A0 ca; BA0, C::ca da; // { dg-error could not convert template 
argument }
+  A0 cb; BA0, C::cb db; // { dg-error could not convert template 
argument }
+};
+Cint e;

Jakub


Re: patch to fix constant math - first small patch

2012-10-08 Thread Kenneth Zadeck

is this ok to commit with this change?

kenny
On 10/05/2012 08:14 PM, Joseph S. Myers wrote:

On Fri, 5 Oct 2012, Kenneth Zadeck wrote:


+# define HOST_HALF_WIDE_INT_PRINT h

This may cause problems on hosts not supporting %hd (MinGW?), and there's
no real need for using h here given the promotion of short to int; you
can just use  (rather than e.g. needing special handling in xm-mingw32.h
like is done for HOST_LONG_LONG_FORMAT).





Re: [C++ PATCH] Fix ICE in cp_tree_equal (PR c++/54858)

2012-10-08 Thread Jason Merrill

OK.

Jason


Re: Convert more non-GTY htab_t to hash_table.

2012-10-08 Thread Mike Stump
On Oct 5, 2012, at 3:19 PM, Diego Novillo dnovi...@google.com wrote:
 On Fri, Oct 5, 2012 at 6:08 PM, Lawrence Crowl cr...@googlers.com wrote:
 
 For many people the time to compile (almost) empty file is very
 important, we are already bad about that right now, initializing
 too much stuff dynamically is going to make it worse.
 
 So far, we are looking at dynamic initializations that would
 take about 10 cycles.  Even on a slow processor, a thousand
 initializations would take a microsecond.  Our time reports don't
 even report anything less than 5 milliseconds.
 
 Is there any reason to believe that this anticipated static
 initialization overhead is not pretty low relative to other overhead?
 I'm thinking here of the fact that to even start, the driver launches
 cc1[plus] which has to parse all the options created by the driver.
 
 I agree.  I don't think this will be a real problem.

I hope you're right.  Experience tells me that the usual high cost a single 
dynamic initialization is 30,000,000 cycles, about 100 of them cost 1 second.  
Costs of the low side, are completely irrelevant.  I think the 10 cycle cost is 
not the high side, but the irrelevant low side number.  If one wanted to 
understand the actual cost one can take a snap of the cycle counter before the 
dynamic inits happen (or near the front of them) and take a snap of it after 
they run, and examine the difference…  A difference of 0, means, though one 
might conceive of them as dynamic inits, they are not.  And the other number is 
what it is.  A global cycle counter that free runs as a time of day counter can 
see the page faults, tlb misses and all the other hair, while per process cpu 
used counter is less useful.


Re: [i386] recognize haddpd

2012-10-08 Thread Marc Glisse

On Mon, 8 Oct 2012, Uros Bizjak wrote:


You missed the most important sseadd1 addition, the one that prevents
checking of operand2 when calculating memory attribute:

 (and (eq_attr type
 !alu1,negnot,ishift1,
   imov,imovx,icmp,test,bitmanip,
   fmov,fcmp,fsgn,
   sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
   sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt)
  (match_operand 2 memory_operand))

Please note ! in the above expression.

[...]

Also note that you have to add handling of sseadd1 attribute in other
(scheduler) *.md files. Simply grep for sseadd and add ,sseadd1
everywhere.


Thank you, it makes more sense now. The attached passed 
bootstrap+testsuite. I didn't know if I should be more precise in the 
ChangeLog, but it would make the ChangeLog as long as the patch with about 
23 entries like:

(define_insn_reservation bdver1_ssemuladd_256): Likewise

Next goal would be to further recognize some DPPD potential uses, but that 
seems harder.



2012-10-09  Marc Glisse  marc.gli...@inria.fr

gcc/
PR target/54400
* config/i386/i386.md (type attribute): Add sseadd1.
(unit attribute): Add support for sseadd1.
(memory attribute): Likewise.
* config/i386/athlon.md: Likewise.
* config/i386/core2.md: Likewise.
* config/i386/atom.md: Likewise.
* config/i386/ppro.md: Likewise.
* config/i386/bdver1.md: Likewise.
* config/i386/sse.md (sse3_hplusminus_insnv2df3): split into...
(sse3_haddv2df3): ... expander.
(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
(sse3_hsubv2df3): ... define_insn.
(*sse3_haddv2df3_low): New define_insn.
(*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
PR target/54400
* gcc.target/i386/pr54400.c: New testcase.


--
Marc GlisseIndex: testsuite/gcc.target/i386/pr54400.c
===
--- testsuite/gcc.target/i386/pr54400.c (revision 0)
+++ testsuite/gcc.target/i386/pr54400.c (revision 0)
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -msse3 -mfpmath=sse } */
+
+#include x86intrin.h
+
+double f (__m128d p)
+{
+  return p[0] - p[1];
+}
+
+double g1 (__m128d p)
+{
+  return p[0] + p[1];
+}
+
+double g2 (__m128d p)
+{
+  return p[1] + p[0];
+}
+
+__m128d h (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] - p[1], q[0] - q[1] };
+  return r;
+}
+
+__m128d i1 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[0] + q[1] };
+  return r;
+}
+
+__m128d i2 (__m128d p, __m128d q)
+{
+  __m128d r = { p[0] + p[1], q[1] + q[0] };
+  return r;
+}
+
+__m128d i3 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[0] + q[1] };
+  return r;
+}
+
+__m128d i4 (__m128d p, __m128d q)
+{
+  __m128d r = { p[1] + p[0], q[1] + q[0] };
+  return r;
+}
+
+/* { dg-final { scan-assembler-times hsubpd 2 } } */
+/* { dg-final { scan-assembler-times haddpd 6 } } */
+/* { dg-final { scan-assembler-not unpck } } */

Property changes on: testsuite/gcc.target/i386/pr54400.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 192214)
+++ config/i386/i386.md (working copy)
@@ -320,36 +320,36 @@
 ;; provided in other attributes.
 (define_attr type
   other,multi,
alu,alu1,negnot,imov,imovx,lea,
incdec,ishift,ishiftx,ishift1,rotate,rotatex,rotate1,imul,imulx,idiv,
icmp,test,ibr,setcc,icmov,
push,pop,call,callv,leave,
str,bitmanip,
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint,
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
-   
sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,ssediv,sseins,
-   ssemuladd,sse4arg,lwp,
+   sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,
+   ssediv,sseins,ssemuladd,sse4arg,lwp,
mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft
   (const_string other))
 
 ;; Main data type used by the insn
 (define_attr mode
   unknown,none,QI,HI,SI,DI,TI,OI,SF,DF,XF,TF,V8SF,V4DF,V4SF,V2DF,V2SF,V1DF
   (const_string unknown))
 
 ;; The CPU unit operations uses.
 (define_attr unit integer,i387,sse,mmx,unknown
   (cond [(eq_attr type 
fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint)
   (const_string i387)
 (eq_attr type 
sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,
- sse,ssemov,sseadd,ssemul,ssecmp,ssecomi,ssecvt,
+ 
sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,
  ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg)
   (const_string sse)
 (eq_attr type mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft)
   

Re: patch to fix constant math - third small patch

2012-10-08 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 diff --git a/gcc/combine.c b/gcc/combine.c
 index 4e0a579..b531305 100644
 --- a/gcc/combine.c
 +++ b/gcc/combine.c
 @@ -2617,16 +2617,19 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int 
 *new_direct_jump_p,
   constant.  */
if (i1 == 0
 (temp = single_set (i2)) != 0
 -   (CONST_INT_P (SET_SRC (temp))
 -   || CONST_DOUBLE_AS_INT_P (SET_SRC (temp)))
 +   CONST_SCALAR_INT_P (SET_SRC (temp))
 GET_CODE (PATTERN (i3)) == SET
 -   (CONST_INT_P (SET_SRC (PATTERN (i3)))
 -   || CONST_DOUBLE_AS_INT_P (SET_SRC (PATTERN (i3
 +   CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
 reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp)))
  {
rtx dest = SET_DEST (PATTERN (i3));
int offset = -1;
int width = 0;
 +  
 +  /* There are not explicit tests to make sure that this is not a
 +  float, but there is code here that would not be correct if it
 +  was.  */
 +  gcc_assert (GET_MODE_CLASS (GET_MODE (SET_SRC (temp))) != MODE_FLOAT);

No need for this assert: CONST_SCALAR_INT_P (SET_SRC (temp)) should cover it.

 @@ -1009,9 +1007,7 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, enum machine_mode 
 memmode)
  static rtx
  wrap_constant (enum machine_mode mode, rtx x)
  {
 -  if (!CONST_INT_P (x) 
 -   GET_CODE (x) != CONST_FIXED
 -   !CONST_DOUBLE_AS_INT_P (x))
 +  if ((!CONST_SCALAR_INT_P (x))  GET_CODE (x) != CONST_FIXED)

Redundant brackets.

Looks good to me otherwise, thanks.

Richard


Re: [PATCH, libstdc++] Add proper OpenBSD support

2012-10-08 Thread Mark Kettenis
Jonathan,

Any further thoughts about this?  I've attached a diff that combines
my origional diff with the change to use the newlib locale model on
OpenBSD since they probably should be committed together.

   On 10 September 2012 07:34, Mark Kettenis wrote:
   Date: Sun, 9 Sep 2012 21:07:39 +0100
   From: Jonathan Wakely jwakely@gmail.com
  
   On 4 September 2012 20:26, Mark Kettenis wrote:
Fixes a few testcases.  Mostly based on the existing
NetBSD/FreeBSD/Darwin code.
   
2012-09-04  Mark Kettenis  kette...@openbsd.org
   
* configure.host (*-*-openbsd*) Set cpu_include_dir.
* config/os/bsd/openbsd/ctype_base.h: New file.
* config/os/bsd/openbsd/ctype_configure_char.cc: New file.
* config/os/bsd/openbsd/ctype_inline.h: New file.
* config/os/bsd/openbsd/os_defines.h: New file.
  
   This patch is OK, thanks.  Do you want me to commit it for you?
  
   Yes please.
  
  It occurs to me now that the patch changes the size of
  ctype_base::mask, from the generic unsigned to char. I assume the
  OpenBSD system compiler uses char? How long has that change been
  present in the OpenBSD source tree?
 
 Yes, the system compile uses char and has been doing so since mid-2005.
 
  I'm not sure whether or not it's better to change the size of that
  type in GCC 4.8, which would break compatibility with previous
  versions of the FSF sources but provide compatibility with the OpenBSD
  system compiler.  My guess would be that most people on OpenBSD are
  using the system compiler not upstream FSF sources.
 
 Indeed.  People either use the system compiler or install one from
 ports/packages.  Given the sorry state of OpenBSD support in the FSF
 source tree (barely buildable) I think binary compatibility with the
 system compiler is more important.

2012-10-08  Mark Kettenis  kette...@openbsd.org

* configure.host (*-*-openbsd*) Set cpu_include_dir.
* config/os/bsd/openbsd/ctype_base.h: New file.
* config/os/bsd/openbsd/ctype_configure_char.cc: New file.
* config/os/bsd/openbsd/ctype_inline.h: New file.
* config/os/bsd/openbsd/os_defines.h: New file.
* acinclude.m4 (GLIBCXX_ENABLE_CLOCALE): Use newlib locale model
for OpenBSD.
* configure: Regenerated.


Index: acinclude.m4
===
--- acinclude.m4(revision 192154)
+++ acinclude.m4(working copy)
@@ -1862,6 +1862,9 @@
   darwin* | freebsd*)
enable_clocale_flag=darwin
;;
+  openbsd*)
+   enable_clocale_flag=newlib
+   ;;
   *)
if test x$with_newlib = xyes; then
  enable_clocale_flag=newlib
Index: config/os/bsd/openbsd/ctype_base.h
===
--- config/os/bsd/openbsd/ctype_base.h  (revision 0)
+++ config/os/bsd/openbsd/ctype_base.h  (working copy)
@@ -0,0 +1,59 @@
+// Locale support -*- C++ -*-
+
+// Copyright (C) 2000, 2009, 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// http://www.gnu.org/licenses/.
+
+//
+// ISO C++ 14882: 22.1  Locales
+//
+  
+// Information as gleaned from /usr/include/ctype.h on OpenBSD.
+  
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /// @brief  Base class for ctype.
+  struct ctype_base
+  {
+// Non-standard typedefs.
+typedef const short*   __to_type;
+
+// NB: Offsets into ctypechar::_M_table force a particular size
+// on the mask type. Because of this, we don't use an enum.
+typedef char   mask;
+
+static const mask upper= _U;
+static const mask lower= _L;
+static const mask alpha= _U | _L;
+static const mask digit= _N;
+static const mask xdigit   = _N | _X;
+static const mask space= _S;
+static const mask print= _P | _U | _L | _N | _B;
+static const mask graph= _P | _U | _L | _N;
+static const mask cntrl= _C;
+static const mask punct  

[google/4_7] Patch committed: backport the location_block bugfix patches from trunk

2012-10-08 Thread Dehao Chen
I have backported the following patches from trunk to google-4_7:

191931, 192049, 192120, 192165

gcc:
2012-10-08  Dehao Chen  de...@google.com

Backport 191931, 192049, 192120, 192165 from trunk.

* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): Use
LOCATION_LOCUS to compare with UNKNOWN_LOCATION.
(slpeel_tree_peel_loop_to_edge): Likewise.
* tree-vectorizer.c (vectorize_loops): Likewise.
* tree-cfg.c (move_block_to_fn): Update lexical block for phi_args.
* tree-ssa-live.c (clear_unused_block_pointer_1): Look at
DECL_DEBUG_EXPR again.
* gimple-low.c (lower_stmt): Set the block for call args.

testsuite:
2012-10-08  Dehao Chen  de...@google.com

Backport r192049 from:trunk:

* gcc.dg/pr54782.c: New test.

Bootstrapped and passed crosstool tests.

Dehao


Re: [PATCH, libstdc++] Add proper OpenBSD support

2012-10-08 Thread Jonathan Wakely
On 8 October 2012 20:45, Mark Kettenis wrote:
 Jonathan,

 Any further thoughts about this?  I've attached a diff that combines
 my origional diff with the change to use the newlib locale model on
 OpenBSD since they probably should be committed together.

Hi,

Sorry for the delay, I realised over the weekend this never went in.

I'm happy to apply the combined diff if you think using newlib is the
right option for OpenBSD.

Jonathan


Re: patch to fix constant math

2012-10-08 Thread Richard Sandiford
Robert Dewar de...@adacore.com writes:
 On 10/8/2012 11:01 AM, Nathan Froyd wrote:
 - Original Message -
 Btw, as for Richards idea of conditionally placing the length field
 in
 rtx_def looks like overkill to me.  These days we'd merely want to
 optimize for 64bit hosts, thus unconditionally adding a 32 bit
 field to rtx_def looks ok to me (you can wrap that inside a union to
 allow both descriptive names and eventual different use - see what
 I've done to tree_base)

 IMHO, unconditionally adding that field isn't optimize for 64-bit
 hosts, but gratuitously make one of the major compiler data
 structures bigger on 32-bit hosts.  Not everybody can cross-compile
 from a 64-bit host.  And even those people who can don't necessarily
 want to.  Please try to consider what's best for all the people who
 use GCC, not just the cases you happen to be working with every day.

 I think that's rasonable in general, but as time goes on, and every
 $300 laptop is 64-bit capable, one should not go TOO far out of the
 way trying to make sure we can compile everything on a 32-bit machine.

It's not 64-bit machine vs. 32-bit machine.  It's an LP64 ABI vs.
an ILP32 ABI.  HJ  co. have put considerable effort into developing
the x32 ABI for x86_64 precisely because ILP32 is still useful for
64-bit machines.  Just as it was for MIPS when SGI invented n32
(which is still useful now).  I believe 64-bit SPARC has a similar
thing, and no doubt other architectures do too.

After all, there shouldn't be much need for more than 2GB of virtual
address space in an AVR cross compiler.  So why pay the cache penalty
of 64-bit pointers and longs (GCC generally tries to avoid using long
directly) when a 32-bit pointer will do?

Many years ago, I moved the HOST_WIDE_INT fields out of rtunion
and into the main rtx_def union because it produced a significant
speed-up on n32 IRIX.  That was before tree-level optimisation,
but I don't think we've really pruned that much RTL optimisation
since then, so I'd be surprised if much has changed.

Richard


Re: [i386] recognize haddpd

2012-10-08 Thread Uros Bizjak
On Mon, Oct 8, 2012 at 9:36 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 8 Oct 2012, Uros Bizjak wrote:

 You missed the most important sseadd1 addition, the one that prevents
 checking of operand2 when calculating memory attribute:

  (and (eq_attr type
  !alu1,negnot,ishift1,
imov,imovx,icmp,test,bitmanip,
fmov,fcmp,fsgn,

 sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1,
sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt)
   (match_operand 2 memory_operand))

 Please note ! in the above expression.

 [...]

 Also note that you have to add handling of sseadd1 attribute in other
 (scheduler) *.md files. Simply grep for sseadd and add ,sseadd1
 everywhere.


 Thank you, it makes more sense now. The attached passed bootstrap+testsuite.
 I didn't know if I should be more precise in the ChangeLog, but it would
 make the ChangeLog as long as the patch with about 23 entries like:
 (define_insn_reservation bdver1_ssemuladd_256): Likewise

 Next goal would be to further recognize some DPPD potential uses, but that
 seems harder.


 2012-10-09  Marc Glisse  marc.gli...@inria.fr


 gcc/
 PR target/54400
 * config/i386/i386.md (type attribute): Add sseadd1.
 (unit attribute): Add support for sseadd1.
 (memory attribute): Likewise.
 * config/i386/athlon.md: Likewise.
 * config/i386/core2.md: Likewise.
 * config/i386/atom.md: Likewise.
 * config/i386/ppro.md: Likewise.
 * config/i386/bdver1.md: Likewise.

 * config/i386/sse.md (sse3_hplusminus_insnv2df3): split into...
 (sse3_haddv2df3): ... expander.
 (*sse3_haddv2df3): ... define_insn. Accept permuted operands.
 (sse3_hsubv2df3): ... define_insn.
 (*sse3_haddv2df3_low): New define_insn.
 (*sse3_hsubv2df3_low): New define_insn.

 gcc/testsuite/
 PR target/54400
 * gcc.target/i386/pr54400.c: New testcase.

OK for mainline SVN with a couple of small changes below ...

 +(define_insn *sse3_haddv2df3
[(set (match_operand:V2DF 0 register_operand =x,x)
 (vec_concat:V2DF
 - (plusminus:DF
 + (plus:DF
 +   (vec_select:DF
 + (match_operand:V2DF 1 register_operand 0,x)
 + (parallel [(match_operand:SI 3 const_0_to_1_operand)]))
 +   (vec_select:DF
 + (match_dup 1)
 + (parallel [(match_operand:SI 4 const_0_to_1_operand)])))
 + (plus:DF
 +   (vec_select:DF
 + (match_operand:V2DF 2 nonimmediate_operand xm,xm)
 + (parallel [(match_operand:SI 5 const_0_to_1_operand)]))
 +   (vec_select:DF
 + (match_dup 2)
 + (parallel [(match_operand:SI 6 const_0_to_1_operand)])]
 +  TARGET_SSE3  INTVAL (operands[3]) != INTVAL (operands[4])
 +INTVAL (operands[5]) != INTVAL (operands[6])

Please put every  expression in its own line:

TARGET_SSE3
   INTVAL (operands[3]) != INTVAL (operands[4])
   INTVAL (operands[5]) != INTVAL (operands[6])

 +(define_insn *sse3_haddv2df3_low
 +  [(set (match_operand:DF 0 register_operand =x,x)
 +   (plus:DF
 + (vec_select:DF
 +   (match_operand:V2DF 1 register_operand 0,x)
 +   (parallel [(match_operand:SI 2 const_0_to_1_operand)]))
 + (vec_select:DF
 +   (match_dup 1)
 +   (parallel [(match_operand:SI 3 const_0_to_1_operand)]]
 +  TARGET_SSE3  INTVAL (operands[2]) != INTVAL (operands[3])

Also here.

Thanks,
Uros.


Small cleanup/memory leak plugs for lto

2012-10-08 Thread Tobias Burnus

Some more issues found by Coverity scanner.

lto-cgraph.c: The code seems to be unused, besides, it's a zero-trip 
loop as parm_num is set to 0 and then checked non nonzeroness.


lto-opts: The check whether first_p is non NULL is always false: All 
calls have a variable ref as argument - and first_p is unconditionally 
dereferenced.


lto_obj_file_open: One could check additionally check lo is NULL, but 
that has then to be directly after the XCNEW as already lto_file_init 
dereferences lo.


Build and regtested on x86-64-gnu-linux

Tobias


patch.diff
Description: application/unknown


Re: [patch][lra] Improve initial program point density in lra-lives.c (RFA)

2012-10-08 Thread Vladimir Makarov

On 10/07/2012 02:52 PM, Steven Bosscher wrote:

On Sat, Oct 6, 2012 at 4:52 AM, Vladimir Makarov wrote:

Without this patch:
Compressing live ranges: from 700458 to 391665 - 55%, pre_count
40730653, post_count 34363983
max per-reg pre_count 12978 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])
max per-reg post_count 10967 (228090, 2 defs, 2 uses) (reg/f:DI 228090
[ SR.25009 ])

With this patch:
Compressing live ranges: from 700458 to 372585 - 53%, pre_count
283937, post_count 271120
max per-reg pre_count 545 (230653, 542 defs, 542 uses) (reg/f:DI
230653 [ SR.13303 ])
max per-reg post_count 544 (230649, 542 defs, 542 uses) (reg/f:DI
230649 [ SR.13305 ])

(the per-reg counts are the lengths of the live range chains for the
mentioned regno).

Yes, that is impressive.  But I think, #points in a live range is a real
parameter of the complexity.

Yes, that's probably true, except for the compression stuff.

Here's the final patch, bootstrappedtested on
x86_64-unknown-linux-gnu. OK for the LRA-branch?



Yes.  Thanks, Steven.

Optimizing live ranges is a real fun.  I guess there is some potential 
to improve function to check live ranges intersection and merging.




Re: [Dwarf Fission] Implement Fission Proposal (issue6305113)

2012-10-08 Thread Jason Merrill

On 07/25/2012 07:54 PM, Sterling Augustine wrote:

On Wed, Jul 25, 2012 at 4:00 PM, Cary Coutant ccout...@google.com wrote:

Perhaps instead of having a val_index field in each attribute you should
have the attribute point to something like an indirect_string_node for
addresses as well.


The potential savings here didn't seem worth the effort of adding a
pass over another table to assign slots in .debug_addr. In practice,
we're seeing very few slots zeroed out here.


And how many duplicate entries?  What strategy does Cary's patch use to 
avoid those?



It also requires a carefully watching when die sizes are measured--if
a leb128 fit inside a byte and then grows to need two bytes, all the
size and die_offset calculations will need to be redone.


I would expect it to be straightforward to assign the indices before 
calculating die sizes.



Deferring the choice of representation of the address until output time
should also avoid the need for the force_direct parameter on various
functions.


I'm not sure about that. Even if we build a hash table for slots in
.debug_addr, we'll still need to know when we call add_AT_addr or
add_AT_lbl_id whether or not we want to use an indirect reference. In
the cases where force_direct is true, we won't want to add the label
to the hash table.


Right. We would have to track it even with a hash table.


I was thinking that the context of the reference would determine whether 
you want a direct or indirect reference, in a way that would be clear 
when we go to write out the reference.  But if that isn't convenient, I 
don't mind determining it when we build the reference.


The added documentation for force_direct tells me what it means, but not 
when you would want to pass true or false.  What is the pattern here?


Jason



Re: [lra] another patch to speed more compilation of PR54146

2012-10-08 Thread Steven Bosscher
On Mon, Oct 8, 2012 at 10:25 PM, Vladimir Makarov vmaka...@redhat.com wrote:

 Actually I have a simpler and better patch:

Ah, lra_insn_recog_data, I couldn't find out how to get the insn itself :-)

The OOM you're seeing on gcc17 is probably because we're both working
on that machine. If we're both trying to compile slow.cc we're using
more memory than there's available on gcc17. I've moved to another
machine now.

Ciao!
Steven


[PATCH, i386]: Merge a couple of attributes in atom.md

2012-10-08 Thread Uros Bizjak
Hello!

2012-10-08  Uros Bizjak  ubiz...@gmail.com

* config/i386/atom.md (atom_sse_4): Merge atom_sse_attr attibutes.
(atom_sse_5): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: config/i386/atom.md
===
--- config/i386/atom.md (revision 19)
+++ config/i386/atom.md (working copy)
@@ -544,16 +544,14 @@
 (define_insn_reservation  atom_sse_4 1
   (and (eq_attr cpu atom)
(and (eq_attr type sse)
-(ior (eq_attr atom_sse_attr fence)
- (eq_attr atom_sse_attr prefetch
+(eq_attr atom_sse_attr fence,prefetch)))
   atom-simple-0)
 
 ;; rcpps, rsqrtss, sqrt, ldmxcsr
 (define_insn_reservation  atom_sse_5 7
   (and (eq_attr cpu atom)
(and (eq_attr type sse)
-(ior (ior (eq_attr atom_sse_attr sqrt)
-  (eq_attr atom_sse_attr mxcsr))
+(ior (eq_attr atom_sse_attr sqrt,mxcsr)
  (and (eq_attr atom_sse_attr rcp)
   (eq_attr mode V4SF)
   atom-complex, atom-all-eu*6)


Build failure with [PATCH] PR 53528 c++/ C++11 Generalized Attribute support

2012-10-08 Thread Hans-Peter Nilsson
 From: Dodji Seketeli do...@redhat.com
 Date: Mon, 8 Oct 2012 14:12:04 +0200

 Jason Merrill ja...@redhat.com writes:
 
  OK.
 
 Thanks.  Committed to trunk at revision r192199.

This caused a build failure, see PR54860.

brgds, H-P


  1   2   >