Re: [Patch,AVR]: Auto-generate all -mmcu= options in documentation.

2012-04-17 Thread Denis Chertykov
2012/4/16 Georg-Johann Lay a...@gjlay.de:
 This patch adds a new file ./gcc/doc/avr-mmcu.texi that lists all valid
 -mmcu= settings and replaces the respective text in invoke.texi by
 @include avr-mmcu.texi

 Up to now, there is no complete list of -mmcu= options, and a list is
 hard to maintain by hand because it contains more than 180 devices.

 If, during the build of avr-gcc, a change of ./gcc/doc/avr-mmcu.texi
 is detected, the build aborts with a message that nags the user to
 copy the new content of avr-mmcu.texi to ./gcc/doc/avr-mmcu.texi.

 The error message's text is:

 *** Verify that you have permission to grant a
 *** GFDL license for all new text in
 *** avr-mmcu.texi, then copy it to $(srcdir)/doc/avr-mmcu.texi

 ./gcc/doc/avr-mmcu.texi is auto-generated, but there is no rule to
 automatically update it during the build process.

 Documents (HTML, PDF, ...) build fine.

 Ok for trunk?

 If it's appropriate for 4.7, I'd change invoke.texi accordingly by
 copy-pasting the auto-generated texi code into that file, i.e.
 into section AVR Options.

 Johann

        * Makefile.in (TEXI_GCC_FILES): Add avr-mmcu.texi.

        * doc/avr-mmcu.texi: New auto-generated file.
        * doc/invoke.texi (AVR Options): Include avr-mmcu.texi in order
        to document all valid -mmcu= arguments.

        * config/avr/avr.h (arch_info_s): New struct definition.
        * config/avr/avr-devices.c (avr_texinfo): New variable.
        * config/avr/gen-avr-mmcu-texi.c: New file.
        * config/avr/t-avr: New rules and dependencies to build avr-mmcu.texi.

AVR port  modifications approved but I think that somebody else must
approve your changes to Makefile.in and doc/*

Denis.


Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
 Hello,
 
 There is code in stmt.c since the initial checkin, that tries to
 balance a switch tree according to some ascii heuristics. I see a
 couple of problems with this code:
 
 1. It doesn't seem to help much. With the attached patch to remove the
 code, I see no compile time changes to e.g. compile GCC itself.
 
 2. It isn't clear what the heuristic is based on (no reference to any
 testing done, or a reference to a book or paper).
 
 3. The heuristic is applied for case values in the range -1,127
 (inclusive) even if the type of the switch expression isn't char or
 int but e.g. an enum. This results in funny application of this
 heuristic in GCC itself to e.g. some cases of enum rtx_code and enum
 tree_code.

Note that it would make a lot of sense to teach this heuristics predict.c
and properly identify chars.
Also it is possble to get an historgrams from profile feedback into
switch expansion. I always wanted to do that once switch expansion code
is cleaned up and moved to gimple level...

Honza
 
 
 The attached patch removes the heuristic.
 
 Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?
 
 Ciao!
 Steven




Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Richard Sandiford
Vladimir Makarov vmaka...@redhat.com writes:
 On 04/10/2012 09:35 AM, Richard Sandiford wrote:
 Hi Vlad,

 Back in Decemember, when we were still very much in stage 3, I sent
 an RFC about an alternative implementation of -fsched-pressure.
 Just wanted to send a reminder now that we're in the proper stage:

 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

 Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
 reasonable results.  (I mentioned bad Power 7 results in that message,
 because of the way the VSX_REGS class is handled.  Ulrich's results
 are without -mvsx though.)

 The condition I orignally set myself was that this patch should only
 go in if it becomes the default on at least one architecture,
 specifically ARM.  Ulrich tells me that Linaro have now made it
 the default for ARM in their GCC 4.7 release, so hopefully Ramana
 would be OK with doing the same in upstream 4.8.

 I realise the whole thing is probably more complicated and ad-hoc
 than you'd like.  Saying it can't go in is a perfectly acceptable
 answer IMO.

 I have a mixed feeling with the patch.  I've tried it on SPEC2000 on 
 x86/x86-64 and ARM.  Model algorithm generates bigger code up to 3.5% 
 (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in 
 comparison with the current algorithm.

Yeah.  That's expected really, since one of the points of the patch is
to allow more spilling than the current algorithm does.

One of the main problems I was seeing with the current algorithm was
that it was too conservative, and prevented spilling even when it would
be benificial.  This included spilling outside loops.  E.g. if you have
14 available GPRs, as on ARM, and have 10 pseudos live across a loop
but not used within it, the current algorithm uses 10 registers as the
starting pressure when scheduling the loop.  So the current algorithm
tries hard to avoid spilling those 10 registers, even if that restricts
the amount of reordering within the loop.  The new algorithm was supposed
to allow such registers to be spilled, but those extra spills would
increase code size.

 It is slower too.  Although the difference is quite insignificant on
 Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm.

Hmm, that's not good.

 The algorithm also generates slower code on x86 (1.5% on SPECINT and
 5% on SPECFP200) and practically the same average code on x86-64 and
 ARM (I've tried only SPECINT on ARM).

Yeah, that's underwhelming too.  It looks like there's a danger that
this would become an A8- and A9-specific pass, so we would really need
better ARM results than that.

 On the other hand, I don't think that 1st insn scheduling will be ever 
 used for x86.  And although the SPECFP2000 rate is the same on x86-64 I 
 saw that some SPECFP2000 tests benefit from your algorithm on x86-64 
 (one amazing difference is 70% improvement on swim on x86-64 although it 
 might be because of different reasons like alignment or cache 
 behaviour).  So I think the algorithm might work better on processors 
 with more registers.

Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?

 As for the patch itself, I think you should document the option in 
 doc/invoke.texi.  It is missed.

I forgot to say, but that was deliberate.  I see this as a developer
option rather than a user option.  The important thing was to allow
backends to default to the new algorithm if they want to.  Providing
command-line control gives developers an easier way of seeing which
default makes sense, but I don't think the option should be advertised
to users.

 Another minor mistake I found is one line garbage (I guess from
 -fira-algorithm) in description of -fsched-pressure-algorithm in
 common.opt.

Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.

Richard


Re: [patch] Remove strange case cost code

2012-04-17 Thread Richard Guenther
On Tue, Apr 17, 2012 at 8:49 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hello,

 There is code in stmt.c since the initial checkin, that tries to
 balance a switch tree according to some ascii heuristics. I see a
 couple of problems with this code:

 1. It doesn't seem to help much. With the attached patch to remove the
 code, I see no compile time changes to e.g. compile GCC itself.

 2. It isn't clear what the heuristic is based on (no reference to any
 testing done, or a reference to a book or paper).

 3. The heuristic is applied for case values in the range -1,127
 (inclusive) even if the type of the switch expression isn't char or
 int but e.g. an enum. This results in funny application of this
 heuristic in GCC itself to e.g. some cases of enum rtx_code and enum
 tree_code.

 Note that it would make a lot of sense to teach this heuristics predict.c
 and properly identify chars.

Indeed this would be the proper place to implement this logic.

 Also it is possble to get an historgrams from profile feedback into
 switch expansion. I always wanted to do that once switch expansion code
 is cleaned up and moved to gimple level...

Indeed.  At least the parts that expand switch stmts to (balanced) trees
should be moved to the GIMPLE level, retaining only the table-jump-like
expansions as switch stmts.



 The attached patch removes the heuristic.

 Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

Ok.

Thanks,
Richard.

 Ciao!
 Steven




Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
  Note that it would make a lot of sense to teach this heuristics predict.c
  and properly identify chars.
 
 Indeed this would be the proper place to implement this logic.

TO a degree - switch expansion needs more info than it can obtain from edge
profile.  Having
switch
  case 1,3,5,7,8,9: aaa
  case 2,4,6,8,10,12: bbb
to produce well ballanced decision tree, it is not enough to know how
often the value is even and how often it is odd...

Thus there is a need for value histograms.
 
  Also it is possble to get an historgrams from profile feedback into
  switch expansion. I always wanted to do that once switch expansion code
  is cleaned up and moved to gimple level...
 
 Indeed.  At least the parts that expand switch stmts to (balanced) trees
 should be moved to the GIMPLE level, retaining only the table-jump-like
 expansions as switch stmts.

Yep.
Honza
 
 
 
  The attached patch removes the heuristic.
 
  Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?
 
 Ok.
 
 Thanks,
 Richard.
 
  Ciao!
  Steven
 
 


Re: [patch] Remove strange case cost code

2012-04-17 Thread Paolo Bonzini
Il 17/04/2012 10:45, Richard Guenther ha scritto:
  Also it is possble to get an historgrams from profile feedback into
  switch expansion. I always wanted to do that once switch expansion code
  is cleaned up and moved to gimple level...
 
 Indeed.  At least the parts that expand switch stmts to (balanced) trees
 should be moved to the GIMPLE level, retaining only the table-jump-like
 expansions as switch stmts.

This would also make it much easier to drop the range checking from
switch statements (VRP would just fold those away).  Also, targets could
choose between casesi and tablejump.  ARM can benefit from that.

Paolo


Re: [PATCH] Prevent 'control reaches end of non-void function' warning for DO_WHILE

2012-04-17 Thread Tom de Vries
On 16/04/12 16:23, Jason Merrill wrote:
 On 04/14/2012 05:43 PM, Tom de Vries wrote:
  +  tree expr = NULL;
  +  append_to_statement_list (*block,expr);
  +  *block = expr;

  Rather than doing this dance here, I think it would be better to enhance
  append_to_statement_list to handle the case of the list argument being a
  non-list.

 Added return value to append_to_statement_list, so now it's:

 *block = append_to_statement_list (*block, NULL);
 
 That's different from what I was suggesting; if the list argument is a 
 pointer to a non-list, we can build up a list for at at that time, so we 
 don't need the
 
 +  *block = append_to_statement_list (*block, NULL);
 
 line at all; when we see
 
 +  append_to_statement_list (build1 (LABEL_EXPR, void_type_node, label),
 +block);
 
 if *block isn't a STATEMENT_LIST we just make the necessary adjustments.
 

I see. Patch adapted, bootstrapped and reg-tested on x86_64.

ok for trunk?

Thanks,
- Tom

 Jason

2012-04-17  Tom de Vries  t...@codesourcery.com

* tree-iterator.c (append_to_statement_list_1): Handle case that *list_p
is not a STMT_LIST.

* cp-gimplify.c (begin_bc_block): Add location parameter and use as
location argument to create_artificial_label.
(finish_bc_block): Change return type to void.  Remove body_seq
parameter, and add block parameter.  Append label to STMT_LIST and
return in block.
(gimplify_cp_loop, gimplify_for_stmt, gimplify_while_stmt)
(gimplify_do_stmt, gimplify_switch_stmt): Remove function.
(genericize_cp_loop, genericize_for_stmt, genericize_while_stmt)
(genericize_do_stmt, genericize_switch_stmt, genericize_continue_stmt)
(genericize_break_stmt, genericize_omp_for_stmt): New function.
(cp_gimplify_omp_for): Remove bc_continue processing.
(cp_gimplify_expr): Genericize VEC_INIT_EXPR.
(cp_gimplify_expr): Mark FOR_STMT, WHILE_STMT, DO_STMT, SWITCH_STMT,
CONTINUE_STMT, and BREAK_STMT as unreachable.
(cp_genericize_r): Genericize FOR_STMT, WHILE_STMT, DO_STMT,
SWITCH_STMT, CONTINUE_STMT, BREAK_STMT and OMP_FOR.
(cp_genericize_tree): New function, factored out of ...
(cp_genericize): ... this function.

* g++.dg/pr51264-4.C: New test.
Index: gcc/tree-iterator.c
===
--- gcc/tree-iterator.c (revision 185028)
+++ gcc/tree-iterator.c (working copy)
@@ -74,6 +74,13 @@ append_to_statement_list_1 (tree t, tree
 	}
   *list_p = list = alloc_stmt_list ();
 }
+  else if (TREE_CODE (list) != STATEMENT_LIST)
+{
+  tree first = list;
+  *list_p = list = alloc_stmt_list ();
+  i = tsi_last (list);
+  tsi_link_after (i, first, TSI_CONTINUE_LINKING);
+}
 
   i = tsi_last (list);
   tsi_link_after (i, t, TSI_CONTINUE_LINKING);
Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c (revision 185028)
+++ gcc/cp/cp-gimplify.c (working copy)
@@ -34,6 +34,11 @@ along with GCC; see the file COPYING3.
 #include flags.h
 #include splay-tree.h
 
+/* Forward declarations.  */
+
+static tree cp_genericize_r (tree *, int *, void *);
+static void cp_genericize_tree (tree*);
+
 /* Local declarations.  */
 
 enum bc_t { bc_break = 0, bc_continue = 1 };
@@ -45,37 +50,36 @@ static tree bc_label[2];
 /* Begin a scope which can be exited by a break or continue statement.  BC
indicates which.
 
-   Just creates a label and pushes it into the current context.  */
+   Just creates a label with location LOCATION and pushes it into the current
+   context.  */
 
 static tree
-begin_bc_block (enum bc_t bc)
+begin_bc_block (enum bc_t bc, location_t location)
 {
-  tree label = create_artificial_label (input_location);
+  tree label = create_artificial_label (location);
   DECL_CHAIN (label) = bc_label[bc];
   bc_label[bc] = label;
   return label;
 }
 
 /* Finish a scope which can be exited by a break or continue statement.
-   LABEL was returned from the most recent call to begin_bc_block.  BODY is
+   LABEL was returned from the most recent call to begin_bc_block.  BLOCK is
an expression for the contents of the scope.
 
If we saw a break (or continue) in the scope, append a LABEL_EXPR to
-   body.  Otherwise, just forget the label.  */
+   BLOCK.  Otherwise, just forget the label.  */
 
-static gimple_seq
-finish_bc_block (enum bc_t bc, tree label, gimple_seq body)
+static void
+finish_bc_block (tree *block, enum bc_t bc, tree label)
 {
   gcc_assert (label == bc_label[bc]);
 
   if (TREE_USED (label))
-{
-  gimple_seq_add_stmt (body, gimple_build_label (label));
-}
+append_to_statement_list (build1 (LABEL_EXPR, void_type_node, label),
+			  block);
 
   bc_label[bc] = DECL_CHAIN (label);
   DECL_CHAIN (label) = NULL_TREE;
-  return body;
 }
 
 /* Get the LABEL_EXPR to 

Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable

2012-04-17 Thread Martin Jambor
Hi,

On Thu, Apr 12, 2012 at 07:21:12PM +0200, Eric Botcazou wrote:
  Well, the commit did not add a testcase and when I looked up the patch
  in the mailing list archive
  (http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it
  was fixing problems not reproducible on trunk so it's basically
  impossible for me to evaluate whether it is still necessary by some
  simple testing.  Having said that, I guess I can give it a round of
  regular testing on all the platforms I have currently set up.
 
 The problem was that, for the same address, you had the alias set of the type 
 on one MEM and the alias set of the reference on the other MEM.  If the alias 
 set of the reference doesn't conflict with that of the type (this can happen 
 in Ada because of DECL_NONADDRESSABLE_P), the RAW dependency may be missed.
 
 If we don't put the alias set of the reference on one of the MEM, then I 
 don't 
 think that we need to put it on the other MEM.  That's what's done for the 
 first, non-bitfield temporary now.
 
  2012-04-10  Martin Jambor  mjam...@suse.cz
 
  * expr.c (expand_expr_real_1): Pass type, not the expression, to
  set_mem_attributes for a memory temporary. Do not call the function
  for the memory temporary created for a bitfield.
 
 Fine with me, but the now dangling code in the bitfield case is a bit 
 annoying.

In order to alleviate that feeling, I'd like to propose the following
patch, which I have successfully bootstrapped and tested on
x86_64-linux (including Ada and obj-c++), i686-linux (likewise),
sparc64-linux (with Ada but without Java), ia64-linux (default
languages, i.e. without Ada) and ppc64-linux (likewise).  Testsuite
run (no bootstrap) on hppa-linux (C and C++ only) is still running and
I expect to have results tomorrow.

Thus, OK for trunk?

Thanks,

Martin


2012-04-16  Martin Jambor  mjam...@suse.cz

* expr.c (expand_expr_real_1): Remove setting parent's alias set for
temporaries created for a bitfield (reverting revision 122014).

Index: src/gcc/expr.c
===
--- src.orig/gcc/expr.c
+++ src/gcc/expr.c
@@ -9866,19 +9866,11 @@ expand_expr_real_1 (tree exp, rtx target
   necessarily be constant.  */
if (mode == BLKmode)
  {
-   HOST_WIDE_INT size = GET_MODE_BITSIZE (ext_mode);
rtx new_rtx;
 
-   /* If the reference doesn't use the alias set of its type,
-  we cannot create the temporary using that type.  */
-   if (component_uses_parent_alias_set (exp))
- {
-   new_rtx = assign_stack_local (ext_mode, size, 0);
-   set_mem_alias_set (new_rtx, get_alias_set (exp));
- }
-   else
- new_rtx = assign_stack_temp_for_type (ext_mode, size, 0, 
type);
-
+   new_rtx = assign_stack_temp_for_type (ext_mode,
+  GET_MODE_BITSIZE (ext_mode),
+  0, type);
emit_move_insn (new_rtx, op0);
op0 = copy_rtx (new_rtx);
PUT_MODE (op0, BLKmode);



Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Ilya Enkovich

 It has nothing but defines for Android. It did not move any existing
 code to this file.


 Adding linux-common.h to i386 backend needs approval from
 i386 backend maintainer.   If a patch also adds Android support,
 i386 backend maintainer may not feel comfortable to review it.
 However, if you simplify add linux-common.h with XXX_SPEC,
 i386 backend maintainer can review it easily.

 --
 H.J.

All XXX_SPEC in linux-common.h are Android related. I also believe
that my Android specific changes need i386 backend maintainer approval
anyway because wrong Android support implementation may break other
targets.

Could please someone from maintainers tell me if it is needed to split
this patch?

Thanks,
Ilya


Re: [patch] Remove strange case cost code

2012-04-17 Thread Steven Bosscher
On Tue, Apr 17, 2012 at 10:45 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 Also it is possble to get an historgrams from profile feedback into
 switch expansion. I always wanted to do that once switch expansion code
 is cleaned up and moved to gimple level...

 Indeed.  At least the parts that expand switch stmts to (balanced) trees
 should be moved to the GIMPLE level, retaining only the table-jump-like
 expansions as switch stmts.

My goal for GCC 4.8 is to do just that: Move switch expansion to
GIMPLE and add value profiling for switch expressions. I may put back
that heuristic as a branch predictor, but I doubt it makes much of a
difference. Besides, it is actually hard to figure out whether a
switch expression is for characters in an ascii string because char is
promoted to int.

Ciao!
Steven


[C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

Hi,

in order to avoid this ICE on invalid, I think it makes sense to 
explicitly check for try-block in massage_constexpr_body, since 7.1.5/4 
explicitly rules out for constexpr constructors such function-body.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52599
* semantics.c (massage_constexpr_body): Check for function-try-block
as constructor function-body.

/testsuite
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52599
* g++.dg/cpp0x/constexpr-ctor10.C: New.
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error constructor }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -6001,8 +6001,18 @@ static tree
 massage_constexpr_body (tree fun, tree body)
 {
   if (DECL_CONSTRUCTOR_P (fun))
-body = build_constexpr_constructor_member_initializers
-  (DECL_CONTEXT (fun), body);
+{
+  if (TREE_CODE (body) == BIND_EXPR
+  TREE_CODE (BIND_EXPR_BODY (body)) == TRY_BLOCK)
+   {
+ error (body of %constexpr% constructor cannot be a 
+function-try-block);
+ return error_mark_node;
+   }
+
+  body = build_constexpr_constructor_member_initializers
+   (DECL_CONTEXT (fun), body);
+}
   else
 {
   if (TREE_CODE (body) == EH_SPEC_BLOCK)


[PATCH] Fix loop bound computation based on undefined behavior

2012-04-17 Thread Richard Guenther

Loop bound computation uses undefined behavior when accessing arrays
outside of their domain.  Unfortunately while it tries to honor
issues with trailing arrays in allocated storage its implementation
is broken (for one, it does consider a TYPE_DECL after the array
as a sign that the array is not at struct end).  The following patch
moves array_at_struct_end_p to expr.c near its natural user
(it's also used by graphite) and re-implements it.  It also adjusts
array_ref_up_bound to not return any bound in the case of an
access to a trailing array - at present what it returns is a
conservative answer in the wrong sense in two of its four callers
(it returns a lower bound for the upper bound).  Given the fact
that array_ref_low_bound returns an exact answer not returning
any lower / upper bound for the upper bound but only what we would
consider exact sounds like the most reasonable solution.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This does not yet fully recover bootstrap if you make use of
undefined behavior loop bound detection in VRP, but two miscompiles
of GCC vanish.

Richard.

2012-04-17  Richard Guenther  rguent...@suse.de

* tree-flow.h (array_at_struct_end_p): Move declaration ...
* tree.h (array_at_struct_end_p): ... here.
* tree-ssa-loop-niter.c (idx_infer_loop_bounds): Infer nothing
from array references at struct ends.
(array_at_struct_end_p): Move ...
* expr.c (array_at_struct_end_p): ... here.  Rewrite.
(array_ref_up_bound): Return NULL_TREE for array references
at struct ends.

Index: gcc/tree.h
===
*** gcc/tree.h  (revision 186496)
--- gcc/tree.h  (working copy)
*** extern bool contains_packed_reference (c
*** 5068,5073 
--- 5068,5075 
  
  extern tree array_ref_element_size (tree);
  
+ bool array_at_struct_end_p (tree);
+ 
  /* Return a tree representing the lower bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/expr.c
===
*** gcc/expr.c  (revision 186496)
--- gcc/expr.c  (working copy)
*** array_ref_low_bound (tree exp)
*** 6778,6783 
--- 6778,6820 
return build_int_cst (TREE_TYPE (TREE_OPERAND (exp, 1)), 0);
  }
  
+ /* Returns true if REF is an array reference to an array at the end of
+a structure.  If this is the case, the array may be allocated larger
+than its upper bound implies.  */
+ 
+ bool
+ array_at_struct_end_p (tree ref)
+ {
+   if (TREE_CODE (ref) != ARRAY_REF
+TREE_CODE (ref) != ARRAY_RANGE_REF)
+ return false;
+ 
+   while (handled_component_p (ref))
+ {
+   /* If the reference chain contains a component reference to a
+  non-union type and there follows another field the reference
+is not at the end of a structure.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+  TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+   {
+ tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
+ while (nextf  TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+ if (nextf)
+   return false;
+   }
+ 
+   ref = TREE_OPERAND (ref, 0);
+ }
+ 
+   /* If the reference is based on a declared entity, the size of the array
+  is constrained by its given domain.  */
+   if (DECL_P (ref))
+ return false;
+ 
+   return true;
+ }
+ 
  /* Return a tree representing the upper bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
*** array_ref_up_bound (tree exp)
*** 6789,6795 
/* If there is a domain type and it has an upper bound, use it, substituting
   for a PLACEHOLDER_EXPR as needed.  */
if (domain_type  TYPE_MAX_VALUE (domain_type))
! return SUBSTITUTE_PLACEHOLDER_IN_EXPR (TYPE_MAX_VALUE (domain_type), exp);
  
/* Otherwise fail.  */
return NULL_TREE;
--- 6826,6843 
/* If there is a domain type and it has an upper bound, use it, substituting
   for a PLACEHOLDER_EXPR as needed.  */
if (domain_type  TYPE_MAX_VALUE (domain_type))
! {
!   tree max = TYPE_MAX_VALUE (domain_type);
! 
!   /* For references to arrays at the end of dynamically allocated
!  structures TYPE_MAX_VALUE is not an upper bound for the array
!size.  */
!   if (TREE_CODE (max) == INTEGER_CST
!  array_at_struct_end_p (exp))
!   return NULL_TREE;
! 
!   return SUBSTITUTE_PLACEHOLDER_IN_EXPR (max, exp);
! }
  
/* Otherwise fail.  */
return NULL_TREE;
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186496)
--- gcc/tree-flow.h (working copy)
*** tree find_loop_niter (struct loop *, edg
*** 686,692 
  tree loop_niter_by_eval (struct loop *, edge);
  tree 

Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
 On Tue, Apr 17, 2012 at 10:45 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
  Also it is possble to get an historgrams from profile feedback into
  switch expansion. I always wanted to do that once switch expansion code
  is cleaned up and moved to gimple level...
 
  Indeed.  At least the parts that expand switch stmts to (balanced) trees
  should be moved to the GIMPLE level, retaining only the table-jump-like
  expansions as switch stmts.
 
 My goal for GCC 4.8 is to do just that: Move switch expansion to
 GIMPLE and add value profiling for switch expressions. I may put back
 that heuristic as a branch predictor, but I doubt it makes much of a
 difference. Besides, it is actually hard to figure out whether a

I have my doubts, too, this is why it is not implemented.  Lets see if the
removal changes anything.
Currently the branch prediction code is extraordinarily stupid on switches.
We still could do better - i.e. be able to combine other types of predictions
on them (i.e. switch edge leading to abort() is unlikely) and eventually we
could teach VRP about value range histograms.

 switch expression is for characters in an ascii string because char is
 promoted to int.

Good plan! Can you two symchonize your efforts, please?

Honza
 
 Ciao!
 Steven


Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls

2012-04-17 Thread Richard Guenther
On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 27/01/12 21:37, Tom de Vries wrote:
 On 24/01/12 11:40, Richard Guenther wrote:
 On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com 
 wrote:
 Richard,
 Jakub,

 the following patch fixes PR51879.

 Consider the following test-case:
 ...
 int bar (int);
 void baz (int);

 void
 foo (int y)
 {
  int a;
  if (y == 6)
    a = bar (7);
  else
    a = bar (7);
  baz (a);
 }
 ...

 after compiling at -02, the representation looks like this before 
 tail-merging:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1714_7 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_3 = barD.1703 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1714_8 = VDEF .MEMD.1714_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  aD.1709_4 = barD.1703 (7);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4)
  # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4)
  # .MEMD.1714_9 = VDEF .MEMD.1714_5
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1705 (aD.1709_1);
  # VUSE .MEMD.1714_9
  return;
 ...

 the patch allows aD.1709_4 to be value numbered to aD.1709_3, and 
 .MEMD.1714_8
 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.

 The patch makes sure non-const/pure call results (gimple_vdef and
 gimple_call_lhs) are properly value numbered.

 Bootstrapped and reg-tested on x86_64.

 ok for stage1?

 The following cannot really work:

 @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
    result = vn_reference_lookup_1 (vr1, NULL);
    if (result)
      {
 -      changed = set_ssa_val_to (lhs, result);
 +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
 +      if (vdef)
 +       changed |= set_ssa_val_to (vdef, result_vdef);
 +      changed |= set_ssa_val_to (lhs, result);

 because 'result' may be not an SSA name.  It might also not have
 a proper definition statement (if VN_INFO (result)-needs_insertion
 is true).  So you at least need to guard things properly.


 Right. And that also doesn't work if the function is without lhs, such as in 
 the
 new test-case pr51879-6.c.

 I fixed this by storing both lhs and vdef, such that I don't have to derive
 the vdef from the lhs.

 (On a side-note - I _did_ want to remove value-numbering virtual operands
 at some point ...)


 Doing so willl hurt performance of tail-merging in its current form.
 OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
 value numbering as used in tail-merging has its limitations too.
 Do you have any ideas how to address that one?

 @@ -3359,8 +3366,10 @@ visit_use (tree use)
           /* ???  We should handle stores from calls.  */
           else if (TREE_CODE (lhs) == SSA_NAME)
             {
 +             tree vuse = gimple_vuse (stmt);
               if (!gimple_call_internal_p (stmt)
 -                  gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST))
 +                  (gimple_call_flags (stmt)  (ECF_PURE | ECF_CONST)
 +                     || (vuse  SSA_VAL (vuse) != VN_TOP)))
                 changed = visit_reference_op_call (lhs, stmt);
               else
                 changed = defs_to_varying (stmt);

 ... exactly because of the issue that a stmt has multiple defs.  Btw,
 vuse should have been visited here or be part of our SCC, so, why do
 you need this check?


 Removed now, that was a workaround for a bug in an earlier version of the 
 patch,
 that I didn't need anymore.

 Bootstrapped and reg-tested on x86_64.

 OK for stage1?


 Richard,

 quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
 ...
 I think these fixes hint at that we should
 use structural equality as fallback if value-numbering doesn't equate
 two stmt effects.  Thus, treat two stmts with exactly the same operands
 and flags as equal and using value-numbering to canonicalize operands
 (when they are SSA names) for that comparison, or use VN entirely
 if there are no side-effects on the stmt.

 Changing value-numbering of virtual operands, even if it looks correct in the
 simple cases you change, doesn't look like a general solution for the missed
 tail merging opportunities.
 ...

 The test-case pr51879-6.c shows a case where improving value numbering will 
 help
 tail-merging, but structural equality comparison not:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # .MEMD.1717_7 = VDEF .MEMD.1717_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);
  # .MEMD.1717_8 = VDEF .MEMD.1717_7
  # USE = nonlocal
  # CLB = nonlocal
  aD.1712_3 = barD.1704 (7);
  goto bb 5;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # .MEMD.1717_9 = VDEF .MEMD.1717_6(D)
  # USE = nonlocal
  # CLB = nonlocal
  blaD.1708 (5);

[Fixinclude]: Fix typo and default to twoprocess on VMS

2012-04-17 Thread Tristan Gingold
Hi,

one-process methodology cannot be used on VMS because fork/pipe/dup2 aren't 
fully supported.  To avoid a build failure, it is therefore better to build 
using two-process methodology.

But, when twoprocess is selected, gcc emits a warning due to a missing 
specifier in printf.  The patch fixes that.

Manually tested on x86_64-darwin by configuring with --enable-twoprocess.

I am pretty sure that fixinclude cannot be used as-is on VMS due to the 
filename convention and missing shell, but at least we can build a cross and a 
native canadian on UNIX.

Ok for trunk ?

Tristan.

fixincludes/
2012-04-17  Tristan Gingold  ging...@adacore.com

* fixincl.c (fix_with_system): Add missing specifier.
* configure.ac: Default to twoprocess on vms.
* configure: Regenerate.

diff --git a/fixincludes/configure.ac b/fixincludes/configure.ac
index e7de791..f1fb2ff 100644
--- a/fixincludes/configure.ac
+++ b/fixincludes/configure.ac
@@ -53,7 +53,8 @@ fi],
i?86-*-msdosdjgpp* | \
i?86-*-mingw32* | \
x86_64-*-mingw32* | \
-   *-*-beos* )
+   *-*-beos* | \
+*-*-*vms*)
TARGET=twoprocess
;;
 
diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 9f399ab..1133534 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -829,7 +829,7 @@ fix_with_system (tFixDesc* p_fixd,
   /*
*  Now add the fix number and file names that may be needed
*/
-  sprintf (pz_scan,  %ld '%s' '%s',  (long) (p_fixd - fixDescList),
+  sprintf (pz_scan,  %ld '%s' '%s' '%s', (long) (p_fixd - fixDescList),
   pz_fix_file, pz_file_source, pz_temp_file);
 }
   else /* NOT an internal fix: */



Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Uros Bizjak
On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich enkovich@gmail.com wrote:

 It has nothing but defines for Android. It did not move any existing
 code to this file.


 Adding linux-common.h to i386 backend needs approval from
 i386 backend maintainer.   If a patch also adds Android support,
 i386 backend maintainer may not feel comfortable to review it.
 However, if you simplify add linux-common.h with XXX_SPEC,
 i386 backend maintainer can review it easily.

 --
 H.J.

 All XXX_SPEC in linux-common.h are Android related. I also believe
 that my Android specific changes need i386 backend maintainer approval
 anyway because wrong Android support implementation may break other
 targets.

+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC \
+  GNU_USER_TARGET_MATHFILE_SPEC   \
+  GNU_USER_TARGET_ENDFILE_SPEC

Where is GNU_USER_TARGET_ENDFILE_SPEC defined?

Uros.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Jason Merrill
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.


Jason


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Uros Bizjak
On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:

 It has nothing but defines for Android. It did not move any existing
 code to this file.


 Adding linux-common.h to i386 backend needs approval from
 i386 backend maintainer.   If a patch also adds Android support,
 i386 backend maintainer may not feel comfortable to review it.
 However, if you simplify add linux-common.h with XXX_SPEC,
 i386 backend maintainer can review it easily.

 --
 H.J.

 All XXX_SPEC in linux-common.h are Android related. I also believe
 that my Android specific changes need i386 backend maintainer approval
 anyway because wrong Android support implementation may break other
 targets.

 +#undef  ENDFILE_SPEC
 +#define ENDFILE_SPEC \
 +  GNU_USER_TARGET_MATHFILE_SPEC   \
 +  GNU_USER_TARGET_ENDFILE_SPEC

 Where is GNU_USER_TARGET_ENDFILE_SPEC defined?

Oh, I found it.

The patch looks OK to me in the sense, that there is no difference for
x86 targets.

So, OK for x86.

Thanks,
Uros.


Re: [PATCH] Fix loop bound computation based on undefined behavior

2012-04-17 Thread Richard Guenther
On Tue, 17 Apr 2012, Richard Guenther wrote:

 
 Loop bound computation uses undefined behavior when accessing arrays
 outside of their domain.  Unfortunately while it tries to honor
 issues with trailing arrays in allocated storage its implementation
 is broken (for one, it does consider a TYPE_DECL after the array
 as a sign that the array is not at struct end).  The following patch
 moves array_at_struct_end_p to expr.c near its natural user
 (it's also used by graphite) and re-implements it.  It also adjusts
 array_ref_up_bound to not return any bound in the case of an
 access to a trailing array - at present what it returns is a
 conservative answer in the wrong sense in two of its four callers
 (it returns a lower bound for the upper bound).  Given the fact
 that array_ref_low_bound returns an exact answer not returning
 any lower / upper bound for the upper bound but only what we would
 consider exact sounds like the most reasonable solution.
 
 Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
 
 This does not yet fully recover bootstrap if you make use of
 undefined behavior loop bound detection in VRP, but two miscompiles
 of GCC vanish.

I ended up with the following simplified variant because we have
testcases that test that niter still records an estimate for
a trailing a[5].

Bootstrapped and tested on x86_64-unknown-linux-gnu, committed.

Richard.

2012-04-17  Richard Guenther  rguent...@suse.de

* tree-flow.h (array_at_struct_end_p): Move declaration ...
* tree.h (array_at_struct_end_p): ... here.
* tree-ssa-loop-niter.c (array_at_struct_end_p): Move ...
* expr.c (array_at_struct_end_p): ... here.  Rewrite.

Index: gcc/tree.h
===
*** gcc/tree.h  (revision 186496)
--- gcc/tree.h  (working copy)
*** extern bool contains_packed_reference (c
*** 5068,5073 
--- 5068,5075 
  
  extern tree array_ref_element_size (tree);
  
+ bool array_at_struct_end_p (tree);
+ 
  /* Return a tree representing the lower bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/expr.c
===
*** gcc/expr.c  (revision 186496)
--- gcc/expr.c  (working copy)
*** array_ref_low_bound (tree exp)
*** 6778,6783 
--- 6778,6820 
return build_int_cst (TREE_TYPE (TREE_OPERAND (exp, 1)), 0);
  }
  
+ /* Returns true if REF is an array reference to an array at the end of
+a structure.  If this is the case, the array may be allocated larger
+than its upper bound implies.  */
+ 
+ bool
+ array_at_struct_end_p (tree ref)
+ {
+   if (TREE_CODE (ref) != ARRAY_REF
+TREE_CODE (ref) != ARRAY_RANGE_REF)
+ return false;
+ 
+   while (handled_component_p (ref))
+ {
+   /* If the reference chain contains a component reference to a
+  non-union type and there follows another field the reference
+is not at the end of a structure.  */
+   if (TREE_CODE (ref) == COMPONENT_REF
+  TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)
+   {
+ tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));
+ while (nextf  TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+ if (nextf)
+   return false;
+   }
+ 
+   ref = TREE_OPERAND (ref, 0);
+ }
+ 
+   /* If the reference is based on a declared entity, the size of the array
+  is constrained by its given domain.  */
+   if (DECL_P (ref))
+ return false;
+ 
+   return true;
+ }
+ 
  /* Return a tree representing the upper bound of the array mentioned in
 EXP, an ARRAY_REF or an ARRAY_RANGE_REF.  */
  
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186496)
--- gcc/tree-flow.h (working copy)
*** tree find_loop_niter (struct loop *, edg
*** 686,692 
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
  void estimate_numbers_of_iterations (bool);
- bool array_at_struct_end_p (tree);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
--- 686,691 
Index: gcc/tree-ssa-loop-niter.c
===
*** gcc/tree-ssa-loop-niter.c   (revision 186496)
--- gcc/tree-ssa-loop-niter.c   (working copy)
*** record_nonwrapping_iv (struct loop *loop
*** 2640,2686 
record_estimate (loop, niter_bound, max, stmt, false, realistic, upper);
  }
  
- /* Returns true if REF is a reference to an array at the end of a dynamically
-allocated structure.  If this is the case, the array may be allocated 
larger
-than its upper bound implies.  */
- 
- bool
- array_at_struct_end_p (tree ref)
- {
-   tree base = 

Re: [PATCH] Prevent 'control reaches end of non-void function' warning for DO_WHILE

2012-04-17 Thread Jason Merrill

OK, thanks.

Jason


Re: Change initialization order in sel-sched

2012-04-17 Thread Alexander Monakov

On Wed, 11 Apr 2012, Richard Guenther wrote:

 On Wed, Apr 11, 2012 at 4:16 PM, Bernd Schmidt ber...@codesourcery.com 
 wrote:
  The order of calls to sched_rgn_init and sched_init differs between
  sched-rgn and sel-sched. This caused a scheduler patch I was working on
  to segfault once sel-sched was enabled. The following patch swaps the
  two function calls.
 
  Bootstrapped  tested on i686-linux. Ok?
 
 Ok.
 
 Thanks,
 Richard.

Actually, this causes miscompilations with selective scheduler when
-fsel-sched-pipelining is enabled (as it is with -O3 on ia64).  The reason is,
with that flag we build custom regions that consist of a loop body and its
preheader in sel_find_rgns, which is called from sched_rgn_init.  We require
that sched_init is called afterwards, so that DF data is computed for any new
blocks that might have been created (i.e. preheaders); it's possible that DF
is not the only thing that forces this order.

Bernd, could you elaborate on the segfault you had seen?  Perhaps we could
offer some advice on fixing it then.

In the meanwhile, could you revert your patch?  I'm sorry to point out this
problem after the patch had been committed, but it's not immediately obvious :)
Andrey or I will add an explanatory comment in sel-sched afterwards.

Alexander


[PATCH] Fix PR53011

2012-04-17 Thread Richard Guenther

This fixes PR53011 - EH cleanup needs to cater for loops now
(or avoid some transforms).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

Index: gcc/tree-eh.c
===
*** gcc/tree-eh.c   (revision 186523)
--- gcc/tree-eh.c   (working copy)
*** cleanup_empty_eh_merge_phis (basic_block
*** 3916,3921 
--- 3916,3936 
for (ei = ei_start (old_bb-preds); (e = ei_safe_edge (ei)); )
  if (e-flags  EDGE_EH)
{
+   /* ???  CFG manipluation routines do not try to update loop
+  form on edge redirection.  Do so manually here for now.  */
+   /* If we redirect a loop entry or latch edge that will either create
+  a multiple entry loop or rotate the loop.  If the loops merge
+  we may have created a loop with multiple latches.
+  All of this isn't easily fixed thus cancel the affected loop
+  and mark the other loop as possibly having multiple latches.  */
+   if (current_loops
+e-dest == e-dest-loop_father-header)
+ {
+   e-dest-loop_father-header = NULL;
+   e-dest-loop_father-latch = NULL;
+   new_bb-loop_father-latch = NULL;
+   loops_state_set (LOOPS_NEED_FIXUP|LOOPS_MAY_HAVE_MULTIPLE_LATCHES);
+ }
redirect_eh_edge_1 (e, new_bb, change_region);
redirect_edge_succ (e, new_bb);
flush_pending_stmts (e);
Index: gcc/testsuite/g++.dg/torture/pr53011.C
===
*** gcc/testsuite/g++.dg/torture/pr53011.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr53011.C  (revision 0)
***
*** 0 
--- 1,66 
+ // { dg-do compile }
+ 
+ extern C class WvFastString;
+ typedef WvFastString WvStringParm;
+ struct WvFastString {
+   ~WvFastString();
+   operator char* () {}
+ };
+ class WvString : WvFastString {};
+ class WvAddr {};
+ class WvIPAddr : WvAddr {};
+ struct WvIPNet : WvIPAddr {
+   bool is_default() {}
+ };
+ templateclass T, bool struct WvTraits_Helper {
+   static void release(T *obj) {
+ delete obj;
+   }
+ };
+ templateclass From struct WvTraits {
+   static void release(From *obj) {
+ WvTraits_HelperFrom, 0::release(obj);
+   }
+ };
+ struct WvLink {
+   void   *data;
+   WvLink *next;
+   boolautofree;
+   WvLink(bool, int) : autofree() {}
+   bool get_autofree() {}
+ 
+   void unlink() {
+ delete this;
+   }
+ };
+ struct WvListBase {
+   WvLink head, *tail;
+   WvListBase() : head(0, 0) {}
+ };
+ templateclass T struct WvList : WvListBase {
+   ~WvList() {
+ zap();
+   }
+ 
+   void zap(bool destroy = 1) {
+ while (head.next) unlink_after(head, destroy);
+   }
+ 
+   void unlink_after(WvLink *after, bool destroy) {
+ WvLink *next = 0;
+ T *obj   = (destroy  next-get_autofree()) ? 
+static_castT*(next-data) : 0;
+ 
+ if (tail) tail = after;
+ next-unlink();
+ WvTraitsT::release(obj);
+   }
+ };
+ typedef WvListWvStringWvStringListBase;
+ class WvStringList : WvStringListBase {};
+ class WvSubProc {
+   WvStringList last_args, env;
+ };
+ void addroute(WvIPNet dest, WvStringParm table) {
+   if (dest.is_default() || (table != default)) WvSubProc checkProc;
+ }


Re: Change initialization order in sel-sched

2012-04-17 Thread Bernd Schmidt
On 04/17/2012 03:33 PM, Alexander Monakov wrote:
 Bernd, could you elaborate on the segfault you had seen?  Perhaps we could
 offer some advice on fixing it then.

It was only seen with another patch which modified the sched-rgn
initialization code.

 In the meanwhile, could you revert your patch?  I'm sorry to point out this
 problem after the patch had been committed, but it's not immediately obvious 
 :)
 Andrey or I will add an explanatory comment in sel-sched afterwards.

Will revert for now. In general I think it would be better to change
sel-sched so that the init functions can always be called in the same
order, so as to avoid unnecessary surprises.


Bernd


Re: [C++ Patch] PR 53003

2012-04-17 Thread Jason Merrill

I have various thoughts:

It's odd that we still treat 'return' as starting a function body long 
after we removed that extension.


Maybe we shouldn't look for a function body if we already have an 
initializer and aren't dealing with a function declarator.


I guess we should set initializer_token_start for {} initializers as well.

But your patch is certainly the smallest change, and OK.

Jason


[PATCH] Use all bells and whistles for number of iteration analysis in VRP

2012-04-17 Thread Richard Guenther

This patch reverts the change originally done when adding 
number-of-iteration analysis uses to VRP, to have a flag
to toggle whether to derive number of iterations from undefined
behavior.  To be able to do so one error in VRP has to be fixed - we
have to check for the number of stmt executions, not for the number
of latch block executions.  Otherwise we miscompile IRA during bootstrap.

Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu.

With this I can finally make the max loop bound preserved and
set it from loop version producers.  Yay.

Richard.

2012-04-17  Richard Guenther  rguent...@suse.de

* cfgloop.h (estimate_numbers_of_iterations_loop): Remove
use_undefined_p parameter.
* tree-flow.h (estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop):
Likewise.
(estimate_numbers_of_iterations): Likewise.
(estimated_loop_iterations): Adjust.
(max_loop_iterations): Likewise.
(scev_probably_wraps_p): Likewise.
* tree-ssa-loop.c (tree_ssa_loop_bounds): Likewise.
* tree-vrp.c (adjust_range_with_scev): Use max_stmt_executions,
not max_loop_iterations.
(execute_vrp): Remove explicit number of iterations estimation.

Index: gcc/cfgloop.h
===
*** gcc/cfgloop.h   (revision 186526)
--- gcc/cfgloop.h   (working copy)
*** gcov_type expected_loop_iterations_unbou
*** 278,284 
  extern unsigned expected_loop_iterations (const struct loop *);
  extern rtx doloop_condition_get (rtx);
  
! void estimate_numbers_of_iterations_loop (struct loop *, bool);
  bool estimated_loop_iterations (struct loop *, double_int *);
  bool max_loop_iterations (struct loop *, double_int *);
  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *);
--- 278,284 
  extern unsigned expected_loop_iterations (const struct loop *);
  extern rtx doloop_condition_get (rtx);
  
! void estimate_numbers_of_iterations_loop (struct loop *);
  bool estimated_loop_iterations (struct loop *, double_int *);
  bool max_loop_iterations (struct loop *, double_int *);
  HOST_WIDE_INT estimated_loop_iterations_int (struct loop *);
Index: gcc/tree-flow.h
===
*** gcc/tree-flow.h (revision 186527)
--- gcc/tree-flow.h (working copy)
*** bool number_of_iterations_exit (struct l
*** 685,691 
  tree find_loop_niter (struct loop *, edge *);
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
! void estimate_numbers_of_iterations (bool);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
--- 685,691 
  tree find_loop_niter (struct loop *, edge *);
  tree loop_niter_by_eval (struct loop *, edge);
  tree find_loop_niter_by_eval (struct loop *, edge *);
! void estimate_numbers_of_iterations (void);
  bool scev_probably_wraps_p (tree, tree, gimple, struct loop *, bool);
  bool convert_affine_scev (struct loop *, tree, tree *, tree *, gimple, bool);
  
Index: gcc/tree-ssa-loop-niter.c
===
*** gcc/tree-ssa-loop-niter.c   (revision 186527)
--- gcc/tree-ssa-loop-niter.c   (working copy)
*** gcov_type_to_double_int (gcov_type val)
*** 2950,2956 
 is true also use estimates derived from undefined behavior.  */
  
  void
! estimate_numbers_of_iterations_loop (struct loop *loop, bool use_undefined_p)
  {
VEC (edge, heap) *exits;
tree niter, type;
--- 2950,2956 
 is true also use estimates derived from undefined behavior.  */
  
  void
! estimate_numbers_of_iterations_loop (struct loop *loop)
  {
VEC (edge, heap) *exits;
tree niter, type;
*** estimate_numbers_of_iterations_loop (str
*** 2984,2991 
  }
VEC_free (edge, heap, exits);
  
!   if (use_undefined_p)
! infer_loop_bounds_from_undefined (loop);
  
/* If we have a measured profile, use it to estimate the number of
   iterations.  */
--- 2984,2990 
  }
VEC_free (edge, heap, exits);
  
!   infer_loop_bounds_from_undefined (loop);
  
/* If we have a measured profile, use it to estimate the number of
   iterations.  */
*** estimate_numbers_of_iterations_loop (str
*** 3013,3019 
  bool
  estimated_loop_iterations (struct loop *loop, double_int *nit)
  {
!   estimate_numbers_of_iterations_loop (loop, true);
if (!loop-any_estimate)
  return false;
  
--- 3012,3018 
  bool
  estimated_loop_iterations (struct loop *loop, double_int *nit)
  {
!   estimate_numbers_of_iterations_loop (loop);
if (!loop-any_estimate)
  return false;
  
*** estimated_loop_iterations (struct loop *
*** 3028,3034 
  bool
  max_loop_iterations (struct 

Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread Richard Guenther
On Sun, Apr 15, 2012 at 5:11 PM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 The comment in alias.c says:

   The contents of an ADDRESS is not normally used, the mode of the
   ADDRESS determines whether the ADDRESS is a function argument or some
   other special value.  Pointer equality, not rtx_equal_p, determines whether
   two ADDRESS expressions refer to the same base address.

   The only use of the contents of an ADDRESS is for determining if the
   current function performs nonlocal memory memory references for the
   purposes of marking the function as a constant function.  */

 The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
 always given ADDRESS the full recursive treatment, rather than saying
 that pointer equality determines ADDRESS equality.  (This is in contrast
 to something like VALUE, where pointer equality is used.)  And AFAICT
 we've always had:

 static int
 base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
                  enum machine_mode y_mode)
 {
  ...
  /* If the base addresses are equal nothing is known about aliasing.  */
  if (rtx_equal_p (x_base, y_base))
    return 1;
  ...
 }

 So I think the contents of an ADDRESS _are_ used to distinguish
 between different bases.

 The second paragraph ceased to be true in 2005 when the pure/const
 analysis moved to its own IPA pass.  Nothing now looks at the contents
 beyond rtx_equal_p.

 Also, base_alias_check effectively treats all arguments as a single base.
 That makes conceptual sense, because this analysis isn't strong enough
 to determine whether arguments are base values at all, never mind whether
 accesses based on different arguments conflict.  But the fact that we have
 a single base isn't obvious from the way the code is written, because we
 create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
 See:

  for (i = 0; i  FIRST_PSEUDO_REGISTER; i++)
    /* Check whether this register can hold an incoming pointer
       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
       numbers, so translate if necessary due to register windows.  */
    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
         HARD_REGNO_MODE_OK (i, Pmode))
      static_reg_base_value[i]
        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));

 and:

      /* Check for an argument passed in memory.  Only record in the
         copying-arguments block; it is too hard to track changes
         otherwise.  */
      if (copying_arguments
           (XEXP (src, 0) == arg_pointer_rtx
              || (GET_CODE (XEXP (src, 0)) == PLUS
                   XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
        return gen_rtx_ADDRESS (VOIDmode, src);

 I think it would be cleaner and less wasteful to use a single rtx for
 the single base (really potential base).

 So if we wanted to, we could now remove the operand from ADDRESS and
 simply rely on pointer equality.  I'm a bit reluctant to do that though.
 It would make debugging harder, and it would mean either adding knowledge
 of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
 or adding special ADDRESS shortcuts to alias.c.  But I think the code
 would be more obvious if we replaced the rtx operand with a unique id,
 which is what we already use for the REG_NOALIAS case:

      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
                                                   GEN_INT (unique_id++));

 And if we do that, we can make the id a direct operand of the ADDRESS,
 rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.

  [*] I'm trying to get rid of CONST_INTs like these that have
      no obvious mode.

 All of which led to the patch below.  I checked that it didn't change
 the code generated at -O2 for a recent set of cc1 .ii files.  Also
 bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

 To cover my back: I'm just trying to rewrite the current code according
 to its current assumptions.  Whether those assumptions are correct or not
 is always open to debate...

This all looks reasonable and matches what I discovered by reverse
engineering the last time I ran into ADDRESSes ...

So, ok, given that nobody else has commented yet.

Thanks,
Richard.

 Richard


 gcc/
        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
        * alias.c (reg_base_value): Expand and update comment.
        (arg_base_value): New variable.
        (unique_id): Move up file.
        (unique_base_value, unique_base_value_p, known_base_value_p): New.
        (find_base_value): Use arg_base_value and known_base_value_p.
        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
        (find_base_term): Use known_base_value_p.
        (base_alias_check): Use unique_base_value_p.
        (init_alias_target): Initialize arg_base_value.  Use unique_base_value.
        (init_alias_analysis): Use 1 as the first id for REG_NOALIAS bases.

 Index: gcc/rtl.def
 

Re: [Fixinclude]: Fix typo and default to twoprocess on VMS

2012-04-17 Thread Bruce Korb
Hi Tristan,

On Tue, Apr 17, 2012 at 5:57 AM, Tristan Gingold ging...@adacore.com wrote:
 Hi,

 one-process methodology cannot be used on VMS[...]
 But, when twoprocess is selected, gcc emits a warning[...]
 Ok for trunk ?

 diff --git a/fixincludes/configure.ac b/fixincludes/configure.ac
 index e7de791..f1fb2ff 100644
 --- a/fixincludes/configure.ac
 +++ b/fixincludes/configure.ac
 @@ -53,7 +53,8 @@ fi],
        i?86-*-msdosdjgpp* | \
        i?86-*-mingw32* | \
        x86_64-*-mingw32* | \
 -       *-*-beos* )
 +       *-*-beos* | \
 +        *-*-*vms*)
                TARGET=twoprocess
                ;;

This, definitely.

 diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
 index 9f399ab..1133534 100644
 --- a/fixincludes/fixincl.c
 +++ b/fixincludes/fixincl.c
 @@ -829,7 +829,7 @@ fix_with_system (tFixDesc* p_fixd,
       /*
        *  Now add the fix number and file names that may be needed
        */
 -      sprintf (pz_scan,  %ld '%s' '%s',  (long) (p_fixd - fixDescList),
 +      sprintf (pz_scan,  %ld '%s' '%s' '%s', (long) (p_fixd - fixDescList),
               pz_fix_file, pz_file_source, pz_temp_file);
     }
   else /* NOT an internal fix: */

This, almost certainly.  I'll take a peek at the source and convince myself of
this decade old mistake tomorrow  send my grateful thanks and approval then.
(No access to source today.)

Thank you!  Cheers - Bruce


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Ilya Enkovich
 On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Apr 17, 2012 at 12:16 PM, Ilya Enkovich enkovich@gmail.com 
 wrote:

 It has nothing but defines for Android. It did not move any existing
 code to this file.


 Adding linux-common.h to i386 backend needs approval from
 i386 backend maintainer.   If a patch also adds Android support,
 i386 backend maintainer may not feel comfortable to review it.
 However, if you simplify add linux-common.h with XXX_SPEC,
 i386 backend maintainer can review it easily.

 --
 H.J.

 All XXX_SPEC in linux-common.h are Android related. I also believe
 that my Android specific changes need i386 backend maintainer approval
 anyway because wrong Android support implementation may break other
 targets.

 +#undef  ENDFILE_SPEC
 +#define ENDFILE_SPEC \
 +  GNU_USER_TARGET_MATHFILE_SPEC   \
 +  GNU_USER_TARGET_ENDFILE_SPEC

 Where is GNU_USER_TARGET_ENDFILE_SPEC defined?

 Oh, I found it.

 The patch looks OK to me in the sense, that there is no difference for
 x86 targets.

 So, OK for x86.

 Thanks,
 Uros.

Thanks, Uros!

Maxim, could you please look at patch?

Thanks,
Ilya


CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread Roman Zhuykov
Hello,

I found the following problem while investigating SMS on x86-64.
When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
latencies in data dependency graph become zeros. The global pointer
insn_default_latency points to insn_default_latency_none, which
returns zero for any instruction.
This happens because ix86_schedule cpu attribute is set to CPU_NONE for nocona.

CPU_NONE was introduced by this patch:
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html

I think we don't want any scheduler to work with zero latencies on
such processors (with such -march).
The following patch fixes the problem for my case with -march=nocona.
Is it correct to fix the problem like this?
What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
prescott) ?

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af4af7c..38d64e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
   PTA_MMX | PTA_SSE | PTA_SSE2},
  {prescott, PROCESSOR_NOCONA, CPU_NONE,
   PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
-  {nocona, PROCESSOR_NOCONA, CPU_NONE,
+  {nocona, PROCESSOR_NOCONA, CPU_GENERIC64,
   PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
   | PTA_CX16 | PTA_NO_SAHF},
  {core2, PROCESSOR_CORE2_64, CPU_CORE2,
--
Roman Zhuykov


Re: [i386, patch, RFC] HLE support in GCC

2012-04-17 Thread Sergey Ostanevich

 Any other inputs?


I would suggest to use snprintf b/gcc/config/i386/i386-c.c to avoid
possible buffer overrun.

I also have a question regarding AS compatibility. In case one built
GCC using AS with support of HLE then using this GCC on a machine with
old AS will cause fail because of usupported prefix. Can we support it
compile time rather configure time?

regards,
Sergos


Re: [i386, patch, RFC] HLE support in GCC

2012-04-17 Thread Andi Kleen
 I also have a question regarding AS compatibility. In case one built
 GCC using AS with support of HLE then using this GCC on a machine with
 old AS will cause fail because of usupported prefix. Can we support it

I don't think that's a supported use case for gcc.
It also doesn't work with .cfi* intrinsics and some other things.

 compile time rather configure time?

The only way to do that would be to always generate .byte,
but the people who read the assembler output would hate you 
for it.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread H.J. Lu
On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov zhr...@ispras.ru wrote:
 Hello,

 I found the following problem while investigating SMS on x86-64.
 When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
 latencies in data dependency graph become zeros. The global pointer
 insn_default_latency points to insn_default_latency_none, which
 returns zero for any instruction.
 This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
 nocona.

 CPU_NONE was introduced by this patch:
 http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html

 I think we don't want any scheduler to work with zero latencies on
 such processors (with such -march).
 The following patch fixes the problem for my case with -march=nocona.
 Is it correct to fix the problem like this?
 What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
 prescott) ?

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index af4af7c..38d64e9 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
       PTA_MMX | PTA_SSE | PTA_SSE2},
      {prescott, PROCESSOR_NOCONA, CPU_NONE,
       PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
 -      {nocona, PROCESSOR_NOCONA, CPU_NONE,
 +      {nocona, PROCESSOR_NOCONA, CPU_GENERIC64,
       PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
       | PTA_CX16 | PTA_NO_SAHF},
      {core2, PROCESSOR_CORE2_64, CPU_CORE2,
 --

Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?

-- 
H.J.


[PATCH, rs6000] Remove DImode from SLOW_UNALIGNED_ACCESS

2012-04-17 Thread Pat Haugen

DImode references do not suffer a major performance hit for  4-byte aligned 
access like the float types.

Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?

-Pat


2012-04-17  Pat Haugen pthau...@us.ibm.com

* config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Remove DImode.

Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h	(revision 186389)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -771,8 +771,7 @@ extern unsigned rs6000_pointer_size;
 #define SLOW_UNALIGNED_ACCESS(MODE, ALIGN)\
   (STRICT_ALIGNMENT			\
|| (((MODE) == SFmode || (MODE) == DFmode || (MODE) == TFmode	\
-	|| (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode	\
-	|| (MODE) == DImode)		\
+	|| (MODE) == SDmode || (MODE) == DDmode || (MODE) == TDmode)	\
 (ALIGN)  32)			\
|| (VECTOR_MODE_P ((MODE))  (((int)(ALIGN))  VECTOR_ALIGN (MODE
 


Re: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-17 Thread Aldy Hernandez

On 04/13/12 18:22, Boehm, Hans wrote:




-Original Message-
From: Aldy Hernandez [mailto:al...@redhat.com]
Sent: Thursday, April 12, 2012 3:12 PM
To: Richard Guenther
Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
Subject: [PR tree-optimization/52558]: RFC: questions on store data
race

Here we have a testcase that affects both the C++ memory model and
transactional memory.

[Hans, this is caused by the same problem that is causing the
speculative register promotion issue you and Torvald pointed me at].




In the following testcase (adapted from the PR), the loop invariant
motion pass caches a pre-existing value for g_2, and then performs a
store to g_2 on every path, causing a store data race:

int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
int l;
for (l = 0; l  1234; l++)
{
  if (g_1)
return l;
  else
g_2 = 0;-- Store to g_2 should only happen if !g_1
}
return 999;
}

This gets transformed into something like:

g_2_lsm = g_2;
if (g_1) {
g_2 = g_2_lsm;  // boo! hiss!
return 0;
} else {
g_2_lsm = 0;
g_2 = g_2_lsm;
}

The spurious write to g_2 could cause a data race.

Presumably the g_2_lsm = g_2 is actually outside the loop?

Why does the second g_2 = g_2_lsm; get introduced?  I would have expected it 
before the return.  Was the example just over-abbreviated?


There is some abbreviation going on :).  To be exact, this is what -O2 
currently produces for the lim1 pass.


bb 2:
  pretmp.4_1 = g_1;
  g_2_lsm.6_12 = g_2;

bb 3:
  # l_13 = PHI l_6(5), 0(2)
  # g_2_lsm.6_10 = PHI g_2_lsm.6_11(5), g_2_lsm.6_12(2)
  g_1.0_4 = pretmp.4_1;
  if (g_1.0_4 != 0)
goto bb 7;
  else
goto bb 4;

bb 4:
  g_2_lsm.6_11 = 0;
  l_6 = l_13 + 1;
  if (l_6 != 1234)
goto bb 5;
  else
goto bb 8;

bb 8:
  # g_2_lsm.6_18 = PHI g_2_lsm.6_11(4)
  g_2 = g_2_lsm.6_18;
  goto bb 6;

bb 5:
  goto bb 3;

bb 7:
  # g_2_lsm.6_17 = PHI g_2_lsm.6_10(3)
  # l_19 = PHI l_13(3)
  g_2 = g_2_lsm.6_17;

bb 6:
  # l_2 = PHI l_19(7), 999(8)
  return l_2;

So yes, there seems to be another write to g_2 inside the else, but 
probably because we have merged some basic blocks along the way.




Other than that, this sounds right to me.  So does Richard's flag-based 
version, which is the approach I would have originally expected.  But that 
clearly costs you a register.  It would be interesting to see how they compare.


I am working on the flag based approach.

Thanks to both of you.


Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread Alexander Monakov


On Tue, 17 Apr 2012, H.J. Lu wrote:

 On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov zhr...@ispras.ru wrote:
  Hello,
 
  I found the following problem while investigating SMS on x86-64.
  When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
  latencies in data dependency graph become zeros. The global pointer
  insn_default_latency points to insn_default_latency_none, which
  returns zero for any instruction.
  This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
  nocona.
 
  CPU_NONE was introduced by this patch:
  http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html
 
  I think we don't want any scheduler to work with zero latencies on
  such processors (with such -march).
  The following patch fixes the problem for my case with -march=nocona.
  Is it correct to fix the problem like this?
  What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
  prescott) ?
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index af4af7c..38d64e9 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
        PTA_MMX | PTA_SSE | PTA_SSE2},
       {prescott, PROCESSOR_NOCONA, CPU_NONE,
        PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
  -      {nocona, PROCESSOR_NOCONA, CPU_NONE,
  +      {nocona, PROCESSOR_NOCONA, CPU_GENERIC64,
        PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
        | PTA_CX16 | PTA_NO_SAHF},
       {core2, PROCESSOR_CORE2_64, CPU_CORE2,
  --
 
 Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?

CPU_GENERIC32 had been removed by the 2008 patch Roman was referring to.  Did
you mean CPU_PENTIUMPRO?

PowerPC prologue and epilogue

2012-04-17 Thread Alan Modra
This is the first in a series of patches cleaning up rs6000 prologue
and epilogue generating code.  This one is just the formatting/style
changes plus renaming two variables to better reflect their usage,
and moving code around.

The patch series has been bootstrapped and regression tested
powerpc-linux, powerpc64-linux and powerpc-linux-gnuspe.  Please test
on darwin and aix.

* config/rs6000/rs6000.c (rs6000_emit_savres_rtx): Formatting.
(rs6000_emit_prologue, rs6000_emit_epilogue): Likewise.  Rename
sp_offset to frame_off.  Move world save code earlier.

diff -urp gcc-virgin/gcc/config/rs6000/rs6000.c 
gcc-alan1/gcc/config/rs6000/rs6000.c
--- gcc-virgin/gcc/config/rs6000/rs6000.c   2012-04-14 22:48:44.108432893 
+0930
+++ gcc-alan1/gcc/config/rs6000/rs6000.c2012-04-16 11:57:37.282242636 
+0930
@@ -19212,9 +19212,9 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
-  use_reg = DEFAULT_ABI == ABI_AIX ? (gpr  !lr ? 12 : 1)
-  : DEFAULT_ABI == ABI_DARWIN  !gpr ? 1
-  : 11;
+  use_reg = (DEFAULT_ABI == ABI_AIX ? (gpr  !lr ? 12 : 1)
+: DEFAULT_ABI == ABI_DARWIN  !gpr ? 1
+: 11);
   RTVEC_ELT (p, offset++)
 = gen_rtx_USE (VOIDmode,
   gen_rtx_REG (Pmode, use_reg));
@@ -19224,7 +19224,7 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
   rtx addr, reg, mem;
   reg = gen_rtx_REG (reg_mode, start_reg + i);
   addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
-  GEN_INT (save_area_offset + reg_size*i));
+  GEN_INT (save_area_offset + reg_size * i));
   mem = gen_frame_mem (reg_mode, addr);
 
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode,
@@ -19293,9 +19293,9 @@ rs6000_emit_prologue (void)
   int saving_GPRs_inline;
   int using_store_multiple;
   int using_static_chain_p = (cfun-static_chain_decl != NULL_TREE
-   df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
+  df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
   call_used_regs[STATIC_CHAIN_REGNUM]);
-  HOST_WIDE_INT sp_offset = 0;
+  HOST_WIDE_INT frame_off = 0;
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = info-total_size;
@@ -19323,52 +19323,6 @@ rs6000_emit_prologue (void)
   reg_size = 8;
 }
 
-  strategy = info-savres_strategy;
-  using_store_multiple = strategy  SAVRES_MULTIPLE;
-  saving_FPRs_inline = strategy  SAVE_INLINE_FPRS;
-  saving_GPRs_inline = strategy  SAVE_INLINE_GPRS;
-
-  /* For V.4, update stack before we do any saving and set back pointer.  */
-  if (! WORLD_SAVE_P (info)
-   info-push_p
-   (DEFAULT_ABI == ABI_V4
- || crtl-calls_eh_return))
-{
-  bool need_r11 = (TARGET_SPE
-  ? (!saving_GPRs_inline
-  info-spe_64bit_regs_used == 0)
-  : (!saving_FPRs_inline || !saving_GPRs_inline));
-  rtx copy_reg = need_r11 ? gen_rtx_REG (Pmode, 11) : NULL;
-
-  if (info-total_size  32767)
-   sp_offset = info-total_size;
-  else if (need_r11)
-   frame_reg_rtx = copy_reg;
-  else if (info-cr_save_p
-  || info-lr_save_p
-  || info-first_fp_reg_save  64
-  || info-first_gp_reg_save  32
-  || info-altivec_size != 0
-  || info-vrsave_mask != 0
-  || crtl-calls_eh_return)
-   {
- copy_reg = frame_ptr_rtx;
- frame_reg_rtx = copy_reg;
-   }
-  else
-   {
- /* The prologue won't be saving any regs so there is no need
-to set up a frame register to access any frame save area.
-We also won't be using sp_offset anywhere below, but set
-the correct value anyway to protect against future
-changes to this function.  */
- sp_offset = info-total_size;
-   }
-  rs6000_emit_allocate_stack (info-total_size, copy_reg);
-  if (frame_reg_rtx != sp_reg_rtx)
-   rs6000_emit_stack_tie (frame_reg_rtx, false);
-}
-
   /* Handle world saves specially here.  */
   if (WORLD_SAVE_P (info))
 {
@@ -19396,7 +19350,7 @@ rs6000_emit_prologue (void)
   info-push_p
   info-lr_save_p
   (!crtl-calls_eh_return
-  || info-ehrd_offset == -432)
+ || info-ehrd_offset == -432)
   info-vrsave_save_offset == -224
   info-altivec_save_offset == -416);
 
@@ -19423,14 +19377,14 @@ rs6000_emit_prologue (void)
 properly.  */
   for (i = 0; i  64 - info-first_fp_reg_save; i++)
{
- rtx reg = gen_rtx_REG (((TARGET_HARD_FLOAT  TARGET_DOUBLE_FLOAT)
-  ? DFmode : SFmode), 
- 

Re: [PATCH, rs6000] Remove DImode from SLOW_UNALIGNED_ACCESS

2012-04-17 Thread David Edelsohn
On Tue, Apr 17, 2012 at 10:55 AM, Pat Haugen
pthau...@linux.vnet.ibm.com wrote:
 DImode references do not suffer a major performance hit for  4-byte aligned
 access like the float types.

 Bootstrap/regtest on powerpc64-linux with no new regressions. Ok for trunk?

 2012-04-17  Pat Haugen pthau...@us.ibm.com

        * config/rs6000/rs6000.h (SLOW_UNALIGNED_ACCESS): Remove DImode.

Okay.

I think this may have been introduced at the time the port was 32 bit
and DImode values sometimes were allocated to FPRs.

Thanks, David


PowerPC prologue and epilogue 2

2012-04-17 Thread Alan Modra
This fixes a lot of confusion in rs6000_frame_related call arguments.
At the time rs6000_frame_related first appeared, the prologue only
used sp_reg_rtx (r1) or frame_ptr_rtx (r12) as frame_reg_rtx to access
register save slots.  If r12 was used, it was necessary to add a note
that gave the equivalent offset relative to r1.

Nowadays, r11 is used as frame_reg_rtx too, when abiv4 and saving regs
out-of-line with a large frame.  When that change was made the calls
to rs6000_frame_related were not updated.  So rs6000_frame_related
won't replace r11 in register save rtl.  As it happens this isn't a
bug because when you look closely, out-of-line saves are disabled with
a large frame!  A fix for that will come later in this patch series.
I also optimize rs6000_frame_related a little to save generating
duplicate rtl.

* config/rs6000/rs6000.c (rs6000_frame_related): Don't emit a
REG_FRAME_RELATED_EXPR note when the instruction exactly matches
the replacement.
(emit_frame_save): Delete frame_ptr param.  Rename total_size to
frame_reg_to_sp.
(rs6000_emit_prologue): Add sp_off.  Update rs6000_frame_related
and emit_frame_save calls.  Cope with possibly missing note.

diff -urp gcc-alan1/gcc/config/rs6000/rs6000.c 
gcc-alan2/gcc/config/rs6000/rs6000.c
--- gcc-alan1/gcc/config/rs6000/rs6000.c2012-04-16 11:57:37.282242636 
+0930
+++ gcc-alan2/gcc/config/rs6000/rs6000.c2012-04-16 11:58:01.50108 
+0930
@@ -18751,7 +18751,10 @@ output_probe_stack_range (rtx reg1, rtx
with (plus:P (reg 1) VAL), and with REG2 replaced with RREG if REG2
is not NULL.  It would be nice if dwarf2out_frame_debug_expr could
deduce these equivalences by itself so it wasn't necessary to hold
-   its hand so much.  */
+   its hand so much.  Don't be tempted to always supply d2_f_d_e with
+   the actual cfa register, ie. r31 when we are using a hard frame
+   pointer.  That fails when saving regs off r1, and sched moves the
+   r31 setup past the reg saves.  */
 
 static rtx
 rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
@@ -18759,6 +18762,25 @@ rs6000_frame_related (rtx insn, rtx reg,
 {
   rtx real, temp;
 
+  if (REGNO (reg) == 1  reg2 == NULL_RTX)
+{
+  /* No need for any replacement.  Just set RTX_FRAME_RELATED_P.  */
+  int i;
+
+  gcc_checking_assert (val == 0);
+  real = PATTERN (insn);
+  if (GET_CODE (real) == PARALLEL)
+   for (i = 0; i  XVECLEN (real, 0); i++)
+ if (GET_CODE (XVECEXP (real, 0, i)) == SET)
+   {
+ rtx set = XVECEXP (real, 0, i);
+
+ RTX_FRAME_RELATED_P (set) = 1;
+   }
+  RTX_FRAME_RELATED_P (insn) = 1;
+  return insn;
+}
+
   /* copy_rtx will not make unique copies of registers, so we need to
  ensure we don't have unwanted sharing here.  */
   if (reg == reg2)
@@ -18772,10 +18794,13 @@ rs6000_frame_related (rtx insn, rtx reg,
   if (reg2 != NULL_RTX)
 real = replace_rtx (real, reg2, rreg);
 
-  real = replace_rtx (real, reg,
- gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode,
-   STACK_POINTER_REGNUM),
-   GEN_INT (val)));
+  if (REGNO (reg) == 1)
+gcc_checking_assert (val == 0);
+  else
+real = replace_rtx (real, reg,
+   gen_rtx_PLUS (Pmode, gen_rtx_REG (Pmode,
+ STACK_POINTER_REGNUM),
+ GEN_INT (val)));
 
   /* We expect that 'real' is either a SET or a PARALLEL containing
  SETs (and possibly other stuff).  In a PARALLEL, all the SETs
@@ -18893,8 +18918,8 @@ generate_set_vrsave (rtx reg, rs6000_sta
Save REGNO into [FRAME_REG + OFFSET] in mode MODE.  */
 
 static rtx
-emit_frame_save (rtx frame_reg, rtx frame_ptr, enum machine_mode mode,
-unsigned int regno, int offset, HOST_WIDE_INT total_size)
+emit_frame_save (rtx frame_reg, enum machine_mode mode,
+unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
   rtx reg, offset_rtx, insn, mem, addr, int_rtx;
   rtx replacea, replaceb;
@@ -18930,7 +18955,8 @@ emit_frame_save (rtx frame_reg, rtx fram
 
   insn = emit_move_insn (mem, reg);
 
-  return rs6000_frame_related (insn, frame_ptr, total_size, replacea, 
replaceb);
+  return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
+  replacea, replaceb);
 }
 
 /* Emit an offset memory reference suitable for a frame store, while
@@ -19295,7 +19321,9 @@ rs6000_emit_prologue (void)
   int using_static_chain_p = (cfun-static_chain_decl != NULL_TREE
   df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
   call_used_regs[STATIC_CHAIN_REGNUM]);
+  /* Offset to top of frame for frame_reg and sp respectively.  */
   HOST_WIDE_INT frame_off = 0;
+  HOST_WIDE_INT sp_off = 0;
 
   if 

PowerPC prologue and epilogue 3

2012-04-17 Thread Alan Modra
This continues the prologue and epilogue cleanup.  Not many user
visible changes here, except for:
- a bugfix to the LR save RTL emitted by rs6000_emit_savres_rtx which
  may affect SPE,
- a bugfix for SPE code emitted when using a static chain,
- vector saves will be done using r1 for large frames just over 32k in
  size, and,
- using r11 as a frame pointer whenever we need to set up r11 for
  out-of-line saves, and merging two pointer reg setup insns.
The latter is a necessary prerequisite to enabling out-of-line
save/restore for large frames, as I do in a later patch.  Currently
this will only affect abiv4 -Os when using out-of-line saves.

eg. -m32 -Os -mno-multiple
int f (double x)
{
  char a[33];
  __asm __volatile (#%0 : =m (a) : : fr31, r27, r28);
  return (int) x;
}
old new
stwu 1,-96(1)   mflr 0
mflr 0  addi 11,1,-8
addi 11,1,88stwu 1,-96(1)
stw 0,100(1)stw 0,12(11)
stfd 31,88(1)   bl _savegpr_27
bl _savegpr_27  stfd 31,0(11)


* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Delete forward
decl.  Move logic selecting update reg to callers.  Update all callers.
(rs6000_emit_allocate_stack): Add copy_off param.
(emit_frame_save): Don't handle reg+reg addressing.
(ptr_regno_for_savres): New function, extracted from..
(rs6000_emit_savres_rtx): ..here.  Add lr_offset param.
(rs6000_emit_prologue): Generate frame_ptr_rtx as we need it.
Set frame_reg_rtx to r11 whenever r11 is needed, and merge
frame offset adjustment for out-of-line save with copy from sp.
Simplify condition controlling whether cr is saved early or
late.  Use ptr_regno_for_savres to verify correct reg is set
up for out-of-line saves.  Pass the actual pointer reg used to
rs6000_emit_savres_rtx so rtl matches insns in out-of-line
function.  Rearrange spe vars so code is similar to that
elsewhere in this function.  Don't update frame_off when spe
save code will restore r11.  Use emit_frame_save for spe and
gpr saves.  Consolidate darwin out-of-line gpr setup with that
for other abis.  Don't assume frame_offset is zero and frame
reg is sp when setting up altivec reg saves, and calculate
exact offset requirement.
(rs6000_emit_epilogue): Use HOST_WIDE_INT for frame_off.  Tidy
spe restore code.  Consolidate darwin out-of-line gpr setup
with that for other abis.

diff -urp gcc-alan2/gcc/config/rs6000/rs6000.c 
gcc-alan3/gcc/config/rs6000/rs6000.c
--- gcc-alan2/gcc/config/rs6000/rs6000.c2012-04-16 11:58:01.50108 
+0930
+++ gcc-alan3/gcc/config/rs6000/rs6000.c2012-04-17 07:19:42.927931887 
+0930
@@ -951,7 +951,6 @@ static void rs6000_eliminate_indexed_mem
 static const char *rs6000_mangle_type (const_tree);
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
-static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -18534,7 +18533,7 @@ rs6000_emit_stack_tie (rtx fp, bool hard
The generated code may use hard register 0 as a temporary.  */
 
 static void
-rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg)
+rs6000_emit_allocate_stack (HOST_WIDE_INT size, rtx copy_reg, int copy_off)
 {
   rtx insn;
   rtx stack_reg = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
@@ -18578,7 +18577,12 @@ rs6000_emit_allocate_stack (HOST_WIDE_IN
 }
 
   if (copy_reg)
-emit_move_insn (copy_reg, stack_reg);
+{
+  if (copy_off != 0)
+   emit_insn (gen_add3_insn (copy_reg, stack_reg, GEN_INT (copy_off)));
+  else
+   emit_move_insn (copy_reg, stack_reg);
+}
 
   if (size  32767)
 {
@@ -18921,42 +18925,22 @@ static rtx
 emit_frame_save (rtx frame_reg, enum machine_mode mode,
 unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
-  rtx reg, offset_rtx, insn, mem, addr, int_rtx;
-  rtx replacea, replaceb;
-
-  int_rtx = GEN_INT (offset);
+  rtx reg, insn, mem, addr;
 
   /* Some cases that need register indexed addressing.  */
-  if ((TARGET_ALTIVEC_ABI  ALTIVEC_VECTOR_MODE (mode))
-  || (TARGET_VSX  ALTIVEC_OR_VSX_VECTOR_MODE (mode))
-  || (TARGET_E500_DOUBLE  mode == DFmode)
-  || (TARGET_SPE_ABI
-  SPE_VECTOR_MODE (mode)
-  !SPE_CONST_OFFSET_OK (offset)))
-{
-  /* Whomever calls us must make sure r11 is available in the
-flow path of instructions in the prologue.  */
-  offset_rtx = gen_rtx_REG (Pmode, 11);
-  emit_move_insn (offset_rtx, int_rtx);
-
-  replacea = offset_rtx;
-  replaceb = int_rtx;
-}
-  else

PowerPC prologue and epilogue 4

2012-04-17 Thread Alan Modra
This provides some protection against misuse of r0, r11 and r12.  I
found it useful when enabling out-of-line saves for large frames.  ;-)

* config/rs6000/rs6000.c (START_USE, END_USE, NOT_INUSE): Define.
(rs6000_emit_prologue): Use the above to catch register overlap.

diff -urp gcc-alan3/gcc/config/rs6000/rs6000.c 
gcc-alan4/gcc/config/rs6000/rs6000.c
--- gcc-alan3/gcc/config/rs6000/rs6000.c2012-04-17 07:19:42.927931887 
+0930
+++ gcc-alan4/gcc/config/rs6000/rs6000.c2012-04-17 09:11:31.760669589 
+0930
@@ -19301,6 +19301,29 @@ rs6000_emit_prologue (void)
   HOST_WIDE_INT frame_off = 0;
   HOST_WIDE_INT sp_off = 0;
 
+#ifdef ENABLE_CHECKING
+  /* Track and check usage of r0, r11, r12.  */
+  int reg_inuse = using_static_chain_p ? 1  11 : 0;
+#define START_USE(R) do \
+  {\
+gcc_assert ((reg_inuse  (1  (R))) == 0);\
+reg_inuse |= 1  (R); \
+  } while (0)
+#define END_USE(R) do \
+  {\
+gcc_assert ((reg_inuse  (1  (R))) != 0);\
+reg_inuse = ~(1  (R));  \
+  } while (0)
+#define NOT_INUSE(R) do \
+  {\
+gcc_assert ((reg_inuse  (1  (R))) == 0);\
+  } while (0)
+#else
+#define START_USE(R) do {} while (0)
+#define END_USE(R) do {} while (0)
+#define NOT_INUSE(R) do {} while (0)
+#endif
+
   if (flag_stack_usage_info)
 current_function_static_stack_size = info-total_size;
 
@@ -19465,6 +19488,7 @@ rs6000_emit_prologue (void)
   if (need_r11)
{
  ptr_reg = gen_rtx_REG (Pmode, 11);
+ START_USE (11);
}
   else if (info-total_size  32767)
frame_off = info-total_size;
@@ -19477,6 +19501,7 @@ rs6000_emit_prologue (void)
   || crtl-calls_eh_return)
{
  ptr_reg = gen_rtx_REG (Pmode, 12);
+ START_USE (12);
}
   else
{
@@ -19509,6 +19534,7 @@ rs6000_emit_prologue (void)
   rtx addr, reg, mem;
 
   reg = gen_rtx_REG (Pmode, 0);
+  START_USE (0);
   insn = emit_move_insn (reg, gen_rtx_REG (Pmode, LR_REGNO));
   RTX_FRAME_RELATED_P (insn) = 1;
 
@@ -19524,6 +19550,7 @@ rs6000_emit_prologue (void)
  insn = emit_move_insn (mem, reg);
  rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
NULL_RTX, NULL_RTX);
+ END_USE (0);
}
 }
 
@@ -19536,6 +19563,7 @@ rs6000_emit_prologue (void)
   rtx set;
 
   cr_save_rtx = gen_rtx_REG (SImode, cr_save_regno);
+  START_USE (cr_save_regno);
   insn = emit_insn (gen_movesi_from_cr (cr_save_rtx));
   RTX_FRAME_RELATED_P (insn) = 1;
   /* Now, there's no way that dwarf2out_frame_debug_expr is going
@@ -19579,6 +19607,8 @@ rs6000_emit_prologue (void)
 /*savep=*/true, /*gpr=*/false, lr);
   rs6000_frame_related (insn, frame_reg_rtx, sp_off,
NULL_RTX, NULL_RTX);
+  if (lr)
+   END_USE (0);
 }
 
   /* Save GPRs.  This is done as a PARALLEL if we are using
@@ -19623,10 +19653,15 @@ rs6000_emit_prologue (void)
  if (using_static_chain_p)
{
  rtx r0 = gen_rtx_REG (Pmode, 0);
+
+ START_USE (0);
  gcc_assert (info-first_gp_reg_save  11);
 
  emit_move_insn (r0, spe_save_area_ptr);
}
+ else if (REGNO (frame_reg_rtx) != 11)
+   START_USE (11);
+
  emit_insn (gen_addsi3 (spe_save_area_ptr,
 frame_reg_rtx, GEN_INT (offset)));
  if (!using_static_chain_p  REGNO (frame_reg_rtx) == 11)
@@ -19657,8 +19692,16 @@ rs6000_emit_prologue (void)
}
 
   /* Move the static chain pointer back.  */
-  if (using_static_chain_p  !spe_regs_addressable)
-   emit_move_insn (spe_save_area_ptr, gen_rtx_REG (Pmode, 0));
+  if (!spe_regs_addressable)
+   {
+ if (using_static_chain_p)
+   {
+ emit_move_insn (spe_save_area_ptr, gen_rtx_REG (Pmode, 0));
+ END_USE (0);
+   }
+ else if (REGNO (frame_reg_rtx) != 11)
+   END_USE (11);
+   }
 }
   else if (!WORLD_SAVE_P (info)  !saving_GPRs_inline)
 {
@@ -19679,10 +19722,13 @@ rs6000_emit_prologue (void)
 
  if (ptr_set_up)
frame_off = -end_save;
+ else
+   NOT_INUSE (ptr_regno);
  emit_insn (gen_add3_insn (ptr_reg, frame_reg_rtx, offset));
}
   else if (!ptr_set_up)
{
+ NOT_INUSE (ptr_regno);
  emit_move_insn (ptr_reg, frame_reg_rtx);
}
   ptr_off = -end_save;
@@ -19693,6 +19739,8 @@ rs6000_emit_prologue (void)
 /*savep=*/true, /*gpr=*/true, lr);
   rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
NULL_RTX, NULL_RTX);
+  if (lr)
+ 

PowerPC prologue and epilogue 5

2012-04-17 Thread Alan Modra
This enables out-of-line save and restore for large frames, and for
ABI_AIX when using the static chain.

* config/rs6000/rs6000.c (rs6000_savres_strategy): Allow
out-of-line save/restore for large frames.  Don't disable
out-of-line saves on ABI_AIX when using static chain reg.
(rs6000_emit_prologue): Adjust cr_save_regno on ABI_AIX to not
clobber static chain reg, and tweak for out-of-line gpr saves
that use r1.

diff -urp gcc-alan4/gcc/config/rs6000/rs6000.c 
gcc-alan5/gcc/config/rs6000/rs6000.c
--- gcc-alan4/gcc/config/rs6000/rs6000.c2012-04-17 09:11:31.760669589 
+0930
+++ gcc-alan5/gcc/config/rs6000/rs6000.c2012-04-17 11:16:09.369537832 
+0930
@@ -17432,8 +17432,7 @@ rs6000_savres_strategy (rs6000_stack_t *
 strategy |= SAVRES_MULTIPLE;
 
   if (crtl-calls_eh_return
-  || cfun-machine-ra_need_lr
-  || info-total_size  32767)
+  || cfun-machine-ra_need_lr)
 strategy |= (SAVE_INLINE_FPRS | REST_INLINE_FPRS
 | SAVE_INLINE_GPRS | REST_INLINE_GPRS);
 
@@ -17454,10 +17453,10 @@ rs6000_savres_strategy (rs6000_stack_t *
   /* Don't bother to try to save things out-of-line if r11 is occupied
  by the static chain.  It would require too much fiddling and the
  static chain is rarely used anyway.  FPRs are saved w.r.t the stack
- pointer on Darwin.  */
-  if (using_static_chain_p)
-strategy |= (DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
-   | SAVE_INLINE_GPRS;
+ pointer on Darwin, and AIX uses r1 or r12.  */
+  if (using_static_chain_p  DEFAULT_ABI != ABI_AIX)
+strategy |= ((DEFAULT_ABI == ABI_DARWIN ? 0 : SAVE_INLINE_FPRS)
+| SAVE_INLINE_GPRS);
 
   /* If we are going to use store multiple, then don't even bother
  with the out-of-line routines, since the store-multiple
@@ -19555,7 +19554,10 @@ rs6000_emit_prologue (void)
 }
 
   /* If we need to save CR, put it into r12 or r11.  */
-  cr_save_regno = DEFAULT_ABI == ABI_AIX  !saving_GPRs_inline ? 11 : 12;
+  cr_save_regno = (DEFAULT_ABI == ABI_AIX
+   (strategy  SAVE_INLINE_GPRS) == 0
+   (strategy  SAVE_NOINLINE_GPRS_SAVES_LR) == 0
+   !using_static_chain_p ? 11 : 12);
   if (!WORLD_SAVE_P (info)
info-cr_save_p
REGNO (frame_reg_rtx) != cr_save_regno)

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-17 Thread Igor Zamyatin
On Thu, Apr 12, 2012 at 3:16 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin izamya...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi All!

 Here is a patch that enables unroll at O2 for Atom.

 This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32
 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32
 bits).

 5% is not moderate.  Your patch does enable unrolling at -O2 but not -O3,
 why? Why do you disable register renaming?  check_imull requires a function
 comment.

 Sure, enabling unroll for O3 could be the next step.
 We can't avoid code size increase with unroll - what number do you
 think will be appropriate?
 Register renaming was the reason of several degradations during tuning 
 process
 Comment for check_imull was added


 This completely looks like a hack for EEMBC2.0, so it's definitely not ok.

 Why? EEMBC was measured and result provided here just because this
 benchmark considers to be very relevant for Atom

 I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP
 would be irrelevant OTOH).  Similar code size for, say, Mozilla Firefox
 or GCC itself would be important.

 -O2 is not supposed to give best benchmark results.

 O2 is wide-used so performance improvement could be important for users.

 But not at a 5% size cost.  Please also always check the compile-time effect
 which is important for -O2 as well.

What would be an acceptable number of size cost/compile-time increase
for O2 and O3 on EEMBC, SPEC INT 2000 and Mozilla?

Is it possible in common to put Atom-specific unroll heuristics under
some option which could be mentioned in GCC docs?


 Richard.


 Thanks,
 Richard.


 Tested for i386 and x86-64, ok for trunk?

 Updated patch attached


 Thanks,
 Igor

 ChangeLog:

 2012-04-10  Yakovlev Vladimir  vladimir.b.yakov...@intel.com

        * gcc/config/i386/i386.c (check_imul): New routine.
        (ix86_loop_unroll_adjust): New target hook.
        (ix86_option_override_internal): Enable unrolling on Atom at -O2.
        (TARGET_LOOP_UNROLL_ADJUST): New define.

Thanks,
Igor


[PATCH] Allow un-distribution with repeated factors (PR52976 follow-up)

2012-04-17 Thread William J. Schmidt
The emergency reassociation patch for PR52976 disabled un-distribution
in the presence of repeated factors to avoid ICEs in zero_one_operation.
This patch fixes such cases properly by teaching zero_one_operation
about __builtin_pow* calls.

Bootstrapped with no new regressions on powerpc64-linux.  Also built
SPEC cpu2000 and cpu2006 successfully.  Ok for trunk?

Thanks,
Bill


gcc:

2012-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

* tree-ssa-reassoc.c (stmt_is_power_of_op): New function.
(decrement_power): Likewise.
(propagate_op_to_single_use): Likewise.
(zero_one_operation): Handle __builtin_pow* calls in linearized
expression trees; factor logic into propagate_op_to_single_use.
(undistribute_ops_list): Allow operands with repeat counts  1.


gcc/testsuite:

2012-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

gfortran.dg/reassoc_7.f: New test.
gfortran.dg/reassoc_8.f: Likewise.
gfortran.dg/reassoc_9.f: Likewise.
gfortran.dg/reassoc_10.f: Likewise.


Index: gcc/testsuite/gfortran.dg/reassoc_10.f
===
--- gcc/testsuite/gfortran.dg/reassoc_10.f  (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_10.f  (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options -O3 -ffast-math -fdump-tree-optimized }
+
+  SUBROUTINE S55199(P,Q,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Q,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P*c2 + 3.*P**2*Q**3*c3
+  END
+
+! There should be five multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times  \\\*  5 optimized } }
+! { dg-final { cleanup-tree-dump optimized } }
Index: gcc/testsuite/gfortran.dg/reassoc_7.f
===
--- gcc/testsuite/gfortran.dg/reassoc_7.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_7.f   (revision 0)
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-options -O3 -ffast-math -fdump-tree-optimized }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P*c2 + 3.*P**2*c3
+  END
+
+! There should be two multiplies following un-distribution.
+
+! { dg-final { scan-tree-dump-times  \\\*  2 optimized } }
+! { dg-final { cleanup-tree-dump optimized } }
Index: gcc/testsuite/gfortran.dg/reassoc_8.f
===
--- gcc/testsuite/gfortran.dg/reassoc_8.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_8.f   (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options -O3 -ffast-math -fdump-tree-optimized }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P**2*c2 + 3.*P**3*c3
+  END
+
+! There should be three multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times  \\\*  3 optimized } }
+! { dg-final { cleanup-tree-dump optimized } }
Index: gcc/testsuite/gfortran.dg/reassoc_9.f
===
--- gcc/testsuite/gfortran.dg/reassoc_9.f   (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_9.f   (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options -O3 -ffast-math -fdump-tree-optimized }
+
+  SUBROUTINE S55199(P,Dvdph)
+  implicit none
+  real(8) :: c1,c2,c3,P,Dvdph
+  c1=0.1d0
+  c2=0.2d0
+  c3=0.3d0
+  Dvdph = c1 + 2.*P**2*c2 + 3.*P**4*c3
+  END
+
+! There should be three multiplies following un-distribution
+! and power expansion.
+
+! { dg-final { scan-tree-dump-times  \\\*  3 optimized } }
+! { dg-final { cleanup-tree-dump optimized } }
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 186495)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -1020,6 +1020,98 @@ oecount_cmp (const void *p1, const void *p2)
 return c1-id - c2-id;
 }
 
+/* Return TRUE iff STMT represents a builtin call that raises OP
+   to some exponent.  */
+
+static bool
+stmt_is_power_of_op (gimple stmt, tree op)
+{
+  tree fndecl;
+
+  if (!is_gimple_call (stmt))
+return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+
+  if (!fndecl
+  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return false;
+
+  switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+{
+CASE_FLT_FN (BUILT_IN_POW):
+CASE_FLT_FN (BUILT_IN_POWI):
+  return (operand_equal_p (gimple_call_arg (stmt, 0), op, 0));
+  
+default:
+  return false;
+}
+}
+
+/* Given STMT which is a __builtin_pow* call, decrement its exponent
+   in place and return the result.  Assumes that stmt_is_power_of_op
+   was previously called 

Re: CPU_NONE ix86_schedule cpu attribute for -march=nocona

2012-04-17 Thread H.J. Lu
On Tue, Apr 17, 2012 at 8:04 AM, Alexander Monakov amona...@ispras.ru wrote:


 On Tue, 17 Apr 2012, H.J. Lu wrote:

 On Tue, Apr 17, 2012 at 7:35 AM, Roman Zhuykov zhr...@ispras.ru wrote:
  Hello,
 
  I found the following problem while investigating SMS on x86-64.
  When I run gcc with -march=nocona (on pentium-4 with EM64T extension), all
  latencies in data dependency graph become zeros. The global pointer
  insn_default_latency points to insn_default_latency_none, which
  returns zero for any instruction.
  This happens because ix86_schedule cpu attribute is set to CPU_NONE for 
  nocona.
 
  CPU_NONE was introduced by this patch:
  http://gcc.gnu.org/ml/gcc-patches/2008-10/msg00179.html
 
  I think we don't want any scheduler to work with zero latencies on
  such processors (with such -march).
  The following patch fixes the problem for my case with -march=nocona.
  Is it correct to fix the problem like this?
  What to do with 32bit architectures (i386, i486, pentium4, pentium4m,
  prescott) ?
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index af4af7c..38d64e9 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -2989,7 +2989,7 @@ ix86_option_override_internal (bool main_args_p)
        PTA_MMX | PTA_SSE | PTA_SSE2},
       {prescott, PROCESSOR_NOCONA, CPU_NONE,
        PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3},
  -      {nocona, PROCESSOR_NOCONA, CPU_NONE,
  +      {nocona, PROCESSOR_NOCONA, CPU_GENERIC64,
        PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
        | PTA_CX16 | PTA_NO_SAHF},
       {core2, PROCESSOR_CORE2_64, CPU_CORE2,
  --

 Should we replace all CPU_NONE with CPU_GENERIC32/CPU_GENERIC64?

 CPU_GENERIC32 had been removed by the 2008 patch Roman was referring to.  Did
 you mean CPU_PENTIUMPRO?

Yes.

-- 
H.J.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

Hi,
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.

Indeed. I'm finishing testing the below. Ok if it passes?

Thanks,
Paolo.


/cp
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52599
* semantics.c (build_constexpr_constructor_member_initializers):
Check for function-try-block as function-body.

/testsuite
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52599
* g++.dg/cpp0x/constexpr-ctor10.C: New.
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error constructor }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -5921,6 +5921,8 @@ build_constexpr_constructor_member_initializers (t
break;
}
 }
+  else if (TREE_CODE (body) == TRY_BLOCK)
+error (body of %constexpr% constructor cannot be a function-try-block);
   else if (EXPR_P (body))
 ok = build_data_member_initialization (body, vec);
   else


Re: [C++ Patch] PR 53003

2012-04-17 Thread Paolo Carlini

On 04/17/2012 03:55 PM, Jason Merrill wrote:

I have various thoughts:

It's odd that we still treat 'return' as starting a function body long 
after we removed that extension.


Maybe we shouldn't look for a function body if we already have an 
initializer and aren't dealing with a function declarator.


I guess we should set initializer_token_start for {} initializers as 
well.


But your patch is certainly the smallest change, and OK.
Thanks. Thus let's say I apply the very safe patchlet to mainline and 
branch and then, when time allows, I'll try and see if I clean up a bit 
mainline in this area.


Thanks,
Paolo.


Re: [C++ Patch] PR 52599

2012-04-17 Thread Paolo Carlini

On 04/17/2012 05:35 PM, Paolo Carlini wrote:

Hi,
I think build_constexpr_constructor_member_initializers is a better 
place for that check, since it's already looking at the tree structure.

Indeed. I'm finishing testing the below. Ok if it passes?
... uhm, actually like this seems more correct to me, I'm testing this 
variant instead. Sorry.


Paolo.

///
Index: testsuite/g++.dg/cpp0x/constexpr-ctor10.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-ctor10.C   (revision 0)
@@ -0,0 +1,6 @@
+// PR c++/52599
+// { dg-options -std=c++11 }
+
+struct foo {
+  constexpr foo() try { } catch(...) { };  // { dg-error constructor }
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 186523)
+++ cp/semantics.c  (working copy)
@@ -5921,6 +5921,12 @@ build_constexpr_constructor_member_initializers (t
break;
}
 }
+  else if (TREE_CODE (body) == TRY_BLOCK)
+{
+  error (body of %constexpr% constructor cannot be 
+a function-try-block);
+  return error_mark_node;
+}
   else if (EXPR_P (body))
 ok = build_data_member_initialization (body, vec);
   else


[patch] Cleanup tree-switch-conversion a bit

2012-04-17 Thread Steven Bosscher
 My goal for GCC 4.8 is to do just that: Move switch expansion to
 GIMPLE and add value profiling for switch expressions.

And the idea is to put all that code in tree-switch-conversion.c. But
there are a few clean-ups I wish to do on that code before that.
First, there is a global pass info structure that contains useful data
for all forms of GIMPLE_SWITCH lowering. I've un-globalized that
data with the attached patch. While there, I made the dump messages
uniform.

Bootstrapped and tested on powerpc-unknown-linux-gnu. OK?

Ciao!
Steven
* tree-switch-conversion.c (info): Remove global pass info.
(check_range, check_process_case, check_final_bb, create_temp_arrays,
free_temp_arrays, gather_default_values, build_constructors,
array_value_type, build_one_array, build_arrays, gen_def_assigns,
fix_phi_nodes, gen_inbound_check): Pass info around from ...
(process_switch): ... here.  Unify message format.  Return a const
char pointer to the failure reason message.
(do_switchconv): Unify message format.  Update process_switch usage.

Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 186526)
+++ tree-switch-conversion.c(working copy)
@@ -24,8 +24,8 @@ Software Foundation, 51 Franklin Street, Fifth Flo
  Switch initialization conversion
 
 The following pass changes simple initializations of scalars in a switch
-statement into initializations from a static array.  Obviously, the values must
-be constant and known at compile time and a default branch must be
+statement into initializations from a static array.  Obviously, the values
+must be constant and known at compile time and a default branch must be
 provided.  For example, the following code:
 
 int a,b;
@@ -162,16 +162,12 @@ struct switch_conv_info
   basic_block bit_test_bb[2];
 };
 
-/* Global pass info.  */
-static struct switch_conv_info info;
-
-
 /* Checks whether the range given by individual case statements of the SWTCH
switch statement isn't too big and whether the number of branches actually
satisfies the size of the new array.  */
 
 static bool
-check_range (gimple swtch)
+check_range (gimple swtch, struct switch_conv_info *info)
 {
   tree min_case, max_case;
   unsigned int branch_num = gimple_switch_num_labels (swtch);
@@ -181,7 +177,7 @@ static bool
  is a default label which is the first in the vector.  */
 
   min_case = gimple_switch_label (swtch, 1);
-  info.range_min = CASE_LOW (min_case);
+  info-range_min = CASE_LOW (min_case);
 
   gcc_assert (branch_num  1);
   gcc_assert (CASE_LOW (gimple_switch_label (swtch, 0)) == NULL_TREE);
@@ -191,22 +187,22 @@ static bool
   else
 range_max = CASE_LOW (max_case);
 
-  gcc_assert (info.range_min);
+  gcc_assert (info-range_min);
   gcc_assert (range_max);
 
-  info.range_size = int_const_binop (MINUS_EXPR, range_max, info.range_min);
+  info-range_size = int_const_binop (MINUS_EXPR, range_max, info-range_min);
 
-  gcc_assert (info.range_size);
-  if (!host_integerp (info.range_size, 1))
+  gcc_assert (info-range_size);
+  if (!host_integerp (info-range_size, 1))
 {
-  info.reason = index range way too large or otherwise unusable.\n;
+  info-reason = index range way too large or otherwise unusable;
   return false;
 }
 
-  if ((unsigned HOST_WIDE_INT) tree_low_cst (info.range_size, 1)
+  if ((unsigned HOST_WIDE_INT) tree_low_cst (info-range_size, 1)
((unsigned) branch_num * SWITCH_CONVERSION_BRANCH_RATIO))
 {
-  info.reason = the maximum range-branch ratio exceeded.\n;
+  info-reason = the maximum range-branch ratio exceeded;
   return false;
 }
 
@@ -219,7 +215,7 @@ static bool
and returns true.  Otherwise returns false.  */
 
 static bool
-check_process_case (tree cs)
+check_process_case (tree cs, struct switch_conv_info *info)
 {
   tree ldecl;
   basic_block label_bb, following_bb;
@@ -228,48 +224,48 @@ static bool
   ldecl = CASE_LABEL (cs);
   label_bb = label_to_block (ldecl);
 
-  e = find_edge (info.switch_bb, label_bb);
+  e = find_edge (info-switch_bb, label_bb);
   gcc_assert (e);
 
   if (CASE_LOW (cs) == NULL_TREE)
 {
   /* Default branch.  */
-  info.default_prob = e-probability;
-  info.default_count = e-count;
+  info-default_prob = e-probability;
+  info-default_count = e-count;
 }
   else
 {
   int i;
-  info.other_count += e-count;
+  info-other_count += e-count;
   for (i = 0; i  2; i++)
-   if (info.bit_test_bb[i] == label_bb)
+   if (info-bit_test_bb[i] == label_bb)
  break;
-   else if (info.bit_test_bb[i] == NULL)
+   else if (info-bit_test_bb[i] == NULL)
  {
-   info.bit_test_bb[i] = label_bb;
-   info.bit_test_uniq++;
+   info-bit_test_bb[i] = label_bb;
+   info-bit_test_uniq++;
break;
  }
   if (i 

Vector subscripts in C++

2012-04-17 Thread Marc Glisse

Hello,

this patch adds vector subscripting to C++ by reusing the C code. 
build_array_ref and cp_build_array_ref could probably share more, but I 
don't understand them enough to do it.


(note that I can't commit, so if you like the patch...)

gcc/cp/ChangeLog
2012-04-17  Marc Glisse  marc.gli...@inria.fr

PR c++/51033
* typeck.c (cp_build_array_ref): Handle VECTOR_TYPE.
* decl2.c (grok_array_decl): Likewise.

gcc/c-family/ChangeLog
2012-04-17  Marc Glisse  marc.gli...@inria.fr

PR c++/51033
* c-common.c (convert_vector_to_pointer_for_subscript): New function.
* c-common.h (convert_vector_to_pointer_for_subscript): Declare it.

gcc/ChangeLog
2012-04-17  Marc Glisse  marc.gli...@inria.fr

PR c++/51033
* c-typeck.c (build_array_ref): Call
convert_vector_to_pointer_for_subscript.
* doc/extend.texi (Vector Extensions): Subscripting not just for C.

gcc/testsuite/ChangeLog
2012-04-17  Marc Glisse  marc.gli...@inria.fr

PR c++/51033
* gcc.dg/vector-1.c: Move to ...
* c-c++-common/vector-1.c: ... here.
* gcc.dg/vector-2.c: Move to ...
* c-c++-common/vector-2.c: ... here.
* gcc.dg/vector-3.c: Move to ...
* c-c++-common/vector-3.c: ... here. Adapt to C++.
* gcc.dg/vector-4.c: Move to ...
* c-c++-common/vector-4.c: ... here.
* gcc.dg/vector-init-1.c: Move to ...
* c-c++-common/vector-init-1.c: ... here.
* gcc.dg/vector-init-2.c: Move to ...
* c-c++-common/vector-init-2.c: ... here.
* gcc.dg/vector-subscript-1.c: Move to ... Adapt to C++.
* c-c++-common/vector-subscript-1.c: ... here.
* gcc.dg/vector-subscript-2.c: Move to ...
* c-c++-common/vector-subscript-2.c: ... here.
* gcc.dg/vector-subscript-3.c: Move to ...
* c-c++-common/vector-subscript-3.c: ... here.

--
Marc GlisseIndex: cp/decl2.c
===
--- cp/decl2.c  (revision 186523)
+++ cp/decl2.c  (working copy)
@@ -373,7 +373,7 @@
 It is a little-known fact that, if `a' is an array and `i' is
 an int, you can write `i[a]', which means the same thing as
 `a[i]'.  */
-  if (TREE_CODE (type) == ARRAY_TYPE)
+  if (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == VECTOR_TYPE)
p1 = array_expr;
   else
p1 = build_expr_type_conversion (WANT_POINTER, array_expr, false);
Index: cp/typeck.c
===
--- cp/typeck.c (revision 186523)
+++ cp/typeck.c (working copy)
@@ -2902,6 +2902,8 @@
   break;
 }
 
+  convert_vector_to_pointer_for_subscript (loc, array, idx);
+
   if (TREE_CODE (TREE_TYPE (array)) == ARRAY_TYPE)
 {
   tree rval, type;
Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 186523)
+++ c-family/c-common.c (working copy)
@@ -10831,4 +10831,29 @@
   return literal;
 }
 
+/* For vector[index], convert the vector to a
+   pointer of the underlying type.  */
+void
+convert_vector_to_pointer_for_subscript (location_t loc, tree* vecp, tree 
index)
+{
+  if (TREE_CODE (TREE_TYPE (*vecp)) == VECTOR_TYPE)
+{
+  tree type = TREE_TYPE (*vecp);
+  tree type1;
+
+  if (TREE_CODE (index) == INTEGER_CST)
+if (!host_integerp (index, 1)
+|| ((unsigned HOST_WIDE_INT) tree_low_cst (index, 1)
+   = TYPE_VECTOR_SUBPARTS (type)))
+  warning_at (loc, OPT_Warray_bounds, index value is out of bound);
+
+  c_common_mark_addressable_vec (*vecp);
+  type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
+  type = build_pointer_type (type);
+  type1 = build_pointer_type (TREE_TYPE (*vecp));
+  *vecp = build1 (ADDR_EXPR, type1, *vecp);
+  *vecp = convert (type, *vecp);
+}
+}
+
 #include gt-c-family-c-common.h
Index: c-family/c-common.h
===
--- c-family/c-common.h (revision 186523)
+++ c-family/c-common.h (working copy)
@@ -1119,4 +1119,6 @@
 
 extern tree build_userdef_literal (tree suffix_id, tree value, tree 
num_string);
 
+extern void convert_vector_to_pointer_for_subscript (location_t, tree*, tree);
+
 #endif /* ! GCC_C_COMMON_H */
Index: testsuite/c-c++-common/vector-3.c
===
--- testsuite/c-c++-common/vector-3.c   (revision 186523)
+++ testsuite/c-c++-common/vector-3.c   (working copy)
@@ -2,4 +2,7 @@
 
 /* Check that we error out when using vector_size on the bool type. */
 
+#ifdef __cplusplus
+#define _Bool bool
+#endif
 __attribute__((vector_size(16) )) _Bool a; /* { dg-error  } */
Index: testsuite/c-c++-common/vector-subscript-1.c
===
--- testsuite/c-c++-common/vector-subscript-1.c (revision 186523)
+++ 

Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Vladimir Makarov

On 04/17/2012 04:29 AM, Richard Sandiford wrote:

Vladimir Makarovvmaka...@redhat.com  writes:

On the other hand, I don't think that 1st insn scheduling will be ever
used for x86.  And although the SPECFP2000 rate is the same on x86-64 I
saw that some SPECFP2000 tests benefit from your algorithm on x86-64
(one amazing difference is 70% improvement on swim on x86-64 although it
might be because of different reasons like alignment or cache
behaviour).  So I think the algorithm might work better on processors
with more registers.
Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?
Yes, mgrid had 4% degradation, mesa had 2%, facerec and ammp had 2.5%, 
lucas had 15%.  On the other hand, galgel had 2% improvement and equake 
had 1%.  All other differences are not considerable.


Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.


It is up to you, Richard.  I don't mind if you commit it into the trunk.

There is something in your approach too.  If it would be possible to get 
the best of the two methods, we could see a big improvement.




[v3] std::uninitialized_copy test

2012-04-17 Thread Benjamin De Kosnik

Found this bug in 4.4 branches, fixed with the following in later
branches:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01616.html

But the test case is useful anyway.

tested x86/linux

-benjamin2012-04-16  Benjamin Kosnik  b...@redhat.com

	* testsuite/20_util/specialized_algorithms/uninitialized_copy/
	808590.cc: New.


diff --git a/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
new file mode 100644
index 000..7ccd8da
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without Pred the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include vector
+#include stdexcept
+
+// 4.4.x only
+struct c 
+{
+  void *m;
+
+  c(void* o = 0) : m(o) {}
+  c(const c r) : m(r.m) {}
+
+  templateclass T
+explicit c(T o) : m((void*)0xdeadfbeef) { }
+};
+
+int main() 
+{
+  std::vectorc cbs;
+  const c cb((void*)0xcafebabe);
+
+  for (int fd = 62; fd  67; ++fd) 
+{
+  cbs.resize(fd + 1);
+  cbs[fd] = cb;
+}
+
+  for (int fd = 62; fd 67; ++fd)
+if (cb.m != cbs[fd].m)
+  throw std::runtime_error(wrong);
+  return 0;
+}


Re: [C++ Patch] PR 52599

2012-04-17 Thread Jason Merrill

OK.

Jason


Re: [v3, testsuite] Fix merging default libstdc++.log

2012-04-17 Thread Rainer Orth
Hi Mike,

 On Apr 16, 2012, at 8:03 AM, Rainer Orth wrote:
 I've long noticed that libstdc++.log (unlike libstdc++.sum) doesn't
 contain log entries for tests run from abi.exp, but hadn't looked
 closer, getting used to check libstdc++.log.sep instead which contained
 everything I expected.

 ok for mainline?

 Ok.  Would have been nice to see the before and after log file...

the full thing is pretty boring, but the gist of the change can be seen
by diffing the output of dg-extract-results.sh -L libstdc++.log.sep:

--- 11-gcc.old/i386-pc-solaris2.11/libstdc++-v3/testsuite/libstdc++.log.dist
2012-04-17 18:59:05.777535114 +0200
+++ 11-gcc/i386-pc-solaris2.11/libstdc++-v3/testsuite/libstdc++.log.fixed   
2012-04-17 18:57:26.890396807 +0200
@@ -1,4 +1,4 @@
-Test Run By ro on Sat Apr 14 19:57:36 2012
+Test Run By ro on Sun Apr 15 22:11:21 2012
 Native configuration is i386-pc-solaris2.11
 
=== libstdc++ tests ===
@@ -15,8 +15,8 @@
 libgomp support detected
 set_ld_library_path_env_vars: 
ld_library_path=:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64
 LD_LIBRARY_PATH = 
:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libmudflap/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libssp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libitm/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/./gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/./prev-gcc

[differences due to tmp file names omitted ...]

@@ -54,6 +54,226 @@
 Setting LD_LIBRARY_PATH to 
:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64::/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/../libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/./libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/gcc/amd64:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libstdc++-v3/src/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libmudflap/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libssp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libgomp/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/i386-pc-solaris2.11/libitm/.libs:/var/gcc/gcc-4.8.0-20120413/11-gcc/./gcc:/var/gcc/gcc-4.8.0-20120413/11-gcc/./prev-gcc
 spawn [open ...]
 
+    libstdc++-v3 check-abi Summary 
+
+# of added symbols: 0
+# of missing symbols:   0
+# of undesignated symbols:  0
+# of incompatible symbols:  0
+
+using: baseline_symbols.txt
+PASS: libstdc++-abi/abi_check
+testcase 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-abi/abi.exp 
completed in 35 seconds
+Running 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
 ...
+libgomp support detected
[rest of prettyprinters.exp tests omitted ...]
+testcase 
/vol/gcc/src/hg/trunk/solaris/libstdc++-v3/testsuite/libstdc++-prettyprinters/prettyprinters.exp
 completed in 45 seconds
+
=== libstdc++ Summary for unix ===
 
 Running target unix/-m64

[similar change for 64-bit variant omitted...]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Use target_alias in validate_failures.py

2012-04-17 Thread Rainer Orth
Diego Novillo dnovi...@google.com writes:

 On 4/16/12 7:32 AM, Rainer Orth wrote:

 Btw., it occured to me that it might be useful to add an option to
 locate out-of-tree manifests.  I often have several source trees
 (unmodified sources, ones with local patches) and would like to share
 manifests between them.  While this can be achieved with symlinks, a
 --manifest_dir or similar option might be an alternative.  Thoughts?

 That would be fantastic.  This is not the first time someone requests this,
 but I've never gotten around to implementing it.  The only thing there is
 that multiple manifests means that versioning needs to be handled
 externally to the script.  But that's not a big deal.

Indeed, but the advantage is that people can choose whatever naming
scheme they like for the different manifests instead of implementing
some (probably limited) scheme inside validate_failures.py.

I'll give it a whirl, but probably only in early May, once I return from
a trip to California.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [v3] std::uninitialized_copy test

2012-04-17 Thread Paolo Carlini

On 04/17/2012 06:52 PM, Benjamin De Kosnik wrote:

Found this bug in 4.4 branches, fixed with the following in later
branches:
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01616.html

But the test case is useful anyway.
Definitely, thanks! The name of the new testcase seems a bit weird (for 
the FSF branches): shall we maybe refer to the original (Fedora or RHEL 
Bugzilla, I suppose) PR in a comment and then use either an explicative 
name (our current practice) or just a small number for the name of the 
testcase itself?


Thanks again,
Paolo.


Re: [v3] std::uninitialized_copy test

2012-04-17 Thread Benjamin Kosnik

 Definitely, thanks! The name of the new testcase seems a bit weird
 (for the FSF branches): shall we maybe refer to the original (Fedora
 or RHEL Bugzilla, I suppose) PR in a comment and then use either an
 explicative name (our current practice) or just a small number for
 the name of the testcase itself?

Yes, agreed. I couldn't think of anything descriptive for this test,
but if you can please feel free to assign it something meaningful.

-benjamin


[PATCH, i386]: Fix PR 53020, another victim of IOR vs OR naming difference

2012-04-17 Thread Uros Bizjak
Hello!

Correct name of atomic or named pattern is atomic_orM, not atomic_iorM.

Attached patch fixes this oversight.

2012-04-17  Uros Bizjak  ubiz...@gmail.com

PR target/53020
* config/i386/sync.md (atomic_codemode): Rename to
atomic_logicmode.

Patch was bootstrapped and tested on x86_64-pc-linux-gnu, will be
committed to all release branches.

Uros.
Index: config/i386/sync.md
===
--- config/i386/sync.md (revision 186539)
+++ config/i386/sync.md (working copy)
@@ -576,7 +576,7 @@
   return lock{%;} sub{imodesuffix}\t{%1, %0|%0, %1};
 })
 
-(define_insn atomic_codemode
+(define_insn atomic_logicmode
   [(set (match_operand:SWI 0 memory_operand +m)
(unspec_volatile:SWI
  [(any_logic:SWI (match_dup 0)


[PATCH] Fix __builtin_powi placement (PR52976 follow-up)

2012-04-17 Thread William J. Schmidt
The emergency patch for PR52976 manipulated the operand rank system to
force inserted __builtin_powi calls to occur before uses of the call
results.  However, this is generally the wrong approach, as it forces
other computations to move unnecessarily, and extends the lifetimes of
other operands.

This patch fixes the problem in the proper way, by letting the rank
system determine where the __builtin_powi call belongs, and moving the
call to that location during the expression rewrite.

Bootstrapped with no new regressions on powerpc64-linux.  SPEC cpu2000
and cpu2006 also build cleanly.  Ok for trunk?

Thanks,
Bill


gcc:

2012-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

* tree-ssa-reassoc.c (add_to_ops_vec_max_rank): Delete.
(possibly_move_powi): New function.
(rewrite_expr_tree): Call possibly_move_powi.
(rewrite_expr_tree_parallel): Likewise.
(attempt_builtin_powi): Change call of add_to_ops_vec_max_rank to
call add_to_ops_vec instead.


gcc/testsuite:

2012-04-17  Bill Schmidt  wschm...@linux.vnet.ibm.com

gfortran.dg/reassoc_11.f: New test.



Index: gcc/testsuite/gfortran.dg/reassoc_11.f
===
--- gcc/testsuite/gfortran.dg/reassoc_11.f  (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_11.f  (revision 0)
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! { dg-options -O3 -ffast-math }
+
+! This tests only for compile-time failure, which formerly occurred
+! when a __builtin_powi was introduced by reassociation in a bad place.
+
+  SUBROUTINE GRDURBAN(URBWSTR, ZIURB, GRIDHT)
+
+  IMPLICIT NONE
+  INTEGER :: I
+  REAL :: SW2, URBWSTR, ZIURB, GRIDHT(87)
+
+  SAVE 
+
+  SW2 = 1.6*(GRIDHT(I)/ZIURB)**0.667*URBWSTR**2
+
+  END
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 186495)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -544,28 +544,6 @@ add_repeat_to_ops_vec (VEC(operand_entry_t, heap)
   reassociate_stats.pows_encountered++;
 }
 
-/* Add an operand entry to *OPS for the tree operand OP, giving the
-   new entry a larger rank than any other operand already in *OPS.  */
-
-static void
-add_to_ops_vec_max_rank (VEC(operand_entry_t, heap) **ops, tree op)
-{
-  operand_entry_t oe = (operand_entry_t) pool_alloc (operand_entry_pool);
-  operand_entry_t oe1;
-  unsigned i;
-  unsigned max_rank = 0;
-
-  FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe1)
-if (oe1-rank  max_rank)
-  max_rank = oe1-rank;
-
-  oe-op = op;
-  oe-rank = max_rank + 1;
-  oe-id = next_operand_entry_id++;
-  oe-count = 1;
-  VEC_safe_push (operand_entry_t, heap, *ops, oe);
-}
-
 /* Return true if STMT is reassociable operation containing a binary
operation with tree code CODE, and is inside LOOP.  */
 
@@ -2162,6 +2242,47 @@ remove_visited_stmt_chain (tree var)
 }
 }
 
+/* If OP is an SSA name, find its definition and determine whether it
+   is a call to __builtin_powi.  If so, move the definition prior to
+   STMT.  Only do this during early reassociation.  */
+
+static void
+possibly_move_powi (gimple stmt, tree op)
+{
+  gimple stmt2;
+  tree fndecl;
+  gimple_stmt_iterator gsi1, gsi2;
+
+  if (!first_pass_instance
+  || !flag_unsafe_math_optimizations
+  || TREE_CODE (op) != SSA_NAME)
+return;
+  
+  stmt2 = SSA_NAME_DEF_STMT (op);
+
+  if (!is_gimple_call (stmt2)
+  || !has_single_use (gimple_call_lhs (stmt2)))
+return;
+
+  fndecl = gimple_call_fndecl (stmt2);
+
+  if (!fndecl
+  || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return;
+
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+CASE_FLT_FN (BUILT_IN_POWI):
+  break;
+default:
+  return;
+}
+
+  gsi1 = gsi_for_stmt (stmt);
+  gsi2 = gsi_for_stmt (stmt2);
+  gsi_move_before (gsi2, gsi1);
+}
+
 /* This function checks three consequtive operands in
passed operands vector OPS starting from OPINDEX and
swaps two operands if it is profitable for binary operation
@@ -2267,6 +2388,8 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
 
+ possibly_move_powi (stmt, oe1-op);
+ possibly_move_powi (stmt, oe2-op);
}
   return;
 }
@@ -2312,6 +2435,8 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
  fprintf (dump_file,  into );
  print_gimple_stmt (dump_file, stmt, 0, 0);
}
+
+  possibly_move_powi (stmt, oe-op);
 }
   /* Recurse on the LHS of the binary operator, which is guaranteed to
  be the non-leaf side.  */
@@ -2485,6 +2610,9 @@ rewrite_expr_tree_parallel (gimple stmt, int width
  fprintf (dump_file,  into );
  print_gimple_stmt (dump_file, stmts[i], 0, 0);
}
+
+  possibly_move_powi (stmts[i], op1);
+  possibly_move_powi (stmts[i], op2);
 }
 
   remove_visited_stmt_chain (last_rhs1);

[PATCH, i386] V4DF __builtin_shuffle

2012-04-17 Thread Marc Glisse

Hello,

this patch expands __builtin_shuffle for V4DF mode in at most 3 insn. It 
is simple and works really well, often generates only 2 insn. It is not 
very generic, because other modes don't have an instruction equivalent to 
vshufpd. For V8SF (and likely V4DI and V8SI with AVX2, but I still need to 
do that), my patch default case in PR 52607 seems more interesting.


I tried calling this new function after expand_vec_perm_vperm2f128_vblend 
(instead of before as in the patch), but it generated more instructions 
for some permutations, and never less. That function is still useful for 
V8SF though.


I bootstrapped gcc on a non-avx platform, compiled a program that tests 
all 4096 shuffles with -mavx/-mavx2, and ran the result using Intel's 
emulator (SDE).


There are still a few V4DF permutations that don't generate an optimal 
sequence (3 insn instead of 2), but not that many I think. Of course, I am 
assuming a constant cost of 1 per insn, which is completely false, but 
seems like a sensible first approximation.


(note that I can't commit)


2012-04-17  Marc Glisse  marc.gli...@inria.fr

PR target/502607
* config/i386/i386.c (ix86_expand_vec_perm_const): Move code to ...
(canonicalize_perm): ... new function.
(expand_vec_perm_2vperm2f128_vshuf): New function.
(ix86_expand_vec_perm_const_1): Call it.

--
Marc GlisseIndex: config/i386/i386.c
===
--- config/i386/i386.c  (revision 186523)
+++ config/i386/i386.c  (working copy)
@@ -32946,6 +32946,7 @@
   bool testing_p;
 };
 
+static bool canonicalize_perm (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_1 (struct expand_vec_perm_d *d);
 static bool expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d);
 
@@ -37003,6 +37004,57 @@
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Implement a V4DF
+   permutation using two vperm2f128, followed by a vshufpd insn blending
+   the two vectors together.  */
+
+static bool
+expand_vec_perm_2vperm2f128_vshuf (struct expand_vec_perm_d *d)
+{
+  struct expand_vec_perm_d dfirst, dsecond, dthird;
+  bool ok;
+
+  if (!TARGET_AVX || (d-vmode != V4DFmode))
+return false;
+
+  if (d-testing_p)
+return true;
+
+  dfirst = *d;
+  dsecond = *d;
+  dthird = *d;
+
+  dfirst.perm[0] = (d-perm[0]  ~1);
+  dfirst.perm[1] = (d-perm[0]  ~1) + 1;
+  dfirst.perm[2] = (d-perm[2]  ~1);
+  dfirst.perm[3] = (d-perm[2]  ~1) + 1;
+  dsecond.perm[0] = (d-perm[1]  ~1);
+  dsecond.perm[1] = (d-perm[1]  ~1) + 1;
+  dsecond.perm[2] = (d-perm[3]  ~1);
+  dsecond.perm[3] = (d-perm[3]  ~1) + 1;
+  dthird.perm[0] = (d-perm[0] % 2);
+  dthird.perm[1] = (d-perm[1] % 2) + 4;
+  dthird.perm[2] = (d-perm[2] % 2) + 2;
+  dthird.perm[3] = (d-perm[3] % 2) + 6;
+
+  dfirst.target = gen_reg_rtx (dfirst.vmode);
+  dsecond.target = gen_reg_rtx (dsecond.vmode);
+  dthird.op0 = dfirst.target;
+  dthird.op1 = dsecond.target;
+  dthird.one_operand_p = false;
+
+  canonicalize_perm (dfirst);
+  canonicalize_perm (dsecond);
+
+  ok = expand_vec_perm_1 (dfirst)
+expand_vec_perm_1 (dsecond)
+expand_vec_perm_1 (dthird);
+
+  gcc_assert (ok);
+
+  return true;
+}
+
 /* A subroutine of expand_vec_perm_even_odd_1.  Implement the double-word
permutation with two pshufb insns and an ior.  We should have already
failed all two instruction sequences.  */
@@ -37652,6 +37704,9 @@
 
   /* Try sequences of three instructions.  */
 
+  if (expand_vec_perm_2vperm2f128_vshuf (d))
+return true;
+
   if (expand_vec_perm_pshufb2 (d))
 return true;
 
@@ -37689,12 +37744,56 @@
   return false;
 }
 
+/* If a permutation only uses one operand, make it clear. Returns true
+   if the permutation references both operands.  */
+
+static bool
+canonicalize_perm (struct expand_vec_perm_d *d)
+{
+  int i, which, nelt = d-nelt;
+
+  for (i = which = 0; i  nelt; ++i)
+  which |= (d-perm[i]  nelt ? 1 : 2);
+
+  d-one_operand_p = true;
+  switch (which)
+{
+default:
+  gcc_unreachable();
+
+case 3:
+  if (!rtx_equal_p (d-op0, d-op1))
+{
+ d-one_operand_p = false;
+ break;
+}
+  /* The elements of PERM do not suggest that only the first operand
+is used, but both operands are identical.  Allow easier matching
+of the permutation by folding the permutation into the single
+input vector.  */
+  /* FALLTHRU */
+
+case 2:
+  for (i = 0; i  nelt; ++i)
+d-perm[i] = nelt - 1;
+  d-op0 = d-op1;
+  break;
+
+case 1:
+  d-op1 = d-op0;
+  break;
+}
+
+  return (which == 3);
+}
+
 bool
 ix86_expand_vec_perm_const (rtx operands[4])
 {
   struct expand_vec_perm_d d;
   unsigned char perm[MAX_VECT_LEN];
-  int i, nelt, which;
+  int i, nelt;
+  bool two_args;
   rtx sel;
 
   d.target = operands[0];
@@ -37711,45 +37810,16 @@
   gcc_assert (XVECLEN (sel, 0) == nelt);
   gcc_checking_assert (sizeof 

Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread H.J. Lu
On Sun, Apr 15, 2012 at 8:11 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 The comment in alias.c says:

   The contents of an ADDRESS is not normally used, the mode of the
   ADDRESS determines whether the ADDRESS is a function argument or some
   other special value.  Pointer equality, not rtx_equal_p, determines whether
   two ADDRESS expressions refer to the same base address.

   The only use of the contents of an ADDRESS is for determining if the
   current function performs nonlocal memory memory references for the
   purposes of marking the function as a constant function.  */

 The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
 always given ADDRESS the full recursive treatment, rather than saying
 that pointer equality determines ADDRESS equality.  (This is in contrast
 to something like VALUE, where pointer equality is used.)  And AFAICT
 we've always had:

 static int
 base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
                  enum machine_mode y_mode)
 {
  ...
  /* If the base addresses are equal nothing is known about aliasing.  */
  if (rtx_equal_p (x_base, y_base))
    return 1;
  ...
 }

 So I think the contents of an ADDRESS _are_ used to distinguish
 between different bases.

 The second paragraph ceased to be true in 2005 when the pure/const
 analysis moved to its own IPA pass.  Nothing now looks at the contents
 beyond rtx_equal_p.

 Also, base_alias_check effectively treats all arguments as a single base.
 That makes conceptual sense, because this analysis isn't strong enough
 to determine whether arguments are base values at all, never mind whether
 accesses based on different arguments conflict.  But the fact that we have
 a single base isn't obvious from the way the code is written, because we
 create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
 See:

  for (i = 0; i  FIRST_PSEUDO_REGISTER; i++)
    /* Check whether this register can hold an incoming pointer
       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
       numbers, so translate if necessary due to register windows.  */
    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
         HARD_REGNO_MODE_OK (i, Pmode))
      static_reg_base_value[i]
        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));

 and:

      /* Check for an argument passed in memory.  Only record in the
         copying-arguments block; it is too hard to track changes
         otherwise.  */
      if (copying_arguments
           (XEXP (src, 0) == arg_pointer_rtx
              || (GET_CODE (XEXP (src, 0)) == PLUS
                   XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
        return gen_rtx_ADDRESS (VOIDmode, src);

 I think it would be cleaner and less wasteful to use a single rtx for
 the single base (really potential base).

 So if we wanted to, we could now remove the operand from ADDRESS and
 simply rely on pointer equality.  I'm a bit reluctant to do that though.
 It would make debugging harder, and it would mean either adding knowledge
 of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
 or adding special ADDRESS shortcuts to alias.c.  But I think the code
 would be more obvious if we replaced the rtx operand with a unique id,
 which is what we already use for the REG_NOALIAS case:

      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
                                                   GEN_INT (unique_id++));

 And if we do that, we can make the id a direct operand of the ADDRESS,
 rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.

  [*] I'm trying to get rid of CONST_INTs like these that have
      no obvious mode.

 All of which led to the patch below.  I checked that it didn't change
 the code generated at -O2 for a recent set of cc1 .ii files.  Also
 bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

 To cover my back: I'm just trying to rewrite the current code according
 to its current assumptions.  Whether those assumptions are correct or not
 is always open to debate...

 Richard


 gcc/
        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
        * alias.c (reg_base_value): Expand and update comment.
        (arg_base_value): New variable.
        (unique_id): Move up file.
        (unique_base_value, unique_base_value_p, known_base_value_p): New.
        (find_base_value): Use arg_base_value and known_base_value_p.
        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
        (find_base_term): Use known_base_value_p.
        (base_alias_check): Use unique_base_value_p.
        (init_alias_target): Initialize arg_base_value.  Use unique_base_value.
        (init_alias_analysis): Use 1 as the first id for REG_NOALIAS bases.


This breaks bootstrap on Linux/x86:


home/regress/tbox/native/build/./gcc/xgcc
-B/home/regress/tbox/native/build/./gcc/
-B/home/regress/tbox/objs/i686-pc-linux-gnu/bin/

[google/google-main] Fix for unused variable warning in libgcov.c (issue6052049)

2012-04-17 Thread Teresa Johnson
I have a patch to fix a compile time warning about an unused variable due
to the use being guarded by #ifndef __GCOV_KERNEL__.

Tested with bootstrap. Ok for google-main?

Teresa

2012-04-17   Teresa Johnson  tejohn...@google.com

Google ref b/5910724.
* libgcc/libgcov.c (gcov_cur_module_id): Guard definition under
#ifndef __GCOV_KERNEL__.

Index: libgcov.c
===
--- libgcov.c   (revision 186282)
+++ libgcov.c   (working copy)
@@ -153,10 +153,10 @@ static gcov_unsigned_t gcov_crc32;
 
 /* Size of the longest file name. */
 static size_t gcov_max_filename = 0;
-#endif /* __GCOV_KERNEL__ */
 
 /* Unique identifier assigned to each module (object file).  */
 static gcov_unsigned_t gcov_cur_module_id = 0;
+#endif /* __GCOV_KERNEL__ */
 
 /* Pointer to the direct-call counters (per call-site counters).
Initialized by the caller.  */

--
This patch is available for review at http://codereview.appspot.com/6052049


Re: [patch] Remove strange case cost code

2012-04-17 Thread Xinliang David Li
On Tue, Apr 17, 2012 at 1:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Note that it would make a lot of sense to teach this heuristics predict.c
  and properly identify chars.

 Indeed this would be the proper place to implement this logic.

 TO a degree - switch expansion needs more info than it can obtain from edge
 profile.  Having
 switch
  case 1,3,5,7,8,9: aaa
  case 2,4,6,8,10,12: bbb
 to produce well ballanced decision tree, it is not enough to know how
 often the value is even and how often it is odd...

Why is that? In this case, the expanded switch case does not use BST,
but testing against bit patterns.


 Thus there is a need for value histograms.

None of the existing value profiler will be powerful enough for this
though: the one_value profiler only tracks one value. The interval
profiler can potentially be used if the switch case range is small --
otherwise the runtime memory overhead will be too large.

Thanks,

David


  Also it is possble to get an historgrams from profile feedback into
  switch expansion. I always wanted to do that once switch expansion code
  is cleaned up and moved to gimple level...

 Indeed.  At least the parts that expand switch stmts to (balanced) trees
 should be moved to the GIMPLE level, retaining only the table-jump-like
 expansions as switch stmts.

 Yep.
 Honza

 
 
  The attached patch removes the heuristic.
 
  Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

 Ok.

 Thanks,
 Richard.

  Ciao!
  Steven
 
 


Re: [google/google-main] Fix for unused variable warning in libgcov.c (issue6052049)

2012-04-17 Thread Xinliang David Li
ok.

David

On Tue, Apr 17, 2012 at 11:40 AM, Teresa Johnson tejohn...@google.com wrote:
 I have a patch to fix a compile time warning about an unused variable due
 to the use being guarded by #ifndef __GCOV_KERNEL__.

 Tested with bootstrap. Ok for google-main?

 Teresa

 2012-04-17   Teresa Johnson  tejohn...@google.com

        Google ref b/5910724.
        * libgcc/libgcov.c (gcov_cur_module_id): Guard definition under
        #ifndef __GCOV_KERNEL__.

 Index: libgcov.c
 ===
 --- libgcov.c   (revision 186282)
 +++ libgcov.c   (working copy)
 @@ -153,10 +153,10 @@ static gcov_unsigned_t gcov_crc32;

  /* Size of the longest file name. */
  static size_t gcov_max_filename = 0;
 -#endif /* __GCOV_KERNEL__ */

  /* Unique identifier assigned to each module (object file).  */
  static gcov_unsigned_t gcov_cur_module_id = 0;
 +#endif /* __GCOV_KERNEL__ */

  /* Pointer to the direct-call counters (per call-site counters).
    Initialized by the caller.  */

 --
 This patch is available for review at http://codereview.appspot.com/6052049


Re: [PATCH, i386, Android] Add Android support for i386 target

2012-04-17 Thread Maxim Kuvyrkov
On 18/04/2012, at 2:32 AM, Ilya Enkovich wrote:

 On Tue, Apr 17, 2012 at 3:16 PM, Uros Bizjak ubiz...@gmail.com wrote:
...
 
 The patch looks OK to me in the sense, that there is no difference for
 x86 targets.
 
 So, OK for x86.
 
 Thanks,
 Uros.
 
 Thanks, Uros!
 
 Maxim, could you please look at patch?

The Android parts of the patch are also good.  Given that Uros approved the x86 
pieces, you are clear to check in.

Ilya, thanks for bearing with the us and reworking your patch after the reviews.
--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




Re: RFA: Clean up ADDRESS handling in alias.c

2012-04-17 Thread Richard Sandiford
H.J. Lu hjl.to...@gmail.com writes:
 On Sun, Apr 15, 2012 at 8:11 AM, Richard Sandiford
 rdsandif...@googlemail.com wrote:
 The comment in alias.c says:

   The contents of an ADDRESS is not normally used, the mode of the
   ADDRESS determines whether the ADDRESS is a function argument or some
   other special value.  Pointer equality, not rtx_equal_p, determines whether
   two ADDRESS expressions refer to the same base address.

   The only use of the contents of an ADDRESS is for determining if the
   current function performs nonlocal memory memory references for the
   purposes of marking the function as a constant function.  */

 The first paragraph is a bit misleading IMO.  AFAICT, rtx_equal_p has
 always given ADDRESS the full recursive treatment, rather than saying
 that pointer equality determines ADDRESS equality.  (This is in contrast
 to something like VALUE, where pointer equality is used.)  And AFAICT
 we've always had:

 static int
 base_alias_check (rtx x, rtx y, enum machine_mode x_mode,
                  enum machine_mode y_mode)
 {
  ...
  /* If the base addresses are equal nothing is known about aliasing.  */
  if (rtx_equal_p (x_base, y_base))
    return 1;
  ...
 }

 So I think the contents of an ADDRESS _are_ used to distinguish
 between different bases.

 The second paragraph ceased to be true in 2005 when the pure/const
 analysis moved to its own IPA pass.  Nothing now looks at the contents
 beyond rtx_equal_p.

 Also, base_alias_check effectively treats all arguments as a single base.
 That makes conceptual sense, because this analysis isn't strong enough
 to determine whether arguments are base values at all, never mind whether
 accesses based on different arguments conflict.  But the fact that we have
 a single base isn't obvious from the way the code is written, because we
 create several separate, non-rtx_equal_p, ADDRESSes to represent arguments.
 See:

  for (i = 0; i  FIRST_PSEUDO_REGISTER; i++)
    /* Check whether this register can hold an incoming pointer
       argument.  FUNCTION_ARG_REGNO_P tests outgoing register
       numbers, so translate if necessary due to register windows.  */
    if (FUNCTION_ARG_REGNO_P (OUTGOING_REGNO (i))
         HARD_REGNO_MODE_OK (i, Pmode))
      static_reg_base_value[i]
        = gen_rtx_ADDRESS (VOIDmode, gen_rtx_REG (Pmode, i));

 and:

      /* Check for an argument passed in memory.  Only record in the
         copying-arguments block; it is too hard to track changes
         otherwise.  */
      if (copying_arguments
           (XEXP (src, 0) == arg_pointer_rtx
              || (GET_CODE (XEXP (src, 0)) == PLUS
                   XEXP (XEXP (src, 0), 0) == arg_pointer_rtx)))
        return gen_rtx_ADDRESS (VOIDmode, src);

 I think it would be cleaner and less wasteful to use a single rtx for
 the single base (really potential base).

 So if we wanted to, we could now remove the operand from ADDRESS and
 simply rely on pointer equality.  I'm a bit reluctant to do that though.
 It would make debugging harder, and it would mean either adding knowledge
 of this alias-specific code to other files (specifically rtl.c:rtx_equal_p),
 or adding special ADDRESS shortcuts to alias.c.  But I think the code
 would be more obvious if we replaced the rtx operand with a unique id,
 which is what we already use for the REG_NOALIAS case:

      new_reg_base_value[regno] = gen_rtx_ADDRESS (Pmode,
                                                   GEN_INT (unique_id++));

 And if we do that, we can make the id a direct operand of the ADDRESS,
 rather than a CONST_INT subrtx[*].  That should make rtx_equal_p cheaper too.

  [*] I'm trying to get rid of CONST_INTs like these that have
      no obvious mode.

 All of which led to the patch below.  I checked that it didn't change
 the code generated at -O2 for a recent set of cc1 .ii files.  Also
 bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

 To cover my back: I'm just trying to rewrite the current code according
 to its current assumptions.  Whether those assumptions are correct or not
 is always open to debate...

 Richard


 gcc/
        * rtl.def (ADDRESS): Turn operand into a HOST_WIDE_INT.
        * alias.c (reg_base_value): Expand and update comment.
        (arg_base_value): New variable.
        (unique_id): Move up file.
        (unique_base_value, unique_base_value_p, known_base_value_p): New.
        (find_base_value): Use arg_base_value and known_base_value_p.
        (record_set): Document REG_NOALIAS handling.  Use unique_base_value.
        (find_base_term): Use known_base_value_p.
        (base_alias_check): Use unique_base_value_p.
        (init_alias_target): Initialize arg_base_value.  Use 
 unique_base_value.
        (init_alias_analysis): Use 1 as the first id for REG_NOALIAS bases.


 This breaks bootstrap on Linux/x86:


 home/regress/tbox/native/build/./gcc/xgcc
 -B/home/regress/tbox/native/build/./gcc/
 -B/home/regress/tbox/objs/i686-pc-linux-gnu/bin/
 

Re: [patch] Remove strange case cost code

2012-04-17 Thread Jan Hubicka
 On Tue, Apr 17, 2012 at 1:48 AM, Jan Hubicka hubi...@ucw.cz wrote:
   Note that it would make a lot of sense to teach this heuristics predict.c
   and properly identify chars.
 
  Indeed this would be the proper place to implement this logic.
 
  TO a degree - switch expansion needs more info than it can obtain from edge
  profile.  Having
  switch
   case 1,3,5,7,8,9: aaa
   case 2,4,6,8,10,12: bbb
  to produce well ballanced decision tree, it is not enough to know how
  often the value is even and how often it is odd...
 
 Why is that? In this case, the expanded switch case does not use BST,
 but testing against bit patterns.

Yep, oversimplified example... 
 
 
  Thus there is a need for value histograms.
 
 None of the existing value profiler will be powerful enough for this
 though: the one_value profiler only tracks one value. The interval
 profiler can potentially be used if the switch case range is small --
 otherwise the runtime memory overhead will be too large.

Adding profiler to profile individual value ranges is not that hard...
But indeed, at the moment we have single value profiler only...

Honza


[PING] iwMMXt patches

2012-04-17 Thread Matt Turner
Are these patches ready to go in? It looks like they were ack'd.

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html

We (OLPC) will need these patches for reasonable iwMMXt performance
and the ability to use VFP and iwMMXt together.

Thanks,
Matt


Wider modes for partial int modes

2012-04-17 Thread Bernd Schmidt
This patch enables GET_MODE_WIDER_MODE for MODE_PARTIAL_INT (by setting
the wider mode to the one the partial mode is based on), which is useful
for the port I'm working on: I can avoid defining operations on the
partial modes. Also, convert_modes is changed so that unsignedp is taken
into account when widening partial modes.

I've tested this on m32c-elf as well as on my port, and bootstrapped on
i686-linux. Ok?


Bernd
* machmode.h (CLASS_HAS_WIDER_MODES_P): True for MODE_PARTIAL_INT.
* expr.c (convert_move): Honor unsignedp when extending partial int
modes.
* genmodes.c (power_of_two_p, regular_mode, make_complex_modes,
emit_mode_wider): Revert Spider hacks.
(complete_mode): Don't clear component field of partial int modes.
(emit_mode_inner): Don't emit it however.
(calc_wider_mode): Partial int modes widen to their component.

Index: machmode.h
===
--- machmode.h  (revision 186270)
+++ machmode.h  (working copy)
@@ -166,6 +166,7 @@ extern const unsigned char mode_class[NU
 /* Nonzero if CLASS modes can be widened.  */
 #define CLASS_HAS_WIDER_MODES_P(CLASS) \
   (CLASS == MODE_INT   \
+   || CLASS == MODE_PARTIAL_INT\
|| CLASS == MODE_FLOAT  \
|| CLASS == MODE_DECIMAL_FLOAT  \
|| CLASS == MODE_COMPLEX_FLOAT  \
Index: expr.c
===
--- expr.c  (revision 186270)
+++ expr.c  (working copy)
@@ -438,21 +438,20 @@ convert_move (rtx to, rtx from, int unsi
   rtx new_from;
   enum machine_mode full_mode
= smallest_mode_for_size (GET_MODE_BITSIZE (from_mode), MODE_INT);
+  convert_optab ctab = unsignedp ? zext_optab : sext_optab;
+  enum insn_code icode;
 
-  gcc_assert (convert_optab_handler (sext_optab, full_mode, from_mode)
- != CODE_FOR_nothing);
+  icode = convert_optab_handler (ctab, full_mode, from_mode);
+  gcc_assert (icode != CODE_FOR_nothing);
 
   if (to_mode == full_mode)
{
- emit_unop_insn (convert_optab_handler (sext_optab, full_mode,
-from_mode),
- to, from, UNKNOWN);
+ emit_unop_insn (icode, to, from, UNKNOWN);
  return;
}
 
   new_from = gen_reg_rtx (full_mode);
-  emit_unop_insn (convert_optab_handler (sext_optab, full_mode, from_mode),
- new_from, from, UNKNOWN);
+  emit_unop_insn (icode, new_from, from, UNKNOWN);
 
   /* else proceed to integer conversions below.  */
   from_mode = full_mode;
Index: genmodes.c
===
--- genmodes.c  (revision 186270)
+++ genmodes.c  (working copy)
@@ -360,7 +360,6 @@ complete_mode (struct mode_data *m)
   m-bytesize = m-component-bytesize;
 
   m-ncomponents = 1;
-  m-component = 0;  /* ??? preserve this */
   break;
 
 case MODE_COMPLEX_INT:
@@ -823,7 +822,13 @@ calc_wider_mode (void)
 
  sortbuf[i] = 0;
  for (j = 0; j  i; j++)
-   sortbuf[j]-next = sortbuf[j]-wider = sortbuf[j + 1];
+   {
+ sortbuf[j]-next = sortbuf[j + 1];
+ if (c == MODE_PARTIAL_INT)
+   sortbuf[j]-wider = sortbuf[j]-component;
+ else
+   sortbuf[j]-wider = sortbuf[j]-next;
+   }
 
  modes[c] = sortbuf[0];
}
@@ -1120,7 +1125,8 @@ emit_mode_inner (void)
 
   for_all_modes (c, m)
 tagged_printf (%smode,
-  m-component ? m-component-name : void_mode-name,
+  c != MODE_PARTIAL_INT  m-component
+  ? m-component-name : void_mode-name,
   m-name);
 
   print_closer ();


[patch] Cleanup tree-switch-conversion a bit

2012-04-17 Thread Steven Bosscher
Hello,

This is another step towards moving GIMPLE_SWITCH expansion to an
earlier point in the pipeline.

With the attached patch, some of the logic from stmt.c:add_case_node()
is moved to gimplify.c:gimplify_switch_expr(). This includes:

* Code to drop case labels that are out of range for the switch index
expression. (Actually, I suspect this code hasn't worked properly
since gimplification was introduced, because the switch index
expression can be promoted by language specific gimplification, so
expand_case never actually sees the proper type with the current
implementation in stmt.c.)

* Code to fold_convert case label values to the right type. I've opted
to go for folding to the original type of the SWITCH_EXPR, rather than
to the post-gimplification switch index type.

* Code to canonicalize CASE_LABEL's subnodes, CASE_LOW and CASE_HIGH.
I've chosen to impose strict requirements that CASE_HIGH  CASE_LOW if
CASE_HIGH is non-zero. This is different from what add_case_node does,
but I think it makes sense to go for the minimal representation here:
The case labels in stmt.c never lived very long (only during expand)
but GIMPLE_SWITCH statements stay around for much of the compilation
process and can also be streamed out, etc.

Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

Ciao!
Steven
* targhooks.c (default_case_values_threshold): Fix code style nit.

* stmt.c (add_case_node, expand_case): Move logic to remove/reduce
case range and type folding from here...
* gimplify.c (gimplify_switch_expr): ... to here.

Index: targhooks.c
===
--- targhooks.c (revision 186526)
+++ targhooks.c (working copy)
@@ -1200,7 +1200,8 @@ default_target_can_inline_p (tree caller, tree cal
this means extra overhead for dispatch tables, which raises the
threshold for using them.  */
 
-unsigned int default_case_values_threshold (void)
+unsigned int
+default_case_values_threshold (void)
 {
   return (HAVE_casesi ? 4 : 5);
 }
Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 186526)
+++ tree-switch-conversion.c(working copy)
@@ -24,8 +24,8 @@ Software Foundation, 51 Franklin Street, Fifth Flo
  Switch initialization conversion
 
 The following pass changes simple initializations of scalars in a switch
-statement into initializations from a static array.  Obviously, the values must
-be constant and known at compile time and a default branch must be
+statement into initializations from a static array.  Obviously, the values
+must be constant and known at compile time and a default branch must be
 provided.  For example, the following code:
 
 int a,b;
@@ -162,16 +162,12 @@ struct switch_conv_info
   basic_block bit_test_bb[2];
 };
 
-/* Global pass info.  */
-static struct switch_conv_info info;
-
-
 /* Checks whether the range given by individual case statements of the SWTCH
switch statement isn't too big and whether the number of branches actually
satisfies the size of the new array.  */
 
 static bool
-check_range (gimple swtch)
+check_range (gimple swtch, struct switch_conv_info *info)
 {
   tree min_case, max_case;
   unsigned int branch_num = gimple_switch_num_labels (swtch);
@@ -181,7 +177,7 @@ static bool
  is a default label which is the first in the vector.  */
 
   min_case = gimple_switch_label (swtch, 1);
-  info.range_min = CASE_LOW (min_case);
+  info-range_min = CASE_LOW (min_case);
 
   gcc_assert (branch_num  1);
   gcc_assert (CASE_LOW (gimple_switch_label (swtch, 0)) == NULL_TREE);
@@ -191,22 +187,22 @@ static bool
   else
 range_max = CASE_LOW (max_case);
 
-  gcc_assert (info.range_min);
+  gcc_assert (info-range_min);
   gcc_assert (range_max);
 
-  info.range_size = int_const_binop (MINUS_EXPR, range_max, info.range_min);
+  info-range_size = int_const_binop (MINUS_EXPR, range_max, info-range_min);
 
-  gcc_assert (info.range_size);
-  if (!host_integerp (info.range_size, 1))
+  gcc_assert (info-range_size);
+  if (!host_integerp (info-range_size, 1))
 {
-  info.reason = index range way too large or otherwise unusable.\n;
+  info-reason = index range way too large or otherwise unusable;
   return false;
 }
 
-  if ((unsigned HOST_WIDE_INT) tree_low_cst (info.range_size, 1)
+  if ((unsigned HOST_WIDE_INT) tree_low_cst (info-range_size, 1)
((unsigned) branch_num * SWITCH_CONVERSION_BRANCH_RATIO))
 {
-  info.reason = the maximum range-branch ratio exceeded.\n;
+  info-reason = the maximum range-branch ratio exceeded;
   return false;
 }
 
@@ -219,7 +215,7 @@ static bool
and returns true.  Otherwise returns false.  */
 
 static bool
-check_process_case (tree cs)
+check_process_case (tree cs, struct switch_conv_info *info)
 {
   tree ldecl;
   basic_block label_bb, following_bb;
@@ -228,48 +224,48 @@ static bool
   

[patch] Move add_case_node logic from stmt.c to gimplify.c

2012-04-17 Thread Steven Bosscher
On Wed, Apr 18, 2012 at 12:04 AM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 This is another step towards moving GIMPLE_SWITCH expansion to an
 earlier point in the pipeline.

 With the attached patch, some of the logic from stmt.c:add_case_node()
 is moved to gimplify.c:gimplify_switch_expr(). This includes:

 * Code to drop case labels that are out of range for the switch index
 expression. (Actually, I suspect this code hasn't worked properly
 since gimplification was introduced, because the switch index
 expression can be promoted by language specific gimplification, so
 expand_case never actually sees the proper type with the current
 implementation in stmt.c.)

 * Code to fold_convert case label values to the right type. I've opted
 to go for folding to the original type of the SWITCH_EXPR, rather than
 to the post-gimplification switch index type.

 * Code to canonicalize CASE_LABEL's subnodes, CASE_LOW and CASE_HIGH.
 I've chosen to impose strict requirements that CASE_HIGH  CASE_LOW if
 CASE_HIGH is non-zero. This is different from what add_case_node does,
 but I think it makes sense to go for the minimal representation here:
 The case labels in stmt.c never lived very long (only during expand)
 but GIMPLE_SWITCH statements stay around for much of the compilation
 process and can also be streamed out, etc.

 Bootstrapped and tested on powerpc-unknown-linux-gnu. OK for trunk?

 Ciao!
 Steven

And this time with the right subject and the right patch attached.
Sorry for the inconvenience!
* targhooks.c (default_case_values_threshold): Fix code style nit.

* stmt.c (add_case_node, expand_case): Move logic to remove/reduce
case range and type folding from here...
* gimplify.c (gimplify_switch_expr): ... to here.

Index: targhooks.c
===
--- targhooks.c (revision 186526)
+++ targhooks.c (working copy)
@@ -1200,7 +1200,8 @@ default_target_can_inline_p (tree caller, tree cal
this means extra overhead for dispatch tables, which raises the
threshold for using them.  */
 
-unsigned int default_case_values_threshold (void)
+unsigned int
+default_case_values_threshold (void)
 {
   return (HAVE_casesi ? 4 : 5);
 }
Index: stmt.c
===
--- stmt.c  (revision 186526)
+++ stmt.c  (working copy)
@@ -1822,66 +1822,25 @@ expand_stack_restore (tree var)
fed to us in descending order from the sorted vector of case labels used
in the tree part of the middle end.  So the list we construct is
sorted in ascending order.  The bounds on the case range, LOW and HIGH,
-   are converted to case's index type TYPE.  */
+   are converted to case's index type TYPE.  Note that the original type
+   of the case index in the source code is usually lost during
+   gimplification due to type promotion, but the case labels retain the
+   original type.  */
 
 static struct case_node *
 add_case_node (struct case_node *head, tree type, tree low, tree high,
tree label, alloc_pool case_node_pool)
 {
-  tree min_value, max_value;
   struct case_node *r;
 
-  gcc_assert (TREE_CODE (low) == INTEGER_CST);
-  gcc_assert (!high || TREE_CODE (high) == INTEGER_CST);
+  gcc_checking_assert (low);
+  gcc_checking_assert (! high || (TREE_TYPE (low) == TREE_TYPE (high)));
 
-  min_value = TYPE_MIN_VALUE (type);
-  max_value = TYPE_MAX_VALUE (type);
-
-  /* If there's no HIGH value, then this is not a case range; it's
- just a simple case label.  But that's just a degenerate case
- range.
- If the bounds are equal, turn this into the one-value case.  */
-  if (!high || tree_int_cst_equal (low, high))
-{
-  /* If the simple case value is unreachable, ignore it.  */
-  if ((TREE_CODE (min_value) == INTEGER_CST
- tree_int_cst_compare (low, min_value)  0)
- || (TREE_CODE (max_value) == INTEGER_CST
-  tree_int_cst_compare (low, max_value)  0))
-   return head;
-  low = fold_convert (type, low);
-  high = low;
-}
-  else
-{
-  /* If the entire case range is unreachable, ignore it.  */
-  if ((TREE_CODE (min_value) == INTEGER_CST
- tree_int_cst_compare (high, min_value)  0)
- || (TREE_CODE (max_value) == INTEGER_CST
-  tree_int_cst_compare (low, max_value)  0))
-   return head;
-
-  /* If the lower bound is less than the index type's minimum
-value, truncate the range bounds.  */
-  if (TREE_CODE (min_value) == INTEGER_CST
- tree_int_cst_compare (low, min_value)  0)
-   low = min_value;
-  low = fold_convert (type, low);
-
-  /* If the upper bound is greater than the index type's maximum
-value, truncate the range bounds.  */
-  if (TREE_CODE (max_value) == INTEGER_CST
-  tree_int_cst_compare (high, max_value)  0)
-   high = max_value;
-  high = fold_convert (type, 

[committed] avoid @opindex before @item in invoke.texi

2012-04-17 Thread Manuel López-Ibáñez
Otherwise, it starts a new paragraph.Tested by inspecting the
resulting html. Committed as obvious.

Cheers,

Manuel.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 186552)
+++ gcc/doc/invoke.texi (working copy)
@@ -2875,8 +2875,8 @@
 line-wrapping is done; each error message appears on a single
 line.

+@item -fdiagnostics-show-location=once
 @opindex fdiagnostics-show-location
-@item -fdiagnostics-show-location=once
 Only meaningful in line-wrapping mode.  Instructs the diagnostic messages
 reporter to emit @emph{once} source location information; that is, in
 case the message is too long to fit on a single physical line and has to
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 186552)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2012-04-18  Manuel López-Ibáñez  m...@gcc.gnu.org
+
+* doc/invoke.texi (Language Independent Options): @item should be
+   before @opindex.
+


various minor obvious fixes in th track-macro-expansion code

2012-04-17 Thread Manuel López-Ibáñez
Hi Dodji,

I was going to commit this as obvious, but I want to make sure that it
doesn't conflict with your new track-macro-expansion patches. It can
also wait until you commit all your patches.

Cheers,

Manuel.

2012-04-18  Manuel López-Ibáñez  m...@gcc.gnu.org

* tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Fix
comment. Delete unused parameter first_exp_point_map.
(virt_loc_aware_diagnostic_finalizer): Update call.
libcpp/
* line-map.c (linemap_resolve_location): Synchronize comments with
those in line-map.h.
* include/line-map.h (linemap_resolve_location): Fix spelling in
comment.


macro-fixes.diff
Description: Binary data


New Vietnamese PO file for 'cpplib' (version 4.7.0)

2012-04-17 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-4.7.0.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Contents of PO file 'cpplib-4.7.0.vi.po'

2012-04-17 Thread Translation Project Robot


cpplib-4.7.0.vi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org


Re: [PATCH, Android] MIPS support

2012-04-17 Thread Maxim Kuvyrkov
On 5/04/2012, at 10:16 AM, Maxim Kuvyrkov wrote:

 Chao,
 
 Let's take discussion of MIPS changes to gcc-patches@.  Please follow up here.
 
 --
 Maxim Kuvyrkov
 CodeSourcery / Mentor Graphics
 
 On 5/04/2012, at 10:10 AM, Fu, Chao-Ying wrote:
 
 For now, two MIPS changes in gnu-user.h and unwind-dw2-fde-dip.c can be 
 posted for comment.
 (I didn't tested this patch, though.)

You need to test your patches before posting them for review.  Below are a 
couple of comments on your current version.

 After starting to build toolchains for Android with Bionic, we may find new 
 files to
 patch.  Ex: Comment out getpagesize() for bionic.
 
 Any comment?  Thanks a lot!
 
 Regards,
 Chao-ying
 
 Index: gcc/gcc/config/mips/gnu-user.h
 ===
 --- gcc.orig/gcc/config/mips/gnu-user.h  2012-04-03 17:39:50.0 
 -0700
 +++ gcc/gcc/config/mips/gnu-user.h   2012-04-04 14:31:50.804236000 -0700
 @@ -45,8 +45,8 @@ along with GCC; see the file COPYING3.  
 /* A standard GNU/Linux mapping.  On most targets, it is included in
   CC1_SPEC itself by config/linux.h, but mips.h overrides CC1_SPEC
   and provides this hook instead.  */
 -#undef SUBTARGET_CC1_SPEC
 -#define SUBTARGET_CC1_SPEC %{profile:-p}
 +#undef GNU_USER_SUBTARGET_CC1_SPEC
 +#define GNU_USER_SUBTARGET_CC1_SPEC %{profile:-p}
 
 /* -G is incompatible with -KPIC which is the default, so only allow objects
   in the small data section if the user explicitly asks for it.  */
 @@ -54,8 +54,8 @@ along with GCC; see the file COPYING3.  
 #define MIPS_DEFAULT_GVALUE 0
 
 /* Borrowed from sparc/linux.h */
 -#undef LINK_SPEC
 -#define LINK_SPEC \
 +#undef GNU_USER_TARGET_LINK_SPEC
 +#define GNU_USER_TARGET_LINK_SPEC \
 %(endian_spec) \
  %{shared:-shared} \
  %{!shared: \
 @@ -89,8 +89,8 @@ along with GCC; see the file COPYING3.  
 #undef ASM_OUTPUT_REG_PUSH
 #undef ASM_OUTPUT_REG_POP
 
 -#undef LIB_SPEC
 -#define LIB_SPEC \
 +#undef GNU_USER_TARGET_LIB_SPEC
 +#define GNU_USER_TARGET_LIB_SPEC \
 %{pthread:-lpthread} \
 %{shared:-lc} \
 %{!shared: \
 @@ -133,7 +133,34 @@ extern const char *host_detect_local_cpu
  LINUX_DRIVER_SELF_SPECS
 
 /* Similar to standard Linux, but adding -ffast-math support.  */
 -#undef  ENDFILE_SPEC
 -#define ENDFILE_SPEC \
 +#undef  GNU_USER_TARGET_ENDFILE_SPEC
 +#define GNN_USER_TARGET_ENDFILE_SPEC \
  %{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
   %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s

Above definitions are OK.

 +
 +#undef  LINK_SPEC
 +#define LINK_SPEC   \
 +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LINK_SPEC,   \
 +   GNU_USER_TARGET_LINK_SPEC   ANDROID_LINK_SPEC)
 +
 +#undef  SUBTARGET_CC1_SPEC
 +#define SUBTARGET_CC1_SPEC  \
 +  LINUX_OR_ANDROID_CC (GNU_USER_SUBTARGET_CC1_SPEC, \
 +   GNU_USER_SUBTARGET_CC1_SPEC   ANDROID_CC1_SPEC)
 +
 +#undef  CC1PLUS_SPEC
 +#define CC1PLUS_SPEC
 \
 +  LINUX_OR_ANDROID_CC (, ANDROID_CC1PLUS_SPEC)
 +
 +#undef  LIB_SPEC
 +#define LIB_SPEC\
 +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LIB_SPEC,\
 +   GNU_USER_TARGET_LIB_SPEC   ANDROID_LIB_SPEC)
 +
 +#undef  STARTFILE_SPEC
 +#define STARTFILE_SPEC  
 \
 +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, 
 ANDROID_STARTFILE_SPEC)
 +
 +#undef  ENDFILE_SPEC
 +#define ENDFILE_SPEC
 \
 +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_ENDFILE_SPEC, ANDROID_ENDFILE_SPEC)

The LINUX_OR_ANDROID_* definitions should be moved out of gnu-user.h, as this 
header is used for systems besides Linux, e.g., kFreeBSD and Hurd.  Please move 
these definitions to mips/linux-common.h, which will be a new file, similarly 
as i386 did in http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html .

 Index: gcc/libgcc/unwind-dw2-fde-dip.c
 ===
 --- gcc.orig/libgcc/unwind-dw2-fde-dip.c 2012-04-03 17:07:28.0 
 -0700
 +++ gcc/libgcc/unwind-dw2-fde-dip.c  2012-04-04 14:51:01.338074000 -0700
 @@ -48,8 +48,9 @@
 #include gthr.h
 
 #if !defined(inhibit_libc)  defined(HAVE_LD_EH_FRAME_HDR) \
 - (__GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__  2) \
 -|| (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  defined(DT_CONFIG)))
 + ((defined(__BIONIC__)  (defined(mips) || defined(__mips__))) \
 +|| (__GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__  2) \
 +|| (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  defined(DT_CONFIG
 # define USE_PT_GNU_EH_FRAME
 #endif

What is this change for?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



RE: [PATCH, Android] MIPS support

2012-04-17 Thread Fu, Chao-Ying
Maxim Kuvyrkov wrote:

  
  For now, two MIPS changes in gnu-user.h and 
 unwind-dw2-fde-dip.c can be posted for comment.
  (I didn't tested this patch, though.)
 
 You need to test your patches before posting them for review. 
  Below are a couple of comments on your current version.

  I can test if this patch doesn't break existing MIPS Linux GCC build.

 
  After starting to build toolchains for Android with 
 Bionic, we may find new files to
  patch.  Ex: Comment out getpagesize() for bionic.
  
  Any comment?  Thanks a lot!
  
  Regards,
  Chao-ying
  
  Index: gcc/gcc/config/mips/gnu-user.h
  ===
  --- gcc.orig/gcc/config/mips/gnu-user.h2012-04-03 
 17:39:50.0 -0700
  +++ gcc/gcc/config/mips/gnu-user.h 2012-04-04 
 14:31:50.804236000 -0700
  @@ -45,8 +45,8 @@ along with GCC; see the file COPYING3.  
  /* A standard GNU/Linux mapping.  On most targets, it is 
 included in
CC1_SPEC itself by config/linux.h, but mips.h overrides CC1_SPEC
and provides this hook instead.  */
  -#undef SUBTARGET_CC1_SPEC
  -#define SUBTARGET_CC1_SPEC %{profile:-p}
  +#undef GNU_USER_SUBTARGET_CC1_SPEC
  +#define GNU_USER_SUBTARGET_CC1_SPEC %{profile:-p}
  
  /* -G is incompatible with -KPIC which is the default, so 
 only allow objects
in the small data section if the user explicitly asks for it.  */
  @@ -54,8 +54,8 @@ along with GCC; see the file COPYING3.  
  #define MIPS_DEFAULT_GVALUE 0
  
  /* Borrowed from sparc/linux.h */
  -#undef LINK_SPEC
  -#define LINK_SPEC \
  +#undef GNU_USER_TARGET_LINK_SPEC
  +#define GNU_USER_TARGET_LINK_SPEC \
  %(endian_spec) \
   %{shared:-shared} \
   %{!shared: \
  @@ -89,8 +89,8 @@ along with GCC; see the file COPYING3.  
  #undef ASM_OUTPUT_REG_PUSH
  #undef ASM_OUTPUT_REG_POP
  
  -#undef LIB_SPEC
  -#define LIB_SPEC \
  +#undef GNU_USER_TARGET_LIB_SPEC
  +#define GNU_USER_TARGET_LIB_SPEC \
  %{pthread:-lpthread} \
  %{shared:-lc} \
  %{!shared: \
  @@ -133,7 +133,34 @@ extern const char *host_detect_local_cpu
   LINUX_DRIVER_SELF_SPECS
  
  /* Similar to standard Linux, but adding -ffast-math support.  */
  -#undef  ENDFILE_SPEC
  -#define ENDFILE_SPEC \
  +#undef  GNU_USER_TARGET_ENDFILE_SPEC
  +#define GNN_USER_TARGET_ENDFILE_SPEC \
   %{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
%{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s
 
 Above definitions are OK.

  Thanks!

 
  +
  +#undef  LINK_SPEC
  +#define LINK_SPEC 
   \
  +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LINK_SPEC, 
   \
  + GNU_USER_TARGET_LINK_SPEC   ANDROID_LINK_SPEC)
  +
  +#undef  SUBTARGET_CC1_SPEC
  +#define SUBTARGET_CC1_SPEC
   \
  +  LINUX_OR_ANDROID_CC (GNU_USER_SUBTARGET_CC1_SPEC,   
   \
  + GNU_USER_SUBTARGET_CC1_SPEC   ANDROID_CC1_SPEC)
  +
  +#undef  CC1PLUS_SPEC
  +#define CC1PLUS_SPEC  
   \
  +  LINUX_OR_ANDROID_CC (, ANDROID_CC1PLUS_SPEC)
  +
  +#undef  LIB_SPEC
  +#define LIB_SPEC  
   \
  +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_LIB_SPEC,  
   \
  + GNU_USER_TARGET_LIB_SPEC   ANDROID_LIB_SPEC)
  +
  +#undef  STARTFILE_SPEC
  +#define STARTFILE_SPEC
   \
  +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, 
 ANDROID_STARTFILE_SPEC)
  +
  +#undef  ENDFILE_SPEC
  +#define ENDFILE_SPEC  
   \
  +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_ENDFILE_SPEC, 
 ANDROID_ENDFILE_SPEC)
 
 The LINUX_OR_ANDROID_* definitions should be moved out of 
 gnu-user.h, as this header is used for systems besides Linux, 
 e.g., kFreeBSD and Hurd.  Please move these definitions to 
 mips/linux-common.h, which will be a new file, similarly as 
 i386 did in http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00944.html .

  I will check this message.

 
  Index: gcc/libgcc/unwind-dw2-fde-dip.c
  ===
  --- gcc.orig/libgcc/unwind-dw2-fde-dip.c   2012-04-03 
 17:07:28.0 -0700
  +++ gcc/libgcc/unwind-dw2-fde-dip.c2012-04-04 
 14:51:01.338074000 -0700
  @@ -48,8 +48,9 @@
  #include gthr.h
  
  #if !defined(inhibit_libc)  defined(HAVE_LD_EH_FRAME_HDR) \
  - (__GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__  2) \
  -  || (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  
 defined(DT_CONFIG)))
  + ((defined(__BIONIC__)  (defined(mips) || 
 defined(__mips__))) \
  +|| (__GLIBC__  2 || (__GLIBC__ == 2  
 __GLIBC_MINOR__  2) \
  +  || (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  
 defined(DT_CONFIG
  # define USE_PT_GNU_EH_FRAME
  #endif
 
 What is this change for?

  For stack unwinding, MIPS needs supporting functions in libgcc to 
work with eh_frame for Android.
(Note that ARM has its own 

Re: [PATCH, Android] MIPS support

2012-04-17 Thread Maxim Kuvyrkov
On 18/04/2012, at 1:10 PM, Fu, Chao-Ying wrote:

 Maxim Kuvyrkov wrote:
 
 Above definitions are OK.
 
  Thanks!

For avoidance of doubt, please wait for the whole patch to be approved before 
committing it.

 Index: gcc/libgcc/unwind-dw2-fde-dip.c
 ===
 --- gcc.orig/libgcc/unwind-dw2-fde-dip.c   2012-04-03 
 17:07:28.0 -0700
 +++ gcc/libgcc/unwind-dw2-fde-dip.c2012-04-04 
 14:51:01.338074000 -0700
 @@ -48,8 +48,9 @@
 #include gthr.h
 
 #if !defined(inhibit_libc)  defined(HAVE_LD_EH_FRAME_HDR) \
 - (__GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__  2) \
 -  || (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  
 defined(DT_CONFIG)))
 + ((defined(__BIONIC__)  (defined(mips) || 
 defined(__mips__))) \
 +|| (__GLIBC__  2 || (__GLIBC__ == 2  
 __GLIBC_MINOR__  2) \
 +  || (__GLIBC__ == 2  __GLIBC_MINOR__ == 2  
 defined(DT_CONFIG
 # define USE_PT_GNU_EH_FRAME
 #endif
 
 What is this change for?
 
  For stack unwinding, MIPS needs supporting functions in libgcc to 
 work with eh_frame for Android.
 (Note that ARM has its own unwinding functions in gcc/config/arm/.  It 
 doesn't use eh_frame.)
 The file is enabled for GLIBC originally.  Thus, I add a new test to enable it
 for MIPS Android BIONIC build.

Please use format that other C libraries use (instead of mixing together GLIBC 
and Bionic definitions):

#if !defined(inhibit_libc)  defined(HAVE_LD_EH_FRAME_HDR) \
 defined(__BIONIC__)
# define USE_PT_GNU_EH_FRAME
#endif

Also, as far as I can tell, this change would also apply for x86, and for ARM 
having USE_PT_GNU_EH_FRAME defined will not hurt.  So please make the 
definition architecture-agnostic.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



[C++ Patch] PR 52422 (new patch)

2012-04-17 Thread Paolo Carlini

Hi Jason,

I have a new patch for this issue, another SFINAE issue noticed by 
Daniel. Compared to the last version, I extended the complain-ization ;) 
to a few more functions in typeck.c (I think the set is more consistent 
now) and thoroughly double checked that the return values of all the 
functions which now get a tsubst_flags_t argument are checked for 
error_mark_node and in case early return back error_mark_node itself, as 
you requested last time.


As usual, tested x86_64-linux.

Ok for mainline?

Thanks,
Paolo.

/
/cp
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52422
* cp-tree.h (build_addr_func, decay_conversion,
get_member_function_from_ptrfunc,
build_m_component_ref, convert_member_func_to_ptr):
Add tsubst_flags_t parameter.
* typeck.c (cp_default_conversion): Add.
(decay_conversion, default_conversion,
get_member_function_from_ptrfunc, convert_member_func_to_ptr):
Add tsubst_flags_t parameter and use it throughout.
(cp_build_indirect_ref, cp_build_array_ref,
cp_build_function_call_vec, convert_arguments, build_x_binary_op,
cp_build_binary_op, cp_build_unary_op, build_reinterpret_cast_1,
build_const_cast_1, expand_ptrmemfunc_cst,
convert_for_initialization): Adjust.
* init.c (build_vec_init): Adjust.
* decl.c (grok_reference_init, get_atexit_node): Likewise.
* rtti.c (build_dynamic_cast_1, tinfo_base_init): Likewise.
* except.c (build_throw): Likewise.
* typeck2.c (build_x_arrow): Likewise.
(build_m_component_ref): Add tsubst_flags_t parameter and
use it throughout.
* pt.c (convert_nontype_argument): Adjust.
* semantics.c (finish_asm_stmt, maybe_add_lambda_conv_op): Likewise.
* decl2.c (build_offset_ref_call_from_tree): Likewise.
* call.c (build_addr_func): Add tsubst_flags_t parameter and
use it throughout.
(build_call_a, build_conditional_expr_1, build_new_op_1,
convert_like_real, convert_arg_to_ellipsis, build_over_call,
build_special_member_call): Adjust.
* cvt.c (cp_convert_to_pointer, force_rvalue,
build_expr_type_conversion): Likewise.

/testsuite
2012-04-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52422
* g++.dg/cpp0x/sfinae33.C: New.
* g++.dg/cpp0x/sfinae34.C: Likewise.
Index: testsuite/g++.dg/cpp0x/sfinae33.C
===
--- testsuite/g++.dg/cpp0x/sfinae33.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/sfinae33.C   (revision 0)
@@ -0,0 +1,27 @@
+// PR c++/52422
+// { dg-options -std=c++11 }
+
+templateclass T
+struct add_rval_ref
+{
+  typedef T type;
+};
+
+template
+struct add_rval_refvoid
+{
+  typedef void type;
+};
+
+templateclass T
+typename add_rval_refT::type create();
+
+templateclass T, 
+  class = decltype(createT()())
+
+auto f(int) - char()[1];
+
+templateclass
+auto f(...) - char()[2];
+
+static_assert(sizeof(fvoid(0)) != 1, );
Index: testsuite/g++.dg/cpp0x/sfinae34.C
===
--- testsuite/g++.dg/cpp0x/sfinae34.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/sfinae34.C   (revision 0)
@@ -0,0 +1,27 @@
+// PR c++/52422
+// { dg-options -std=c++11 }
+
+templateclass T
+struct add_rval_ref
+{
+  typedef T type;
+};
+
+template
+struct add_rval_refvoid
+{
+  typedef void type;
+};
+
+templateclass T
+typename add_rval_refT::type create();
+
+templateclass T, class U,
+  class = decltype( (createT().*createU())() )
+
+auto f(int) - char()[1];
+
+templateclass, class
+auto f(...) - char()[2];
+
+static_assert(sizeof(fvoid, void(0)) != 1, );
Index: cp/typeck.c
===
--- cp/typeck.c (revision 186552)
+++ cp/typeck.c (working copy)
@@ -1818,7 +1818,7 @@ unlowered_expr_type (const_tree exp)
that the return value is no longer an lvalue.  */
 
 tree
-decay_conversion (tree exp)
+decay_conversion (tree exp, tsubst_flags_t complain)
 {
   tree type;
   enum tree_code code;
@@ -1832,7 +1832,8 @@ tree
   exp = resolve_nondeduced_context (exp);
   if (type_unknown_p (exp))
 {
-  cxx_incomplete_type_error (exp, TREE_TYPE (exp));
+  if (complain  tf_error)
+   cxx_incomplete_type_error (exp, TREE_TYPE (exp));
   return error_mark_node;
 }
 
@@ -1851,13 +1852,14 @@ tree
   code = TREE_CODE (type);
   if (code == VOID_TYPE)
 {
-  error (void value not ignored as it ought to be);
+  if (complain  tf_error)
+   error (void value not ignored as it ought to be);
   return error_mark_node;
 }
-  if (invalid_nonstatic_memfn_p (exp, tf_warning_or_error))
+  if (invalid_nonstatic_memfn_p (exp, complain))
 return error_mark_node;
   if (code == FUNCTION_TYPE || is_overloaded_fn (exp))
-return cp_build_addr_expr (exp, tf_warning_or_error);
+return 

[PATCH, PR38785] Throttle PRE at -O3

2012-04-17 Thread Maxim Kuvyrkov
Steven,
Jorn,

I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
version of your combined patch attached to PR38785 still works very well.  
Would you mind me getting it through upstream review, or are there any issues 
with contributing this patch to GCC mainline?

We (CodeSourcery/Mentor) were carrying this patch in our toolchains since GCC 
4.4, and it didn't show any performance or correctness problems on x86, ARM, 
MIPS, and other architectures.  It also reliably fixes bitmnp01 regression, 
which is still present in current mainline.

I have tested this patch on recent mainline on i686-linux-gnu with no 
regressions.  Unless I hear from you to the contrary, I will push this patch 
for upstream review and, hopefully, get it checked in.

Previous discussion of this patch is at 
http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



pr38785.ChangeLog
Description: Binary data


pr38785.patch
Description: Binary data


[google/integration] Add -Xclang-only option (issue6047048)

2012-04-17 Thread Ollie Wild
To be submitted to the google/integration branch and merged into
google/{main,gcc-4_6,gcc-4_7}.

Add -Xclang-only option (which is ignored).

This is used by certain drivers to pass options selectively to clang.  Adding
support to the gcc driver makes it easier to test GCC in the absence of these
drivers.

Google ref 6302116.

2012-04-17   Ollie Wild  a...@google.com

* gcc/common.opt (Xclang-only): New option.
* gcc/doc/invoke.texi (Xclang-only): Document new option.
* gcc/gcc.c (display_help): Print new option.
(driver_handle_option): Support new option (ignoring args).


diff --git a/gcc/common.opt b/gcc/common.opt
index 4a751a9..39f0843 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -743,6 +743,9 @@ Warn when a vector operation is compiled outside the SIMD
 Xassembler
 Driver Separate
 
+Xclang-only
+Driver Joined
+
 Xlinker
 Driver Separate
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d980e9f..1b61e76 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9560,6 +9560,11 @@ systems using the GNU linker.  On some targets, such as 
bare-board
 targets without an operating system, the @option{-T} option may be required
 when linking to avoid references to undefined symbols.
 
+@item -Xclang-only @var{option}
+@opindex Xclang-only
+Ignore @var{option}.  This is used by some custom drivers to pass options
+to Clang but not GCC.
+
 @item -Xlinker @var{option}
 @opindex Xlinker
 Pass @var{option} as an option to the linker.  You can use this to
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 5f789fd..c6b48a6 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -2983,6 +2983,7 @@ display_help (void)
   fputs (_(  -Xassembler argPass arg on to the assembler\n), 
stdout);
   fputs (_(  -Xpreprocessor arg Pass arg on to the preprocessor\n), 
stdout);
   fputs (_(  -Xlinker arg   Pass arg on to the linker\n), 
stdout);
+  fputs (_(  -Xclang-only=arg   Ignore arg\n), stdout);
   fputs (_(  -save-temps  Do not delete intermediate files\n), 
stdout);
   fputs (_(  -save-temps=argDo not delete intermediate files\n), 
stdout);
   fputs (_(\
@@ -3353,6 +3354,11 @@ driver_handle_option (struct gcc_options *opts,
   do_save = false;
   break;
 
+case OPT_Xclang_only:
+  /* Ignore the argument.  Used by some drivers to selectively pass
+ arguments to clang.  */
+  break;
+
 case OPT_Xlinker:
   add_infile (arg, *);
   do_save = false;

--
This patch is available for review at http://codereview.appspot.com/6047048


[PATCH][ARM][Testsute] Skip thumb1 test in non-thumb1 target

2012-04-17 Thread Joey Ye
Fix the test case failed in ARM state.

* gcc.target/arm/thumb1-imm.c: Skip it in non-thumb1 target

Index: gcc/testsuite/gcc.target/arm/thumb1-imm.c
===
--- gcc/testsuite/gcc.target/arm/thumb1-imm.c   (revision 186517)
+++ gcc/testsuite/gcc.target/arm/thumb1-imm.c   (working copy)
@@ -1,5 +1,7 @@
 /* Check for thumb1 imm [255-510] moves.  */
 /* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-options -Os } */
+/* { dg-skip-if  { ! { arm_thumb1 } } } */
 
 int f()
 {