date:20131113

Re: [PATCH] Merge cgraph_get_create_node and cgraph_get_create_real_symbol_node

2013-11-13 Thread Uros Bizjak

On Wed, Nov 13, 2013 at 6:18 PM, Uros Bizjak  wrote:

>> as discussed with Honza on many occasions, all users of
>> cgraph_get_create_node really want cgraph_get_create_real_symbol_node,
>> i.e. they are not interested in inline nodes and should get a
>> standalone node instead.  So this patch changes cgraph_get_create_node
>> to do what cgraph_get_create_real_symbol_node currently does and
>> removes the latter altogether.
>>
>> I had to change a call to cgraph_get_create_node to cgraph_get_node in
>> lto-streamer-in.c so that it does not make the node it operates on a
>> clone of another one because this made ipa_pta_execute abort on assert
>> after calling cgraph_get_body (visionary points to Richi for putting
>> the assert there).
>>
>> The patch successfully passed bootstrap and testing ("all" languages +
>> Ada) and LTO-bootstrap (C and C++ only) on x86_64-linux.
>>
>> 2013-11-12  Martin Jambor  
>>
>> * cgraph.c (cgraph_get_create_node): Do what
>> cgraph_get_create_real_symbol_node used to do.
>> (cgraph_get_create_real_symbol_node): Removed.  Changed all users to
>> call cgraph_get_create_node.
>> * cgraph.h (cgraph_get_create_real_symbol_node): Removed.
>> * lto-streamer-in.c (input_function): Call cgraph_get_node instead of
>> cgraph_get_create_node.  Assert we get a node.
>
> This patch breaks lto-profiledbootstrap on x86_64-pc-linux-gnu with:
>
> In function ‘colorize_start’:
> lto1: internal compiler error: in input_function, at lto-streamer-in.c:919
> 0xa585c1 input_function
> /home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:919
> 0xa585c1 lto_read_body
> /home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:1067
> 0xa585c1 lto_input_function_body(lto_file_decl_data*, cgraph_node*, char 
> const*)
> /home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:1109
> 0x66eb2c cgraph_get_body(cgraph_node*)
> /home/uros/gcc-svn/trunk/gcc/cgraph.c:2967
> 0x999339 ipa_merge_profiles(cgraph_node*, cgraph_node*)
> /home/uros/gcc-svn/trunk/gcc/ipa-utils.c:699
> 0x5979a6 lto_cgraph_replace_node
> /home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:82
> 0x598079 lto_symtab_merge_symbols_1
> /home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:561
> 0x598079 lto_symtab_merge_symbols()
> /home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:589
> 0x586fad read_cgraph_and_symbols
> /home/uros/gcc-svn/trunk/gcc/lto/lto.c:2945
> 0x586fad lto_main()
> /home/uros/gcc-svn/trunk/gcc/lto/lto.c:3254
>
> You will need patches from Teresa [1],[2] to get up to there in the
> lto-profiledbootstrap.

These patches are now in mainline, the failure is confirmed by HJ's
buildboot at http://gcc.gnu.org/ml/gcc-regression/2013-11/msg00350.html

Uros.

Re: [PATCH 2/6] Hand-written port of various accessors within gimple.h

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:

* gimple.h (gimple_use_ops): Port from union to usage of
dyn_cast.
(gimple_set_use_ops): Port from union to usage of as_a.
(gimple_set_vuse): Likewise.
(gimple_set_vdef): Likewise.
(gimple_call_internal_fn): Port from union to a static_cast,
given that the type has already been asserted.
(gimple_omp_body_ptr): Port from unchecked union usage to
a static_cast.
(gimple_omp_set_body): Likewise.

OK with usual conditions.

It's getting late and I just fired off my overnight regression tests. 
So I'll have to look at 1/6 tomorrow.


jeff

Re: [PATCH 3/6] Automated part of conversion of gimple types to use C++ inheritance

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:

gcc/

Patch autogenerated by refactor_gimple.py from
https://github.com/davidmalcolm/gcc-refactoring-scripts
revision 74cd3d5f06565c318749d0fb9f35b565dae28daa

[ ... ]
This is fine with the usual conditions.

 diff --git a/gcc/gimple-iterator.c b/gcc/gimple-iterator.c

index e430050..ed0d6df 100644
--- a/gcc/gimple-iterator.c
+++ b/gcc/gimple-iterator.c
@@ -67,7 +67,7 @@ update_bb_for_stmts (gimple_seq_node first, gimple_seq_node 
last,
  {
gimple_seq_node n;

-  for (n = first; n; n = n->gsbase.next)
+  for (n = first; n; n = n->next)
So just a quite note.  If I'm reading this corectly, this should make 
things marginally easier in the debugger when looking at objects?  It 
drives me absolutely nuts having to figure out how to get through the 
base class to the fields I care about.


I didn't look at every hunk in this patch carefully.  Just spot checked 
thigns.




  }

  /* Set the nowait flag on OMP_RETURN statement S.  */
@@ -1661,7 +1973,7 @@ static inline void
  gimple_omp_return_set_nowait (gimple s)
  {
GIMPLE_CHECK (s, GIMPLE_OMP_RETURN);
-  s->gsbase.subcode |= GF_OMP_RETURN_NOWAIT;
+  s->subcode |= GF_OMP_RETURN_NOWAIT;
So is there some reason the GIMPLE_CHECK was left in here rather than 
doing the downcasting?  This happens in other places.




  }


@@ -1681,8 +1993,9 @@ gimple_omp_return_nowait_p (const_gimple g)
  static inline void
  gimple_omp_return_set_lhs (gimple g, tree lhs)
  {
-  GIMPLE_CHECK (g, GIMPLE_OMP_RETURN);
-  g->gimple_omp_atomic_store.val = lhs;
+  gimple_statement_omp_atomic_store *omp_atomic_store_stmt =
+as_a  (g);
+  omp_atomic_store_stmt->val = lhs;
Contrast to prior hunk.  This one, AFAICT elimates the GIMPLE_CHECK here 
and does it as part of the downcasting, right?


I wonder how far we have to go with this before GIMPLE_CHECK goes away :-)



@@ -1723,7 +2038,7 @@ static inline void
  gimple_omp_section_set_last (gimple g)
  {
GIMPLE_CHECK (g, GIMPLE_OMP_SECTION);
-  g->gsbase.subcode |= GF_OMP_SECTION_LAST;
+  g->subcode |= GF_OMP_SECTION_LAST;

Another example of the GIMPLE_CHECK hanging around.  On purpose?


  }


@@ -1746,9 +2061,9 @@ gimple_omp_parallel_set_combined_p (gimple g, bool 
combined_p)
  {
GIMPLE_CHECK (g, GIMPLE_OMP_PARALLEL);
if (combined_p)
-g->gsbase.subcode |= GF_OMP_PARALLEL_COMBINED;
+g->subcode |= GF_OMP_PARALLEL_COMBINED;
else
-g->gsbase.subcode &= ~GF_OMP_PARALLEL_COMBINED;
+g->subcode &= ~GF_OMP_PARALLEL_COMBINED;

Likewise.


  }


@@ -1771,7 +2086,7 @@ gimple_omp_atomic_set_need_value (gimple g)
  {
if (gimple_code (g) != GIMPLE_OMP_ATOMIC_LOAD)
  GIMPLE_CHECK (g, GIMPLE_OMP_ATOMIC_STORE);
-  g->gsbase.subcode |= GF_OMP_ATOMIC_NEED_VALUE;
+  g->subcode |= GF_OMP_ATOMIC_NEED_VALUE;

Likewise.


And so-on.

I don't see anything objectionable.  Just want to make sure the script 
and/or the by-hand stuff didn't miss some of the conversions.


Jeff

Re: [PATCH 4/6] Implement is_a_helper <>::test specializations for various gimple types

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:

* gimple.h (is_a_helper ::test): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
OK with the usual conditions.  Check with Andrew as to the location of 
these helpers since he's in the middle of ripping apart gimple.h.

jeff

Re: [PATCH 5/6] Port various places from union access to subclass access.

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:


diff --git a/gcc/gimple-streamer-in.c b/gcc/gimple-streamer-in.c
index 4f31b83..2555dbe 100644
--- a/gcc/gimple-streamer-in.c
+++ b/gcc/gimple-streamer-in.c
@@ -129,13 +129,14 @@ input_gimple_stmt (struct lto_input_block *ib, struct 
data_in *data_in,
  case GIMPLE_ASM:
{
/* FIXME lto.  Move most of this into a new gimple_asm_set_string().  */
+   gimple_statement_asm *asm_stmt = as_a  (stmt);
tree str;
-   stmt->gimple_asm.ni = streamer_read_uhwi (ib);
-   stmt->gimple_asm.no = streamer_read_uhwi (ib);
-   stmt->gimple_asm.nc = streamer_read_uhwi (ib);
-   stmt->gimple_asm.nl = streamer_read_uhwi (ib);
+   asm_stmt->ni = streamer_read_uhwi (ib);
+   asm_stmt->no = streamer_read_uhwi (ib);
+   asm_stmt->nc = streamer_read_uhwi (ib);
+   asm_stmt->nl = streamer_read_uhwi (ib);
str = streamer_read_string_cst (data_in, ib);
-   stmt->gimple_asm.string = TREE_STRING_POINTER (str);
+   asm_stmt->string = TREE_STRING_POINTER (str);
The in/out streaming seems like another natural fit for virtual 
functions in the long term.




}
/* Fallthru  */

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 9b1337a..e9ef8e0 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -641,21 +641,22 @@ static inline gimple
  gimple_build_asm_1 (const char *string, unsigned ninputs, unsigned noutputs,
  unsigned nclobbers, unsigned nlabels)
  {
-  gimple p;
+  gimple_statement_asm *p;
int size = strlen (string);

/* ASMs with labels cannot have outputs.  This should have been
   enforced by the front end.  */
gcc_assert (nlabels == 0 || noutputs == 0);

-  p = gimple_build_with_ops (GIMPLE_ASM, ERROR_MARK,
-ninputs + noutputs + nclobbers + nlabels);
+  p = as_a  (
+gimple_build_with_ops (GIMPLE_ASM, ERROR_MARK,
+  ninputs + noutputs + nclobbers + nlabels));

-  p->gimple_asm.ni = ninputs;
-  p->gimple_asm.no = noutputs;
-  p->gimple_asm.nc = nclobbers;
-  p->gimple_asm.nl = nlabels;
-  p->gimple_asm.string = ggc_alloc_string (string, size);
+  p->ni = ninputs;
+  p->no = noutputs;
+  p->nc = nclobbers;
+  p->nl = nlabels;
+  p->string = ggc_alloc_string (string, size);
As noted in a prior message, having build methods would eliminate this 
downcasting.  Not necessary for this patch, just wanted to point it out 
to anyone reading.  I won't point it out everywhere ;-)



So given the prior disussions around as_a, I'm not going to object to 
these as_a instances.  I see them as warts/markers that we have further 
work to do in terms of fleshing out the class and possibly refactoring 
code.


Conditionally OK.  Conditional on the other related patches going in and 
keeping it updated with Andrew's churn.  If/when the set goes in, post 
the final version you actually checkin -- no re-review is needed for 
this hunk so long as any changes are the obvious fixing of fallout from 
Andrew's work.


Jeff

Re: [PATCH 6/6] Update gdb hooks to reflect changes to gimple types

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:

gcc/
* gdbhooks.py (GimplePrinter.to_string): Update lookup of
code field to reflect inheritance, rather than embedding of
the base gimple type.
Conditionally approved.  Obvious condition is the other 5 patches get 
approved.


Jeff

Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)

2013-11-13 Thread Jeff Law


On 10/31/13 10:26, David Malcolm wrote:


The following series of patches convert it to a C++ hierarchy, using the
existing structs, eliminating the union. The "gimple" typedef changes
from being a
   (union gimple_statement_d *)
to being a:
   (struct gimple_statement_base *)

There are no virtual functions in the new code: the sizes of the various
structs are unchanged.
Seems like a reasonable place to start (no virtuals).  As I mentioned in 
my earlier reply today, virtuals may be one way to cut down on the 
downcasting.  They have obvious downsides, but I think we now have some 
code samples to think about so we can sensibly evaluate downcasting vs 
adding virtual functions and how each affects the code we write.





Again, as noted in the earlier patch series, the names of the structs
are rather verbose.  I would prefer to also rename them all to eliminate
the "_statement" component:
   "gimple_statement_base" -> "gimple_base"
   "gimple_statement_phi"  -> "gimple_phi"
   "gimple_statement_omp"  -> "gimple_omp"
etc, but I didn't do this to mimimize the patch size.  But if the core
maintainers are up for that, I can redo the patch series with that
change also, or do that as a followup.
If we do that, I think it'd be a followup -- it's a fair amount of churn 
with marginal benefit though, IMHO.


Jeff

Re: [PATCH] Add gimple subclasses for every gimple code (was Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3))

2013-11-13 Thread Jeff Law


On 11/08/13 12:02, David Malcolm wrote:

I wouldn't mind seeing a small example proof of concept posted to help
those who don't see where this is going understand the goal.  I would
recommend against posting another large patch for inclusion at this time.

Attached is a proof-of-concept patch which uses the
gimple_statement_switch subclass (as a "gimple_switch" typedef).  This
is one of the subclasses that the earlier patch added, which has no new
fields, but which carries the invariant that, if non-NULL,
gimple_code (gs) == GIMPLE_SWITCH.

[ ... ]

Thanks.  It's pretty much what I expected.  Obviously for other codes 
there may be a lot more changes that you have to slog through, but I 
think this example shows the main concepts.


Presumably in this new world order, the various gimple statement types 
will continue to inherit from a base class.  That seems somewhat 
inevitable and implies a certain amount of downcasting (via whatever 
means we agree upon).  The worry, in my mind is how pervasive the 
downcasting will be and how much of it we can get rid of over time.


I may be wrong, but ISTM some of the downcasting is a result of not 
providing certain capabilities via (pure?) virtual methods.  For 
example, expand_gimple_stmt_1 seems ripe for implementing as virtual 
methods.   ISTM you could also have virtuals to build the statements, 
dump/pretty-print them, verify them, branch/edge redirection, 
estimations for inlining, etc.  ISTM that would eliminate a good chunk 
of the downcasting.


Just to be clear, I'm not asking you to make those changes, just for 
your thoughts on approaches to eliminate the downcasting based on what 
we've seen so far.


Thanks for pulling this together to help illustrate how some of this 
might look in practice.  I hope others take the time to look closely as 
this example and think about what it means in terms of how we would be 
writing code 6 months from now.



Jeff

Re: [PATCH] Avoid duplicate calls to REG_PARM_STACK_SPACE

2013-11-13 Thread Jeff Law


On 11/11/13 07:33, Ulrich Weigand wrote:

Hello,

this is another tweak to the middle-end to help support the new
powerpc64le-linux ABI.

In the new ABI, we make a distinction between functions that pass
all arguments solely in registers, and those that do not.  Only when
calling one the latter type (or a varags routine) does the caller
need to provide REG_PARM_STACK_SPACE; in the former case, this can
be omitted to save stack space.

This leads to a definition of REG_PARM_STACK_SPACE that is a lot
more complex than usual, which means it would be helpful if the
middle-end were to evaluate it sparingly (e.g. once per function
definition / call).

The middle-end already does cache REG_PARM_STACK_SPACE results,
however, this cache it not consistently used.  In particular,
the locate_and_pad_parm routine will re-evaluate the macro
every time it is called, even though all callers of the routine
already have a cached copy.

This patch adds a new argument to locate_and_pad_parm which is
used instead of evaluating REG_PARM_STACK_SPACE, and updates
the callers to pass in the correct value.

There should be no change in generated code on any platform.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich

2013-11-11  Ulrich Weigand  
Alan Modra  

ChangeLog:

* function.c (assign_parms): Use all.reg_parm_stack_space instead
of re-evaluating REG_PARM_STACK_SPACE target macro.
(locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE.  Use it
instead of evaluating target macro REG_PARM_STACK_SPACE every time.
(assign_parm_find_entry_rtl): Update call.
* calls.c (initialize_argument_information): Update call.
(emit_library_call_value_1): Likewise.
* expr.h (locate_and_pad_parm): Update prototype.

This is fine for the trunk.  Please go ahead and install.

Thanks,
Jeff

Re: [PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos

2013-11-13 Thread Jeff Law


On 11/11/13 07:32, Ulrich Weigand wrote:

Hello,

when implementing the new ABI for powerpc64le-linux, I ran into an assertion
failure in store_unaligned_arguments_into_pseudos:
 gcc_assert (args[i].partial % UNITS_PER_WORD == 0);

This can happen in the new ABI since we pass "homogeneous structures"
consisting of soleley floating point elements of the same type in
floating-point registers until they run out, and the rest of the
structure in memory.  If the structure member type is a 4-byte float,
and we only have an odd number of floating point registers available,
then args[i].partial can legitimately end up not being a multiple
of UNITS_PER_WORD (i.e. 8 on the platform).

Now, there are a number of similar checks that args[i].partial is
aligned elsewhere in calls.c and functions.c.  But for all of those
the logic is: if args[i].reg is a REG, then args[i].partial must be
a multiple of the word size; but if args[i].regs is a PARALLEL, then
args[i].partial can be any arbitrary value.  In the powerpc64le-linux
use case, the back-end always generates PARALLEL in this situation,
so it seemed the middle-end ought to support this -- and it does,
except for this one case.

However, looking more closely, it seems store_unaligned_arguments_into_pseudos
is not really useful for PARALLEL arguments in the first place.  What this
routine does is load arguments into args[i].aligned_regs.  But if we have
an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in
fact never be used later on at all!   Instead, PARALLEL will always be
handled directly via emit_group_move (in load_register_parameters), so
the code generated by store_unaligned_arguments_into_pseudos for such cases
is simply dead anyway.

Thus, I'd suggest to simply have store_unaligned_arguments_into_pseudos
skip PARALLEL arguments.

Tested on powerpc64-linux and powerpc64le-linux.

OK for mainline?

Bye,
Ulrich



ChangeLog:

2013-11-11  Ulrich Weigand  

* calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL
arguments.
OK, so after a lot of worrying, I think this is OK.  I kept thinking 
this had to tie into the BLKmode return value braindamage that we have 
to support on the PA.  But that's not the case here.  This is strictly 
arguments.


If the argument is a PARALLEL, then yes, there's no sense in using 
store_unaligned_arguments_into_pseudos, at least AFAICT, they are 
handled by emit_group_move.


OK for the trunk, thanks for your patience,
Jeff

Re: [PATCH] Do not set flag_complex_method to 2 for C++ by default.

2013-11-13 Thread Andrew Pinski

On Wed, Nov 13, 2013 at 5:26 PM, Cong Hou  wrote:
> This patch is for PR58963.
>
> In the patch http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00560.html,
> the builtin function is used to perform complex multiplication and
> division. This is to comply with C99 standard, but I am wondering if
> C++ also needs this.
>
> There is no complex keyword in C++, and no content in C++ standard
> about the behavior of operations on complex types. The 
> header file is all written in source code, including complex
> multiplication and division. GCC should not do too much for them by
> using builtin calls by default (although we can set -fcx-limited-range
> to prevent GCC doing this), which has a big impact on performance
> (there may exist vectorization opportunities).
>
> In this patch flag_complex_method will not be set to 2 for C++.
> Bootstraped and tested on an x86-64 machine.

I think you need to look into this issue deeper as the original patch
only enabled it for C99:
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01483.html .

Just a little deeper will find
http://gcc.gnu.org/ml/gcc/2007-07/msg00124.html which says yes C++
needs this.

Thanks,
Andrew Pinski

>
>
> thanks,
> Cong
>
>
> Index: gcc/c-family/c-opts.c
> ===
> --- gcc/c-family/c-opts.c (revision 204712)
> +++ gcc/c-family/c-opts.c (working copy)
> @@ -198,8 +198,10 @@ c_common_init_options_struct (struct gcc
>opts->x_warn_write_strings = c_dialect_cxx ();
>opts->x_flag_warn_unused_result = true;
>
> -  /* By default, C99-like requirements for complex multiply and divide.  */
> -  opts->x_flag_complex_method = 2;
> +  /* By default, C99-like requirements for complex multiply and divide.
> + But for C++ this should not be required.  */
> +  if (c_language != clk_cxx && c_language != clk_objcxx)
> +opts->x_flag_complex_method = 2;
>  }
>
>  /* Common initialization before calling option handlers.  */
> Index: gcc/c-family/ChangeLog
> ===
> --- gcc/c-family/ChangeLog (revision 204712)
> +++ gcc/c-family/ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2013-11-13  Cong Hou  
> +
> + * c-opts.c (c_common_init_options_struct): Don't let C++ comply with
> + C99-like requirements for complex multiply and divide.
> +
>  2013-11-12  Joseph Myers  
>
>   * c-common.c (c_common_reswords): Add _Thread_local.

Re: [PATCH][1-3] New configure option to enable Position independent executable as default.

2013-11-13 Thread Ian Lance Taylor

On Wed, Nov 13, 2013 at 8:07 PM, Mike Stump  wrote:
> On Nov 13, 2013, at 5:14 PM, Ian Lance Taylor  wrote:
>> On Wed, Nov 13, 2013 at 3:23 PM, Mike Stump  wrote:
>>> On Nov 13, 2013, at 2:28 PM, Magnus Granberg  wrote:
 This patchset will add a new configure options --enable-default-pie.
>>>
>>> Ick.  Would be nice to figure out on what systems one can do this and just 
>>> do it without the configure option.  Is there some reason that we need an 
>>> option for it?
>>
>> I either don't understand the patch description or I don't understand
>> your comment.  I think the patch is making -fPIE the default.
>
> No, what the patch does is 'New configure option'.  What I prefer is, the 
> patch titled, 'make pie default'.  The difference is the word option.

OK, I didn't understand your comment.

I don't think we should make -fPIE the default for all users.

Ian

Re: [PATCH][1-3] New configure option to enable Position independent executable as default.

2013-11-13 Thread Mike Stump

On Nov 13, 2013, at 5:14 PM, Ian Lance Taylor  wrote:
> On Wed, Nov 13, 2013 at 3:23 PM, Mike Stump  wrote:
>> On Nov 13, 2013, at 2:28 PM, Magnus Granberg  wrote:
>>> This patchset will add a new configure options --enable-default-pie.
>> 
>> Ick.  Would be nice to figure out on what systems one can do this and just 
>> do it without the configure option.  Is there some reason that we need an 
>> option for it?
> 
> I either don't understand the patch description or I don't understand
> your comment.  I think the patch is making -fPIE the default.

No, what the patch does is 'New configure option'.  What I prefer is, the patch 
titled, 'make pie default'.  The difference is the word option.

> It is not normally the default.  This option would change the default
> behaviour of the compiler, so it doesn't really make sense to ask
> whether we can just do it without the configure option.

Sure it does.  Explain how the default can't be changed otherwise.

For example, we can change the default for -O, to be -O2.

int optimize = 2;

instead of

int optimize = 0;

One can add a configure option to select the default for -O, but, it isn't 
necessary to merely change the default.

Re: [PATCH, rs6000] ELFv2 ABI 1/8: Add options and infrastructure

2013-11-13 Thread Alan Modra

On Tue, Nov 12, 2013 at 10:33:46PM +, Joseph S. Myers wrote:
> On Tue, 12 Nov 2013, Ulrich Weigand wrote:
> > Well, we had been thinking about this, but right now it seems we're not
> > going to be able to make that change throughout the ecosystem quickly
> > enough, so for now, it'll probably have to remain the IBM long double.
> 
> What's the difficulty?  GCC is supposed to support IEEE binary128 long 
> double already for Power, and IEEE binary128 works much better than IBM 
> long double in glibc.  The time a new ABI is being introduced is the time 
> to get things right rather than suggesting yet another ABI change in 
> future.

The major difficulty is that we don't want to use the existing
powerpc32 parameter passing and return scheme for ieee128 (ie. pass
pointers to memory).  The thought was to pass ieee128 in vsx
registers.  This means work in gcc, in glibc, in gdb, and likely other
places, and as Uli has already said, we have time and resource
constraints.

-- 
Alan Modra
Australia Development Lab, IBM

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Vladimir Makarov


On 11/13/2013, 5:48 PM, Steven Bosscher wrote:

On Wed, Nov 13, 2013 at 6:56 PM, Vladimir Makarov wrote:

   The following patch improves coloring.  The order of pushing allocnos on
the coloring stack from a bunch of colorable allocnos was always important
for generated code performance.  LRA has a mechanism of allocating pseudos
by threads.  Thread in LRA is a set of non-conflicting pseudos connected by
moves (or by future reload insns).  Allocating pseudos by threads in LRA
permits to improve code by increasing chance of removing the move insns.

   So the same mechanism can be used for IRA.  The patch implements it.  The
difference is only that LRA forms thread statically before allocation
sub-pass.  That is because the basic allocation are already done in IRA.
The statically thread forming works well for IRA too.  But even better
results can be got by dynamically forming threads.  It means that we are
forming threads during allocation and includes only colorable allocnos.

   The results of using threads in IRA is pretty good.  The average code size
(text segment) of SPEC2000 is improved (by >0.1% for x86 SPECFP2000 and >
0.3% for x86-64 SPECFP2000). The biggest code performance improvement (> 1%)
is obtained on x86-64 SPECFP2000.  Performance tools report that additional
code takes only about 0.05% additionally executed insns.


Nice!

Can you please also update the leading comment in ira.c? It seems
worth mentioning this approach under the "Coloring" bullet
(ira.c:176).
Thanks.  I've added a comment.  It is not a big change but pretty 
important one (e.g. to get about the same improvement 3 years ago I 
needed 1K lines to implement coloring for irregular reg file 
architecture).  So it is worth to document it.

(Not sure if that whole comment block is otherwise up to date??)


It is pretty upto date.  I don't see missed important details. Although 
it would be nice to describe new pre-RA optimizations shrink-wrapping, 
find-moveable, and decrease-live-ranges.


Index: ira.c
===
--- ira.c   (revision 204753)
+++ ira.c   (working copy)
@@ -192,7 +192,14 @@ along with GCC; see the file COPYING3.
  this point.  There is some freedom in the order of putting
  allocnos on the stack which can affect the final result of
  the allocation.  IRA uses some heuristics to improve the
- order.
+ order.  The major one is to form *threads* from colorable
+ allocnos and push them on the stack by threads. Thread is a
+ set of non-conflicting colorable allocnos connected by
+ copies.  The thread contains allocnos from the colorable
+ bucket or colorable allocnos already pushed onto the coloring
+ stack.  Pushing thread allocnos one after another onto the
+ stack increases chances of removing copies when the allocnos
+ get the same hard reg.

 We also use a modification of Chaitin-Briggs algorithm which
  works for intersected register classes of allocnos.  To

Re: [PATCH] Introducing SAD (Sum of Absolute Differences) operation to GCC vectorizer.

2013-11-13 Thread Cong Hou

Ping?


thanks,
Cong


On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou  wrote:
> Hi James
>
> Sorry for the late reply.
>
>
> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh
>  wrote:
>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou  wrote:
>>> > Thank you for your detailed explanation.
>>> >
>>> > Once GCC detects a reduction operation, it will automatically
>>> > accumulate all elements in the vector after the loop. In the loop the
>>> > reduction variable is always a vector whose elements are reductions of
>>> > corresponding values from other vectors. Therefore in your case the
>>> > only instruction you need to generate is:
>>> >
>>> > VABAL   ops[3], ops[1], ops[2]
>>> >
>>> > It is OK if you accumulate the elements into one in the vector inside
>>> > of the loop (if one instruction can do this), but you have to make
>>> > sure other elements in the vector should remain zero so that the final
>>> > result is correct.
>>> >
>>> > If you are confused about the documentation, check the one for
>>> > udot_prod (just above usad in md.texi), as it has very similar
>>> > behavior as usad. Actually I copied the text from there and did some
>>> > changes. As those two instruction patterns are both for vectorization,
>>> > their behavior should not be difficult to explain.
>>> >
>>> > If you have more questions or think that the documentation is still
>>> > improper please let me know.
>>
>> Hi Cong,
>>
>> Thanks for your reply.
>>
>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and
>> DOT_PROD_EXPR and I see that the same ambiguity exists for
>> DOT_PROD_EXPR. Can you please add a note in your tree.def
>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either:
>>
>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>   tmp2 = ABS_EXPR (tmp)
>>   arg3 = PLUS_EXPR (tmp2, arg3)
>>
>> or:
>>
>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>   tmp2 = ABS_EXPR (tmp)
>>   arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
>>
>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a
>> a value of the same (widened) type as arg3.
>>
>
>
> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I
> mentioned it in tree.def).
>
>
>> Also, while looking for the history of DOT_PROD_EXPR I spotted this
>> patch:
>>
>>   [autovect] [patch] detect mult-hi and sad patterns
>>   http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html
>>
>> I wonder what the reason was for that patch to be dropped?
>>
>
> It has been 8 years.. I have no idea why this patch is not accepted
> finally. There is even no reply in that thread. But I believe the SAD
> pattern is very important to be recognized. ARM also provides
> instructions for it.
>
>
> Thank you for your comment again!
>
>
> thanks,
> Cong
>
>
>
>> Thanks,
>> James
>>

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Vladimir Makarov


On 11/13/2013, 8:34 PM, Ulrich Weigand wrote:

Vladimir Makarov wrote:


  PR rtl-optimization/59036
  * ira-color.c (struct allocno_color_data): Add new members
  first_thread_allocno, next_thread_allocno, thread_freq.
  (sorted_copies): New static var.
  (allocnos_conflict_by_live_ranges_p, copy_freq_compare_func): Move
  up.
  (allocno_thread_conflict_p, merge_threads)
  (form_threads_from_copies, form_threads_from_bucket)
  (form_threads_from_colorable_allocno, init_allocno_threads): New
  functions.
  (bucket_allocno_compare_func): Add comparison by thread frequency
  and threads.
  (add_allocno_to_ordered_bucket): Rename to
  add_allocno_to_ordered_colorable_bucket.  Remove parameter.
  (push_only_colorable): Call form_threads_from_bucket.
  (color_pass): Call init_allocno_threads.  Use
  consideration_allocno_bitmap instead of coloring_allocno_bitmap
  for nuillify allocno color data.
  (ira_initiate_assign, ira_finish_assign): Allocate/free
  sorted_copies.
  (coalesce_allocnos): Use static sorted copies.

Unfortunately, this patch causes cc1 for powerpc64-linux to crash for me
even when compiling "int main () { return 0; }" with -O due to a memory
corruption somewhere:



I've just fixed it, Ulrich.  Really sorry for inconvenience.

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Ulrich Weigand

> Vladimir Makarov wrote:
> 
> >  PR rtl-optimization/59036
> >  * ira-color.c (struct allocno_color_data): Add new members
> >  first_thread_allocno, next_thread_allocno, thread_freq.
> >  (sorted_copies): New static var.
> >  (allocnos_conflict_by_live_ranges_p, copy_freq_compare_func): Move
> >  up.
> >  (allocno_thread_conflict_p, merge_threads)
> >  (form_threads_from_copies, form_threads_from_bucket)
> >  (form_threads_from_colorable_allocno, init_allocno_threads): New
> >  functions.
> >  (bucket_allocno_compare_func): Add comparison by thread frequency
> >  and threads.
> >  (add_allocno_to_ordered_bucket): Rename to
> >  add_allocno_to_ordered_colorable_bucket.  Remove parameter.
> >  (push_only_colorable): Call form_threads_from_bucket.
> >  (color_pass): Call init_allocno_threads.  Use
> >  consideration_allocno_bitmap instead of coloring_allocno_bitmap
> >  for nuillify allocno color data.
> >  (ira_initiate_assign, ira_finish_assign): Allocate/free
> >  sorted_copies.
> >  (coalesce_allocnos): Use static sorted copies.
> 
> Unfortunately, this patch causes cc1 for powerpc64-linux to crash for me
> even when compiling "int main () { return 0; }" with -O due to a memory
> corruption somewhere:

valgrind says it's a double free:

n==15063== Invalid free() / delete / delete[] / realloc()   


==15063==at 0x4026FB8: free (vg_replace_malloc.c:446)   


==15063==by 0x1060F92F: ira_free(void*) (ira.c:666) 


==15063==by 0x10643233: ira_finish_assign() (ira-color.c:4682)  


==15063==by 0x10618AEF: (anonymous namespace)::pass_reload::execute() 
(ira.c:5430)
  
==15063==by 0x1071E8E7: execute_one_pass(opt_pass*) (passes.c:2217) 


==15063==by 0x1071EC53: execute_pass_list(opt_pass*) (passes.c:2270)


==15063==by 0x1071EC9B: execute_pass_list(opt_pass*) (passes.c:2271)


==15063==by 0x10335B97: expand_function(cgraph_node*) (cgraphunit.c:1753)   


==15063==by 0x103361BF: expand_all_functions() (cgraphunit.c:1858)  


==15063==by 0x1033713B: compile() (cgraphunit.c:2195)   


==15063==by 0x10337393: finalize_compilation_unit() (cgraphunit.c:2272) 


==15063==by 0x1012F043: c_write_global_declarations() (c-decl.c:10374)  


==15063==  Address 0x43faf90 is 0 bytes inside a block of size 1 free'd 


==15063==at 0x4026FB8: free (vg_replace_malloc.c:446)   


==15063==by 0x1060F92F: ira_free(void*) (ira.c:666) 


==15063==by 0x1063F227: coalesce_allocnos() (ira-color.c:3793)  


==15063==by 0x1064044B: ira_sort_regnos_for_alter_reg(int*, int, unsigned 
int*) (ira-color.c:4095)
  
==15063==by 0x1077CBC7: reload(rtx_def*, int) (reload1.c:790)   


==15063==by 0x106193D3: (anonymous namespace)::pass_reload::execute() 
(ira.c:5419)
  
==15063==by 0x1071E8E7: execute_one_pass(opt_pass*) (passes.c:2217)

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Vladimir Makarov


On 11/13/2013, 12:56 PM, Vladimir Makarov wrote:


2013-11-13  Vladimir Makarov  

PR rtl-optimization/59036
* ira-color.c (struct allocno_color_data): Add new members
first_thread_allocno, next_thread_allocno, thread_freq.
(sorted_copies): New static var.
(allocnos_conflict_by_live_ranges_p, copy_freq_compare_func): 
Move

up.
(allocno_thread_conflict_p, merge_threads)
(form_threads_from_copies, form_threads_from_bucket)
(form_threads_from_colorable_allocno, init_allocno_threads): New
functions.
(bucket_allocno_compare_func): Add comparison by thread frequency
and threads.
(add_allocno_to_ordered_bucket): Rename to
add_allocno_to_ordered_colorable_bucket.  Remove parameter.
(push_only_colorable): Call form_threads_from_bucket.
(color_pass): Call init_allocno_threads.  Use
consideration_allocno_bitmap instead of coloring_allocno_bitmap
for nuillify allocno color data.
(ira_initiate_assign, ira_finish_assign): Allocate/free
sorted_copies.
(coalesce_allocnos): Use static sorted copies.


Sorry, I did not check reload based targets and this patch broke them.  
The following patch fixes it.


Committed as rev. 204771.

2013-11-13  Vladimir Makarov  

* ira-color.c (coalesce_allocnos): Don't allocate and free
sorted_copies.

Index: ira-color.c
===
--- ira-color.c (revision 204767)
+++ ira-color.c (working copy)
@@ -3725,8 +3725,6 @@ coalesce_allocnos (void)
   int i, n, cp_num, regno;
   bitmap_iterator bi;
 
-  sorted_copies = (ira_copy_t *) ira_allocate (ira_copies_num
-  * sizeof (ira_copy_t));
   cp_num = 0;
   /* Collect copies.  */
   EXECUTE_IF_SET_IN_BITMAP (coloring_allocno_bitmap, 0, j, bi)
@@ -3790,7 +3788,6 @@ coalesce_allocnos (void)
}
   cp_num = n;
 }
-  ira_free (sorted_copies);
 }
 
 /* Usage cost and order number of coalesced allocno set to which

[PATCH] Do not set flag_complex_method to 2 for C++ by default.

2013-11-13 Thread Cong Hou

This patch is for PR58963.

In the patch http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00560.html,
the builtin function is used to perform complex multiplication and
division. This is to comply with C99 standard, but I am wondering if
C++ also needs this.

There is no complex keyword in C++, and no content in C++ standard
about the behavior of operations on complex types. The 
header file is all written in source code, including complex
multiplication and division. GCC should not do too much for them by
using builtin calls by default (although we can set -fcx-limited-range
to prevent GCC doing this), which has a big impact on performance
(there may exist vectorization opportunities).

In this patch flag_complex_method will not be set to 2 for C++.
Bootstraped and tested on an x86-64 machine.


thanks,
Cong


Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c (revision 204712)
+++ gcc/c-family/c-opts.c (working copy)
@@ -198,8 +198,10 @@ c_common_init_options_struct (struct gcc
   opts->x_warn_write_strings = c_dialect_cxx ();
   opts->x_flag_warn_unused_result = true;

-  /* By default, C99-like requirements for complex multiply and divide.  */
-  opts->x_flag_complex_method = 2;
+  /* By default, C99-like requirements for complex multiply and divide.
+ But for C++ this should not be required.  */
+  if (c_language != clk_cxx && c_language != clk_objcxx)
+opts->x_flag_complex_method = 2;
 }

 /* Common initialization before calling option handlers.  */
Index: gcc/c-family/ChangeLog
===
--- gcc/c-family/ChangeLog (revision 204712)
+++ gcc/c-family/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2013-11-13  Cong Hou  
+
+ * c-opts.c (c_common_init_options_struct): Don't let C++ comply with
+ C99-like requirements for complex multiply and divide.
+
 2013-11-12  Joseph Myers  

  * c-common.c (c_common_reswords): Add _Thread_local.

Re: [PATCH, testsuite] Add lp64 to target requirements of new IRA shrink wrapping preparation testcases

2013-11-13 Thread H.J. Lu

On Wed, Nov 13, 2013 at 4:17 PM, Martin Jambor  wrote:
> On Wed, Nov 13, 2013 at 12:41:54PM -0800, H.J. Lu wrote:
>> On Wed, Nov 13, 2013 at 7:27 AM, Martin Jambor  wrote:
>> > Hi,
>> >
>> > the testcases I have added for IRA shrink wrapping preparation code
>> > were not intended for -m32 on x86_64 and should not be tested with it,
>> > thus I'm adding lp64 to the target requirements.
>> >
>> > Let me also briefly mention that I would like to make the testcases
>> > also run on ppc64 and therefore I did not put them into
>> > gcc.target/i386.
>> >
>> > I have tested the changes by running
>> >
>> > make -k check RUNTESTFLAGS="dg.exp=ira-shrinkwrap-prep-?.c
>> > --target_board=unix/\{,32\}" and
>> > RUNTESTFLAGS="dg.exp=pr10474.c --target_board=unix/\{,32\}"
>> >
>> > and examining the gcc.sum files.  Since this was recommended by Richi
>> > on IRC, I will commit the change in a few minutes.
>> >
>> > Thanks,
>> >
>> > Martin
>> >
>> >
>> > 2013-11-13  Martin Jambor  
>> >
>> > * testsuite/gcc.dg/ira-shrinkwrap-prep-1.c: Add lp64 to target
>> > requirements.
>> > * testsuite/gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
>>
>> Those 2 pass on x32.
>>
>> > * testsuite/gcc.dg/pr10474.c: Likewise.
>> >
>>
>> This fails on x32.  Should it pass on x32?
>>
>
> I would assume all three should be ignored as UNSUPPORTED on x32 now.

The first 2 passed before this checkin.

> How exactly does it fail?
>

I got

FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing
shrink-wrapping"

with

make check-gcc RUNTESTFLAGS="--target_board='unix{-mx32}' dg.exp=pr10474.c"

before your checkin.

-- 
H.J.

Re: [PATCH][1-3] New configure option to enable Position independent executable as default.

2013-11-13 Thread Ian Lance Taylor

On Wed, Nov 13, 2013 at 3:23 PM, Mike Stump  wrote:
> On Nov 13, 2013, at 2:28 PM, Magnus Granberg  wrote:
>> This patchset will add a new configure options --enable-default-pie.
>
> Ick.  Would be nice to figure out on what systems one can do this and just do 
> it without the configure option.  Is there some reason that we need an option 
> for it?

I either don't understand the patch description or I don't understand
your comment.  I think the patch is making -fPIE the default.  It is
not normally the default.  This option would change the default
behaviour of the compiler, so it doesn't really make sense to ask
whether we can just do it without the configure option.

Ian

RE: [PATCH, i386] Fix -mpreferred-stack-boundary

2013-11-13 Thread Bernd Edlinger

On Tue, 12 Nov 2013 17:50:27, Sriraman Tallam wrote:
>
> On Tue, Nov 12, 2013 at 5:17 PM, Sriraman Tallam  wrote:
>> On Tue, Nov 12, 2013 at 2:53 PM, Bernd Edlinger
>>  wrote:
>>> Hi,
>>>
>>>
>>> On Tue, 12 Nov 2013 10:30:16, Sriraman Tallam wrote:

 On Mon, Nov 11, 2013 at 11:30 PM, Uros Bizjak  wrote:
> There was something wrong with Bernd's address, retrying.
>
>>> Currently on trunk the option -mpreferred-stack-boundary does not work
>>> together with #pragma GCC target("sse") or 
>>> __attribute__((target("sse"))).
>>>
>>> There is already a test case that detects this: 
>>> gcc.target/i386/fastcall-sseregparm.c
>>>
>>> The attached patch fixes this test case under i686-pc-linux-gnu.
>>>
>>> Boot-strapped and regression-tested under i686-pc-linux-gnu.
>>>
>>> OK for trunk?
>>
>> No, this is not what I had in mind. This is simply reverting my
>> refactoring work which was to make ix86_option_override_internal get
>> rid of the global_options dependency. Here is the problem:
>> global_options gets some flags set after command-line options are read
>> (ix86_preferred_stack_boundary_arg in this case). But, this does not
>> get saved into target_option_default_node because there is no
>> corresponding field in cl_target_option for
>> ix86_preferred_stack_boundary_arg. So, when you restore
>> target_option_default_node to func_options in
>> ix86_valid_target_attribute_p, this particular flag does not get
>> copied. So, you can either copy this explicitly to func_options which
>> was your first patch or you could extend cl_target_option to include
>> this field too which is done by making
>> ix86_preferred_stack_boundary_arg a Variable in i386.opt. The latter
>> is cleaner because it always saves the default flags into
>> target_option_default_node.
>
> I quickly hacked up what I had in mind and attached the patch. Can you
> check if this fixes your problem?
>
> Thanks
> Sri
>
>

Well, this way it could be fixed too. 

But opts->x_ix86_preferred_stack_bounary_arg is not dependent on any
pragma or target attribute. Correct me if that is wrong.

And this code 

  if (opts_set->x_ix86_preferred_stack_boundary_arg)
    {
  int min = (TARGET_64BIT_P (opts->x_ix86_isa_flags)
 ? (TARGET_SSE_P (opts->x_ix86_isa_flags) ? 4 : 3) : 2);
  int max = (TARGET_SEH ? 4 : 12);

  if (opts->x_ix86_preferred_stack_boundary_arg < min
  || opts->x_ix86_preferred_stack_boundary_arg> max)

checks func_options against global_options_set:

  new_target = ix86_valid_target_attribute_tree (args, &func_options,
 &global_options_set);

So this code as it is will fail if this option was ever made target specific.
There is still a reason why this check needs to be executed each time
the opts->x_ix86_isa_flags changes.

Because of this I still would prefer my second attempt of fixing this issue,
because it is simple and it removes the different handling between
-mpreferred-stack-boundary and -mincoming-stack-boundary.

If that should be re-factored for any reason, I think all similar options
should be changed on one sweep. But in that case the global_options_set
must somehow also become target specific.
And we need to invent something like a target pragma to change this options
because it must somehow be possible to test this code.

Thanks
Bernd.

>>
>> Thanks
>> Sri
>>
>>
>>
>> I'm not experienced enough in this new option handling stuff, let's
>> ask Sriraman for his opinion on the patch.


 I do not think this is the right fix, I am wondering how many other
 target flags we may have to copy this way from global_options. I
 notice that other flags like ix86_regparm and
 ix86_incoming_stack_boundary_arg are very similar. Why should this
 need to be restored from global_options explicitly? This patch may fix
 the issue but it seems like a hack to me. We should be able to restore
 whatever we need from target_option_default_node via
 ix86_function_specific_restore. Maybe modifying the i386.opt file to
 make ix86_preferred_stack_boundary_arg a variable might fix it. See
 ix86_isa_flags for instance in i386.opt.


 Please let me know what you think.
>>>
>>> Thanks, now I see what you mean. I can change it the other way round
>>> and implement ix86_preferred_stack_boundary like 
>>> ix86_incoming_stack_boundary.
>>>
>>> using this define in options.h:
>>> #define ix86_preferred_stack_boundary_arg 
>>> global_options.x_ix86_preferred_stack_boundary_arg
>>>
>>> the global option is never copied into function specific options.
>>>
>>> Attached is the updated patch.
>>>
>>> OK for trunk after boot-strap and regression-testing?
>>>
>>> Bernd.
>>>
>>>
>>> PS: I have one more patch pending, and would like to know what you think
>>> about it: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00133.html
>>> That has also to do with global sta

Re: [PATCH, rs6000] ELFv2 ABI preparation: Refactor rs6000_function_arg

2013-11-13 Thread David Edelsohn

* config/rs6000/rs6000.c (rs6000_psave_function_arg): New function.
(rs6000_finish_function_arg): Likewise.
(rs6000_function_arg): Use rs6000_psave_function_arg and
rs6000_finish_function_arg to handle both vector and floating
point arguments that are also passed in GPRs / the stack.

This is okay, but

@@ -9711,85 +9781,33 @@ rs6000_function_arg (cumulative_args_t c
  {
   rtx rvec[GP_ARG_NUM_REG + 1];
   rtx r;
-  int k;
-  bool needs_psave;
-  enum machine_mode fmode = mode;
+  int k = 0;
   unsigned long n_fpreg = (GET_MODE_SIZE (mode) + 7) >> 3;

+  /* Do we also need to pass this argument in the parameter
+ save area?  */
+  if (type && (cum->nargs_prototype <= 0
+   || (DEFAULT_ABI == ABI_AIX
+   && TARGET_XL_COMPAT
+   && align_words >= GP_ARG_NUM_REG)))
+k = rs6000_psave_function_arg (mode, type, align_words, rvec);
+
+  /* Describe where this argument goes in the fprs.  */
+
+  /* Check if the argument is split over registers and memory.
+ This can only ever happen for long double or _Decimal128;
+ complex types are handled via split_complex_arg.  */
+  enum machine_mode fmode = mode;

Why did you move the declaration of fmode away from the beginning of
the scope block? C++ and C99 accept it, but I would prefer it in the
beginning of the block.

Thanks, David

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Ulrich Weigand

Vladimir Makarov wrote:

>  PR rtl-optimization/59036
>  * ira-color.c (struct allocno_color_data): Add new members
>  first_thread_allocno, next_thread_allocno, thread_freq.
>  (sorted_copies): New static var.
>  (allocnos_conflict_by_live_ranges_p, copy_freq_compare_func): Move
>  up.
>  (allocno_thread_conflict_p, merge_threads)
>  (form_threads_from_copies, form_threads_from_bucket)
>  (form_threads_from_colorable_allocno, init_allocno_threads): New
>  functions.
>  (bucket_allocno_compare_func): Add comparison by thread frequency
>  and threads.
>  (add_allocno_to_ordered_bucket): Rename to
>  add_allocno_to_ordered_colorable_bucket.  Remove parameter.
>  (push_only_colorable): Call form_threads_from_bucket.
>  (color_pass): Call init_allocno_threads.  Use
>  consideration_allocno_bitmap instead of coloring_allocno_bitmap
>  for nuillify allocno color data.
>  (ira_initiate_assign, ira_finish_assign): Allocate/free
>  sorted_copies.
>  (coalesce_allocnos): Use static sorted copies.

Unfortunately, this patch causes cc1 for powerpc64-linux to crash for me
even when compiling "int main () { return 0; }" with -O due to a memory
corruption somewhere:

 <*free_lang_data>   <*free_inline_summary> 
Assembling 
functions:
 main*** glibc detected *** ./cc1: corrupted double-linked list: 
0x010035a3b4c0 ***  
   
=== Backtrace: =


/lib64/libc.so.6[0x80a284fe04]  


/lib64/libc.so.6[0x80a2850144]  


/lib64/libc.so.6[0x80a2852b18]  


./cc1(_Z16empty_alloc_poolP14alloc_pool_def-0x1306f10)[0x10263bc8]  


./cc1(_Z15free_alloc_poolP14alloc_pool_def-0x1306e90)[0x10263c58]   


./cc1[0x103746cc]   


./cc1(_Z13df_scan_allocP15bitmap_head_def-0x11fe808)[0x10374810]


./cc1[0x106189c8]   


./cc1(_Z16execute_one_passP8opt_pass-0xe6ebe0)[0x1071e8e8]  


./cc1(_Z17execute_pass_listP8opt_pass-0xe6e884)[0x1071ec54] 


./cc1(_Z17execute_pass_listP8opt_pass-0xe6e83c)[0x1071ec9c] 


./cc1[0x10335b98]   


./cc1[0x103361c0]   


./cc1(_Z7compilev-0x123a0ec)[0x1033713c]


./cc1(_Z25finalize_compilation_unitv-0x1239ea4)[0x10337394] 


./cc1(_Z27c_write_global_declarationsv-0x1437034)[0x1012f044]   


./cc1[0x10820814]   


./cc1[0x10824334]   


./cc1(_Z11toplev_mainiPPc-0xd7068c)[0x108245bc] 


.

Re: [PATCH, testsuite] Add lp64 to target requirements of new IRA shrink wrapping preparation testcases

2013-11-13 Thread Martin Jambor

On Wed, Nov 13, 2013 at 12:41:54PM -0800, H.J. Lu wrote:
> On Wed, Nov 13, 2013 at 7:27 AM, Martin Jambor  wrote:
> > Hi,
> >
> > the testcases I have added for IRA shrink wrapping preparation code
> > were not intended for -m32 on x86_64 and should not be tested with it,
> > thus I'm adding lp64 to the target requirements.
> >
> > Let me also briefly mention that I would like to make the testcases
> > also run on ppc64 and therefore I did not put them into
> > gcc.target/i386.
> >
> > I have tested the changes by running
> >
> > make -k check RUNTESTFLAGS="dg.exp=ira-shrinkwrap-prep-?.c
> > --target_board=unix/\{,32\}" and
> > RUNTESTFLAGS="dg.exp=pr10474.c --target_board=unix/\{,32\}"
> >
> > and examining the gcc.sum files.  Since this was recommended by Richi
> > on IRC, I will commit the change in a few minutes.
> >
> > Thanks,
> >
> > Martin
> >
> >
> > 2013-11-13  Martin Jambor  
> >
> > * testsuite/gcc.dg/ira-shrinkwrap-prep-1.c: Add lp64 to target
> > requirements.
> > * testsuite/gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
> 
> Those 2 pass on x32.
> 
> > * testsuite/gcc.dg/pr10474.c: Likewise.
> >
> 
> This fails on x32.  Should it pass on x32?
> 

I would assume all three should be ignored as UNSUPPORTED on x32 now.
How exactly does it fail?

Martin

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Yufeng Zhang


Hi Bill,

On 11/13/13 20:54, Bill Schmidt wrote:

Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.


Thanks a lot for the review.  I've attached an updated patch with the 
suggested changes incorporated.


For the next-interp adventure, I was quite happy to do the experiment; 
it's a good chance of gaining insight into the pass.  Many thanks for 
your prompt replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final approval,
as I'm not a maintainer.


Hi Richard, would you be happy to OK the patch?

Regards,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with &name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 88afc91..26502c3 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "hash-table.h"
 #include "tree-ssa-address.h"
+#include "tree-affine.h"
 
 /* Information about a strength reduction candidate.  Each statement
in the candidate table represents an expression of one of the
@@ -420,6 +421,42 @@ cand_chain_hasher::equal (const value_type *chain1, const 
compare_type *chain2)
 /* Hash table embodying a mapping from base exprs to chains of candidates.  */
 static hash_table  base_cand_map;
 
+/* Pointer map used by tree_to_aff_combination_expand.  */
+static struct pointer_map_t *name_expansions;
+/* Pointer map embodying a mapping from bases to alternative bases.  */
+static struct pointer_map_t *alt_base_map;
+
+/* Given BASE, use the tree affine combiniation facilities to
+   find the underlying tree expression for BASE, with any
+   immediate offset excluded.  */
+
+static tree
+get_alternative_base (tree base)
+{
+  tree *result = (tree *) pointer_map_contains (alt_base_map, base);
+
+  if (result == NULL)
+{
+  tree expr;
+  aff_tree aff;
+
+  tree_to_aff_combination_expand (base, TREE_TYPE (base),
+ &aff, &name_expansions);
+  aff.offset = tree_to_double_int (integer_zero_node);
+  expr = aff_combination_to_tree (&aff);
+
+  result = (tree *) pointer_map_insert (alt_base_map, base);
+  gcc_assert (!*result);
+
+  if (expr == base)
+   *result = NULL;
+  else
+   *result = expr;
+}
+
+  return *result;
+}
+
 /* Look in the candidate table for a CAND_PHI that defines BASE and
return it if found; otherwise return NULL.  */
 
@@ -440,8 +477,9 @@ find_phi_def (tree base)
 }
 
 /* Helper routine for find_basis_for_candidate.  May be called twice:
-   once for the candidate's base expr, and optionally again for the
-   candidate's phi definition.  */
+   once for the candidate's base expr, and optionally again either for
+   the candidate's phi definition or for a CAND_REF's alternative base
+   expression.  */
 
 static slsr_cand_t
 find_basis_for_base_expr (slsr_cand_t c, tree base_expr)
@@ -518,6 +556,13 @@ find_basis_for_candidate (slsr_cand_t c)
}
 }
 
+  if (!basis && c->kind == CAND_REF)
+{
+  tree alt_base_expr = get_alternative_base (c->base_expr);
+  if (alt_base_expr)
+   basis = find_basis_for_base_expr (c, alt_base_expr);
+}
+
   if (basis)
 {
   c->sibling = basis->dependent;
@@ -528,17 +573,21 @@ find_basis_for_candidate (slsr_cand_t c)
   return 0;
 }
 
-/* Record a mapping from the base expression of C to C itself, indicating that
-   C may potentially serve as a basis using that base expression.  */
+/* Record a mapping from BASE to C, indicating that C may potentially serve
+   as a basis using that base expression.  BASE may be the same as
+   C->BASE_EXPR; alternatively BASE can be a different tree that share the
+   underlining expression of C->BASE_EXPR.  */
 
 static void
-record_potential_basis (slsr_cand_t c)
+recor

gimple-ssa-isolate-paths comment fix

2013-11-13 Thread Steven Bosscher

Committed, obvious.

   * gimple-ssa-isolate-paths.c (pass_isolate_erroneous_paths): Comment fix.

Index: gimple-ssa-isolate-paths.c
===
--- gimple-ssa-isolate-paths.c  (revision 204761)
+++ gimple-ssa-isolate-paths.c  (working copy)
@@ -415,7 +415,7 @@
   bool gate () { return gate_isolate_erroneous_paths (); }
   unsigned int execute () { return gimple_ssa_isolate_erroneous_paths (); }

-}; // class pass_uncprop
+}; // class pass_isolate_erroneous_paths
 }

 gimple_opt_pass *

Re: [PATCH][1-3] New configure option to enable Position independent executable as default.

2013-11-13 Thread Mike Stump

On Nov 13, 2013, at 2:28 PM, Magnus Granberg  wrote:
> This patchset will add a new configure options --enable-default-pie.

Ick.  Would be nice to figure out on what systems one can do this and just do 
it without the configure option.  Is there some reason that we need an option 
for it?

[wide-int] Fix memory stomper in dwarf

2013-11-13 Thread Mike Stump

This fixes a memory stomper that Kenny found.

This also improves the code in the face of vector of partial ints...

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index acee2000..ab8852f 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3163,7 +3163,7 @@ static void add_AT_location_description   (dw_die_ref, 
enum dwarf_attribute,
 static void add_data_member_location_attribute (dw_die_ref, tree);
 static bool add_const_value_attribute (dw_die_ref, rtx);
 static void insert_int (HOST_WIDE_INT, unsigned, unsigned char *);
-static void insert_wide_int (const wide_int &, unsigned char *);
+static void insert_wide_int (const wide_int &, unsigned char *, int);
 static void insert_float (const_rtx, unsigned char *);
 static rtx rtl_for_decl_location (tree);
 static bool add_location_or_const_value_attribute (dw_die_ref, tree, bool,
@@ -13464,7 +13464,7 @@ loc_descriptor (rtx rtl, enum machine_mode mode,
  for (i = 0, p = array; i < length; i++, p += elt_size)
{
  rtx elt = CONST_VECTOR_ELT (rtl, i);
- insert_wide_int (std::make_pair (elt, imode), p);
+ insert_wide_int (std::make_pair (elt, imode), p, elt_size);
}
  break;
 
@@ -15103,18 +15103,29 @@ extract_int (const unsigned char *src, unsigned int 
size)
 /* Writes wide_int values to dw_vec_const array.  */
 
 static void
-insert_wide_int (const wide_int &val, unsigned char *dest)
+insert_wide_int (const wide_int &val, unsigned char *dest, int elt_size)
 {
   int i;
 
+  if (elt_size <= HOST_BITS_PER_WIDE_INT/BITS_PER_UNIT)
+{
+  insert_int ((HOST_WIDE_INT) val.elt (0), elt_size, dest);
+  return;
+}
+
+  // We'd have to extend this code to support odd sizes.
+  gcc_assert (elt_size % (HOST_BITS_PER_WIDE_INT/BITS_PER_UNIT) == 0);
+
+  int n = elt_size / (HOST_BITS_PER_WIDE_INT/BITS_PER_UNIT);
+
   if (WORDS_BIG_ENDIAN)
-for (i = (int)get_full_len (val) - 1; i >= 0; i--)
+for (i = n - 1; i >= 0; i--)
   {
insert_int ((HOST_WIDE_INT) val.elt (i), sizeof (HOST_WIDE_INT), dest);
dest += sizeof (HOST_WIDE_INT);
   }
   else
-for (i = 0; i < (int)get_full_len (val); i++)
+for (i = 0; i < n; i++)
   {
insert_int ((HOST_WIDE_INT) val.elt (i), sizeof (HOST_WIDE_INT), dest);
dest += sizeof (HOST_WIDE_INT);
@@ -15207,7 +15218,7 @@ add_const_value_attribute (dw_die_ref die, rtx rtl)
for (i = 0, p = array; i < length; i++, p += elt_size)
  {
rtx elt = CONST_VECTOR_ELT (rtl, i);
-   insert_wide_int (std::make_pair (elt, imode), p);
+   insert_wide_int (std::make_pair (elt, imode), p, elt_size);
  }
break;

[PATCH][PR middle-end/59119] Avoid referencing released SSA_NAMEs

2013-11-13 Thread Jeff Law



We have a known API wart with the SSA_NAME manager.  Specifically, once 
you start releasing names, you can't have dangling references (say in 
unreachable blocks) and allococate new names.


The erroneous path optimization could run afoul of that requirement as 
it removed statements after an explicit erroneous statement, 
particularly in the code to adjust debug statements.  That resulted in a 
segfault after we tried to us a NULL TREE_TYPE of a released SSA_NAME.


I think I know how we can fix the name manager, but that's future work.

For the erroneous path optimizer, two changes avoid these problems. 
First on a local basis, when we remove statements we do so from the end 
of the block up to, but not including the builtin_trap.


Second, on a global basis, we first find all the implicit erroneous 
paths that have to be isolated.  That process creates SSA_NAMEs, but 
does not remove edges in the CFG (it redirects edges).  So it should be 
safe.


After we find all the implicit erroneous paths in the current function, 
we then look for any explicit erroneous statements in each block and 
optimize those (which can create unreachable code and dangling references).


Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed on the trunk.


* PR middle-end/59119
* gimple-ssa-isolate-paths.c (find_implicit_erroneous_behaviour): New
function, extracted from gimple_ssa_isolate_erroneous_paths.
(find_explicit_erroneous_behaviour): Similarly.
(insert_trap_and_remove_trailing_statements): Remove statements
in reverse order.

* PR middle-end/59119
* gcc.c-torture/compile/pr59119.c: New test.

diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c
index 2d8f176..0051da2 100644
--- a/gcc/gimple-ssa-isolate-paths.c
+++ b/gcc/gimple-ssa-isolate-paths.c
@@ -100,14 +100,16 @@ insert_trap_and_remove_trailing_statements 
(gimple_stmt_iterator *si_p, tree op)
   else
 gsi_insert_before (si_p, seq, GSI_NEW_STMT);
 
-  /* The iterator points to the __builtin_trap.  Advance the iterator
- and delete everything else in the block.  */
-  gsi_next (si_p);
-  for (; !gsi_end_p (*si_p);)
+  /* We must remove statements from the end of the block so that we
+ never reference a released SSA_NAME.  */
+  basic_block bb = gimple_bb (gsi_stmt (*si_p));
+  for (gimple_stmt_iterator si = gsi_last_bb (bb);
+   gsi_stmt (si) != gsi_stmt (*si_p);
+   si = gsi_last_bb (bb))
 {
-  stmt = gsi_stmt (*si_p);
+  stmt = gsi_stmt (si);
   unlink_stmt_vdef (stmt);
-  gsi_remove (si_p, true);
+  gsi_remove (&si, true);
   release_defs (stmt);
 }
 }
@@ -192,40 +194,19 @@ isolate_path (basic_block bb, basic_block duplicate,
   return duplicate;
 }
 
-/* Search the function for statements which, if executed, would cause
-   the program to fault such as a dereference of a NULL pointer.
-
-   Such a program can't be valid if such a statement was to execute
-   according to ISO standards.
-
-   We detect explicit NULL pointer dereferences as well as those implied
-   by a PHI argument having a NULL value which unconditionally flows into
-   a dereference in the same block as the PHI.
-
-   In the former case we replace the offending statement with an
-   unconditional trap and eliminate the outgoing edges from the statement's
-   basic block.  This may expose secondary optimization opportunities.
-
-   In the latter case, we isolate the path(s) with the NULL PHI 
-   feeding the dereference.  We can then replace the offending statement
-   and eliminate the outgoing edges in the duplicate.  Again, this may
-   expose secondary optimization opportunities.
+/* Look for PHI nodes which feed statements in the same block where
+   the value of the PHI node implies the statement is erroneous.
 
-   A warning for both cases may be advisable as well.
+   For example, a NULL PHI arg value which then feeds a pointer
+   dereference.
 
-   Other statically detectable violations of the ISO standard could be
-   handled in a similar way, such as out-of-bounds array indexing.  */
-
-static unsigned int
-gimple_ssa_isolate_erroneous_paths (void)
+   When found isolate and optimize the path associated with the PHI
+   argument feeding the erroneous statement.  */
+static void
+find_implicit_erroneous_behaviour (void)
 {
   basic_block bb;
 
-  initialize_original_copy_tables ();
-
-  /* Search all the blocks for edges which, if traversed, will
- result in undefined behaviour.  */
-  cfg_altered = false;
   FOR_EACH_BB (bb)
 {
   gimple_stmt_iterator si;
@@ -288,6 +269,21 @@ gimple_ssa_isolate_erroneous_paths (void)
}
}
}
+}
+}
+
+/* Look for statements which exhibit erroneous behaviour.  For example
+   a NULL pointer dereference. 
+
+   When found, optimize the block containing the erroneous behaviour.  */
+static void
+find_explicit_erroneous_behaviour (void)
+{
+

Re: patch to implement thread coloring in IRA

2013-11-13 Thread Steven Bosscher

On Wed, Nov 13, 2013 at 6:56 PM, Vladimir Makarov wrote:
>   The following patch improves coloring.  The order of pushing allocnos on
> the coloring stack from a bunch of colorable allocnos was always important
> for generated code performance.  LRA has a mechanism of allocating pseudos
> by threads.  Thread in LRA is a set of non-conflicting pseudos connected by
> moves (or by future reload insns).  Allocating pseudos by threads in LRA
> permits to improve code by increasing chance of removing the move insns.
>
>   So the same mechanism can be used for IRA.  The patch implements it.  The
> difference is only that LRA forms thread statically before allocation
> sub-pass.  That is because the basic allocation are already done in IRA.
> The statically thread forming works well for IRA too.  But even better
> results can be got by dynamically forming threads.  It means that we are
> forming threads during allocation and includes only colorable allocnos.
>
>   The results of using threads in IRA is pretty good.  The average code size
> (text segment) of SPEC2000 is improved (by >0.1% for x86 SPECFP2000 and >
> 0.3% for x86-64 SPECFP2000). The biggest code performance improvement (> 1%)
> is obtained on x86-64 SPECFP2000.  Performance tools report that additional
> code takes only about 0.05% additionally executed insns.


Nice!

Can you please also update the leading comment in ira.c? It seems
worth mentioning this approach under the "Coloring" bullet
(ira.c:176).
(Not sure if that whole comment block is otherwise up to date??)

Ciao!
Steven

Re: [PATCH] Fix failing assertion in calls.c:store_unaligned_arguments_into_pseudos

2013-11-13 Thread Jeff Law


On 11/11/13 12:10, Ulrich Weigand wrote:

Jeff Law wrote:

On 11/11/13 07:32, Ulrich Weigand wrote:

However, looking more closely, it seems store_unaligned_arguments_into_pseudos
is not really useful for PARALLEL arguments in the first place.  What this
routine does is load arguments into args[i].aligned_regs.  But if we have
an argument where args[i].reg is a PARALLEL, args[i].aligned_regs will in
fact never be used later on at all!   Instead, PARALLEL will always be
handled directly via emit_group_move (in load_register_parameters), so
the code generated by store_unaligned_arguments_into_pseudos for such cases
is simply dead anyway.



Does this work on the PA, particularly the 32bit ABI?
   /* Structures 5 to 8 bytes in size are passed in the general
   registers in the same manner as other non floating-point
   objects.  The data is right-justified and zero-extended
   to 64 bits.  This is opposite to the normal justification
   used on big endian targets and requires special treatment.
   We now define BLOCK_REG_PADDING to pad these objects.
   Aggregates, complex and vector types are passed in the same
   manner as structures.  */


It seems this is exactly one of the cases I mention above, where current
code generates code to load up "aligned registers" which are then never
used again.
Agreed.  I built a cross compiler and managed to get some old grey 
matter activated enough to read the resulting code.




I'm not really set up to test PA, but I built a cross to hppa-linux,
and compiled this simple test passing a 5-byte struct:

Few people are, I think the cross testing is fine.




struct test { char x[5]; };

struct test x;
void func (struct test);

void caller (void)
{
   func (x);
}

And indeed with my patch, I'm getting a lot less code at -O0.  However,
building with -O2 both with and without my patch, the resulting assembler
files are identical again -- which appears to confirm that this extra
code is in fact dead ...
Just as importantly, if I'm reading the resulting code correctly, the 
values we actually end up passing in the argument registers (%r26,%r25) 
are the same before/after your patch.


Let me take a look at the actual patch again since it seems nobody is 
ready to go out on a limb and approve this stuff :-)


jeff

Re: Revert libsanitizer patches or fix 59009

2013-11-13 Thread Peter Bergner

On Wed, 2013-11-13 at 18:29 +0100, Jakub Jelinek wrote:
> On Wed, Nov 13, 2013 at 11:25:06AM -0600, Peter Bergner wrote:
> > >   * sanitizer_common/sanitizer_platform_limits_linux.cc: Temporarily
> > >   ifdef out almost the whole source.
> > >   * sanitizer_common/sanitizer_common_syscalls.inc: Likewise.
> > 
> Ok, thanks.

Ok, committed as revision 204757.  Thanks.

Looking at the testsuite results, I'm seeing some new ASAN failures,
but they seem related to the new merge and not my patch, so I'll
open a new bugzilla entry for it.

Peter



[bergner@igoo asan]$ cat foo.i 
extern int printf(const char *format, ...);
void
Child (void)
{
  char x[32] = {0};
  printf ("Child:  \n", x);
}

int
main (void)
{
  return 0;
}
[bergner@igoo
asan]$ /home/bergner/gcc/build/gcc-fsf-mainline-asan/gcc/xgcc
-B/home/bergner/gcc/build/gcc-fsf-mainline-asan/gcc/ -fsanitize=address
-O1 -m32 -c foo.i 
/tmp/ccadY6mS.s: Assembler messages:
/tmp/ccadY6mS.s:136: Error: symbol `.LASANPC0' is already defined

[PATCH][1-3] New configure option to enable Position independent executable as default.

2013-11-13 Thread Magnus Granberg

Hi
This patchset will add a new configure options --enable-default-pie.
With the new option enable will make it pass -fPIE and -pie from the gcc and 
g++ frontend. Have only add the support for two targets but should work on
more targes. In configure.ac we add the new option. We can't compile the 
compiler or the crt stuff with -fPIE it will brake the PCH and the crtbegin and
crtend files. The disabling is done in the Makefiles. The needed spec is added 
to DRIVER_SELF_SPECS. We disable all the profiling test for the linking will 
fail.Tested on x86_64 linux (Gentoo).

/Magnus Granberg

Changlog

2013-11-10  Magnus Granberg  

/gcc
* config/gnu-user.h: Define PIE_DRIVER_SELF_SPECS for PIE 
as default and GNU_DRIVER_SELF_SPECS.
* config/i386/gnu-user-common.h: Define DRIVER_SELF_SPECS
* configure.ac: Add new option that enable PIE as default.
* configure, config.in: Rebuild.
* Makefile.in: Disable PIE when building the compiler.
* doc/install.texi: Add the new configure option default PIE.
* doc/invoke.texi: Add note for the new configure option default PIE.
* testsuite/gcc/default-pie.c: New test for new configure option
--enale-default-pie
* testsuite/gcc.dg/other/anon5.C: Add skip test as it fail to link
on effective_target default_pie.
* testsuite/lib/target-supports.exp (check_profiling_available):
We can't use profiling on effective target default_pie. 
(check_effective_target_pie): Add check_effective_target_default_pie.

/libgcc
* Makefile.in: Disable PIE when building the crtbegin/end files.


--- a/gcc/config/gnu-user.h	2013-08-20 10:31:40.0 +0200
+++ b/gcc/config/gnu-user.h	2013-10-23 22:01:42.337238981 +0200
@@ -134,3 +134,17 @@ see the files COPYING3 and COPYING.RUNTI
 /* Additional libraries needed by -static-libtsan.  */
 #undef STATIC_LIBTSAN_LIBS
 #define STATIC_LIBTSAN_LIBS "-ldl -lpthread"
+
+/* We use this to make the compiler use -fPIE as default and link
+   with -pie.  */
+#ifdef ENABLE_DEFAULT_PIE
+#define PIE_DRIVER_SELF_SPECS \
+"%{pie|fpic|fPIC|fpie|fPIE|fno-pic|fno-PIC|fno-pie|fno-PIE| \
+  shared|static|nostdlib|nostartfiles:;:-fPIE -pie}"
+#else
+#define PIE_DRIVER_SELF_SPECS ""
+#endif
+
+#ifndef GNU_DRIVER_SELF_SPECS
+#define GNU_DRIVER_SELF_SPECS PIE_DRIVER_SELF_SPECS
+#endif
--- a/gcc/config/i386/gnu-user-common.h	2013-01-10 21:38:27.0 +0100
+++ b/gcc/config/i386/gnu-user-common.h	2013-10-23 17:37:45.432767049 +0200
@@ -70,3 +70,8 @@ along with GCC; see the file COPYING3.
 
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
+
+/* Use GNU_DRIVER_SELF_SPECS.  */
+#ifndef DRIVER_SELF_SPECS
+#define DRIVER_SELF_SPECS GNU_DRIVER_SELF_SPECS
+#endif
--- a/gcc/configure.ac	2013-09-25 18:10:35.0 +0200
+++ b/gcc/configure.ac	2013-10-22 21:26:56.287602139 +0200
@@ -5434,6 +5434,31 @@ if test x"${LINKER_HASH_STYLE}" != x; th
  [The linker hash style])
 fi
 
+# Check whether --enable-default-pie was given and target have the support.
+AC_ARG_ENABLE(default-pie,
+[AS_HELP_STRING([--enable-default-pie], [Enable Position independent executable as default.
+ If we have suppot for it when compiling and linking.
+ Linux targets supported i?86 and x86_64.])],
+enable_default_pie=$enableval,
+enable_default_pie=no)
+if test x$enable_default_pie = xyes; then
+  AC_MSG_CHECKING(if $target support to default with -fPIE and link with -pie as default)
+  enable_default_pie=no
+  case $target in
+i?86*-*-linux* | x86_64*-*-linux*)
+  enable_default_pie=yes
+  ;;
+*)
+  ;;
+esac
+  AC_MSG_RESULT($enable_default_pie)
+fi
+if test x$enable_default_pie == xyes ; then
+  AC_DEFINE(ENABLE_DEFAULT_PIE, 1,
+  [Define if your target support default-pie and you have enable it.])
+fi
+AC_SUBST([enable_default_pie])
+
 # Configure the subdirectories
 # AC_CONFIG_SUBDIRS($subdirs)
 
--- a/gcc/Makefile.in	2013-10-02 21:52:27.0 +0200
+++ b/gcc/Makefile.in	2013-10-24 17:46:22.055357122 +0200
@@ -957,14 +957,23 @@ CONTEXT_H = context.h
 # cross compiler which does not use the native headers and libraries.
 INTERNAL_CFLAGS = -DIN_GCC @CROSS@
 
+# We don't want to compile the compiler with -fPIE, it make PCH fail.
+enable_default_pie = @enable_default_pie@
+ifeq ($(enable_default_pie),yes)
+NOPIE_CFLAGS = -fno-PIE
+else
+NOPIE_CFLAGS=
+endif
+
 # This is the variable actually used when we compile. If you change this,
 # you probably want to update BUILD_CFLAGS in configure.ac
-ALL_CFLAGS = $(T_CFLAGS) $(CFLAGS-$@) \
+ALL_CFLAGS = $(NOPIE_CFLAGS) $(T_CFLAGS) $(CFLAGS-$@) \
   $(CFLAGS) $(INTERNAL_CFLAGS) $(COVERAGE_FLAGS) $(WARN_CFLAGS) @DEFS@
 
 # The C++ version.
-ALL_CXXFLAGS = $(T_CFLAGS) $(CFLAGS-$@) $(CXXFLAGS) $(INTERNAL_CFLAGS) \
-  $(COVERAGE_FLAGS) $(NOEXCEPTION_FLAGS) $(WARN_CXXFLAGS)

Re: [patch 1/3] Create gimple-iterator.h and gimple-walk.[ch]

2013-11-13 Thread Jeff Law


On 11/13/13 07:46, Andrew MacLeod wrote:

This set of 3 patches creates gimple-iterator.h to hold the prototypes
for the existing gimple-iterator.c.  It also extracts the gimple-stmt
'walker' routines into their own file. I didn't originally intend to do
that, but I discovered that the include dependencies between statements,
iterators and walkers made the separation occur naturally, so went with it.

There were a couple of routines that got moved from .h to .c files.
tree-phinodes.h::set_phi_nodes()  and
gimple.h::gimple_seq_set_location() were moved into their .c files
because the inlined function made use of statement iterators, and by
moving them into the .c file, it removes the requirement on
gimple-iterator.h being needed so simply parse the header file.  I do
not believe either will have any measurable impact.

patch 1 has the core changes.
patch 2 updates the core compiler files to include one or both .h files
as needed.
patch 3 updates a few other files.. 2 target config files and 2
testsuite plugin tests which require the new includes.

Note that at this moment, gimple.h still has "enum
gsi_iterator_update".  I can't move that out until I split gimplfy.h
into the gimplfy-be.h component mentioned in the previous patch set
(gimplify-be.h will require gimple-iterator.h and contain only
gimplification routines that the back end uses).  That'll be the next
patch set...  and then once that is out, I'll clean up the remaining
gimple.h prototypes and then see about flattening all the #includes in
gimple.h and gimplfy.h

bootstraps on x86_64-unknown-linux-gnu with no new regressions.

I built stage 1 successfully for aarch64-elf, aarch64-linux-gnu,
powerpc-darwin8, powerpc-eabi, and  rs6000-ibm-aix6.0 to verify the
config changes compile for those targets.   I'm also currently building
the other targets I built when gimplify.h was split out just to make
sure I didn't miss anything there but I don't think there are any
dependencies in those targets.

OK for mainline?

All 3 patches in the series are fine.

Thanks,
jeff

[PATCH] Time profiler - phase 2

2013-11-13 Thread Martin Liška

Hello,
   this patch introduces new function reordering feature that is based
on the patch I submitted few days ago.

The idea of a new flag (-fprofile-reorder-functions) is to reorder
functions according to the first execution in instrumented run. Of
course, the option operates only in LTO mode, where the compiler can
see all call graph nodes.

I've tested the patch on programs like Gimp, Inkscape, Firefox. There
are still some inaccuracies caused by weak symbols in C++ and has been
under investigation.

Thank you,
Martin
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c566a85..1562098 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,15 @@
+2013-11-13	Martin Liska	
+		Jan Hubicka  
+
+	* cgraphunit.c (node_cmp): New function.
+	(expand_all_functions): Function ordering added.
+	* common.opt: New profile based function reordering flag introduced.
+	* coverage.c (get_coverage_counts): Wrong profile handled.
+	* ipa.c (cgraph_externally_visible_p): New late flag introduced.
+	* lto-partition.c: Support for time profile added.
+	* lto.c: Likewise.
+	* value-prof.c: Histogram instrumentation switch added.
+
 2013-11-13  Vladimir Makarov  
 
 	PR rtl-optimization/59036
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4765e6a..7cdd9a4 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1821,6 +1821,17 @@ expand_function (struct cgraph_node *node)
   ipa_remove_all_references (&node->ref_list);
 }
 
+/* Node comparer that is responsible for the order that corresponds
+   to time when a function was launched for the first time.  */
+
+static int
+node_cmp (const void *pa, const void *pb)
+{
+  const struct cgraph_node *a = *(const struct cgraph_node * const *) pa;
+  const struct cgraph_node *b = *(const struct cgraph_node * const *) pb;
+
+  return b->tp_first_run - a->tp_first_run;
+}
 
 /* Expand all functions that must be output.
 
@@ -1832,11 +1843,14 @@ expand_function (struct cgraph_node *node)
to use subsections to make the output functions appear in top-down
order).  */
 
+
 static void
 expand_all_functions (void)
 {
   struct cgraph_node *node;
   struct cgraph_node **order = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes);
+
+  unsigned int expanded_func_count = 0, profiled_func_count = 0;
   int order_pos, new_order_pos = 0;
   int i;
 
@@ -1849,19 +1863,35 @@ expand_all_functions (void)
 if (order[i]->process)
   order[new_order_pos++] = order[i];
 
+  if (flag_profile_reorder_functions)
+qsort (order, new_order_pos, sizeof (struct cgraph_node *), node_cmp);
+
   for (i = new_order_pos - 1; i >= 0; i--)
 {
   node = order[i];
+
   if (node->process)
 	{
+ expanded_func_count++;
+ if(node->tp_first_run)
+   profiled_func_count++;
+
 	  node->process = 0;
 	  expand_function (node);
 	}
 }
+
+if (in_lto_p && dump_file)
+  fprintf (dump_file, "Expanded functions with time profile (%s):%u/%u\n",
+   main_input_filename, profiled_func_count, expanded_func_count);
+
+  if (cgraph_dump_file && flag_profile_reorder_functions && in_lto_p)
+fprintf (cgraph_dump_file, "Expanded functions with time profile:%u/%u\n",
+ profiled_func_count, expanded_func_count);
+
   cgraph_process_new_functions ();
 
   free (order);
-
 }
 
 /* This is used to sort the node types by the cgraph order number.  */
@@ -2186,6 +2216,7 @@ compile (void)
 #endif
 
   cgraph_state = CGRAPH_STATE_EXPANSION;
+
   if (!flag_toplevel_reorder)
 output_in_order ();
   else
diff --git a/gcc/common.opt b/gcc/common.opt
index d5971df..85d5c74 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1708,6 +1708,10 @@ fprofile-report
 Common Report Var(profile_report)
 Report on consistency of profile
 
+fprofile-reorder-functions
+Common Report Var(flag_profile_reorder_functions)
+Enable function reordering that improves code placement
+
 frandom-seed
 Common Var(common_deferred_options) Defer
 
diff --git a/gcc/ipa-profile.c b/gcc/ipa-profile.c
index 1260069..eff4b51 100644
--- a/gcc/ipa-profile.c
+++ b/gcc/ipa-profile.c
@@ -465,6 +465,7 @@ ipa_propagate_frequency (struct cgraph_node *node)
   if (d.maybe_unlikely_executed)
 {
   node->frequency = NODE_FREQUENCY_UNLIKELY_EXECUTED;
+  node->tp_first_run = 0;
   if (dump_file)
 	fprintf (dump_file, "Node %s promoted to unlikely executed.\n",
 		 cgraph_node_name (node));
diff --git a/gcc/ipa.c b/gcc/ipa.c
index a11b1c7..d92a332 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -761,10 +761,14 @@ cgraph_externally_visible_p (struct cgraph_node *node,
  This improves code quality and we know we will duplicate them at most twice
  (in the case that we are not using plugin and link with object file
   implementing same COMDAT)  */
-  if ((in_lto_p || whole_program)
-  && DECL_COMDAT (node->decl)
-  && comdat_can_be_unshared_p (node))
-return false;
+  if ((in_lto_p || whole_program || profile_arc_flag)
+ && DECL_COMDAT (node->decl)
+ && comdat_can_be_unsh

Re: [PATCH, ia64] [PR target/57491] internal compiler error: in ia64_split_tmode -O2, quadmath

2013-11-13 Thread Steve Ellcey

On Tue, 2013-11-12 at 11:38 +0300, Kirill Yukhin wrote:
> Hello,
> 
> On 07 Nov 15:42, Kirill Yukhin wrote:
> > Could you pls take a look?
> 
> Ping?
> 
> --
> Thanks, K

Looks OK to me, go ahead and check it in.

Steve Ellcey
sell...@mips.com

Re: [PATCH] Fix PR ipa/58862 (overflow in edge_badness computation)

2013-11-13 Thread Jan Hubicka

> The following fixes PR ipa/58862, which caused failures in lto
> profiledbootstrap and in several spec cpu2006 profile-use builds.
> 
> Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that
> it fixed the lto profiledbootstrap and cpu2006 failures. Ok for trunk?
> 
> Thanks,
> Teresa
> 
> 2013-11-13  Teresa Johnson  
> 
> PR ipa/58862
> * ipa-inline.c (edge_badness): Fix overflow.

OK,
thanks!
Honza
> 
> Index: ipa-inline.c
> ===
> --- ipa-inline.c(revision 204703)
> +++ ipa-inline.c(working copy)
> @@ -909,7 +909,7 @@ edge_badness (struct cgraph_edge *edge, bool dump)
>/* Capping edge->count to max_count. edge->count can be larger than
>  max_count if an inline adds new edges which increase max_count
>  after max_count is computed.  */
> -  int edge_count = edge->count > max_count ? max_count : edge->count;
> +  gcov_type edge_count = edge->count > max_count ? max_count : 
> edge->count;
> 
>sreal_init (&relbenefit_real, relbenefit, 0);
>sreal_init (&growth_real, growth, 0);
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-13 Thread Jan Hubicka

> Ok, that sounds good. Here is the new patch. Is this ok for trunk if
> testing (bootstrap regression and lto profiledbootstrap) succeeds?
> 
> Thanks,
> Teresa
> 
> 2013-11-13  Teresa Johnson  
> 
> * predict.c (drop_profile): Error is currently too strict.
> (handle_missing_profiles): Pass call_count to drop_profile.

OK, thanks

Honza
> 
> Index: predict.c
> ===
> --- predict.c   (revision 204704)
> +++ predict.c   (working copy)
> @@ -2766,12 +2766,17 @@ estimate_loops (void)
>  }
> 
>  /* Drop the profile for NODE to guessed, and update its frequency based on
> -   whether it is expected to be HOT.  */
> +   whether it is expected to be hot given the CALL_COUNT.  */
> 
>  static void
> -drop_profile (struct cgraph_node *node, bool hot)
> +drop_profile (struct cgraph_node *node, gcov_type call_count)
>  {
>struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +  /* In the case where this was called by another function with a
> + dropped profile, call_count will be 0. Since there are no
> + non-zero call counts to this function, we don't know for sure
> + whether it is hot, and therefore it will be marked normal below.  */
> +  bool hot = maybe_hot_count_p (NULL, call_count);
> 
>if (dump_file)
>  fprintf (dump_file,
> @@ -2781,8 +2786,13 @@ static void
>/* We only expect to miss profiles for functions that are reached
>   via non-zero call edges in cases where the function may have
>   been linked from another module or library (COMDATs and extern
> - templates). See the comments below for handle_missing_profiles.  */
> -  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
> + templates). See the comments below for handle_missing_profiles.
> + Also, only warn in cases where the missing counts exceed the
> + number of training runs. In certain cases with an execv followed
> + by a no-return call the profile for the no-return call is not
> + dumped and there can be a mismatch.  */
> +  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl)
> +  && call_count > profile_info->runs)
>  {
>if (flag_profile_correction)
>  {
> @@ -2792,8 +2802,8 @@ static void
>   cgraph_node_name (node), node->order);
>  }
>else
> -error ("Missing counts for called function %s/%i",
> -   cgraph_node_name (node), node->order);
> +warning (0, "Missing counts for called function %s/%i",
> + cgraph_node_name (node), node->order);
>  }
> 
>profile_status_for_function (fn)
> @@ -2839,9 +2849,7 @@ handle_missing_profiles (void)
>&& fn && fn->cfg
>&& (call_count * unlikely_count_fraction >= profile_info->runs))
>  {
> -  bool maybe_hot = maybe_hot_count_p (NULL, call_count);
> -
> -  drop_profile (node, maybe_hot);
> +  drop_profile (node, call_count);
>worklist.safe_push (node);
>  }
>  }
> @@ -2863,11 +2871,7 @@ handle_missing_profiles (void)
>if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>&& profile_status_for_function (fn) == PROFILE_READ)
>  {
> -  /* Since there are no non-0 call counts to this function,
> - we don't know for sure whether it is hot. Indicate to
> - the drop_profile routine that function should be marked
> - normal, rather than hot.  */
> -  drop_profile (node, false);
> +  drop_profile (node, 0);
>worklist.safe_push (callee);
>  }
>  }
> 
> >
> > Honza
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: XFAIL a couple of gnat.dg testcases on MIPS

2013-11-13 Thread H.J. Lu

On Wed, Nov 13, 2013 at 12:27 PM, H.J. Lu  wrote:
> On Sun, Oct 21, 2012 at 1:26 PM, Eric Botcazou  wrote:
>> They are reported as failing with the n32 ABI, but the failures are spurious.
>>
>> Tested on mips64el-linux-gnu, applied on the mainline and 4.7 branch.
>>
>>
>> 2012-10-21  Eric Botcazou  
>>
>> * gnat.dg/specs/atomic1.ads: XFAIL on MIPS.
>> * gnat.dg/specs/addr1.ads: Likewise.
>>
>
> They are also failing on x32, which similar to n32:
>
> FAIL: gnat.dg/specs/addr1.ads  (test for bogus messages, line 24)
> FAIL: gnat.dg/specs/atomic1.ads  (test for errors, line 9)
> FAIL: gnat.dg/specs/atomic1.ads  (test for errors, line 13)
>

Here is a patch.  OK to install?

Thanks.


-- 
H.J.
2013-11-13  H.J. Lu  

	* gnat.dg/specs/addr1.ads: XFAIL on x32.
	* gnat.dg/specs/atomic1.ads: Likewise.

diff --git a/gcc/testsuite/gnat.dg/specs/addr1.ads b/gcc/testsuite/gnat.dg/specs/addr1.ads
index bcb833b..b357115 100644
--- a/gcc/testsuite/gnat.dg/specs/addr1.ads
+++ b/gcc/testsuite/gnat.dg/specs/addr1.ads
@@ -21,7 +21,7 @@ package Addr1 is
   for Obj1'Address use A'Address; -- { dg-bogus "(alignment|erroneous)" }
 
   Obj2: Rec2;
-  for Obj2'Address use A'Address; -- { dg-bogus "(alignment|erroneous)" "" { xfail mips*-*-* } }
+  for Obj2'Address use A'Address; -- { dg-bogus "(alignment|erroneous)" "" { xfail mips*-*-* { { i?86-*-* x86_64-*-* } && x32 } } }
 
   Obj3: Rec1;
   for Obj3'Address use A(1)'Address; -- { dg-bogus "(alignment|erroneous)" }
diff --git a/gcc/testsuite/gnat.dg/specs/atomic1.ads b/gcc/testsuite/gnat.dg/specs/atomic1.ads
index 02e98b6..2994f2a 100644
--- a/gcc/testsuite/gnat.dg/specs/atomic1.ads
+++ b/gcc/testsuite/gnat.dg/specs/atomic1.ads
@@ -6,11 +6,11 @@ package Atomic1 is
   type UA is access all Arr;
 
   U : UA;
-  pragma Atomic (U);  -- { dg-error "atomic access" "" { xfail mips*-*-* } }
+  pragma Atomic (U);  -- { dg-error "atomic access" "" { xfail mips*-*-* { { i?86-*-* x86_64-*-* } && x32 } } }
 
   type R is record
 U : UA;
-pragma Atomic (U);  -- { dg-error "atomic access" "" { xfail mips*-*-* } }
+pragma Atomic (U);  -- { dg-error "atomic access" "" { xfail mips*-*-* { { i?86-*-* x86_64-*-* } && x32 } } }
   end record;
 
 end Atomic1;
-- 
1.8.3.1

Re: Add __auto_type C extension, use it in

2013-11-13 Thread Joseph S. Myers

On Wed, 13 Nov 2013, Basile Starynkevitch wrote:

> I have no idea, but does anyone knows if other free compilers (notably
> Clang/LLVM) are adding a similar feature?

I looked at the list of Clang language extensions before adding this one 
and didn't see mention of anything similar as a C language extension.  
(Clang uses a different set of built-in functions for C11 atomics, so it's 
possible the motivation from  doesn't apply there.)

> And I also like that feature, but it should be documented outside of the
> support of  since it is genuinely useful by itself (e.g. as
> an alternative to typeof).

The patch includes documentation in extend.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Bill Schmidt

Hi Yufeng,

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.


>diff --git a/gcc/gimple-ssa-strength-reduction.c 
>b/gcc/gimple-ssa-strength-reduction.c
>index 88afc91..d069246 100644
>--- a/gcc/gimple-ssa-strength-reduction.c
>+++ b/gcc/gimple-ssa-strength-reduction.c
>@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "params.h"
> #include "hash-table.h"
> #include "tree-ssa-address.h"
>+#include "tree-affine.h"
> 
> /* Information about a strength reduction candidate.  Each statement
>in the candidate table represents an expression of one of the
>@@ -420,6 +421,42 @@ cand_chain_hasher::equal (const value_type *chain1, const 
>compare_type *chain2)
> /* Hash table embodying a mapping from base exprs to chains of candidates.  */
> static hash_table  base_cand_map;
> 
>+/* Pointer map used by tree_to_aff_combination_expand.  */
>+static struct pointer_map_t *name_expansions;
>+/* Pointer map embodying a mapping from bases to alternative bases.  */
>+static struct pointer_map_t *alt_base_map;
>+
>+/* Given BASE, use the tree affine combiniation facilities to
>+   find the underlying tree expression for BASE, with any
>+   immediate offset excluded.  */
>+
>+static tree
>+get_alternative_base (tree base)
>+{
>+  tree *result = (tree *) pointer_map_contains (alt_base_map, base);
>+
>+  if (result == NULL)
>+{
>+  tree expr;
>+  aff_tree aff;
>+
>+  tree_to_aff_combination_expand (base, TREE_TYPE (base),
>+&aff, &name_expansions);
>+  aff.offset = tree_to_double_int (integer_zero_node);
>+  expr = aff_combination_to_tree (&aff);
>+
>+  result = (tree *) pointer_map_insert (alt_base_map, base);
>+  gcc_assert (!*result);
>+
>+  if (expr == base)
>+  *result = NULL;
>+  else
>+  *result = expr;
>+}
>+
>+  return *result;
>+}
>+
> /* Look in the candidate table for a CAND_PHI that defines BASE and
>return it if found; otherwise return NULL.  */
> 
>@@ -439,9 +476,10 @@ find_phi_def (tree base)
>   return c->cand_num;
> }
> 
>-/* Helper routine for find_basis_for_candidate.  May be called twice:
>+/* Helper routine for find_basis_for_candidate.  May be called three times:
>once for the candidate's base expr, and optionally again for the
>-   candidate's phi definition.  */
>+   candidate's phi definition, as well as for an alternative base expr
>+   in the case of CAND_REF.  */

Technically this will never be called three times.  It will be called
once for the candidate's base expression, and optionally either for the
candidate's phi definition or for a CAND_REF's alternative base
expression.  (There is no phi processing for a CAND_REF.)

> 
> static slsr_cand_t
> find_basis_for_base_expr (slsr_cand_t c, tree base_expr)
>@@ -518,6 +556,13 @@ find_basis_for_candidate (slsr_cand_t c)
>   }
> }
> 
>+  if (!basis && c->kind == CAND_REF)
>+{
>+  tree alt_base_expr = get_alternative_base (c->base_expr);
>+  if (alt_base_expr)
>+  basis = find_basis_for_base_expr (c, alt_base_expr);
>+}
>+
>   if (basis)
> {
>   c->sibling = basis->dependent;
>@@ -528,17 +573,22 @@ find_basis_for_candidate (slsr_cand_t c)
>   return 0;
> }
> 
>-/* Record a mapping from the base expression of C to C itself, indicating that
>-   C may potentially serve as a basis using that base expression.  */
>+/* Record a mapping from BASE to C, indicating that C may potentially serve
>+   as a basis using that base expression.  BASE may be the same as
>+   C->BASE_EXPR; alternatively BASE can be a different tree that share the
>+   underlining expression of C->BASE_EXPR.  */
> 
> static void
>-record_potential_basis (slsr_cand_t c)
>+record_potential_basis (slsr_cand_t c, tree base)
> {
>   cand_chain_t node;
>   cand_chain **slot;
> 
>+  if (base == NULL)
>+return;

Please do this check outside the common code; it's not necessary except
for CAND_REFs.  Replace with:

  gcc_assert (base);

>+
>   node = (cand_chain_t) obstack_alloc (&chain_obstack, sizeof (cand_chain));
>-  node->base_expr = c->base_expr;
>+  node->base_expr = base;
>   node->cand = c;
>   node->next = NULL;
>   slot = base_cand_map.find_slot (node, INSERT);
>@@ -554,10 +604,18 @@ record_potential_basis (slsr_cand_t c)
> }
> 
> /* Allocate storage for a new candidate and initialize its fields.
>-   Attempt to find a basis for the candidate.  */
>+   Attempt to find a basis for the candidate.
>+
>+   For CAND_REF, an alternative base may also be recorded and used
>+   to find a basis.  This helps cases where the expression hidden
>+   behind BASE (which is usually an SSA_NAME) has immediate offset,
>+   e.g.
>+
>+

Re: Add __auto_type C extension, use it in

2013-11-13 Thread Basile Starynkevitch

On Wed, 2013-11-13 at 11:39 +0100, Richard Biener wrote:
> On Wed, Nov 13, 2013 at 1:39 AM, Joseph S. Myers
>  wrote:
> >  contains what C11 describes as "generic functions".
> > Although DR#419 makes clear that users cannot #undef these macros (or
> > otherwise suppress use of a macro definition) and expect to find an
> > underlying function, they still need to behave like functions as
> > regards evaluating their arguments exactly once (see C11 7.1.4).
> >
> > I noted when adding  to mainline that some of the macro
> > definitions there failed that requirement in the case where the
> > pointer argument had variably modified type, because then typeof
> > evaluates its argument and so that argument would be evaluated twice.
> > Avoiding such double evaluation requires defining the type of a
> > temporary variable, and initializing it with the pointer argument,
> > with a single evaluation.  To achieve this, this patch adds a new GNU
> > C extension __auto_type, essentially a restricted version of C++11
> > auto, and uses it in .
> 
> I suppose you didn't use '__auto' because that's much more likely
> used elsewhere than '__auto_type'?

I have no idea, but does anyone knows if other free compilers (notably
Clang/LLVM) are adding a similar feature?

If they do, perhaps (if it is not too painful) we should use the same
keyword (i.e. __auto_type) and a similar semantics.

And I also like that feature, but it should be documented outside of the
support of  since it is genuinely useful by itself (e.g. as
an alternative to typeof).

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Re: [PATCH, testsuite] Add lp64 to target requirements of new IRA shrink wrapping preparation testcases

2013-11-13 Thread H.J. Lu

On Wed, Nov 13, 2013 at 7:27 AM, Martin Jambor  wrote:
> Hi,
>
> the testcases I have added for IRA shrink wrapping preparation code
> were not intended for -m32 on x86_64 and should not be tested with it,
> thus I'm adding lp64 to the target requirements.
>
> Let me also briefly mention that I would like to make the testcases
> also run on ppc64 and therefore I did not put them into
> gcc.target/i386.
>
> I have tested the changes by running
>
> make -k check RUNTESTFLAGS="dg.exp=ira-shrinkwrap-prep-?.c
> --target_board=unix/\{,32\}" and
> RUNTESTFLAGS="dg.exp=pr10474.c --target_board=unix/\{,32\}"
>
> and examining the gcc.sum files.  Since this was recommended by Richi
> on IRC, I will commit the change in a few minutes.
>
> Thanks,
>
> Martin
>
>
> 2013-11-13  Martin Jambor  
>
> * testsuite/gcc.dg/ira-shrinkwrap-prep-1.c: Add lp64 to target
> requirements.
> * testsuite/gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

Those 2 pass on x32.

> * testsuite/gcc.dg/pr10474.c: Likewise.
>

This fails on x32.  Should it pass on x32?


-- 
H.J.

Re: XFAIL a couple of gnat.dg testcases on MIPS

2013-11-13 Thread H.J. Lu

On Sun, Oct 21, 2012 at 1:26 PM, Eric Botcazou  wrote:
> They are reported as failing with the n32 ABI, but the failures are spurious.
>
> Tested on mips64el-linux-gnu, applied on the mainline and 4.7 branch.
>
>
> 2012-10-21  Eric Botcazou  
>
> * gnat.dg/specs/atomic1.ads: XFAIL on MIPS.
> * gnat.dg/specs/addr1.ads: Likewise.
>

They are also failing on x32, which similar to n32:

FAIL: gnat.dg/specs/addr1.ads  (test for bogus messages, line 24)
FAIL: gnat.dg/specs/atomic1.ads  (test for errors, line 9)
FAIL: gnat.dg/specs/atomic1.ads  (test for errors, line 13)


-- 
H.J.

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Bill Schmidt

Hi Yufeng,

On Wed, 2013-11-13 at 19:32 +, Yufeng Zhang wrote:
> Hi Bill,
> 
> On 11/13/13 18:04, Bill Schmidt wrote:
> > Hi Yufeng,
> >
> > On Tue, 2013-11-12 at 22:34 +, Yufeng Zhang wrote:
> >> Hi Bill,
> >>
> >> Many thanks for the review.
> >>
> >> I find your suggestion on using the next_interp field quite
> >> enlightening.  I prepared a patch which adds changes without modifying
> >> the framework.  With the patch, the slsr pass now tries to create a
> >> second candidate for each memory accessing gimple statement, and chain
> >> it to the first one via the next_interp field.
> >>
> >> There are two implications in this approach though:
> >>
> >> 1) For each memory accessing gimple statement, there can be two
> >> candidates, and these two candidates can be part of different dependency
> >> graphs respectively (based on different base expr).  Only one of the
> >> dependency graph should be traversed to do replace_refs.  Most of the
> >> changes in the patch is to handle this implication.
> >>
> >> I am aware that you suggest to follow the next-interp chain only when
> >> the searching fails for the first interpretation.  However, that doesn't
> >> work very well, as it can result in worse code-gen.  Taking a varied
> >> form of the added test slsr-41.c for example:
> >>
> >> i1:  a2 [i] [j] = 1;
> >> i2:  a2 [i] [j+1] = 2;
> >> i3:  a2 [i+20] [j] = i;
> >>
> >> With the 2nd interpretation created conditionally, the following two
> >> dependency chains will be established:
> >>
> >> i1 -->  i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
> >> i1 -->  i3  (base expr is a tree expression of (a2 + i * 200))
> >
> > So it seems to me that really what needs to happen is to unify those two
> > base_exprs.  We don't currently have logic in this pass to look up an
> > SSA name based on {base, index, stride, cand_type}, but that could be
> > done with a hash table.  For now to save processing time it would make
> > sense to only do that for MEM candidates, though the cand_type should be
> > included in the hash to allow this to be used for other candidate types
> > if necessary.  Of course, the SSA name definition must dominate the
> > candidate to be eligible as a basis, and that should be checked, but
> > this should generally be the case.
> 
> I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully 
> understood what you suggest.
> 
> For i1 --> i3, the base_expr is the tree expression (a2 + i * 200), 
> which is the result of a sequence of operations (conversion to affine, 
> immediate offset removal and conversion to tree), with another SSA_NAME 
> as the input.  In other words, there are two SSA_NAMEs involved in the 
> example:
> 
>_s1: (a2 + i * 200).
>_s2: (a2 + (i * 200 + 4000))
> 
> their strides and indexes are different.
> 
> I guess what you suggest is that given the tree expression (a2 + i * 
> 200), look up an SSA_NAME and return _s1.  If that is the case, the 
> challenge will be how to analyze the tree expression and get the 
> information on its {base, index, stride, cand_type}.  While it would be 
> too specific and narrative to check for a POINTER_PLUS_EXPR expression, 
> the existing framework (e.g. create_add_ssa_cand) seems to assume that 
> the analyzed tree represent a genuine gimple statement.
> 
> Moreover, there may not be an SSA_NAME exists, for example in the 
> following case:
> 
>i1:  a2 [i+1] [j] = 1;
>i2:  a2 [i+1] [j+1] = 2;
>i3:  a2 [i+20] [j] = i;
> 
> you wouldn't be able to find an SSA_NAME for (a2 + i * 200).

Ok.  It is probably too much to hope for to get a sufficiently general
approach to handle all of these cases cleanly.

Bleah.  The whole preferred_ref_cand business seems very ad hoc to me,
and to some extent is closing the barn door after the cows have escaped.
Perhaps we can't use the next-interpretation infrastructure to solve
this problem ideally, in which case I apologize for leading you down
this path.  The alternate patch at least keeps the candidate tree in a
straightforward state, and the new version is less intrusive than the
original.

Let me look that version over more carefully and I'll get back to you.
Thanks for your patience.

Bill

> 
> [snip]
> > A couple of quick comments on the next_interp patch:
> >
> >   * You don't need num_of_dependents ().  You should be able to add a
> > forward declaration for count_candidates () and use it.
> 
> Missed count_candidates (); thanks!
> 
> >   * Your new test case is missing a final newline, so your patch doesn't
> > apply cleanly.
> 
> I'll fix it.
> 
> > Please look into unifying the base expressions, as I believe you should
> > not need the preferred_ref_cand logic if you do that.
> 
> I would also like to live without preferred_ref_cand if feasible . :)
> 
> > I still prefer the approach of using next_interp for its generality and
> > expandibility.
> 
> Sure; this approach indeed fit the framework better.
> 
> 
> Regards,
> Yufeng
>

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Yufeng Zhang


Hi Bill,

On 11/13/13 18:04, Bill Schmidt wrote:

Hi Yufeng,

On Tue, 2013-11-12 at 22:34 +, Yufeng Zhang wrote:

Hi Bill,

Many thanks for the review.

I find your suggestion on using the next_interp field quite
enlightening.  I prepared a patch which adds changes without modifying
the framework.  With the patch, the slsr pass now tries to create a
second candidate for each memory accessing gimple statement, and chain
it to the first one via the next_interp field.

There are two implications in this approach though:

1) For each memory accessing gimple statement, there can be two
candidates, and these two candidates can be part of different dependency
graphs respectively (based on different base expr).  Only one of the
dependency graph should be traversed to do replace_refs.  Most of the
changes in the patch is to handle this implication.

I am aware that you suggest to follow the next-interp chain only when
the searching fails for the first interpretation.  However, that doesn't
work very well, as it can result in worse code-gen.  Taking a varied
form of the added test slsr-41.c for example:

i1:  a2 [i] [j] = 1;
i2:  a2 [i] [j+1] = 2;
i3:  a2 [i+20] [j] = i;

With the 2nd interpretation created conditionally, the following two
dependency chains will be established:

i1 -->  i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
i1 -->  i3  (base expr is a tree expression of (a2 + i * 200))


So it seems to me that really what needs to happen is to unify those two
base_exprs.  We don't currently have logic in this pass to look up an
SSA name based on {base, index, stride, cand_type}, but that could be
done with a hash table.  For now to save processing time it would make
sense to only do that for MEM candidates, though the cand_type should be
included in the hash to allow this to be used for other candidate types
if necessary.  Of course, the SSA name definition must dominate the
candidate to be eligible as a basis, and that should be checked, but
this should generally be the case.


I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully 
understood what you suggest.


For i1 --> i3, the base_expr is the tree expression (a2 + i * 200), 
which is the result of a sequence of operations (conversion to affine, 
immediate offset removal and conversion to tree), with another SSA_NAME 
as the input.  In other words, there are two SSA_NAMEs involved in the 
example:


  _s1: (a2 + i * 200).
  _s2: (a2 + (i * 200 + 4000))

their strides and indexes are different.

I guess what you suggest is that given the tree expression (a2 + i * 
200), look up an SSA_NAME and return _s1.  If that is the case, the 
challenge will be how to analyze the tree expression and get the 
information on its {base, index, stride, cand_type}.  While it would be 
too specific and narrative to check for a POINTER_PLUS_EXPR expression, 
the existing framework (e.g. create_add_ssa_cand) seems to assume that 
the analyzed tree represent a genuine gimple statement.


Moreover, there may not be an SSA_NAME exists, for example in the 
following case:


  i1:  a2 [i+1] [j] = 1;
  i2:  a2 [i+1] [j+1] = 2;
  i3:  a2 [i+20] [j] = i;

you wouldn't be able to find an SSA_NAME for (a2 + i * 200).

[snip]

A couple of quick comments on the next_interp patch:

  * You don't need num_of_dependents ().  You should be able to add a
forward declaration for count_candidates () and use it.


Missed count_candidates (); thanks!


  * Your new test case is missing a final newline, so your patch doesn't
apply cleanly.


I'll fix it.


Please look into unifying the base expressions, as I believe you should
not need the preferred_ref_cand logic if you do that.


I would also like to live without preferred_ref_cand if feasible . :)


I still prefer the approach of using next_interp for its generality and
expandibility.


Sure; this approach indeed fit the framework better.


Regards,
Yufeng

Re: [PATCH] Fix *anddi_2 (PR target/59101)

2013-11-13 Thread Uros Bizjak

On Wed, Nov 13, 2013 at 7:12 PM, Jakub Jelinek  wrote:
> On Wed, Nov 13, 2013 at 06:38:00PM +0100, Uros Bizjak wrote:
>> > --- gcc/config/i386/i386.md.jj  2013-11-12 11:31:31.0 +0100
>> > +++ gcc/config/i386/i386.md 2013-11-13 10:14:10.981609589 +0100
>> > @@ -7978,7 +7978,12 @@ (define_insn "*anddi_2"
>> >  (const_int 0)))
>> > (set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
>> > (and:DI (match_dup 1) (match_dup 2)))]
>> > -  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
>> > +  "TARGET_64BIT
>> > +   && ix86_match_ccmode (insn, CONST_INT_P (operands[2])
>> > +  && INTVAL (operands[2]) > 0
>> > +  && (INTVAL (operands[2])
>> > +  & (HOST_WIDE_INT_1 << 31)) != 0
>> > +  ? CCZmode : CCNOmode)
>>
>> "Z" constraint, a.k.a x86_64_zext_immediate_operand won't allow
>> constants with high bits set.
>
> But x86_64_immediate_operand will.

Yes. I haven't notice "e" in operands[2].

>> It looks to me, we can use mode_signbit_p here, and simplify the expression 
>> to
>
> No, because mode_signbit_p tests if the constant is equal to the
> sign bit, while we need to test whether the sign bit is set.
> For & 0x8000 it is the same thing, but for & 0x8001 it is not.
>
>> ix86_match_ccmode (insn, mode_signbit_p (SImode, operands[2])
>> ? CCZmode : CCNOmode)
>
> Even if mode_signbit_p did something different, that would pessimize:
> long long
> foo (long long a)
> {
>   long long b = a & 0xf000LL;
>   if (b > 0)
> bar ();
>   return b;

Eh, it should read val_signbit_known_set_p.

> So what about this version?  If x86_64_zext_immediate_operand
> returns true for non-CONST_INT (CONST_DOUBLE won't happen for DImode
> operand with 64-bit HWI), it might be SYMBOL_REF/LABEL_REF etc. and
> it is IMHO better to just assume it might have bit 31 set.

Yes, this is correct.
>
> 2013-11-13  Jakub Jelinek  
>
> PR target/59101
> * config/i386/i386.md (*anddi_2): Only allow CCZmode if
> operands[2] is x86_64_zext_immediate_operand that might
> have bit 31 set.
>
> * gcc.c-torture/execute/pr59101.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2013-11-13 18:32:48.586808734 +0100
> +++ gcc/config/i386/i386.md 2013-11-13 19:07:05.648122323 +0100
> @@ -7978,7 +7978,20 @@ (define_insn "*anddi_2"
>  (const_int 0)))
> (set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
> (and:DI (match_dup 1) (match_dup 2)))]
> -  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
> +  "TARGET_64BIT
> +   && ix86_match_ccmode (insn,
> +/* If we are going to emit andl instead of andq,
> +   and the operands[2] constant might have the
> +   SImode sign bit set, make sure the sign flag isn't
> +   tested, because the instruction will set the sign
> +   flag based on bit 31 rather than bit 63.
> +   If it isn't CONST_INT, conservatively assume
> +   it might have bit 31 set.  */
> +(x86_64_zext_immediate_operand (operands[2], 
> VOIDmode)
> + && (!CONST_INT_P (operands[2])
> + || (INTVAL (operands[2])
> + & (HOST_WIDE_INT_1 << 31)) != 0))
> +? CCZmode : CCNOmode)

I'd suggest adding following condition (together with your comment),
otherwise equivalent to the condition above, but IMO much more
descriptive:

Index: i386.md
===
--- i386.md (revision 204750)
+++ i386.md (working copy)
@@ -7978,7 +7978,12 @@
 (const_int 0)))
(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
(and:DI (match_dup 1) (match_dup 2)))]
-  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
+  "TARGET_64BIT
+   && ix86_match_ccmode
+  (insn, (satisfies_constraint_Z (operands[2])
+&& (!CONST_INT_P (operands[2])
+  || val_signbit_known_set_p (SImode, INTVAL
(operands[2]
+   ? CCZmode : CCNOmode)
&& ix86_binary_operator_ok (AND, DImode, operands)"
   "@
and{l}\t{%k2, %k0|%k0, %k2}

But I'll leave the decision to you.

Patch is OK everywhere, with or without the suggested change.

Thanks,
Uros.

[PATCH] fix PR sanitizer/58994 on darwin via correct linkage

2013-11-13 Thread Jack Howarth

Currently, the libasan shared library in FSF gcc is linked without the 
Foundation framework
on darwin...

% otool -L /sw/lib/gcc4.9/lib/libasan.1.dylib
/sw/lib/gcc4.9/lib/libasan.1.dylib:
/sw/lib/gcc4.9/lib/libasan.1.dylib (compatibility version 2.0.0, 
current version 2.0.0)
/sw/lib/gcc4.9/lib/libstdc++.6.dylib (compatibility version 7.0.0, 
current version 7.19.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 169.3.0)
/sw/lib/gcc4.9/lib/libgcc_s.1.dylib (compatibility version 1.0.0, 
current version 1.0.0)

compared to llvm svn...

% otool -L 
/sw/opt/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.asan_osx_dynamic.dylib
/sw/opt/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:

/sw/opt/llvm-3.4/lib/clang/3.4/lib/darwin/libclang_rt.asan_osx_dynamic.dylib 
(compatibility version 0.0.0, current version 0.0.0)
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation 
(compatibility version 300.0.0, current version 945.18.0)
/usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current 
version 56.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 169.3.0)

The attached patch adds the missing linkage of the Foundation framework to 
libasan_la_LDFLAGS
which produces the desired linkage of...

% otool -L /sw/lib/gcc4.9/lib/libasan.1.dylib
/sw/lib/gcc4.9/lib/libasan.1.dylib:
/sw/lib/gcc4.9/lib/libasan.1.dylib (compatibility version 2.0.0, 
current version 2.0.0)
/sw/lib/gcc4.9/lib/libstdc++.6.dylib (compatibility version 7.0.0, 
current version 7.19.0)
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation 
(compatibility version 300.0.0, current version 945.18.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 169.3.0)
/sw/lib/gcc4.9/lib/libgcc_s.1.dylib (compatibility version 1.0.0, 
current version 1.0.0)

Bootstrap and regression tested on x86_64-apple-darwin12 against Xcode 5.0.1 at 
r204750. 

Native configuration is x86_64-apple-darwin12.5.0

=== g++ tests ===


Running target unix/-m32

=== g++ Summary for unix/-m32 ===

# of expected passes481
# of unsupported tests  132

Running target unix/-m64

=== g++ Summary for unix/-m64 ===

# of expected passes481
# of unsupported tests  132

=== g++ Summary ===

# of expected passes962
# of unsupported tests  264
/sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/gcc/testsuite/g++/../../xg++  
version 4.9.0 20131113 (experimental) (GCC) 

=== gcc tests ===


Running target unix/-m32

=== gcc Summary for unix/-m32 ===

# of expected passes326
# of unsupported tests  101

Running target unix/-m64

=== gcc Summary for unix/-m64 ===

# of expected passes326
# of unsupported tests  101

=== gcc Summary ===

# of expected passes652
# of unsupported tests  202
/sw/src/fink.build/gcc49-4.9.0-1000/darwin_objdir/gcc/xgcc  version 4.9.0 
20131113 (experimental) (GCC) 

Compiler version: 4.9.0 20131113 (experimental) (GCC) 
Platform: x86_64-apple-darwin12.5.0
configure flags: --prefix=/sw --prefix=/sw/lib/gcc4.9 --mandir=/sw/share/man 
--infodir=/sw/lib/gcc4.9/info 
--enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw 
--with-libiconv-prefix=/sw --with-isl=/sw --with-cloog=/sw --with-mpc=/sw 
--with-system-zlib --enable-checking=yes --x-includes=/usr/X11R6/include 
--x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.9

   This change is sufficient to suppress the failures in sanitizer/PR58994 at 
-m64. Okay for
gcc trunk after libsanitizer is synced with the alternative fix from

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-2013/194748.html

so that we are following the norms for linkage on darwin and adhere closer to 
the llvm
libsanitizer build approach for libasan?
   Jack
ps Note that "-Wl,-framework,Foundation" is used to prevent libtool from 
munging the
the flags.

libsanitizer/

2013-11-13  Jack Howarth  

PR sanitizer/58994
* asan/Makefile.am (libasan_la_LDFLAGS): Link to Foundation framework 
on darwin.
* asan/Makefile.in: Regenerate.


Index: libsanitizer/asan/Makefile.am
===
--- libsanitizer/asan/Makefile.am   (revision 204750)
+++ libsanitizer/asan/Makefile.am   (working copy)
@@ -43,7 +43,11 @@ libasan_la_LIBADD = $(top_builddir)/sani
 endif
 libasan_la_LIBADD += $(LIBSTDCXX_RAW_CXX_LDFLAGS)
 
+if USING_MAC_INTERPOSE
+libasan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/libtool-version` 
-Wl,-framework,Foundation -lpthread -ldl
+else
 libasan_la_LDFLAGS = -version-info `grep -v '^\#' $(srcdir)/li

Re: Add __auto_type C extension, use it in

2013-11-13 Thread Mike Stump

On Nov 12, 2013, at 4:39 PM, Joseph S. Myers  wrote:
> with a single evaluation.  To achieve this, this patch adds a new GNU
> C extension __auto_type,

Nice, I like it.

Re: [PATCH] Fix *anddi_2 (PR target/59101)

2013-11-13 Thread Jakub Jelinek

On Wed, Nov 13, 2013 at 06:38:00PM +0100, Uros Bizjak wrote:
> > --- gcc/config/i386/i386.md.jj  2013-11-12 11:31:31.0 +0100
> > +++ gcc/config/i386/i386.md 2013-11-13 10:14:10.981609589 +0100
> > @@ -7978,7 +7978,12 @@ (define_insn "*anddi_2"
> >  (const_int 0)))
> > (set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
> > (and:DI (match_dup 1) (match_dup 2)))]
> > -  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
> > +  "TARGET_64BIT
> > +   && ix86_match_ccmode (insn, CONST_INT_P (operands[2])
> > +  && INTVAL (operands[2]) > 0
> > +  && (INTVAL (operands[2])
> > +  & (HOST_WIDE_INT_1 << 31)) != 0
> > +  ? CCZmode : CCNOmode)
> 
> "Z" constraint, a.k.a x86_64_zext_immediate_operand won't allow
> constants with high bits set.

But x86_64_immediate_operand will.

> It looks to me, we can use mode_signbit_p here, and simplify the expression to

No, because mode_signbit_p tests if the constant is equal to the
sign bit, while we need to test whether the sign bit is set.
For & 0x8000 it is the same thing, but for & 0x8001 it is not.

> ix86_match_ccmode (insn, mode_signbit_p (SImode, operands[2])
> ? CCZmode : CCNOmode)

Even if mode_signbit_p did something different, that would pessimize:
long long
foo (long long a)
{
  long long b = a & 0xf000LL;
  if (b > 0)
bar ();
  return b;
}
because then we can use *anddi_2 just fine:
#(insn 7 25 8 2 (parallel [
#(set (reg:CCNO 17 flags)
#(compare:CCNO (and:DI (reg/v:DI 3 bx [orig:87 b ] [87])
#(const_int -268435456 [0xf000]))
#(const_int 0 [0])))
#(set (reg/v:DI 3 bx [orig:87 b ] [87])
#(and:DI (reg/v:DI 3 bx [orig:87 b ] [87])
#(const_int -268435456 [0xf000])))
#]) pr59101-3.c:7 392 {*anddi_2}
# (nil))
andq$-268435456, %rbx   # 7 *anddi_2/2  [length = 7]
#(jump_insn 8 7 9 2 (set (pc)
#(if_then_else (le (reg:CCNO 17 flags)
#(const_int 0 [0]))
#(label_ref 12)
#(pc))) pr59101-3.c:7 607 {*jcc_1}
# (expr_list:REG_DEAD (reg:CCNO 17 flags)
#(int_list:REG_BR_PROB 3666 (nil)))
# -> 12)
jle .L2 # 8 *jcc_1  [length = 2]

as we aren't using the first alternative (andl), but another one.

> Please also add comment, this issue is quite non-obvious.

So what about this version?  If x86_64_zext_immediate_operand
returns true for non-CONST_INT (CONST_DOUBLE won't happen for DImode
operand with 64-bit HWI), it might be SYMBOL_REF/LABEL_REF etc. and
it is IMHO better to just assume it might have bit 31 set.

2013-11-13  Jakub Jelinek  

PR target/59101
* config/i386/i386.md (*anddi_2): Only allow CCZmode if
operands[2] is x86_64_zext_immediate_operand that might
have bit 31 set.

* gcc.c-torture/execute/pr59101.c: New test.

--- gcc/config/i386/i386.md.jj  2013-11-13 18:32:48.586808734 +0100
+++ gcc/config/i386/i386.md 2013-11-13 19:07:05.648122323 +0100
@@ -7978,7 +7978,20 @@ (define_insn "*anddi_2"
 (const_int 0)))
(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
(and:DI (match_dup 1) (match_dup 2)))]
-  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
+  "TARGET_64BIT
+   && ix86_match_ccmode (insn,
+/* If we are going to emit andl instead of andq,
+   and the operands[2] constant might have the
+   SImode sign bit set, make sure the sign flag isn't
+   tested, because the instruction will set the sign
+   flag based on bit 31 rather than bit 63.
+   If it isn't CONST_INT, conservatively assume
+   it might have bit 31 set.  */
+(x86_64_zext_immediate_operand (operands[2], VOIDmode)
+ && (!CONST_INT_P (operands[2])
+ || (INTVAL (operands[2])
+ & (HOST_WIDE_INT_1 << 31)) != 0))
+? CCZmode : CCNOmode)
&& ix86_binary_operator_ok (AND, DImode, operands)"
   "@
and{l}\t{%k2, %k0|%k0, %k2}
--- gcc/testsuite/gcc.c-torture/execute/pr59101.c.jj2013-11-13 
18:49:04.922736108 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr59101.c   2013-11-13 
18:49:04.922736108 +0100
@@ -0,0 +1,15 @@
+/* PR target/59101 */
+
+__attribute__((noinline, noclone)) int
+foo (int a)
+{
+  return (~a & 4102790424LL) > 0 | 6;
+}
+
+int
+main ()
+{
+  if (foo (0) != 7)
+__builtin_abort ();
+  return 0;
+}


Jakub

Re: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-11-13 Thread Bill Schmidt

Hi Yufeng,

On Tue, 2013-11-12 at 22:34 +, Yufeng Zhang wrote:
> Hi Bill,
> 
> Many thanks for the review.
> 
> I find your suggestion on using the next_interp field quite 
> enlightening.  I prepared a patch which adds changes without modifying 
> the framework.  With the patch, the slsr pass now tries to create a 
> second candidate for each memory accessing gimple statement, and chain 
> it to the first one via the next_interp field.
> 
> There are two implications in this approach though:
> 
> 1) For each memory accessing gimple statement, there can be two 
> candidates, and these two candidates can be part of different dependency 
> graphs respectively (based on different base expr).  Only one of the 
> dependency graph should be traversed to do replace_refs.  Most of the 
> changes in the patch is to handle this implication.
> 
> I am aware that you suggest to follow the next-interp chain only when 
> the searching fails for the first interpretation.  However, that doesn't 
> work very well, as it can result in worse code-gen.  Taking a varied 
> form of the added test slsr-41.c for example:
> 
> i1:  a2 [i] [j] = 1;
> i2:  a2 [i] [j+1] = 2;
> i3:  a2 [i+20] [j] = i;
> 
> With the 2nd interpretation created conditionally, the following two 
> dependency chains will be established:
> 
>i1 --> i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
>i1 --> i3  (base expr is a tree expression of (a2 + i * 200))

So it seems to me that really what needs to happen is to unify those two
base_exprs.  We don't currently have logic in this pass to look up an
SSA name based on {base, index, stride, cand_type}, but that could be
done with a hash table.  For now to save processing time it would make
sense to only do that for MEM candidates, though the cand_type should be
included in the hash to allow this to be used for other candidate types
if necessary.  Of course, the SSA name definition must dominate the
candidate to be eligible as a basis, and that should be checked, but
this should generally be the case.

The goal should be for all of these references to have the same base
expr so that i3 can choose either i1 or i2 as a basis.  (For now the
logic in the pass chooses the most dominating basis, but eventually I
would like to add heuristics to make better choices.)

If all three of these use the same base expr, that should eliminate your
concerns, right?

> 
> the result is that three gimples will be lowered to MEM_REFs differently 
> (as the candidates have different base_exprs); the later passes can get 
> confused, generating worse code.
> 
> What this patch does is to create two interpretations where possible (if 
> different base exprs exist); the following dependency chains will be 
> produced:
> 
>i1 --> i2  (base expr is an SSA_NAME defined as (a2 + i * 200))
>i1 --> i2 --> i3  (base expr is a tree expression of (a2 + i * 200))
> 
> In analyze_candidates_and_replace, a new function preferred_ref_cand is 
> called to analyze a root CAND_REF and replace_refs is only called if 
> this root CAND_REF is found to be part of a larger dependency graph (or 
> longer dependency chain in simple cases).  In the example above, the 2nd 
> dependency chain will be picked up to do replace_refs.
> 
> 2) The 2nd implication is that the alternative candidate may expose the 
> underlying tree expression of a base expr, which can cause more 
> aggressive extraction and folding of immediate offsets.  Taking the new 
> test slsr-41 for example, the code-gen difference on x86_64 with the 
> original patch and this patch is (-O2):
> 
> -   leal5(%rsi), %edx
> +   leal5(%rsi), %eax
>  movslq  %esi, %rsi
> -   salq$2, %rsi
> -   movslq  %edx, %rax
> -   leaq(%rax,%rax,4), %rax
> -   leaq(%rax,%rax,4), %rcx
> -   salq$3, %rcx
> -   leaq(%rdi,%rcx), %rax
> -   addq%rsi, %rax
> -   movl$2, -1980(%rax)
> -   movl%edx, 20(%rax)
> -   movl%edx, 4024(%rax)
> -   leaq-600(%rdi,%rcx), %rax
> -   addl$1, 16(%rsi,%rax)
> +   imulq   $204, %rsi, %rsi
> +   addq%rsi, %rdi
> +   movl$2, -980(%rdi)
> +   movl%eax, 1020(%rdi)
> +   movl%eax, 5024(%rdi)
> +   addl$1, 416(%rdi)
>  ret
> 
> As you can see, the larger offsets are produced as the affine expander 
> is able to look deep into the tree expression.  This raises concern that 
> larger immediates can cause worse code-gen when the immediates are out 
> of the supported range on a target.  On x86_64 it is not obvious (as it 
> allows larger ranges), on arm cortex-a15 the load with the immediate 
> 5024 will be done by
> 
>  movwr2, #5024
>  str r3, [r0, r2]
> 
> which is not optimal.  Things can get worse when there are multiple 
> loads/stores with large immediates as each one may require an extra mov 
> immediate instruction.  One thing can potentially be done is to reduce 
> th

patch to implement thread coloring in IRA

2013-11-13 Thread Vladimir Makarov

  The following patch improves coloring.  The order of pushing allocnos 
on the coloring stack from a bunch of colorable allocnos was always 
important for generated code performance.  LRA has a mechanism of 
allocating pseudos by threads.  Thread in LRA is a set of 
non-conflicting pseudos connected by moves (or by future reload insns).  
Allocating pseudos by threads in LRA permits to improve code by 
increasing chance of removing the move insns.


  So the same mechanism can be used for IRA.  The patch implements it.  
The difference is only that LRA forms thread statically before 
allocation sub-pass.  That is because the basic allocation are already 
done in IRA.  The statically thread forming works well for IRA too.  But 
even better results can be got by dynamically forming threads.  It means 
that we are forming threads during allocation and includes only 
colorable allocnos.


  The results of using threads in IRA is pretty good.  The average code 
size (text segment) of SPEC2000 is improved (by >0.1% for x86 SPECFP2000 
and > 0.3% for x86-64 SPECFP2000). The biggest code performance 
improvement (> 1%) is obtained on x86-64 SPECFP2000.  Performance tools 
report that additional code takes only about 0.05% additionally executed 
insns.


  The patch also removes 2 insn in code for PR59036

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59036

  Therefore I put the PR in changelog.

  The patch was successfully bootstrapped an tested on x86/x86-64.

  Committed as rev. 204752.

2013-11-13  Vladimir Makarov  

PR rtl-optimization/59036
* ira-color.c (struct allocno_color_data): Add new members
first_thread_allocno, next_thread_allocno, thread_freq.
(sorted_copies): New static var.
(allocnos_conflict_by_live_ranges_p, copy_freq_compare_func): Move
up.
(allocno_thread_conflict_p, merge_threads)
(form_threads_from_copies, form_threads_from_bucket)
(form_threads_from_colorable_allocno, init_allocno_threads): New
functions.
(bucket_allocno_compare_func): Add comparison by thread frequency
and threads.
(add_allocno_to_ordered_bucket): Rename to
add_allocno_to_ordered_colorable_bucket.  Remove parameter.
(push_only_colorable): Call form_threads_from_bucket.
(color_pass): Call init_allocno_threads.  Use
consideration_allocno_bitmap instead of coloring_allocno_bitmap
for nuillify allocno color data.
(ira_initiate_assign, ira_finish_assign): Allocate/free
sorted_copies.
(coalesce_allocnos): Use static sorted copies.


Index: ira-color.c
===
--- ira-color.c (revision 204594)
+++ ira-color.c (working copy)
@@ -142,6 +142,15 @@ struct allocno_color_data
  used to restore original hard reg costs of allocnos connected to
  this allocno by copies.  */
   struct update_cost_record *update_cost_records;
+  /* Threads.  We collect allocnos connected by copies into threads
+ and try to assign hard regs to allocnos by threads.  */
+  /* Allocno representing all thread.  */
+  ira_allocno_t first_thread_allocno;
+  /* Allocnos in thread forms a cycle list through the following
+ member.  */
+  ira_allocno_t next_thread_allocno;
+  /* All thread frequency.  Defined only for first thread allocno.  */
+  int thread_freq;
 };
 
 /* See above.  */
@@ -1863,6 +1872,250 @@ assign_hard_reg (ira_allocno_t a, bool r
 
 
 
+/* An array used to sort copies.  */
+static ira_copy_t *sorted_copies;
+
+/* Return TRUE if live ranges of allocnos A1 and A2 intersect.  It is
+   used to find a conflict for new allocnos or allocnos with the
+   different allocno classes.  */
+static bool
+allocnos_conflict_by_live_ranges_p (ira_allocno_t a1, ira_allocno_t a2)
+{
+  rtx reg1, reg2;
+  int i, j;
+  int n1 = ALLOCNO_NUM_OBJECTS (a1);
+  int n2 = ALLOCNO_NUM_OBJECTS (a2);
+
+  if (a1 == a2)
+return false;
+  reg1 = regno_reg_rtx[ALLOCNO_REGNO (a1)];
+  reg2 = regno_reg_rtx[ALLOCNO_REGNO (a2)];
+  if (reg1 != NULL && reg2 != NULL
+  && ORIGINAL_REGNO (reg1) == ORIGINAL_REGNO (reg2))
+return false;
+
+  for (i = 0; i < n1; i++)
+{
+  ira_object_t c1 = ALLOCNO_OBJECT (a1, i);
+
+  for (j = 0; j < n2; j++)
+   {
+ ira_object_t c2 = ALLOCNO_OBJECT (a2, j);
+
+ if (ira_live_ranges_intersect_p (OBJECT_LIVE_RANGES (c1),
+  OBJECT_LIVE_RANGES (c2)))
+   return true;
+   }
+}
+  return false;
+}
+
+/* The function is used to sort copies according to their execution
+   frequencies.  */
+static int
+copy_freq_compare_func (const void *v1p, const void *v2p)
+{
+  ira_copy_t cp1 = *(const ira_copy_t *) v1p, cp2 = *(const ira_copy_t *) v2p;
+  int pri1, pri2;
+
+  pri1 = cp1->freq;
+  pri2 = cp2->freq;
+  if (pri2 - pri1)
+return pri2 - pri1;
+
+  /* If freqencies are equal, sort by copies, so that the results of
+

Re: Revert libsanitizer patches or fix 59009

2013-11-13 Thread Jakub Jelinek

On Wed, Nov 13, 2013 at 11:25:06AM -0600, Peter Bergner wrote:
> On Wed, 2013-11-13 at 00:49 +0100, Jakub Jelinek wrote:
> > 2013-11-12  Jakub Jelinek  
> > 
> > * sanitizer_common/sanitizer_platform_limits_linux.cc: Temporarily
> > ifdef out almost the whole source.
> > * sanitizer_common/sanitizer_common_syscalls.inc: Likewise.
> 
> That helps, but as Pat reported in the bugzilla, it still is failing.
> With the following patch, we can now bootstrap on powerpc64-linux.
> 
> Is this ok for trunk?
> 
> Does this help the other architectures that are failing for the same
> build error?

Ok, thanks.

>   PR sanitizer/59009
>   * sanitizer_common/sanitizer_platform_limits_posix.cc: Temporarily
>   ifdef out more source.
> 
> Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
> ===
> --- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc  
> (revision 204747)
> +++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc  
> (working copy)
> @@ -855,6 +855,7 @@ CHECK_STRUCT_SIZE_AND_OFFSET(sigaction, 
>  CHECK_STRUCT_SIZE_AND_OFFSET(sigaction, sa_restorer);
>  #endif
>  
> +#ifdef SYSCALL_INTERCEPTION
>  #if SANITIZER_LINUX
>  CHECK_TYPE_SIZE(__sysctl_args);
>  CHECK_SIZE_AND_OFFSET(__sysctl_args, name);
> @@ -872,6 +873,7 @@ CHECK_TYPE_SIZE(__kernel_off_t);
>  CHECK_TYPE_SIZE(__kernel_loff_t);
>  CHECK_TYPE_SIZE(__kernel_fd_set);
>  #endif
> +#endif
>  
>  #if !SANITIZER_ANDROID
>  CHECK_TYPE_SIZE(wordexp_t);
> 
> 

Jakub

Re: [PATCH] Generate a label for the split cold function while using -freorder-blocks-and-partition

2013-11-13 Thread Sriraman Tallam

On Tue, Nov 12, 2013 at 10:10 AM, Cary Coutant  wrote:
 Is there a format for compiler-defined labels that would not be able
 to clash with other user-generated labels?
>>>
>>> My understanding is that the "." in the generated name ensures that it
>>> will not clash with user-generated labels which cannot contain ".". So
>>> this should not be an issue.
>>
>> Is that true for all languages? I don't think we can make that
>> assumption. Also, there may be clashes with other compiler generated
>> symbols. I know of at least one Fortran compiler that composes names
>> for module symbols as "modulename.symbolname", and if symbolname is
>> "cold" then you can have a clash.
>
> GCC already uses "." to generate names for cloned functions, and picks
> a different separator for targets where "." is legal (see
> clone_function_name in cgraphclones.c). It would probably be a good
> idea to use clone_function_name to generate the new name (in
> particular, because the C++ demangler already knows how to handle that
> style when it's applied to mangled names).

I have modified my original patch to use clone_function_name to
generate the cold part label and attached it.

Thanks
Sri


>
> -cary
Patch to generate labels for cold function parts that are split when
using the option -freorder-blocks-and-partition.  The cold label name
is generated by suffixing "cold" to the assembler name of the hot
function in a demangler friendly way.

This is useful when getting back traces from gdb when the cold function
part does get executed. 

* final.c (final_scan_insn): Generate cold label name by suffixing
"cold" to function's name.
* gcc.dg/tree-prof/cold_partition_label.c: New test.

Index: final.c
===
--- final.c (revision 204712)
+++ final.c (working copy)
@@ -2167,6 +2167,15 @@ final_scan_insn (rtx insn, FILE *file, int optimiz
  targetm.asm_out.function_switched_text_sections (asm_out_file,
   
current_function_decl,
   in_cold_section_p);
+ /* Emit a label for the split cold section.  Form label name by
+suffixing "cold" to the original function's name.  */
+ if (in_cold_section_p)
+   {
+ tree cold_function_name
+   = clone_function_name (current_function_decl, "cold");
+ ASM_OUTPUT_LABEL (asm_out_file,
+   IDENTIFIER_POINTER (cold_function_name));
+   }
  break;
 
case NOTE_INSN_BASIC_BLOCK:
Index: testsuite/gcc.dg/tree-prof/cold_partition_label.c
===
--- testsuite/gcc.dg/tree-prof/cold_partition_label.c   (revision 0)
+++ testsuite/gcc.dg/tree-prof/cold_partition_label.c   (revision 0)
@@ -0,0 +1,39 @@
+/* Test case to check if function foo gets split and the cold function
+   gets a label.  */
+/* { dg-require-effective-target freorder } */
+/* { dg-options "-O2 -freorder-blocks-and-partition --save-temps" } */
+
+#define SIZE 1
+
+const char *sarr[SIZE];
+const char *buf_hot;
+const char *buf_cold;
+
+__attribute__((noinline))
+void 
+foo (int path)
+{
+  int i;
+  if (path)
+{
+  for (i = 0; i < SIZE; i++)
+   sarr[i] = buf_hot;
+}
+  else
+{
+  for (i = 0; i < SIZE; i++)
+   sarr[i] = buf_cold;
+}
+}
+
+int
+main (int argc, char *argv[])
+{
+  buf_hot =  "hello";
+  buf_cold = "world";
+  foo (argc);
+  return 0;
+}
+
+/* { dg-final-use { scan-assembler "foo.cold" } } */
+/* { dg-final-use { cleanup-saved-temps } } */

Re: [PATCH 2/3] libstdc++-v3: ::tmpnam depends on uClibc SUSV4_LEGACY

2013-11-13 Thread Jonathan Wakely

On 13 November 2013 09:22, Bernhard Reutner-Fischer wrote:
> On 11 November 2013 12:30, Jonathan Wakely  wrote:
>> How does __UCLIBC_SUSV4_LEGACY__ get defined?  We'd have a problem if
>> users defined that at configure time but not later when using the
>> library.
> That would be defined by uClibc's configury, but the latest
> "commit-6f2faa2" i attached does not mention this anymore, but does
> the check in a libc-agnostic manner?

Yes, but I was concerned about whether the value of that macro can
change between configuring libstdc++ and users compiling code using
libstdc++.  If it could change (e.g. by users compiling with
-D_POSIX_C_SOURCE=200112L or some other feature test macro) then the
value of _GLIBCXX_USE_TMPNAM (which doesn't change) would be
unreliable and we could end up with a "using ::tmpnam" in the library
that causes errors when users compile.

If it's set when configuring uClibc then it is a constant for a given
libstdc++ installation, so the value of _GLIBCXX_USE_TMPNAM is
reliable.  In that case your change is OK to commit (with or without
the "XYZ" change) - thanks.

Re: [patch] [arm] ARM Cortex-M3/M4 tuning

2013-11-13 Thread Janis Johnson

On 11/12/2013 10:20 PM, Joey Ye wrote:
> Janis, can you please take a look at test case changes.
> 
> Thanks,
> Joey

They look fine.

Janis

>> -Original Message-
>> From: Ramana Radhakrishnan
>> Sent: Friday, November 08, 2013 17:11
>> To: Joey Ye
>> Cc: gcc-patches@gcc.gnu.org; jani...@codesourcery.com
>> Subject: Re: [patch] [arm] ARM Cortex-M3/M4 tuning
>>
 ChangeLog:

  2013-11-01  Julian Brown  
  Joey Ye  

  * config/arm/arm.c (arm_cortex_m_branch_cost): New.
  (arm_v7m_tune): New.
  (arm_*_tune): Add comments for Sched adj cost.
>>
>> List all names here please rather than a regexp.
>>
  * config/arm/arm-cores.def (cortex-m4, cortex-m3):
  Use arm_v7m_tune tuning.

>>
>> The ARM parts are ok but I'd like a testsuite maintainer to look at the
>> testsuite changes before committing.
>>
>> regards
>> Ramana
>>
 testsuite:
  2013-11-01  Joey Ye  

  * gcc.dg/tree-ssa/forwprop-28.c: Disable for cortex_m.
  * gcc.dg/tree-ssa/vrp47.c: Likewise.
  * gcc.dg/tree-ssa/vrp87.c: Likewise.
  * gcc.dg/tree-ssa/ssa-dom-thread-4.c: Ignore for cortex_m.
  * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Likewise.

[patch] Fix ICEs when DEBUG_MANGLE is enabled

2013-11-13 Thread Cary Coutant

This patch fixes a few ICEs I encountered when enabling DEBUG_MANGLE.
I've also changed dump_substitution_candidates to output the substitution
index in base 36, to match the actual mangled name.

OK for trunk?

-cary


2013-11-13  Cary Coutant  

gcc/
* cp/mangle.c (to_base36): New function.
(dump_substitution_candidates): Add checks for NULL.
Print substitution index in base 36.


commit 5ece725d55f104dd6499ac261380a9c9c4002613
Author: Cary Coutant 
Date:   Wed Nov 13 09:28:58 2013 -0800

Fix ICEs when DEBUG_MANGLE is enabled.

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 202fafc..56c1844 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -301,6 +301,19 @@ decl_is_template_id (const tree decl, tree* const 
template_info)
   return 0;
 }
 
+/* Convert VAL to base 36.  */
+
+static const char *
+to_base36 (int val)
+{
+  static char buffer[sizeof (HOST_WIDE_INT) * 8 + 1];
+  unsigned count;
+
+  count = hwint_to_ascii (number, 36, buffer + sizeof (buffer) - 1, 1);
+  buffer[sizeof (buffer) - 1] = '\0';
+  return buffer + sizeof (buffer) - 1 - count;
+}
+
 /* Produce debugging output of current substitution candidates.  */
 
 static void
@@ -317,12 +330,27 @@ dump_substitution_candidates (void)
   if (i > 0)
fprintf (stderr, "");
   if (DECL_P (el))
-   name = IDENTIFIER_POINTER (DECL_NAME (el));
+{
+  if (DECL_NAME (el))
+name = IDENTIFIER_POINTER (DECL_NAME (el));
+}
   else if (TREE_CODE (el) == TREE_LIST)
-   name = IDENTIFIER_POINTER (DECL_NAME (TREE_VALUE (el)));
+{
+  tree val = TREE_VALUE (el);
+  if (TREE_CODE (val) == IDENTIFIER_NODE)
+name = IDENTIFIER_POINTER (val);
+  else if (DECL_P (val))
+name = IDENTIFIER_POINTER (DECL_NAME (val));
+}
   else if (TYPE_NAME (el))
-   name = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (el)));
-  fprintf (stderr, " S%d_ = ", i - 1);
+{
+  if (DECL_NAME (TYPE_NAME (el)))
+name = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (el)));
+}
+  if (i == 0)
+fprintf (stderr, " S_ = ");
+  else
+fprintf (stderr, " S%s_ = ", to_base36 (i - 1));
   if (TYPE_P (el) &&
  (CP_TYPE_RESTRICT_P (el)
   || CP_TYPE_VOLATILE_P (el)

Re: [PATCH] Fix *anddi_2 (PR target/59101)

2013-11-13 Thread Uros Bizjak

On Wed, Nov 13, 2013 at 6:03 PM, Jakub Jelinek  wrote:

> If *anddi_2 is used with 64-bit integer constant that matches
> "Z" constraint, but has bit 31 set, we emit it as andl instead,
> but that is wrong unless just ZF is tested in the flags, because
> SF might be different (if operands[1] has bit 31 set, then SF
> from andl will be set, but for the 64-bit operation it would be clear).
> If constant doesn't have bit 31 set (nor any other higher bits), then
> SF will be 0 both when using andl or when using andq, so it is fine
> to keep using andl in that case.
>
> At -O2 VRP will optimize
>   _3 = ~a_2(D);
>   _4 = (long long int) _3;
>   _5 = _4 & 4102790424;
>   if (_5 > 0)
> into:
>   _3 = ~a_2(D);
>   _4 = (long long int) _3;
>   _5 = _4 & 4102790424;
>   if (_5 != 0)
> but at -O1, -Og (or -O2 -fno-tree-vrp) it will not and then we can end
> up with wrong-code.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
> for trunk/4.8?
>
> 2013-11-13  Jakub Jelinek  
>
> PR target/59101
> * config/i386/i386.md (*anddi_2): Only allow CCZmode if
> operands[2] is positive 64-bit constant that is negative in
> SImode.
>
> * gcc.c-torture/execute/pr59101.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2013-11-12 11:31:31.0 +0100
> +++ gcc/config/i386/i386.md 2013-11-13 10:14:10.981609589 +0100
> @@ -7978,7 +7978,12 @@ (define_insn "*anddi_2"
>  (const_int 0)))
> (set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
> (and:DI (match_dup 1) (match_dup 2)))]
> -  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
> +  "TARGET_64BIT
> +   && ix86_match_ccmode (insn, CONST_INT_P (operands[2])
> +  && INTVAL (operands[2]) > 0
> +  && (INTVAL (operands[2])
> +  & (HOST_WIDE_INT_1 << 31)) != 0
> +  ? CCZmode : CCNOmode)

"Z" constraint, a.k.a x86_64_zext_immediate_operand won't allow
constants with high bits set.

It looks to me, we can use mode_signbit_p here, and simplify the expression to

ix86_match_ccmode (insn, mode_signbit_p (SImode, operands[2])
? CCZmode : CCNOmode)

Please also add comment, this issue is quite non-obvious.

Uros.

Re: Revert libsanitizer patches or fix 59009

2013-11-13 Thread Peter Bergner

On Wed, 2013-11-13 at 00:49 +0100, Jakub Jelinek wrote:
> 2013-11-12  Jakub Jelinek  
> 
>   * sanitizer_common/sanitizer_platform_limits_linux.cc: Temporarily
>   ifdef out almost the whole source.
>   * sanitizer_common/sanitizer_common_syscalls.inc: Likewise.

That helps, but as Pat reported in the bugzilla, it still is failing.
With the following patch, we can now bootstrap on powerpc64-linux.

Is this ok for trunk?

Does this help the other architectures that are failing for the same
build error?

Peter

PR sanitizer/59009
* sanitizer_common/sanitizer_platform_limits_posix.cc: Temporarily
ifdef out more source.

Index: libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
===
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
(revision 204747)
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
(working copy)
@@ -855,6 +855,7 @@ CHECK_STRUCT_SIZE_AND_OFFSET(sigaction, 
 CHECK_STRUCT_SIZE_AND_OFFSET(sigaction, sa_restorer);
 #endif
 
+#ifdef SYSCALL_INTERCEPTION
 #if SANITIZER_LINUX
 CHECK_TYPE_SIZE(__sysctl_args);
 CHECK_SIZE_AND_OFFSET(__sysctl_args, name);
@@ -872,6 +873,7 @@ CHECK_TYPE_SIZE(__kernel_off_t);
 CHECK_TYPE_SIZE(__kernel_loff_t);
 CHECK_TYPE_SIZE(__kernel_fd_set);
 #endif
+#endif
 
 #if !SANITIZER_ANDROID
 CHECK_TYPE_SIZE(wordexp_t);

Re: Revert libsanitizer patches or fix 59009

2013-11-13 Thread Kostya Serebryany

On Wed, Nov 13, 2013 at 9:21 AM, Michael Meissner
 wrote:
> On Wed, Nov 13, 2013 at 10:45:54AM +0400, Kostya Serebryany wrote:
>> Many thanks, Jakub.
>>
>> I don't want to appear in this situation again.
>> Would you suggest a place to create a wiki page which would list all
>> required steps to test libsanitizer?
>>
>> libsanitizer is (unfortunately) a very system-dependent beast and our
>> upstream commits will break other platforms regularly;
>> that's unavoidable unless each platform's community helps us test the
>> code upstream. (I.e. I encourage PowerPC folks to help us in the LLVM
>> land)
>
> Maybe it should be removed completely then, if you are going to break things 
> on
> a regular basis.  Or at least made a configuration option that is OFF by
> default.  Or kept in a branch.

Ok, unless someone commits to support libsanitizer for PowerPC in gcc
repository
I am going to disable it before the next merge.

--kcc


>
>> For gcc merges, all we can promise to do is to run any amount of
>> testing (described on a to-be-created wiki) on an x86_64 linux
>> machine.
>> For other kinds of testing we'll rely on the platform owners.
>> If we break someone's platform, we expect the owners to send us
>> patches which we can commit upstream. That's what happened with x32
>> last week.
>
> NO, NO, NO, NO.  We have the GCC compile farm for a reason.  Use it to test
> system dependent changes before committing them to the trunk.
>
> I have too much on my plate that I'm scrambling to get my changes done before
> stage1 closes.  I don't have time or engery to fix code that other people
> broke.
>
> I'm sorry, but I'm really getting annoyed by the length of time it has taken 
> to
> get this resolved.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
> email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
>

Re: [RFA][PATCH] Isolate erroneous paths optimization

2013-11-13 Thread Ulrich Weigand

Jeff Law wrote:

>   * Makefile.in (OBJS): Add gimple-ssa-isolate-paths.o
>   * common.opt (-fisolate-erroneous-paths): Add option and
>   documentation.
>   * gimple-ssa-isolate-paths.c: New file.

This causes compiler segfaults for me when building Python 2.7.5.
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59119
for a reduced test case ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: Revert libsanitizer patches or fix 59009

2013-11-13 Thread Michael Meissner

On Wed, Nov 13, 2013 at 10:45:54AM +0400, Kostya Serebryany wrote:
> Many thanks, Jakub.
> 
> I don't want to appear in this situation again.
> Would you suggest a place to create a wiki page which would list all
> required steps to test libsanitizer?
> 
> libsanitizer is (unfortunately) a very system-dependent beast and our
> upstream commits will break other platforms regularly;
> that's unavoidable unless each platform's community helps us test the
> code upstream. (I.e. I encourage PowerPC folks to help us in the LLVM
> land)

Maybe it should be removed completely then, if you are going to break things on
a regular basis.  Or at least made a configuration option that is OFF by
default.  Or kept in a branch.

> For gcc merges, all we can promise to do is to run any amount of
> testing (described on a to-be-created wiki) on an x86_64 linux
> machine.
> For other kinds of testing we'll rely on the platform owners.
> If we break someone's platform, we expect the owners to send us
> patches which we can commit upstream. That's what happened with x32
> last week.

NO, NO, NO, NO.  We have the GCC compile farm for a reason.  Use it to test
system dependent changes before committing them to the trunk.

I have too much on my plate that I'm scrambling to get my changes done before
stage1 closes.  I don't have time or engery to fix code that other people
broke.

I'm sorry, but I'm really getting annoyed by the length of time it has taken to
get this resolved.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH] Merge cgraph_get_create_node and cgraph_get_create_real_symbol_node

2013-11-13 Thread Uros Bizjak

Hello!

> as discussed with Honza on many occasions, all users of
> cgraph_get_create_node really want cgraph_get_create_real_symbol_node,
> i.e. they are not interested in inline nodes and should get a
> standalone node instead.  So this patch changes cgraph_get_create_node
> to do what cgraph_get_create_real_symbol_node currently does and
> removes the latter altogether.
>
> I had to change a call to cgraph_get_create_node to cgraph_get_node in
> lto-streamer-in.c so that it does not make the node it operates on a
> clone of another one because this made ipa_pta_execute abort on assert
> after calling cgraph_get_body (visionary points to Richi for putting
> the assert there).
>
> The patch successfully passed bootstrap and testing ("all" languages +
> Ada) and LTO-bootstrap (C and C++ only) on x86_64-linux.
>
> 2013-11-12  Martin Jambor  
>
> * cgraph.c (cgraph_get_create_node): Do what
> cgraph_get_create_real_symbol_node used to do.
> (cgraph_get_create_real_symbol_node): Removed.  Changed all users to
> call cgraph_get_create_node.
> * cgraph.h (cgraph_get_create_real_symbol_node): Removed.
> * lto-streamer-in.c (input_function): Call cgraph_get_node instead of
> cgraph_get_create_node.  Assert we get a node.

This patch breaks lto-profiledbootstrap on x86_64-pc-linux-gnu with:

In function ‘colorize_start’:
lto1: internal compiler error: in input_function, at lto-streamer-in.c:919
0xa585c1 input_function
/home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:919
0xa585c1 lto_read_body
/home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:1067
0xa585c1 lto_input_function_body(lto_file_decl_data*, cgraph_node*, char const*)
/home/uros/gcc-svn/trunk/gcc/lto-streamer-in.c:1109
0x66eb2c cgraph_get_body(cgraph_node*)
/home/uros/gcc-svn/trunk/gcc/cgraph.c:2967
0x999339 ipa_merge_profiles(cgraph_node*, cgraph_node*)
/home/uros/gcc-svn/trunk/gcc/ipa-utils.c:699
0x5979a6 lto_cgraph_replace_node
/home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:82
0x598079 lto_symtab_merge_symbols_1
/home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:561
0x598079 lto_symtab_merge_symbols()
/home/uros/gcc-svn/trunk/gcc/lto/lto-symtab.c:589
0x586fad read_cgraph_and_symbols
/home/uros/gcc-svn/trunk/gcc/lto/lto.c:2945
0x586fad lto_main()
/home/uros/gcc-svn/trunk/gcc/lto/lto.c:3254

You will need patches from Teresa [1],[2] to get up to there in the
lto-profiledbootstrap.

[1] http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01455.html
[2] http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01453.html

Uros.

Re: [gomp4 simd, RFC] Simple fix to override vectorization cost estimation.

2013-11-13 Thread Sergey Ostanevich

I will get some tests.
As for cost analysis - simply consider the pragma as a request to
vectorize. How can I - as a developer - enforce it beyond the pragma?

On Wed, Nov 13, 2013 at 12:55 PM, Richard Biener  wrote:
> On Tue, 12 Nov 2013, Sergey Ostanevich wrote:
>
>> The reason patch was in its original state is because we want
>> to notify user that his assumption of profitability may be wrong.
>> This is not a part of any spec and as far as I know ICC does not
>> notify user about the case. Still it can be a good hint for those
>> users who tries to get as much as possible performance.
>>
>> Richard's comment on the vectorization problems is about the same -
>> to inform user that his attempt to force vectorization is failed.
>>
>> As for profitable or not - sometimes I believe it's impossible to be
>> precise. For OMP we have case of a vector version of a function
>> and we have no chance to figure out whether it is profitable to use
>> it or to loose it. If we can't map the loop for any vector length
>> other than 1 - I believe in this case we have to bail out and report.
>> Is it about 'never profitable'?
>
> For example.  I think we should report non-vectorized loops
> that are marked with force_vect anyway, with -Wdisabled-optimization.
> Another case is that a loop may be profitable to vectorize if
> the ISA supports a gather instruction but otherwise not.  Or if the
> ISA supports efficient vector construction from N not loop
> invariant scalars (for vectorization of strided loads).
>
> Simply disregarding all of the cost analysis sounds completely
> bogus to me.
>
> I'd simply go for the diagnostic for now, not changing anything else.
> We want to have a good understanding about why the cost model is
> so bad that we have to force to ignore it for #pragma simd - thus we
> want testcases.
>
> Richard.
>
>>
>> On Tue, Nov 12, 2013 at 6:35 PM, Richard Biener  wrote:
>> > On 11/12/13 3:16 PM, Jakub Jelinek wrote:
>> >> On Tue, Nov 12, 2013 at 05:46:14PM +0400, Sergey Ostanevich wrote:
>> >>> ivdep just substitutes all cross-iteration data analysis,
>> >>> nothing related to cost model. ICC does not cancel its
>> >>> cost model in case of #pragma ivdep
>> >>>
>> >>> as for the safelen - OMP standart treats it as a limitation
>> >>> for the vector length. this means if no safelen is present
>> >>> an arbitrary vector length can be used.
>> >>
>> >> I was talking about GCC loop->safelen, which is INT_MAX for #pragma omp 
>> >> simd
>> >> without safelen clause or #pragma simd without vectorlength clause.
>> >>
>> >>> so I believe loop->force_vect is the only trigger to disregard
>> >>> the cost model
>> >>
>> >> Anyway, in that case I think the originally posted patch is wrong,
>> >> if we want to treat force_vect as disregard all the cost model and
>> >> force vectorization (well, the name of the field already kind of suggest
>> >> that), then IMHO we should treat it the same as 
>> >> -fvect-cost-model=unlimited
>> >> for those loops.
>> >
>> > Err - the user may have a specific sub-architecture in mind when using
>> > #pragma simd, if you say we should completely ignore the cost model
>> > then should we also sorry () if we cannot vectorize the loop (either
>> > because of GCC deficiencies or lack of sub-target support)?
>> >
>> > That said, at least in the cases that the cost model says the loop
>> > is never profitable to vectorize we should follow its advice.
>> >
>> > Richard.
>> >
>> >> Thus (untested):
>> >>
>> >> 2013-11-12  Jakub Jelinek  
>> >>
>> >>   * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use
>> >>   unlimited cost model also for force_vect loops.
>> >>
>> >> --- gcc/tree-vect-loop.c.jj   2013-11-12 12:09:40.0 +0100
>> >> +++ gcc/tree-vect-loop.c  2013-11-12 15:11:43.821404330 +0100
>> >> @@ -2702,7 +2702,7 @@ vect_estimate_min_profitable_iters (loop
>> >>void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
>> >>
>> >>/* Cost model disabled.  */
>> >> -  if (unlimited_cost_model ())
>> >> +  if (unlimited_cost_model () || LOOP_VINFO_LOOP 
>> >> (loop_vinfo)->force_vect)
>> >>  {
>> >>dump_printf_loc (MSG_NOTE, vect_location, "cost model 
>> >> disabled.\n");
>> >>*ret_min_profitable_niters = 0;
>> >>
>> >>   Jakub
>> >>
>> >
>>
>>
>
> --
> Richard Biener 
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imend

[PATCH] Fix *anddi_2 (PR target/59101)

2013-11-13 Thread Jakub Jelinek

Hi!

If *anddi_2 is used with 64-bit integer constant that matches
"Z" constraint, but has bit 31 set, we emit it as andl instead,
but that is wrong unless just ZF is tested in the flags, because
SF might be different (if operands[1] has bit 31 set, then SF
from andl will be set, but for the 64-bit operation it would be clear).
If constant doesn't have bit 31 set (nor any other higher bits), then
SF will be 0 both when using andl or when using andq, so it is fine
to keep using andl in that case.

At -O2 VRP will optimize
  _3 = ~a_2(D);
  _4 = (long long int) _3;
  _5 = _4 & 4102790424;
  if (_5 > 0)
into:
  _3 = ~a_2(D);
  _4 = (long long int) _3;
  _5 = _4 & 4102790424;
  if (_5 != 0)
but at -O1, -Og (or -O2 -fno-tree-vrp) it will not and then we can end
up with wrong-code.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk/4.8?

2013-11-13  Jakub Jelinek  

PR target/59101
* config/i386/i386.md (*anddi_2): Only allow CCZmode if
operands[2] is positive 64-bit constant that is negative in
SImode.

* gcc.c-torture/execute/pr59101.c: New test.

--- gcc/config/i386/i386.md.jj  2013-11-12 11:31:31.0 +0100
+++ gcc/config/i386/i386.md 2013-11-13 10:14:10.981609589 +0100
@@ -7978,7 +7978,12 @@ (define_insn "*anddi_2"
 (const_int 0)))
(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm")
(and:DI (match_dup 1) (match_dup 2)))]
-  "TARGET_64BIT && ix86_match_ccmode (insn, CCNOmode)
+  "TARGET_64BIT
+   && ix86_match_ccmode (insn, CONST_INT_P (operands[2])
+  && INTVAL (operands[2]) > 0
+  && (INTVAL (operands[2])
+  & (HOST_WIDE_INT_1 << 31)) != 0
+  ? CCZmode : CCNOmode)
&& ix86_binary_operator_ok (AND, DImode, operands)"
   "@
and{l}\t{%k2, %k0|%k0, %k2}
--- gcc/testsuite/gcc.c-torture/execute/pr59101.c.jj2013-11-13 
10:22:08.489154035 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr59101.c   2013-11-13 
10:23:01.734870201 +0100
@@ -0,0 +1,15 @@
+/* PR target/59101 */
+
+__attribute__((noinline, noclone)) int
+foo (int a)
+{
+  return (~a & 4102790424LL) > 0 | 6;
+}
+
+int
+main ()
+{
+  if (foo (0) != 7)
+__builtin_abort ();
+  return 0;
+}

Jakub

Re: [patch] [arm] New option for PIC offset unfixed

2013-11-13 Thread Richard Earnshaw

On 13/11/13 15:57, Joey Ye wrote:
> 
> 
>> -Original Message-
>> From: Richard Earnshaw
>> Sent: Wednesday, November 13, 2013 19:17
>> To: Joey Ye
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: [patch] [arm] New option for PIC offset unfixed
>>
>> On 13/11/13 10:20, Joey Ye wrote:
> +  if (TARGET_VXWORKS_RTP)
>>> +arm_pic_data_is_text_relative = 0;
>
> Why is this needed?  Surely, even a VxWorks user should have the
> right to force the compiler to behave differently.  You've set
> things up through
>>> the
> default, now just accept what the user has asked for.
>>> The reason is that TARGET_VXWORKS_RTP isn't a compile time value to
>>> initiate arm.opt. Instead it is true only when -mrtp is specified in
>>> runtime. Also enable text relative may result in runtime error on
>>> vxworks, it is better to prevent it here.
>>
>> I'd be happier if this was only done if the command-line option was not
>> explicitly set on the command line.
> So you are suggesting change like this:
> + Target Report Var(arm_pic_data_is_text_relative) Init(-1)
> 
> +   if (arm_pic_data_is_text_relative < 0 && TARGET_VXWORKS_RTP)
> + arm_pic_data_is_text_relative = 0;
> +   else
> + arm_pic_data_is_text_relative = 1;
> 

No, use the global_options_set structure to find out if the user has set
the value.

Ping Re: Clean up configure glibc version detection, add --with-glibc-version

2013-11-13 Thread Joseph S. Myers

Ping.  This patch 
 is pending 
review (build system or global reviewer).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, i386]: AMD bdver4 enablement

2013-11-13 Thread Uros Bizjak

On Tue, Nov 12, 2013 at 8:01 AM, Gopalasubramanian, Ganesh
 wrote:

> The attached patch (bd4-enablement.patch) enables the next version of AMD's 
> core.
> New addition to the ISA (AVX2 and BMI2) are enabled for the new core.
> Presently, the tuning is mostly copied from bdver3.  This includes the 
> pipeline modeling too.
> X86_TUNE_REASSOC_FP_TO_PARALLEL is not enabled (which might be a work in 
> future).
>
> Bootstrapping passes. Is it OK for upstream?
>
> Regards
> Ganesh
>
> 2013-11-12 Ganesh Gopalasubramanian  
>
> * config.gcc (i[34567]86-*-linux* | ...): Add bdver4.
> (case ${target}): Add bdver4.
> * config/i386/bdver3.md: Add bdver4.
> * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> -march=native recognize bdver4 processors.
> * config/i386/i386-c.c (ix86_target_macros_internal): Add
> bdver4 def_and_undef
> * config/i386/i386.c (struct processor_costs bdver4_cost): New.
> (m_BDVER4): New definition.
> (m_AMD_MULTIPLE): Includes m_BDVER4.
> (processor_target_table): Add bdver4 entry.
> (static const char *const cpu_names): Add bdver4 entry.
> (software_prefetching_beneficial_p): Add bdver3.
> (ix86_option_override_internal): Add bdver4 instruction sets.
> (ix86_issue_rate): Add bdver4.
> (ix86_adjust_cost): Add bdver4.
> (ia32_multipass_dfa_lookahead): Add bdver4.
> (enum processor_model): Add M_AMDFAM15H_BDVER4.
> (struct _arch_names_table): Add M_AMDFAM15H_BDVER4.
> (has_dispatch): Add bdver4.
> * config/i386/i386.h (TARGET_BDVER4): New definition.
> (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver4.
> (enum processor_type): Add PROCESSOR_BDVER4.
> * config/i386/i386.md (define_attr "cpu"): Add bdver4.
> * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver4.
> * gcc/doc/extend.texi: Add details about bdver4.
> * gcc/doc/invoke.texi: Add details about bdver4.

@@ -14526,6 +14526,11 @@ AMD Family 15h core based CPUs with x86-64
instruction set support.  (This
 supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
 SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
 extensions.
+@item bdver4
+AMD Family 15h core based CPUs with x86-64 instruction set support.  (This
+supersets BMI, BMI2, TBM, F16C, FMA, AVX, AVX2, XOP, LWP, AES, PCL_MUL, CX16,
+MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit
+instruction set extensions.

Please also mention FSGS and FMA4 (both are also missing for other
bdver targets).

OK with this change.

Uros.

Re: Add __auto_type C extension, use it in

2013-11-13 Thread Joseph S. Myers

On Wed, 13 Nov 2013, Michael Matz wrote:

> Hi,
> 
> On Wed, 13 Nov 2013, Joseph S. Myers wrote:
> 
> > +In GNU C, but not GNU C++, you may also declare the type of a variable
> > +as @code{__auto_type}.  In that case, the declaration must declare
> > +only one variable,
> 
> What's the reason for this restriction?  I can't see what would become 
> ambiguous with allowing multiple declarations (even when mixing types):
> 
>   int i;
>   short s;
>   __auto_type i2 = i, s2 = s;
> 
> (i2 would be int, s2 be short).

__auto_type is thought of as being equivalent to typeof (initializer), 
except for avoiding multiple evaluation; there aren't any existing cases 
in GNU C where the type specifier is interpreted separately for each 
identifier being declared.  Obviously you can define semantics (following 
C++) for more cases, but the minimal version is sufficient for 
 and other similar uses in macros, and keeping it minimal 
reduces the risk of incompatibility with any future addition of such a 
feature to ISO C.  (It's also simplest to implement.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [testsuite] Fix gcc.dg/atomic/c11-atomic-exec-[45].c tests on Solaris 10+

2013-11-13 Thread Joseph S. Myers

On Wed, 13 Nov 2013, Rainer Orth wrote:

> The following patch fixes this.  Tested by the appropriate runtest
> invocation on i386-pc-solaris2.10, ok for mainline?

OK.  The _POSIX_C_SOURCE definition was intended to allow for any systems 
whose system headers don't like pthreads in strict ISO C mode without a 
feature test macro, but I suppose it's inevitable that the particular 
feature test macros needed are going to vary from system to system.

-- 
Joseph S. Myers
jos...@codesourcery.com

[gomp4, WIP] Elementals improvements

2013-11-13 Thread Jakub Jelinek

Hi!

Here is my latest elemental tweaks patch.
While the patch has code to pass some argument in multiple vector arguments,
it doesn't have something similar for return types.  A lot of
decisions whether to create elemental clone is done in a target hook
(because, as long as we pass arguments in vector registers, it is quite
tied to the ABI for the particular CPU and most probably requires target
attribute support (which the hook now uses on i?86)).

As discussed earlier, if we strictly follow the Intel ABI for simds,
we run into various issues.  The clones then have to use __regcall calling
convention which e.g. mandates that on x86_64 up to 16 vector arguments
are passed in xmm/ymm registers (problem, because the dynamic linker
during lazy binding can clobber ymm8 through ymm15), requires up to 16
vector values returned in xmm/ymm registers (for e.g.
#pragma omp declare simd simdlen(16)
_Complex double foo (double);
) - we don't have infrastructure for that plus we'd need to teach backend(s)
about that new calling convention, and declares {x,y}mm4-7 for 32-bit
and {x,y}mm8-15 for 64-bit to be call saved (on 64-bit again there is a
problem with that because the dynamic linker may clobber that, plus
it is an issue for bt/up in the debugger (we don't save/restore those in
unwind info and how big vectors would we save; note, elementals aren't
allowed to throw or setjmp/longjmp (the standard doesn't mention
setcontext/swapcontext etc. though)).

So, shall we just use different ISA letters to make it clear we are ABI
incompatible with ICC?  How should we return values if we need wider
vector than hw supports it?  Returning just wider vector type has the
problem that it is ABI unstable, say if we have:
#pragma omp declare simd simdlen(4)
double foo (double);
if this is the SSE2 ISA variant (originally x ISA letter), then
if we return double V __attribute__((vector_size (32))); then if
the definition of that function is compiled with -mavx, it would be
returned in %ymm0, otherwise as BLKmode vector (with warning about
ABI changing).  So, do we want to return it as say structure
containing array of 4 doubles?  Or pass as hidden argument
pointer to 4 doubles?  Something different?

What shall be done for targets that don't support target attribute?
Currently the patch just doesn't create simd clones there (well, on ppc* it
could/should, because it supports it).  But say for
ARM/AArch64/MIPS/SPARC/etc.?  I wonder if the generic representation
just shouldn't be ISA 'a', which would pass all non-uniform/non-linear
arguments as pointers to array of simdlen elements, and ditto for return
value through first hidden argument.  For x86_64/i?86, because (at least on
a tiny benchmark I've tried) the pointer arguments variant is somewhat
slower, we would use ISA 'b', 'c', 'd' for SSE2/AVX/AVX2 (shall we do
anything for AVX512-F too?) if simdlen is in between 2 and 16, otherwise
we'd use 'a' and arrays too.

Perhaps for the future we'll also want some way how to say that
two SIMD clones could be aliases of each other (and which one to
use as the primary one), and as discussed earlier also thunks.

For the aliases, because the integer vector size is the same between
x and y and floating vector size is the same between y and Y, if you
have say int (int, int) simd, then the calling convention is the same
between x and y, similarly for float (double, float) simd between y and Y.

Testcases I've been using are e.g.:
/* { dg-do options "-Ofast -fopenmp -mavx" } */

#pragma omp declare simd notinbranch
__attribute__((noinline)) float
foo (float a, float b, float c)
{
  return a + b + c;
}

float a[1024], b[1024], c[1024];

int
main ()
{
  int i, j;
  asm volatile ("" : : : "memory");
  for (j = 0; j < 1000; j++)
#pragma omp simd
for (i = 0; i < 1024; i++)
  c[i] = foo (a[i], b[i], c[i]);
  return 0;
}
and corresponding 'a' ISA hand written variant (-Ofast -mavx):
typedef float V __attribute__((vector_size (32)));

__attribute__((noinline)) void
foo (V *ret, V *a, V *b, V *c)
{
  *ret = *a + *b + *c;
}

float a[1024], b[1024], c[1024];

int
main ()
{
  int i, j;
  asm volatile ("" : : : "memory");
  for (j = 0; j < 1000; j++)
for (i = 0; i < 1024; i += 8)
  {
V rt, at, bt, ct;
at = *(V *)&a[i];
bt = *(V *)&b[i];
ct = *(V *)&c[i];
foo (&rt, &at, &bt, &ct);
*(V *)&c[i] = rt;
  }
  return 0;
}
or compile time testcase:
/* { dg-do options "-O3 -fopenmp -mavx2" } */

#pragma omp declare simd
#pragma omp declare simd uniform(b) linear(c:3)
__attribute__((noinline)) short
foo (int a, long int b, short c)
{
  if (a == b + c)
return 5;
  else
return 6;
}

int a[1024];
long int b[1024];
short c[1024];

void
bar (int x)
{
  int i;
  if (x == 0)
{
  #pragma omp simd
  for (i = 0; i < 1024; i++)
c[i] = foo (a[i], b[i], c[i]);
}
  else
{
  #pragma omp simd
  for (i = 0; i < 1024; i++)
c[i] = foo (a[i], x, i * 3);

Re: [PATCH] Fix infinite recursion between store_fixed_bit_field/store_split_bit_field with STRICT_ALIGNMENT

2013-11-13 Thread Joseph S. Myers

On Wed, 13 Nov 2013, Julian Brown wrote:

> * gcc.dg/packed-struct-mode-1.c: New.

I think this should be a torture test rather than specifying -O2, and 
should declare malloc itself rather than using .

-- 
Joseph S. Myers
jos...@codesourcery.com

RE: [patch] [arm] New option for PIC offset unfixed

2013-11-13 Thread Joey Ye

This patch address all comments.

Thanks,
Joey

> -Original Message-
> From: Richard Earnshaw
> Sent: Wednesday, November 13, 2013 19:07
> To: Joey Ye
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [patch] [arm] New option for PIC offset unfixed
> 
> On 13/11/13 10:20, Joey Ye wrote:
> >>> +@item -mpic-data-is-text-relative
> >>> > > +@opindex mpic-data-is-text-relative Assume that each data
> >>> > > +segments are relative to text segment at load
> > time.
> >> >
> >>> > > +Therefore, prevent PC relative and GOTOFF style relocations to
> >>> > > +reference data.
> >> >
> >> > I think the sense of this sentence is now backwards.  I'd also try
> >> > to
> > avoid
> >> > GOTOFF in the user part of the manual.
> > How about
> > "Therefore, prevent addressing data with relocation types that doesn't
> > apply in such circumstance."
> >
> >> >
> 
> No, that's still backwards.  Remember, the option is now pic-data-is-text-
> relative, so the option /permits/ addressing data using PC-relative
operations.
> 
> R.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7757e86..5a95399 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2504,6 +2504,11 @@ arm_option_override (void)
arm_pic_register = pic_register;
 }
 
+  if (arm_pic_data_is_text_relative < 0 && TARGET_VXWORKS_RTP)
+arm_pic_data_is_text_relative = 0;
+  else
+arm_pic_data_is_text_relative = 1;
+
   /* Enable -mfix-cortex-m3-ldrd by default for Cortex-M3 cores.  */
   if (fix_cm3_ldrd == 2)
 {
@@ -6020,7 +6025,7 @@ legitimize_pic_address (rtx orig, enum machine_mode mode, 
rtx reg)
   || (GET_CODE (orig) == SYMBOL_REF &&
   SYMBOL_REF_LOCAL_P (orig)))
  && NEED_GOT_RELOC
- && !TARGET_VXWORKS_RTP)
+ && arm_pic_data_is_text_relative)
insn = arm_pic_static_addr (orig, reg);
   else
{
@@ -21498,7 +21503,7 @@ arm_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
{
  /* See legitimize_pic_address for an explanation of the
 TARGET_VXWORKS_RTP check.  */
- if (TARGET_VXWORKS_RTP
+ if (!arm_pic_data_is_text_relative
  || (GET_CODE (x) == SYMBOL_REF && !SYMBOL_REF_LOCAL_P (x)))
fputs ("(GOT)", asm_out_file);
  else
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 9b74038..adac749 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -158,6 +158,10 @@ mlong-calls
 Target Report Mask(LONG_CALLS)
 Generate call insns as indirect calls, if necessary
 
+mpic-data-is-text-relative
+Target Report Var(arm_pic_data_is_text_relative) Init(-1)
+Assume data segments are relative to text segment.
+
 mpic-register=
 Target RejectNegative Joined Var(arm_pic_register_string)
 Specify the register to be used for PIC addressing
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 863e518..fbe77e6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12120,6 +12120,12 @@ before execution begins.
 Specify the register to be used for PIC addressing.  The default is R10
 unless stack-checking is enabled, when R9 is used.
 
+@item -mpic-data-is-text-relative
+@opindex mpic-data-is-text-relative
+Assume that each data segments are relative to text segment at load time.
+Therefore, it permits addressing data using PC-relative operations.
+This option is on by default for targets other than VxWorks RTP.
+
 @item -mpoke-function-name
 @opindex mpoke-function-name
 Write the name of each function into the text section, directly

RE: [PATCH] Fix for PR bootstrap/58951

2013-11-13 Thread Gerald Pfeifer

On Tue, 12 Nov 2013, Iyer, Balaji V wrote:
> This worked for me and I checked it in. Gerald, please let me know if 
> you still have issues.

Thanks Balaji and H.J. -- my tester successfully bootstrapped
again (for the first time this month).

Gerald

Re: [PATCH, libiberty]: Add a couple of missing casts

2013-11-13 Thread Ian Lance Taylor

On Wed, Nov 13, 2013 at 7:30 AM, Gary Benson  wrote:
> Richard Biener wrote:
>> On Tue, Nov 12, 2013 at 8:55 PM, Ian Lance Taylor  wrote:
>> > On Tue, Nov 12, 2013 at 11:24 AM, Uros Bizjak  wrote:
>> > >
>> > > This was uncovered by x86 lto-profiledbootstrap. The patch allows
>> > > lto-profiledbootstrap to proceed further.
>> > >
>> > > 2013-11-12  Uros Bizjak  
>> > >
>> > > * cp-demangle.c (d_copy_templates): Cast result of malloc
>> > > to (struct d_print_template *).
>> > > (d_print_comp): Cast result of realloc to (struct d_saved scope *).
>> > >
>> > > Tested on x86_64-pc-linux-gnu.
>> > >
>> > > OK for mainline?
>> >
>> > The patch is OK, but this code is troubling.  I obviously should
>> > have looked at it earlier.  The C++ demangler is sometimes used in
>> > panic situations, when malloc is not available.  The interface was
>> > designed to be usable without requiring malloc, by passing in a
>> > sufficiently large buffer.  I'm concerned that we apparently now
>> > require malloc to work.
>>
>> That indeed looks like an important regression - Gary, can you
>> please work to fix this?
>
> I'm on it.

Thanks.  See also the cplus_demangle_print_callback function.

Ian

Re: Add __auto_type C extension, use it in

2013-11-13 Thread Michael Matz

Hi,

On Wed, 13 Nov 2013, Joseph S. Myers wrote:

> +In GNU C, but not GNU C++, you may also declare the type of a variable
> +as @code{__auto_type}.  In that case, the declaration must declare
> +only one variable,

What's the reason for this restriction?  I can't see what would become 
ambiguous with allowing multiple declarations (even when mixing types):

  int i;
  short s;
  __auto_type i2 = i, s2 = s;

(i2 would be int, s2 be short).

Ciao,
Michael.

Re: [v3] Missing uglification

2013-11-13 Thread Paolo Carlini


Hi,

On 11/13/2013 03:40 PM, Marc Glisse wrote:

Bootstrap and testsuite on x86_64-unknown-linux-gnu.

Ok, thanks. If you like I think the patch is safe for 4.8.3 too.

The main other issue in that PR will require a UDL specialist.

Let's ping Ed, then.

Thanks,
Paolo.

Re: [PATCH, PR 10474] Take two on splitting live-ranges of function arguments to help shrink-wrapping

2013-11-13 Thread H.J. Lu

On Wed, Nov 6, 2013 at 8:26 AM, Martin Jambor  wrote:
> Hi,
>
> last Thursday I had to revert a previous version of this patch because
> it has caused a lot of trouble on various platforms I did not test it
> on.  The patch is still very similar to its previous iteration
> (http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02183.html) the only
> major difference is that I moved more after-the-fact re-analyzing from
> find_moveable_pseudos to a point when my transformation also finished.
>
> The problem with it was that REG_N_REFS of the pseudos introduced by
> my code was not set and thus reload ignored and let it slip through
> instead of generating spills.  This is fixed by moving the
> re-computation of regstat_n_sets_and_refs in
> regstat_init_n_sets_and_refs to after my transformation.  In order to
> avoid similar surprises I have moved all other re-computations from
> find_moveable_pseudos to the caller in ira, namely:
>
>   fix_reg_equiv_init ();
>   expand_reg_info ();
>   regstat_free_n_sets_and_refs ();
>   regstat_free_ri ();
>   regstat_init_n_sets_and_refs ();
>   regstat_compute_ri ();
>
> I have also bootstrapped and tested the patch not only on x86_64-linux
> but also on i686-linux and ppc64-linux (without Ada though).  I have
> made sure that the reported problem does not occur on cris-elf and
> sh-none-elf cross compilers.  (I could not reproduce it on arm, and do
> not have access to sparc but it was also reported there.)
>
> Another minor change which I erroneously omitted the last time is that
> the testcases are run on x86_64 only because that is the only platform
> where I know the transformation currently takes place.  The reason why
> I did not move them to a target-specific directory is that I believe
> the transformation can be beneficial on other platforms as well.  For
> example, PR 10474 was actually filed against PPC but this patch does
> not work there because the initial move of the parameter is done in a
> parallel insn:
>
> (parallel [
> (set (reg:CC 124)
> (compare:CC (reg:DI 3 3 [ i ])
> (const_int 0 [0])))
> (set (reg/v/f:DI 123 [ i ])
> (reg:DI 3 3 [ i ]))
> ])
>
> which fails my single_set test.  However, the mechanism can be
> extended to handle these situations as well and afterwards we could
> run the test also on ppc64.
>
> So, Vlad, Steven, do you think that this time I have re-computed all
> that is necessary?  Do you think the patch is OK?
>
> Thanks a lot and sorry for the breakage,
>
> Martin
>
>
> 2013-11-04  Martin Jambor  
>
> PR rtl-optimization/10474
> * ira.c (interesting_dest_for_shprep): New function.
> (split_live_ranges_for_shrink_wrap): Likewise.
> (find_moveable_pseudos): Move calculation of dominance info,
> df_analysios and the final anlyses to...
> (ira): ...here, call split_live_ranges_for_shrink_wrap.
>
> testsuite/
> * gcc.dg/pr10474.c: New testcase.
> * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
> * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59099

-- 
H.J.

[AArch64] [-mtune cleanup 4/5] Remove "example-1", "example-2" tuning options.

2013-11-13 Thread James Greenhalgh


Hi,

"example-1" and "example-2" provide a "large"-like tuning option and
a "small"-like tuning option.

Now that we have wired up tuning for "cortex-a57" and "cortex-a53"
we no longer need these options.

Remove them.

Tested in series on aarch64-none-elf with no regressions.

OK?

Thanks,
James

---
gcc/

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-cores.def (example-1): Remove.
(example-2): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.md: Do not include "large.md" or "small.md".
(generic_sched): Remove "large", "small".
* config/aarch64/large.md: Delete.
* config/aarch64/small.md: Delete.

gcc/testsuite/

2013-11-13  James Greenhalgh  

* gcc.target/aarch64/cpu-diagnostics-2.c: Change "-mcpu="
to "cortex-a53".
* gcc.target/aarch64/cpu-diagnostics-3.c: Change "-mcpu="
to "cortex-a53".
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 1845358..51c1ff8 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -36,5 +36,3 @@
 
 AARCH64_CORE("cortex-a53",	  cortexa53,	 8,  AARCH64_FL_FPSIMD,generic)
 AARCH64_CORE("cortex-a57",	  cortexa15,	 8,  AARCH64_FL_FPSIMD,generic)
-AARCH64_CORE("example-1",	  large,	 8,  AARCH64_FL_FPSIMD,generic)
-AARCH64_CORE("example-2",	  small,	 8,  AARCH64_FL_FPSIMD,generic)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 1bde99bec57c5defc35d24eb4c141aab70f616d2..84081d1ba57e306398e4449e55bf4c4dadf2e391 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa15,large,small"
+	"cortexa53,cortexa15"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6f828e26c594994701d150396972b2a3dcd9196f..5f35344154a65480cd520b1e5743d82bc6e56be9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -312,13 +312,11 @@ (define_attr "enabled" "no,yes"
 
 (define_attr "generic_sched" "yes,no"
   (const (if_then_else
-  (eq_attr "tune" "large,small,cortexa53,cortexa15")
+  (eq_attr "tune" "cortexa53,cortexa15")
   (const_string "no")
   (const_string "yes"
 
 ;; Scheduling
-(include "large.md")
-(include "small.md")
 (include "../arm/cortex-a53.md")
 (include "../arm/cortex-a15.md")
 
diff --git a/gcc/config/aarch64/large.md b/gcc/config/aarch64/large.md
index 4316cc7dfafff8a2a2e48e581a1bb06d5c9f866b..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 100644
--- a/gcc/config/aarch64/large.md
+++ b/gcc/config/aarch64/large.md
@@ -1,312 +0,0 @@
-;; Copyright (C) 2012-2013 Free Software Foundation, Inc.
-;;
-;; Contributed by ARM Ltd.
-;;
-;; This file is part of GCC.
-;;
-;; GCC is free software; you can redistribute it and/or modify it
-;; under the terms of the GNU General Public License as published by
-;; the Free Software Foundation; either version 3, or (at your option)
-;; any later version.
-;;
-;; GCC is distributed in the hope that it will be useful, but
-;; WITHOUT ANY WARRANTY; without even the implied warranty of
-;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-;; General Public License for more details.
-;;
-;; You should have received a copy of the GNU General Public License
-;; along with GCC; see the file COPYING3.  If not see
-;; .
-
-;; In the absence of any ARMv8-A implementations, two examples derived
-;; from ARM's most recent ARMv7-A cores (Cortex-A7 and Cortex-A15) are
-;; included by way of example.  This is a temporary measure.
-
-;; Example pipeline description for an example 'large' core
-;; implementing AArch64
-
-;;---
-;; General Description
-;;---
-
-(define_automaton "large_cpu")
-
-;; The core is modelled as a triple issue pipeline that has
-;; the following dispatch units.
-;; 1. Two pipelines for simple integer operations: int1, int2
-;; 2. Two pipelines for SIMD and FP data-processing operations: fpsimd1, fpsimd2
-;; 3. One pipeline for branch operations: br
-;; 4. One pipeline for integer multiply and divide operations: multdiv
-;; 5. Two pipelines for load and store operations: ls1, ls2
-;;
-;; We can issue into three pipelines per-cycle.
-;;
-;; We assume that where we have unit pairs xxx1 is always filled before xxx2.
-
-;;---
-;; CPU Units and Reservations
-;;---
-
-;; The three issue units
-(define_cpu_unit "large_cpu_unit_i1, large_cpu_unit_i2, large_cpu_unit_i3" "large_cpu")
-
-(def

[AArch64] [-mtune cleanup 3/5] [Temporary] When asked to tune for Cortex-A57, tune for Cortex-A15

2013-11-13 Thread James Greenhalgh


Hi,

We do not yet have a pipeline model for Cortex-A57. The most sensible
thing we can use to generate pipeline schedules is another "big"-like
processor.

For that we can use the Cortex-A15 model.

Tested in series on aarch64-none-elf with no regressions.

OK?

Thanks,
James

---
gcc/

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-cores.def (cortex-a57): Tune for cortexa15.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.md: Include cortex-a15 pipeline model.
(generic_sched): "no" if we are tuning for cortexa15.
* config/arm/cortex-a15.md: Include cortex-a15-neon.md by
relative path.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index c840aa0..1845358 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -35,6 +35,6 @@
therefore serves as a template for adding more CPUs in the future.  */
 
 AARCH64_CORE("cortex-a53",	  cortexa53,	 8,  AARCH64_FL_FPSIMD,generic)
-AARCH64_CORE("cortex-a57",	  cortexa57,	 8,  AARCH64_FL_FPSIMD,generic)
+AARCH64_CORE("cortex-a57",	  cortexa15,	 8,  AARCH64_FL_FPSIMD,generic)
 AARCH64_CORE("example-1",	  large,	 8,  AARCH64_FL_FPSIMD,generic)
 AARCH64_CORE("example-2",	  small,	 8,  AARCH64_FL_FPSIMD,generic)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 02699e35c3fca8e00b45347c68e3b17286df721b..1bde99bec57c5defc35d24eb4c141aab70f616d2 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa57,large,small"
+	"cortexa53,cortexa15,large,small"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index d66b41dcb42e458735c907f8e2de9f6ef206ac03..6f828e26c594994701d150396972b2a3dcd9196f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -312,7 +312,7 @@ (define_attr "enabled" "no,yes"
 
 (define_attr "generic_sched" "yes,no"
   (const (if_then_else
-  (eq_attr "tune" "large,small,cortexa53")
+  (eq_attr "tune" "large,small,cortexa53,cortexa15")
   (const_string "no")
   (const_string "yes"
 
@@ -320,6 +320,7 @@ (define_attr "generic_sched" "yes,no"
 (include "large.md")
 (include "small.md")
 (include "../arm/cortex-a53.md")
+(include "../arm/cortex-a15.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md
index ccad62076089b5e095f472fdbf298ba7226ae4ec..5a31a097918f7a01b38671416ece350049f00e28 100644
--- a/gcc/config/arm/cortex-a15.md
+++ b/gcc/config/arm/cortex-a15.md
@@ -158,7 +158,7 @@ (define_insn_reservation "cortex_a15_sto
   "ca15_issue2,ca15_ls1+ca15_ls2,ca15_str,ca15_str")
 
 ;; We include Neon.md here to ensure that the branch can block the Neon units.
-(include "cortex-a15-neon.md")
+(include "../arm/cortex-a15-neon.md")
 
 ;; We lie with calls.  They take up all issue slots, and form a block in the
 ;; pipeline.  The result however is available the next cycle.

[AArch64] [-mtune cleanup 2/5] Tune for Cortex-A53 by default.

2013-11-13 Thread James Greenhalgh


Hi,

This patch makes Cortex-A53 the processor we choose to tune for in
the following situations:

  * No -mtune, -mcpu or -march
  * -march=armv8-a
  * -mtune/cpu=generic
  * -mtune/cpu=cortex-a15

That is to say, we will tune for cortex-a53 by default.

This seems the pragmatic choice as we currently only have a
pipeline model implemented for cortex-a53.

Having made this decision, we remove aarch64-generic.md, which
does nothing of interest.

Tested on aarch64-none-elf in series with no issues.

OK?

Thanks,
James

---
gcc/

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-arches.def (armv8-a): Tune for cortex-a53.
* config/aarch64/aarch64.md: Do not include aarch64-generic.md.
* config/aarch64/aarch64.c (aarch64_tune): Initialize to cortexa53.
(all_cores): Use cortexa53 when tuning for "generic".
(aarch64_override_options): Fix comment.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Set to cortexa53.
* config/aarch64/aarch64-generic.md: Delete.
diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index b66e33e..683c34c 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -26,4 +26,4 @@
this architecture.  ARCH is the architecture revision.  FLAGS are
the flags implied by the architecture.  */
 
-AARCH64_ARCH("armv8-a",	  generic,	 8,  AARCH64_FL_FOR_ARCH8)
+AARCH64_ARCH("armv8-a",	  cortexa53,	 8,  AARCH64_FL_FOR_ARCH8)
diff --git a/gcc/config/aarch64/aarch64-generic.md b/gcc/config/aarch64/aarch64-generic.md
index 12faac84348c72c44c1c144d268ea9751a0665ac... .
--- a/gcc/config/aarch64/aarch64-generic.md
+++ b/gcc/config/aarch64/aarch64-generic.md
@@ -1,40 +0,0 @@
-;; Machine description for AArch64 architecture.
-;; Copyright (C) 2009-2013 Free Software Foundation, Inc.
-;; Contributed by ARM Ltd.
-;;
-;; This file is part of GCC.
-;;
-;; GCC is free software; you can redistribute it and/or modify it
-;; under the terms of the GNU General Public License as published by
-;; the Free Software Foundation; either version 3, or (at your option)
-;; any later version.
-;;
-;; GCC is distributed in the hope that it will be useful, but
-;; WITHOUT ANY WARRANTY; without even the implied warranty of
-;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-;; General Public License for more details.
-;;
-;; You should have received a copy of the GNU General Public License
-;; along with GCC; see the file COPYING3.  If not see
-;; .
-
-;; Generic scheduler
-
-(define_automaton "aarch64")
-
-(define_cpu_unit "core" "aarch64")
-
-(define_attr "is_load" "yes,no"
-  (if_then_else (eq_attr "v8type" "fpsimd_load,fpsimd_load2,load1,load2")
-	(const_string "yes")
-	(const_string "no")))
-
-(define_insn_reservation "load" 2
-  (and (eq_attr "generic_sched" "yes")
-   (eq_attr "is_load" "yes"))
-  "core")
-
-(define_insn_reservation "nonload" 1
-  (and (eq_attr "generic_sched" "yes")
-   (eq_attr "is_load" "no"))
-  "core")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5fba692..1714f43 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -127,7 +127,7 @@ static bool aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 		 const unsigned char *sel);
 
 /* The processor for which instructions should be scheduled.  */
-enum aarch64_processor aarch64_tune = generic;
+enum aarch64_processor aarch64_tune = cortexa53;
 
 /* The current tuning set.  */
 const struct tune_params *aarch64_tune_params;
@@ -240,7 +240,7 @@ static const struct processor all_cores[] =
   {NAME, IDENT, #ARCH, FLAGS | AARCH64_FL_FOR_ARCH##ARCH, &COSTS##_tunings},
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
-  {"generic", generic, "8", AARCH64_FL_FPSIMD | AARCH64_FL_FOR_ARCH8, &generic_tunings},
+  {"generic", cortexa53, "8", AARCH64_FL_FPSIMD | AARCH64_FL_FOR_ARCH8, &generic_tunings},
   {NULL, aarch64_none, NULL, 0, NULL}
 };
 
@@ -5182,7 +5182,7 @@ aarch64_override_options (void)
 
   /* If the user did not specify a processor, choose the default
  one for them.  This will be the CPU set during configuration using
- --with-cpu, otherwise it is "generic".  */
+ --with-cpu, otherwise it is "coretex-a53".  */
   if (!selected_cpu)
 {
   selected_cpu = &all_cores[TARGET_CPU_DEFAULT & 0x3f];
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 7a80e96..914d42d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -468,10 +468,10 @@ enum target_cpus
   TARGET_CPU_generic
 };
 
-/* If there is no CPU defined at configure, use "generic" as default.  */
+/* If there is no CPU defined at configure, use "cortex-a53" as default.  */
 #ifndef TARGET_CPU_DEFAULT
 #define TARGET_CPU_DEFAULT \
-  (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS << 6))
+  (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_

[AArch64] [-mtune cleanup 5/5] Update invoke.texi

2013-11-13 Thread James Greenhalgh


Hi,

This patch finishes the series, updating invoke.texi
to reflect the options supported by -mtune and -mcpu.

Thanks,
James

---
gcc/

2013-11-13  James Greenhalgh  

* doc/invoke.texi: Update documentation for AArch64's -mcpu
and -mtune options.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 25e3eb5..25f4983 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11169,9 +11169,9 @@ instead of the @option{-mcpu=} option.
 Specify the name of the target processor, optionally suffixed by one or more
 feature modifiers.  This option has the form
 @option{-mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the
-possible values for @var{cpu} are @samp{generic}, @samp{large}.  The
-possible values for @var{feature} are documented in the sub-section
-below.
+possible values for @var{cpu} are @samp{generic}, @samp{cortex-a53},
+@samp{cortex-a57}.  The possible values for @var{feature} are documented
+in the sub-section below.
 
 Where conflicting feature modifiers are specified, the right-most feature is
 used.
@@ -11184,8 +11184,12 @@ generating assembly code.
 Specify the name of the processor to tune the performance for.  The code will
 be tuned as if the target processor were of the type specified in this option,
 but still using instructions compatible with the target processor specified
-by a @option{-mcpu=} option.  This option cannot be suffixed by feature
-modifiers.
+by a @option{-mcpu=} option.  Where no @option{-mtune=} option is
+specified, the code will be tuned to perform well on the target processor
+given by @option{-mcpu=} or @option{-march=}.  Where none of
+@option{-mtune=}, @option{-mcpu=} or @option{-march=} are specified,
+the code will be tuned to perform well across a range of target
+processors.  This option cannot be suffixed by feature modifiers.
 
 @end table

[AArch64] [-mtune cleanup 1/5] Remove -march=generic.

2013-11-13 Thread James Greenhalgh


Hi,

This option was never documented anywhere, is meaningless
and generates unhelpful code. Better we remove it.

Tested on aarch64-none-elf in series with no regressions.

OK?

Thanks,
James

---
gcc/

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64.c (all_architectures): Remove "generic".
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8458cac..5fba692 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -251,7 +251,6 @@ static const struct processor all_architectures[] =
   {NAME, CORE, #ARCH, FLAGS, NULL},
 #include "aarch64-arches.def"
 #undef AARCH64_ARCH
-  {"generic", generic, "8", AARCH64_FL_FOR_ARCH8, NULL},
   {NULL, aarch64_none, NULL, 0, NULL}
 };

[AArch64] [0/5 -mtune cleanup] Update options for -mtune.

2013-11-13 Thread James Greenhalgh

Hi,

This patch series performs a number of cleanups to the
 -mtune/-mcpu/-march infrastructure for AArch64.

Our goals are:

  * Remove the example pipeline models.
  * Tune for Cortex-A53 by default.
  * Provide sensible tuning for Cortex-A57.

The patches which implement these goals are:

[AArch64] [-mtune cleanup 1/5] Remove -march=generic.
  -march=generic has no sensible meaning - remove it.

[AArch64] [-mtune cleanup 2/5] Tune for Cortex-A53 by default.
  The current "generic" scheduler is not very smart. We would like
  to try to tune for something sensible when given -march=armv8-a.
  As it is currently the only pipeline model we have implemented,
  tuning for the Cortex-A53 seems a pragmatic decision.

[AArch64] [-mtune cleanup 3/5] [Temporary] When asked to tune for
Cortex-A57, tune for Cortex-A15
  Cortex-A57 is a "big" core. We do not yet have a pipeline model
  for Cortex-A57, so it would be sensible in the interim to tune for
  another "big" core.

[AArch64] [-mtune cleanup 4/5] Remove "example-1", "example-2" tuning
options.
  These were transitory options which would give example tunings for
  a "big" core and a "little" core. Now we have "cortex-a57" and
  "cortex-a53" wired up, they are not needed.

[AArch64] [-mtune cleanup 5/5] Update invoke.texi
  Finally, update the documentation for the above changes.

The patch series has been regression tested for aarch64-none-elf
with no issues.

OK?

Thanks,
James

---
gcc/

[AArch64] [-mtune cleanup 1/5] Remove -march=generic.

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64.c (all_architectures): Remove "generic".

[AArch64] [-mtune cleanup 2/5] Tune for Cortex-A53 by default.

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-arches.def (armv8-a): Tune for cortex-a53.
* config/aarch64/aarch64.md: Do not include aarch64-generic.md.
* config/aarch64/aarch64.c (aarch64_tune): Initialize to cortexa53.
(all_cores): Use cortexa53 when tuning for "generic".
(aarch64_override_options): Fix comment.
* config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Set to cortexa53.
* config/aarch64/aarch64-generic.md: Delete.

[AArch64] [-mtune cleanup 3/5] [Temprorary] When asked to tune for
Cortex-A57, tune for Cortex-A15

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-cores.def (cortex-a57): Tune for cortexa15.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.md: Include cortex-a15 pipeline model.
(generic_sched): "no" if we are tuning for cortexa15.
* config/arm/cortex-a15.md: Include cortex-a15-neon.md by
relative path.

[AArch64] [-mtune cleanup 4/5] Remove "example-1", "example-2" tuning
options.

2013-11-13  James Greenhalgh  

* config/aarch64/aarch64-cores.def (example-1): Remove.
(example-2): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.md: Do not include "large.md" or "small.md".
(generic_sched): Remove "large", "small".
* config/aarch64/large.md: Delete.
* config/aarch64/small.md: Delete.

[AArch64] [-mtune cleanup 5/5] Update invoke.texi

2013-11-13  James Greenhalgh  

* doc/invoke.texi: Update documentation for AArch64's -mcpu
and -mtune options.

gcc/testsuite/

[AArch64] [-mtune cleanup 4/5] Remove "example-1", "example-2" tuning
options.

2013-11-13  James Greenhalgh  

* gcc.target/aarch64/cpu-diagnostics-2.c: Change "-mcpu="
to "cortex-a53".
* gcc.target/aarch64/cpu-diagnostics-3.c: Change "-mcpu="
to cortex-a53.

Re: [PATCH, libiberty]: Add a couple of missing casts

2013-11-13 Thread Gary Benson

Richard Biener wrote:
> On Tue, Nov 12, 2013 at 8:55 PM, Ian Lance Taylor  wrote:
> > On Tue, Nov 12, 2013 at 11:24 AM, Uros Bizjak  wrote:
> > >
> > > This was uncovered by x86 lto-profiledbootstrap. The patch allows
> > > lto-profiledbootstrap to proceed further.
> > >
> > > 2013-11-12  Uros Bizjak  
> > >
> > > * cp-demangle.c (d_copy_templates): Cast result of malloc
> > > to (struct d_print_template *).
> > > (d_print_comp): Cast result of realloc to (struct d_saved scope *).
> > >
> > > Tested on x86_64-pc-linux-gnu.
> > >
> > > OK for mainline?
> >
> > The patch is OK, but this code is troubling.  I obviously should
> > have looked at it earlier.  The C++ demangler is sometimes used in
> > panic situations, when malloc is not available.  The interface was
> > designed to be usable without requiring malloc, by passing in a
> > sufficiently large buffer.  I'm concerned that we apparently now
> > require malloc to work.
> 
> That indeed looks like an important regression - Gary, can you
> please work to fix this?

I'm on it.

Thanks,
Gary

-- 
http://gbenson.net/

[PATCH, testsuite] Add lp64 to target requirements of new IRA shrink wrapping preparation testcases

2013-11-13 Thread Martin Jambor

Hi,

the testcases I have added for IRA shrink wrapping preparation code
were not intended for -m32 on x86_64 and should not be tested with it,
thus I'm adding lp64 to the target requirements.

Let me also briefly mention that I would like to make the testcases
also run on ppc64 and therefore I did not put them into
gcc.target/i386.

I have tested the changes by running

make -k check RUNTESTFLAGS="dg.exp=ira-shrinkwrap-prep-?.c
--target_board=unix/\{,32\}" and
RUNTESTFLAGS="dg.exp=pr10474.c --target_board=unix/\{,32\}"

and examining the gcc.sum files.  Since this was recommended by Richi
on IRC, I will commit the change in a few minutes.

Thanks,

Martin


2013-11-13  Martin Jambor  

* testsuite/gcc.dg/ira-shrinkwrap-prep-1.c: Add lp64 to target
requirements.
* testsuite/gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* testsuite/gcc.dg/pr10474.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c 
b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
index 16095e3..4fc00b2 100644
--- a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
+++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-do compile { target { x86_64-*-* && lp64 } } } */
 /* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue"  } */
 
 int __attribute__((noinline, noclone))
diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c 
b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c
index 2b00c5b..bb725e1 100644
--- a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c
+++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-2.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-do compile { target { x86_64-*-* && lp64 } } } */
 /* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue"  } */
 
 int __attribute__((noinline, noclone))
diff --git a/gcc/testsuite/gcc.dg/pr10474.c b/gcc/testsuite/gcc.dg/pr10474.c
index 4fcd75d..08324d8 100644
--- a/gcc/testsuite/gcc.dg/pr10474.c
+++ b/gcc/testsuite/gcc.dg/pr10474.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-do compile { target { x86_64-*-* && lp64 } } } */
 /* { dg-options "-O3 -fdump-rtl-pro_and_epilogue"  } */
 
 void f(int *i)

[testsuite] Fix gcc.dg/atomic/c11-atomic-exec-[45].c tests on Solaris 10+

2013-11-13 Thread Rainer Orth

Two of the new gcc.dg/atomic tests were failing to compile on Solaris 10+:

FAIL: gcc.dg/atomic/c11-atomic-exec-4.c  -O0  (test for excess errors)
Excess errors:
/var/gcc/regression/trunk/10-gcc/build/gcc/include-fixed/sys/feature_tests.h:346:2:
 error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications 
 and pre-2001 POSIX applications"

WARNING: gcc.dg/atomic/c11-atomic-exec-4.c  -O0  compilation failed to produce 
executable

 has

#if defined(_STDC_C99) && (defined(__XOPEN_OR_POSIX) && !defined(_XPG6))
#error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
and pre-2001 POSIX applications"

This doesn't happen on Solaris 9 which lacks C99 support in system
headers.

The following patch fixes this.  Tested by the appropriate runtest
invocation on i386-pc-solaris2.10, ok for mainline?

Rainer


2013-11-13  Rainer Orth  

* gcc.dg/atomic/c11-atomic-exec-4.c: Define _XOPEN_SOURCE=600 on
*-*-solaris2.1[0-9]*.
* gcc.dg/atomic/c11-atomic-exec-5.c: Likewise.

# HG changeset patch
# Parent c636e28757231d8549c20a35a4cd3f94296082f5
Fix gcc.dg/atomic/c11-atomic-exec-[45].c tests on Solaris 10+

diff --git a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
--- a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
+++ b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-4.c
@@ -3,6 +3,7 @@
out in two threads.  */
 /* { dg-do run } */
 /* { dg-options "-std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L" } */
+/* { dg-additional-options "-D_XOPEN_SOURCE=600" { target *-*-solaris2.1[0-9]* } }
 /* { dg-require-effective-target pthread } */
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
--- a/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
+++ b/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c
@@ -4,6 +4,7 @@
get properly cleared).  */
 /* { dg-do run } */
 /* { dg-options "-std=c11 -pedantic-errors -pthread -D_POSIX_C_SOURCE=200809L" } */
+/* { dg-additional-options "-D_XOPEN_SOURCE=600" { target *-*-solaris2.1[0-9]* } }
 /* { dg-require-effective-target fenv_exceptions } */
 /* { dg-require-effective-target pthread } */
 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Recent Go patch broke Alpha bootstrap

2013-11-13 Thread Uros Bizjak

On Tue, Nov 12, 2013 at 8:52 AM, Uros Bizjak  wrote:

>>> panic: runtime error: invalid memory address or nil pointer dereference
>>> [signal 0xb code=0x1 addr=0x1c]
>
>>> FAIL: runtime/pprof
>>> gmake[2]: *** [runtime/pprof/check] Error 1
>>>
>>> This one is new, I have to look into it a bit deeper.
>>
>>
>> I don't know what is happening here.  I can't recreate it.  There was
>> a different problem that could arise in runtime/pprof, that was fixed
>> by a patch I submitted on Saturday
>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01016.html).  So it's
>> possible that this is fixed now.
>
> The failure is specific to !USING_SPLIT_STACK targets:

The same error triggered on CentOS 5.10 x86_64 (another
!USING_SPLIT_STACK target) for 32bit lib (net, runtime). The panic:
string is the same, only addr=0x9f. There are also a couple of
segfaults (database/sql, net/http) and abort in sync/atomic.

Uros.

[PATCH] Fix infinite recursion between store_fixed_bit_field/store_split_bit_field with STRICT_ALIGNMENT

2013-11-13 Thread Julian Brown

Hi,

This patch addresses an issue where the compiler gets stuck in an
infinite mutually-recursive loop between store_fixed_bit_field and
store_split_bit_field. This affects versions back at least as far as
4.6 (or so). We observed this happening on PowerPC E500, but other
targets may be affected too. (The symptom is the same as PR 55438, but
the cause is different.)

A small testcase is as follows (compile with a toolchain targeting
"powerpc-linux-gnuspe" and configured with "--with-cpu=8548",
currently requiring minor hacks to work around e.g. libsanitizer
breakage):

#include 

typedef struct {
  char pad;
  int arr[0];
} __attribute__((packed)) str;

str *
foo (int* src)
{
  str *s = malloc (sizeof (str) + sizeof (int));
  s->arr[0] = 0x12345678;
  return s;
}

$ powerpc-linux-gnuspe-gcc -O2 -c min.c 
(Segfault)

The problem is as follows: in stor-layout.c:compute_record_mode, the
record (struct) "str" is considered to have a single element (the "char
pad"), since only the size is checked and not the elements themselves:
as an optimisation the record as a whole is given the mode of the first
element, since that fits nicely into a machine word and then (the idea
is that) the record can be held in a register. In this case, the mode
given will be QImode.

Now, E500 cores cannot handle misaligned data accesses, at least for
some subset of instructions (STRICT_ALIGNMENT is true on such cores), so
accessing elements of the array "arr" in the packed structure will
typically use read-modify-write operations.

The function expmed.c:store_fixed_bit_field uses get_best_mode to try
to find a suitable mode for that read-modify-write operation: the
mode passed into get_best_mode is taken from op0 (inside the "if (MEM_P
(op0))" clause). Because the record type we are accessing has
QImode, this looks something like:

(mem:QI (reg:SI ...))

Now stor-layout.c:bit_field_iterator::next_mode will reject any mode
which is smaller than the size of the access we want to do (32 bits, or
24 bits after store_split_bit_field has been called once), skipping over
QImode and HImode. The SImode value returned is then rejected in
get_best_mode because it is bigger than largest_mode, which is QImode
(from before), so it returns VOIDmode.

That means that store_split_bit_field is called (from
store_fixed_bit_field), but now the damage has been done: we still have
a MEM for op0, so the "else" clause "word = op0" is executed, and we
recurse back into store_fixed_bit_field at the end of the function, and
we're back where we started -- this leads to infinite recursion between
those two functions, which eventually blows up the stack and crashes
the compiler.

Anyway: the short story is that a record that finishes with a
zero-length array should never be given the mode of its
"only" (non-zero-sized) element to start with. The attached patch stops
that from happening. (A flexible trailing array member, "int arr[];" is
handled correctly -- left as BLKmode -- due to the existing "DECL_SIZE
(field) == 0" check.)

Tested (gcc/g++/libstdc++) with an E500 cross-compiler as configured
above. The newly-added test fails without the patch, and passes with. OK
to apply, or any comments?

Thanks,

Julian

ChangeLog

gcc/
* stor-layout.c (compute_record_mode): Handle zero-sized array
members.

gcc/testsuite/
* gcc.dg/packed-struct-mode-1.c: New.Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c	(revision 204674)
+++ gcc/stor-layout.c	(working copy)
@@ -1601,6 +1601,7 @@ compute_record_mode (tree type)
 		   && integer_zerop (TYPE_SIZE (TREE_TYPE (field)
 	  || ! host_integerp (bit_position (field), 1)
 	  || DECL_SIZE (field) == 0
+	  || integer_zerop (DECL_SIZE (field))
 	  || ! host_integerp (DECL_SIZE (field), 1))
 	return;
 
Index: gcc/testsuite/gcc.dg/packed-struct-mode-1.c
===
--- gcc/testsuite/gcc.dg/packed-struct-mode-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/packed-struct-mode-1.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99 -O2" } */
+
+#include 
+
+typedef struct {
+  char pad;
+  int arr[0];
+} __attribute__((packed)) str;
+
+str *
+foo (int* src)
+{
+  str *s = malloc (sizeof (str) + sizeof (int));
+  s->arr[0] = 0x12345678;
+  return s;
+}

[patch 3/3] Create gimple-iterator.h and gimple-walk.[ch] - config and testsuite

2013-11-13 Thread Andrew MacLeod


And finally the changes to the config and testsuite files.

	* config/aarch64/aarch64-builtins.c: Include gimple-iterator.h.
	* config/rs6000/rs6000.c: Include gimple-iterator.h and gimple-walk.h.
	* testsuite/g++.dg/plugin/selfassign.c: Include gimple-iterator.h.
	* testsuite/gcc.dg/plugin/selfassign.c: Likewise.

Index: config/aarch64/aarch64-builtins.c
===
*** config/aarch64/aarch64-builtins.c	(revision 204721)
--- config/aarch64/aarch64-builtins.c	(working copy)
***
*** 31,36 
--- 31,37 
  #include "diagnostic-core.h"
  #include "optabs.h"
  #include "gimple.h"
+ #include "gimple-iterator.h"
  
  enum aarch64_simd_builtin_type_mode
  {
Index: config/rs6000/rs6000.c
===
*** config/rs6000/rs6000.c	(revision 204721)
--- config/rs6000/rs6000.c	(working copy)
***
*** 52,57 
--- 52,59 
  #include "cfgloop.h"
  #include "sched-int.h"
  #include "gimplify.h"
+ #include "gimple-iterator.h"
+ #include "gimple-walk.h"
  #include "intl.h"
  #include "params.h"
  #include "tm-constrs.h"
Index: testsuite/g++.dg/plugin/selfassign.c
===
*** testsuite/g++.dg/plugin/selfassign.c	(revision 204721)
--- testsuite/g++.dg/plugin/selfassign.c	(working copy)
***
*** 11,16 
--- 11,17 
  #include "toplev.h"
  #include "basic-block.h"
  #include "gimple.h"
+ #include "gimple-iterator.h"
  #include "tree.h"
  #include "tree-pass.h"
  #include "intl.h"
Index: testsuite/gcc.dg/plugin/selfassign.c
===
*** testsuite/gcc.dg/plugin/selfassign.c	(revision 204721)
--- testsuite/gcc.dg/plugin/selfassign.c	(working copy)
***
*** 11,16 
--- 11,17 
  #include "toplev.h"
  #include "basic-block.h"
  #include "gimple.h"
+ #include "gimple-iterator.h"
  #include "tree.h"
  #include "tree-pass.h"
  #include "intl.h"

[patch 2/3] Create gimple-iterator.h and gimple-walk.[ch] - #include changes

2013-11-13 Thread Andrew MacLeod


This has the core compiler files #include changes



	* asan.c: Update Include list as required for gimple-iterator.h and
	gimple-walk.h.
	* cfgexpand.c: Likewise.
	* cfgloop.c: Likewise.
	* cfgloopmanip.c: Likewise.
	* cgraph.c: Likewise.
	* cgraphbuild.c: Likewise.
	* cgraphunit.c: Likewise.
	* gimple-fold.c: Likewise.
	* gimple-low.c: Likewise.
	* gimple-pretty-print.c: Likewise.
	* gimple-ssa-isolate-paths.c: Likewise.
	* gimple-ssa-strength-reduction.c: Likewise.
	* gimple-streamer-in.c: Likewise.
	* gimple-streamer-out.c: Likewise.
	* gimplify.c: Likewise.
	* graphite-blocking.c: Likewise.
	* graphite-clast-to-gimple.c: Likewise.
	* graphite-dependences.c: Likewise.
	* graphite-interchange.c: Likewise.
	* graphite-optimize-isl.c: Likewise.
	* graphite-poly.c: Likewise.
	* graphite-scop-detection.c: Likewise.
	* graphite-sese-to-poly.c: Likewise.
	* graphite.c: Likewise.
	* ipa-inline-analysis.c: Likewise.
	* ipa-profile.c: Likewise.
	* ipa-prop.c: Likewise.
	* ipa-pure-const.c: Likewise.
	* ipa-split.c: Likewise.
	* lto-streamer-in.c: Likewise.
	* lto-streamer-out.c: Likewise.
	* omp-low.c: Likewise.
	* predict.c: Likewise.
	* profile.c: Likewise.
	* sese.c: Likewise.
	* tracer.c: Likewise.
	* trans-mem.c: Likewise.
	* tree-call-cdce.c: Likewise.
	* tree-cfg.c: Likewise.
	* tree-cfgcleanup.c: Likewise.
	* tree-complex.c: Likewise.
	* tree-data-ref.c: Likewise.
	* tree-dfa.c: Likewise.
	* tree-eh.c: Likewise.
	* tree-emutls.c: Likewise.
	* tree-if-conv.c: Likewise.
	* tree-inline.c: Likewise.
	* tree-into-ssa.c: Likewise.
	* tree-loop-distribution.c: Likewise.
	* tree-nested.c: Likewise.
	* tree-nrv.c: Likewise.
	* tree-object-size.c: Likewise.
	* tree-outof-ssa.c: Likewise.
	* tree-parloops.c: Likewise.
	* tree-predcom.c: Likewise.
	* tree-profile.c: Likewise.
	* tree-scalar-evolution.c: Likewise.
	* tree-sra.c: Likewise.
	* tree-ssa-ccp.c: Likewise.
	* tree-ssa-coalesce.c: Likewise.
	* tree-ssa-copy.c: Likewise.
	* tree-ssa-copyrename.c: Likewise.
	* tree-ssa-dce.c: Likewise.
	* tree-ssa-dom.c: Likewise.
	* tree-ssa-dse.c: Likewise.
	* tree-ssa-forwprop.c: Likewise.
	* tree-ssa-ifcombine.c: Likewise.
	* tree-ssa-live.c: Likewise.
	* tree-ssa-loop-ch.c: Likewise.
	* tree-ssa-loop-im.c: Likewise.
	* tree-ssa-loop-ivcanon.c: Likewise.
	* tree-ssa-loop-ivopts.c: Likewise.
	* tree-ssa-loop-manip.c: Likewise.
	* tree-ssa-loop-niter.c: Likewise.
	* tree-ssa-loop-prefetch.c: Likewise.
	* tree-ssa-loop.c: Likewise.
	* tree-ssa-math-opts.c: Likewise.
	* tree-ssa-phiopt.c: Likewise.
	* tree-ssa-phiprop.c: Likewise.
	* tree-ssa-pre.c: Likewise.
	* tree-ssa-propagate.c: Likewise.
	* tree-ssa-reassoc.c: Likewise.
	* tree-ssa-sink.c: Likewise.
	* tree-ssa-strlen.c: Likewise.
	* tree-ssa-structalias.c: Likewise.
	* tree-ssa-tail-merge.c: Likewise.
	* tree-ssa-ter.c: Likewise.
	* tree-ssa-threadedge.c: Likewise.
	* tree-ssa-threadupdate.c: Likewise.
	* tree-ssa-uncprop.c: Likewise.
	* tree-ssa-uninit.c: Likewise.
	* tree-ssa.c: Likewise.
	* tree-stdarg.c: Likewise.
	* tree-switch-conversion.c: Likewise.
	* tree-tailcall.c: Likewise.
	* tree-vect-data-refs.c: Likewise.
	* tree-vect-generic.c: Likewise.
	* tree-vect-loop-manip.c: Likewise.
	* tree-vect-loop.c: Likewise.
	* tree-vect-patterns.c: Likewise.
	* tree-vect-slp.c: Likewise.
	* tree-vect-stmts.c: Likewise.
	* tree-vectorizer.c: Likewise.
	* tree-vrp.c: Likewise.
	* tree.c: Likewise.
	* tsan.c: Likewise.
	* value-prof.c: Likewise.
	* vtable-verify.c: Likewise.

Index: asan.c
===
*** asan.c	(revision 204721)
--- asan.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 24,29 
--- 24,30 
  #include "coretypes.h"
  #include "tree.h"
  #include "gimplify.h"
+ #include "gimple-iterator.h"
  #include "tree-iterator.h"
  #include "cgraph.h"
  #include "tree-ssanames.h"
Index: cfgexpand.c
===
*** cfgexpand.c	(revision 204721)
--- cfgexpand.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 31,36 
--- 31,38 
  #include "langhooks.h"
  #include "bitmap.h"
  #include "gimple.h"
+ #include "gimple-iterator.h"
+ #include "gimple-walk.h"
  #include "gimple-ssa.h"
  #include "cgraph.h"
  #include "tree-cfg.h"
Index: cfgloop.c
===
*** cfgloop.c	(revision 204721)
--- cfgloop.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 29,34 
--- 29,35 
  #include "flags.h"
  #include "tree.h"
  #include "gimple.h"
+ #include "gimple-iterator.h"
  #include "gimple-ssa.h"
  #include "pointer-set.h"
  #include "ggc.h"
Index: cfgloopmanip.c
===
*** cfgloopmanip.c	(revision 204721)
--- cfgloopmanip.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 26,31 
--- 26,32 
  #include "cfgloop.h"
  #include "tree.h"
  #

[PATCH] Fix PR ipa/58862 (overflow in edge_badness computation)

2013-11-13 Thread Teresa Johnson

The following fixes PR ipa/58862, which caused failures in lto
profiledbootstrap and in several spec cpu2006 profile-use builds.

Bootstrapped and tested on x86-64-unknown-linux-gnu. Also ensured that
it fixed the lto profiledbootstrap and cpu2006 failures. Ok for trunk?

Thanks,
Teresa

2013-11-13  Teresa Johnson  

PR ipa/58862
* ipa-inline.c (edge_badness): Fix overflow.

Index: ipa-inline.c
===
--- ipa-inline.c(revision 204703)
+++ ipa-inline.c(working copy)
@@ -909,7 +909,7 @@ edge_badness (struct cgraph_edge *edge, bool dump)
   /* Capping edge->count to max_count. edge->count can be larger than
 max_count if an inline adds new edges which increase max_count
 after max_count is computed.  */
-  int edge_count = edge->count > max_count ? max_count : edge->count;
+  gcov_type edge_count = edge->count > max_count ? max_count : edge->count;

   sreal_init (&relbenefit_real, relbenefit, 0);
   sreal_init (&growth_real, growth, 0);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] decide edge's hotness when there is profile info

2013-11-13 Thread Teresa Johnson

On Tue, Nov 12, 2013 at 2:35 PM, Jan Hubicka  wrote:
>> On Tue, Nov 12, 2013 at 1:33 PM, Jan Hubicka  wrote:
>> >> More info on the lto bootstrap failure:
>> >>
>> >> /usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c:790:1:
>> >> warning: Missing counts for called function pex_child_error.isra.1/73
>> >> [enabled by default]
>> >>  }
>> >>
>> >> This is an error handling routine that invokes _exit. Looking at the
>> >> ipa-profile dump, I can see that all the counts read in for
>> >> pex_child_error have the value 0. But there is a non-zero callsite,
>> >> that not surprisingly complains of a profile insanity in the bb's
>> >> outgoing edge weights, since pex_child_error never returns:
>> >>
>> >> ;;   basic block 38, loop depth 1, count 192, freq 5000
>> >> ;;   Invalid sum of outgoing counts 0, should be 192
>> >> ...
>> >>   pex_child_error.isra.1D.5005 (_92, executable_40(D),
>> >> [/usr/local/google/home/tejohnson/gcc_trunk_9/libiberty/pex-unix.c :
>> >> 677] "execv", _91);
>> >> ;;succ:   3 [100.0%]  
>> >> (ABNORMAL,DFS_BACK,IRREDUCIBLE_LOOP,EXECUTABLE)
>> >>
>> >> Not sure why the counts were all 0 for this routine though, I wouldn't
>> >> think the early exit should prevent us from updating the counts. But
>> >> IMO the best thing to do here is to issue a warning, since I don't
>> >> want more issues like this to cause compile errors when we handled
>> >> them fine before.
>> >>
>> >> Let me know if the patch below is ok.
>> >
>> > OK, so _exit actually makes us to miss streaming of the path leading to the
>> > error message?  Handling of vfork is somewhat broken, as I noticed with 
>> > Martin
>> > Liska. One problem is that gcov_exit is not clearing the counts.  With 
>> > vfork
>> > being used in addition to execvs we stream out the data but the counts 
>> > stays in
>> > the oriignal process memory and then are streamed again.  If execve fails 
>> > and
>> > leads to _exit I think we can get the miscompare you see.
>> >
>> > Does calling gcov_clear within gcov_exit help?  (It is what I have in our 
>> > tree)
>>
>> I thought this was already handled by the __gcov_execv wrapper around
>> execv which invokes __gcov_flush that in turns does a gcov_exit
>> followed by gcov_clear?
>>
>> I think the issue is that we register gcov_exit with atexit(), but
>> _exit does not execute any atexit functions by definition. So that
>> would explain why the counters from pex_child_error didn't get dumped.
>> The caller bb invokes execv before invoking pex_child_error:
>>   execv (executable, to_ptr32 (argv));
>>   pex_child_error (obj, executable, "execv", errno);
>> So the counters of the caller bb are dumped (hence the caller bb has
>> non-zero counts) and cleared, and the pex_child_error does not dump
>> its counters since it uses _exit instead of exit and therefore doesn't
>> invoke gcov_exit atexit.
>
> Hmm, I see, the problem here is that execv gets BB splitted (and profile is 
> correct)
> but before we get into ipa_profile the BBs get re-merged and we incorrectly 
> put
> count of 1 to pex_child_error.
>
> I suppose this will always happen when you have function terminating program
> (execve) before other function call.  Perhaps we can warn only when the 
> difference in counts
> is greater than number of train runs?

Ok, that sounds good. Here is the new patch. Is this ok for trunk if
testing (bootstrap regression and lto profiledbootstrap) succeeds?

Thanks,
Teresa

2013-11-13  Teresa Johnson  

* predict.c (drop_profile): Error is currently too strict.
(handle_missing_profiles): Pass call_count to drop_profile.

Index: predict.c
===
--- predict.c   (revision 204704)
+++ predict.c   (working copy)
@@ -2766,12 +2766,17 @@ estimate_loops (void)
 }

 /* Drop the profile for NODE to guessed, and update its frequency based on
-   whether it is expected to be HOT.  */
+   whether it is expected to be hot given the CALL_COUNT.  */

 static void
-drop_profile (struct cgraph_node *node, bool hot)
+drop_profile (struct cgraph_node *node, gcov_type call_count)
 {
   struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+  /* In the case where this was called by another function with a
+ dropped profile, call_count will be 0. Since there are no
+ non-zero call counts to this function, we don't know for sure
+ whether it is hot, and therefore it will be marked normal below.  */
+  bool hot = maybe_hot_count_p (NULL, call_count);

   if (dump_file)
 fprintf (dump_file,
@@ -2781,8 +2786,13 @@ static void
   /* We only expect to miss profiles for functions that are reached
  via non-zero call edges in cases where the function may have
  been linked from another module or library (COMDATs and extern
- templates). See the comments below for handle_missing_profiles.  */
-  if (!DECL_COMDAT (node->decl) && !DECL_EXTERNAL (node->decl))
+ templates). See the comments be

[v3] Missing uglification

2013-11-13 Thread Marc Glisse


Bootstrap and testsuite on x86_64-unknown-linux-gnu.
The main other issue in that PR will require a UDL specialist.

2013-11-13  Marc Glisse  

PR libstdc++/59087
* include/ext/pod_char_traits.h: Uglify V, I and S.

--
Marc GlisseIndex: include/ext/pod_char_traits.h
===
--- include/ext/pod_char_traits.h   (revision 204739)
+++ include/ext/pod_char_traits.h   (working copy)
@@ -38,27 +38,27 @@
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // POD character abstraction.
   // NB: The char_type parameter is a subset of int_type, as to allow
   // int_type to properly hold the full range of char_type values as
   // well as EOF.
   /// @brief A POD class that serves as a character abstraction class.
-  template
+  template
 struct character
 {
-  typedef Vvalue_type;
-  typedef Iint_type;
-  typedef Sstate_type;
-  typedef character   char_type;
+  typedef _Value   value_type;
+  typedef _Int int_type;
+  typedef _St  state_type;
+  typedef character<_Value, _Int, _St> char_type;
 
   value_type   value;
 
   template
 static char_type
 from(const V2& v)
 {
  char_type ret = { static_cast(v) };
  return ret;
}
@@ -66,46 +66,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 static V2
 to(const char_type& c)
 {
  V2 ret = { static_cast(c.value) };
  return ret;
}
 
 };
 
-  template
+  template
 inline bool
-operator==(const character& lhs, const character& rhs)
+operator==(const character<_Value, _Int, _St>& lhs,
+  const character<_Value, _Int, _St>& rhs)
 { return lhs.value == rhs.value; }
 
-  template
+  template
 inline bool
-operator<(const character& lhs, const character& rhs)
+operator<(const character<_Value, _Int, _St>& lhs,
+ const character<_Value, _Int, _St>& rhs)
 { return lhs.value < rhs.value; }
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// char_traits<__gnu_cxx::character> specialization.
-  template
-struct char_traits<__gnu_cxx::character >
+  template
+struct char_traits<__gnu_cxx::character<_Value, _Int, _St> >
 {
-  typedef __gnu_cxx::characterchar_type;
-  typedef typename char_type::int_type int_type;
-  typedef typename char_type::state_type   state_type;
-  typedef fpos pos_type;
-  typedef streamoffoff_type;
+  typedef __gnu_cxx::character<_Value, _Int, _St>  char_type;
+  typedef typename char_type::int_type int_type;
+  typedef typename char_type::state_type   state_type;
+  typedef fpos pos_type;
+  typedef streamoffoff_type;
 
   static void
   assign(char_type& __c1, const char_type& __c2)
   { __c1 = __c2; }
 
   static bool
   eq(const char_type& __c1, const char_type& __c2)
   { return __c1 == __c2; }
 
   static bool

Re: [PATCH] Add check for aarch64 in vect_cmdline_needed

2013-11-13 Thread Marcus Shawcroft

On 7 November 2013 15:20, Cesar Philippidis  wrote:
> On 11/6/13, 5:06 PM, Joseph S. Myers wrote:
>
>> You should be testing aarch64*-*-* so as to match aarch64_be targets.
>
> Thank you for catching that. Please commit this new patch if is OK. I
> don't have SVN access.

Applied as 204745 thanks.

/Marcus

Re: [buildrobot] [PATCH] c6x.c: `mark_addressable' not declared

2013-11-13 Thread Andrew MacLeod


On 11/13/2013 06:09 AM, Richard Biener wrote:

On Wed, Nov 13, 2013 at 11:18 AM, Jan-Benedict Glaw  wrote:

Hi Andrew!

Some fallout for tic6x-uclinux:

g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
-I/home/jbglaw/repos/gcc/gcc/../include 
-I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
-I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
-I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o c6x.o -MT c6x.o -MMD -MP -MF 
./.deps/c6x.TPo /home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c: In function ‘bool 
c6x_expand_movmem(rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*, rtx_def*)’:
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c:1723: error: ‘mark_addressable’ was 
not declared in this scope
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c:1725: error: ‘mark_addressable’ was 
not declared in this scope
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c: In function ‘rtx_def* 
c6x_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)’:
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c:6654: warning: comparison between 
signed and unsigned integer expressions
/home/jbglaw/repos/gcc/gcc/config/c6x/c6x.c:6659: warning: comparison between 
signed and unsigned integer expressions
make[1]: *** [c6x.o] Error 1

(See http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=28623 .
The signed-unsigned is actually old, will not touch it right now.)

This is needed to make it build:


2013-11-13  Jan-Benedict Glaw  

 * config/c6x/c6x.c: Include "gimple-expr.h".

diff --git a/gcc/config/c6x/c6x.c b/gcc/config/c6x/c6x.c
index a7c3683..a37e02f 100644
--- a/gcc/config/c6x/c6x.c
+++ b/gcc/config/c6x/c6x.c
@@ -52,6 +52,7 @@
  #include "hw-doloop.h"
  #include "regrename.h"
  #include "dumpfile.h"
+#include "gimple-expr.h"

  /* Table of supported architecture variants.  */
  typedef struct


Ok?

Ok.

Thanks,
Richard.




Thanks!
Andrew

Re: [patch 3/4] Separate gimple.[ch] and gimplify.[ch] - front end files

2013-11-13 Thread Andrew MacLeod


On 11/13/2013 04:40 AM, Richard Biener wrote:

On Mon, Nov 11, 2013 at 10:07 PM, Andrew MacLeod  wrote:

This one covers the front end files which included gimple.h

Bootstraps on x86_64-unknown-linux-gnu with no new regressions.  OK?

* c-family/c-omp.c: Include gimple-expr.h instead of gimple.h.

can you explain why gimple-expr.h is not as bad as gimple.h?
(gimple-expr.h sounds oxymoronish ... but I didn't follow the thread
that ended up creating this beast, it seems a better matching name
would be gimple-tree.h ... haha).

Otherwise gimple.h -> gimplify.h indeed looks like an improvement.


There needs to be a place which has gimple componentry  that is not 
related to or require a statement.  gimple.h is becoming  the home for 
just 0gimple statements. There are 3 (for the moment) major classes 
of things that are in statements and are also used by other parts of the 
compiler  .. Types, Decls, and Expressions.  I could have split it into 
those 3 files right now, but it didn't seem like that granularity was 
needed yet.


I was going to call it gimple-decl.h since most of the things are decl 
related... but I figured that eventually there will be all 3 files, and 
its likely that gimple-expr.h will eventually include gimple-type.h and 
gimple-decl.,h so just include gimple-expr.h now and have less include 
turmoil eventually.   Perhaps that wont be the case and  gimple-decl.h 
may have been a more appropriate name for now.


Its true that gimple-tree would in fact be a more appropriate name at 
the moment, but these gimple-* files are the core ones I'll be changing 
first, so the tree part would no longer be meaningful.  the 'expr' part 
is suppose to represent the abstract purpose...  The stuff required to 
represent an expression in gimple IL.  And yes, that is currently a tree :-)


Andrew

Re: Some wide-int review comments

2013-11-13 Thread Richard Sandiford

Kenneth Zadeck  writes:
>>  From fold-const.c:
>>
>> @@ -13686,14 +13548,17 @@ fold_binary_loc (location_t loc,
>>break;
>>  }
>>   
>> -else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
>> - && TREE_INT_CST_LOW (arg1) == signed_max_lo
>> +else if (wi::eq_p (arg1, signed_max)
>>   && TYPE_UNSIGNED (arg1_type)
>> + /* KENNY QUESTIONS THE CHECKING OF THE BITSIZE
>> +HERE.  HE FEELS THAT THE PRECISION SHOULD BE
>> +CHECKED */
>> +
>>   /* We will flip the signedness of the comparison operator
>>  associated with the mode of arg1, so the sign bit is
>>  specified by this mode.  Check that arg1 is the signed
>>  max associated with this sign bit.  */
>> - && width == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
>> + && prec == GET_MODE_BITSIZE (TYPE_MODE (arg1_type))
>>   /* signed_type does not work on pointer types.  */
>>   && INTEGRAL_TYPE_P (arg1_type))
>>{
>>
>> Looks like it should be resolved one way or the other before the merge.
> i am the one who asked the question.

OK, but this sort of thing should be handled separately from the wide-int
conversion, e.g. by a question to gcc@ or a patch to gcc-patches@ that
adds a "??? ..."-style comment.

>>  From gcse.c:
>>
>> --- wide-int-base/gcc/gcc/gcse.c 2013-11-05 13:09:32.148376180 +
>> +++ wide-int/gcc/gcc/gcse.c  2013-11-05 13:07:28.431495118 +
>> @@ -1997,6 +1997,13 @@ prune_insertions_deletions (int n_elems)
>>  bitmap_clear_bit (pre_delete_map[i], j);
>>   }
>>   
>> +  if (dump_file)
>> +{
>> +  dump_bitmap_vector (dump_file, "pre_insert_map", "", pre_insert_map, 
>> n_edges);
>> +  dump_bitmap_vector (dump_file, "pre_delete_map", "", pre_delete_map,
>> +   last_basic_block);
>> +}
>> +
>> sbitmap_free (prune_exprs);
>> free (insertions);
>> free (deletions);
>>
>> This doesn't look related.
> when we were a/b tests with trunk we added this to trunk also. This this 
> is useful, but it could also be removed.
>
>> 
>>
>>  From lcm.c:
>>
>> diff -udpr '--exclude=.svn' '--exclude=.pc' '--exclude=patches' 
>> wide-int-base/gcc/gcc/lcm.c wide-int/gcc/gcc/lcm.c
>> --- wide-int-base/gcc/gcc/lcm.c  2013-08-22 09:00:23.068716382 +0100
>> +++ wide-int/gcc/gcc/lcm.c   2013-10-26 13:19:16.287277520 +0100
>> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
>>   #include "sbitmap.h"
>>   #include "dumpfile.h"
>>   
>> +#define LCM_DEBUG_INFO 1
>>   /* Edge based LCM routines.  */
>>   static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, 
>> sbitmap *);
>>   static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap 
>> *,
>> @@ -106,6 +107,7 @@ compute_antinout_edge (sbitmap *antloc,
>> /* We want a maximal solution, so make an optimistic initialization of
>>ANTIN.  */
>> bitmap_vector_ones (antin, last_basic_block);
>> +  bitmap_vector_clear (antout, last_basic_block);
>>   
>> /* Put every block on the worklist; this is necessary because of the
>>optimistic initialization of ANTIN above.  */
>> @@ -432,6 +434,7 @@ pre_edge_lcm (int n_exprs, sbitmap *tran
>>   
>> /* Allocate an extra element for the exit block in the laterin vector.  
>> */
>> laterin = sbitmap_vector_alloc (last_basic_block + 1, n_exprs);
>> +  bitmap_vector_clear (laterin, last_basic_block);
>> compute_laterin (edge_list, earliest, antloc, later, laterin);
>>   
>>   #ifdef LCM_DEBUG_INFO
> this is less so.   it was added for debugging.   But the problem was 
> that the bitvectors were never initialized.it could be argued that 
> in correct code that this would not matter unless you actually were 
> looking the values for debugging which we were doing.I would 
> certainly say that you could remove the define, but others should 
> comment on removing the clears.

But here too I think this kind of change should be separate from the
wide-int conversion.  If you'd like to see some of these changes made
then please submit a patch against mainline.

>> +/* Returns 1 if OP is an operand that is a CONST_WIDE_INT of mode
>> +   MODE.  This most likely is not as useful as
>> +   const_scalar_int_operand since it does not accept CONST_INTs, but
>> +   is here for consistancy.  */
>> +int
>> +const_wide_int_operand (rtx op, enum machine_mode mode)
>> +{
>> +  if (!CONST_WIDE_INT_P (op))
>> +return 0;
>> +
>> +  return const_scalar_int_operand (op, mode);
>> +}
>>
>> Hmm, but have you found any use cases?  Like you say in the comment,
>> it just seems wrong to require a CONST_WIDE_INT over a CONST_INT,
>> since that's a host-dependent choice.  I think we should drop this
>> from the initial merge.
> no, but i have only converted the ppc to use this so

1 2 >

1 - 100 of 154 matches

Mail list logo