Commit: Xstormy16: Add modes to post_inc and pre_dec patterns

2014-02-21 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to add modes to the POST_INC and PRE_DEC
  patterns in the XStormy16 backend.  The lack of the modes was leading
  to some build problems.

Cheers
  Nick

gcc/ChangeLog
2014-02-21  Nick Clifton  ni...@redhat.com

* config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc.
(pushhi1): Likewise.
(popqi1): Add mode to pre_dec.
(pophi1): Likewise.

Index: gcc/config/stormy16/stormy16.md
===
--- gcc/config/stormy16/stormy16.md (revision 207983)
+++ gcc/config/stormy16/stormy16.md (working copy)
@@ -114,7 +114,7 @@
 ;; insns like this one are never generated.
 
 (define_insn pushqi1
-  [(set (mem:QI (post_inc (reg:HI 15)))
+  [(set (mem:QI (post_inc:HI (reg:HI 15)))
(match_operand:QI 0 register_operand r))]
   
   push %0
@@ -123,7 +123,7 @@
 
 (define_insn popqi1
   [(set (match_operand:QI 0 register_operand =r)
-   (mem:QI (pre_dec (reg:HI 15]
+   (mem:QI (pre_dec:HI (reg:HI 15]
   
   pop %0
   [(set_attr psw_operand nop)
@@ -168,7 +168,7 @@
(set_attr psw_operand 0,0,0,0,nop,0,nop,0,0)])
 
 (define_insn pushhi1
-  [(set (mem:HI (post_inc (reg:HI 15)))
+  [(set (mem:HI (post_inc:HI (reg:HI 15)))
(match_operand:HI 0 register_operand r))]
   
   push %0
@@ -177,7 +177,7 @@
 
 (define_insn pophi1
   [(set (match_operand:HI 0 register_operand =r)
-   (mem:HI (pre_dec (reg:HI 15]
+   (mem:HI (pre_dec:HI (reg:HI 15]
   
   pop %0
   [(set_attr psw_operand nop)


Re: [PATCH] Fix PR c++/60065.

2014-02-21 Thread Adam Butcher

On 2014-02-20 16:18, Jason Merrill wrote:

On 02/19/2014 10:00 PM, Adam Butcher wrote:

+  if (current_template_parms)
+{
+  cp_binding_level *maybe_tmpl_scope = 
current_binding_level-level_chain;
+  while (maybe_tmpl_scope  maybe_tmpl_scope-kind == 
sk_class)

+   maybe_tmpl_scope = maybe_tmpl_scope-level_chain;
+  if (maybe_tmpl_scope  maybe_tmpl_scope-kind == 
sk_template_parms)

+   declaring_template_p = true;
+}


Won't this return true for a member function of a class template?  
i.e.


template class T
struct A {
  void f(auto x);
};

Yes I think you're right.  I was thinking about that yesterday but 
hadn't had a chance to get to my PC to check or post a reply.  The 
intent is to deal with out-of-line implicit member templates.  But I 
think the issue is more complex; and I think it may be true for the 
synthesize code as well as this new code.


A class template with an out-of-line generic function definition will 
give the same issue I think:


  template typename T
  void AT::f(auto x) {}  // should inject a new list

It needs to know when to extend a function template parameter list and 
when to insert a new one.  Another case:


  struct B
  {
template int N
void f(auto x);
  };

  template int N
  void B::f(auto x) {}  // should extend existing inner list

And also:

  template typename T
  struct C
  {
template int N
void f(auto x);
  };

  template typename T
  template int N
  void CT::f(auto x) {}  // should extend existing inner list

Obviously there is an arbitrary depth of class and class templates.

Need to look further into it when I get some more time.


Once it's resolved I think it'd be useful to create a new function to 
determine this rather than doing the scope walk in a number of places.  
Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully 
some more elegant name!




Why doesn't num_template_parameter_lists work as a predicate here?

It works in the lambda case as it is updated there, but for generic 
functions I think the following prevents it:


  cp/parser.c:17063:

  /* Inside the function parameter list, surrounding
 template-parameter-lists do not apply.  */
  saved_num_template_parameter_lists
= parser-num_template_parameter_lists;
  parser-num_template_parameter_lists = 0;

  begin_scope (sk_function_parms, NULL_TREE);

  /* Parse the parameter-declaration-clause.  */
  params = cp_parser_parameter_declaration_clause (parser);

  /* Restore saved template parameter lists accounting for 
implicit

 template parameters.  */
  parser-num_template_parameter_lists
+= saved_num_template_parameter_lists;


Cheers,
Adam



[PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Jakub Jelinek
Hi!

As discussed in the PR, on larger functions we can end up with
over 3 million of compute_control_dep_chain nested calls from
a single compute_control_dep_chain call, on that testcase all that
effort just to get zero or at most one (useless) control dep path.
The problem is that the function is really unbound, even with the
6 element path length limitation (recursion depth) and the limit of 8
find_pdom calls - everything still iterates on all the successor edges at
each level.  And, the function is often called on the same basic block
again and again, even at a particular depth level (e.g. over 20 times
same bb same depth level).  But the preceeding edge list is slightly
different in each case and in theory it could give different answers.

Fixed by bounding the total number of nested calls.

Additionally, I've made a couple of cleanups, heap allocating 8 field array
instead of using an automatic array makes no sense, the chain length is at
most 6 and thus we can use a stack vector, etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-02-21  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/56490
* params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
* tree-ssa-uninit.c: Include params.h.
(compute_control_dep_chain): Add num_calls argument, return false
if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
num_calls to recursive call.
(find_predicates): Change dep_chain into normal array,
cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls
variable and adjust compute_control_dep_chain caller.
(find_def_preds): Likewise.

--- gcc/params.def.jj   2014-01-09 19:09:47.0 +0100
+++ gcc/params.def  2014-02-20 19:30:37.467597338 +0100
@@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
  asan-use-after-return,
  Enable asan builtin functions protection,
  1, 0, 1)
+
+DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
+ uninit-control-dep-attempts,
+ Maximum number of nested calls to search for control dependencies 
+ during uninitialized variable analysis,
+ 1000, 1, 0)
 /*
 
 Local variables:
--- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100
+++ gcc/tree-ssa-uninit.c   2014-02-20 19:31:14.198385817 +0100
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
 #include hashtab.h
 #include tree-pass.h
 #include diagnostic-core.h
+#include params.h
 
 /* This implements the pass that does predicate aware warning on uses of
possibly uninitialized variables. The pass first collects the set of
@@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb
 
 /* Computes the control dependence chains (paths of edges)
for DEP_BB up to the dominating basic block BB (the head node of a
-   chain should be dominated by it).  CD_CHAINS is pointer to a
-   dynamic array holding the result chains. CUR_CD_CHAIN is the current
+   chain should be dominated by it).  CD_CHAINS is pointer to an
+   array holding the result chains.  CUR_CD_CHAIN is the current
chain being computed.  *NUM_CHAINS is total number of chains.  The
function returns true if the information is successfully computed,
return false if there is no control dependence or not computed.  */
@@ -400,7 +401,8 @@ static bool
 compute_control_dep_chain (basic_block bb, basic_block dep_bb,
vecedge *cd_chains,
size_t *num_chains,
-   vecedge *cur_cd_chain)
+  vecedge *cur_cd_chain,
+  int *num_calls)
 {
   edge_iterator ei;
   edge e;
@@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
   if (EDGE_COUNT (bb-succs)  2)
 return false;
 
+  if (*num_calls  PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
+return false;
+  ++*num_calls;
+
   /* Could use a set instead.  */
   cur_chain_len = cur_cd_chain-length ();
   if (cur_chain_len  MAX_CHAIN_LEN)
@@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b
 
   /* Now check if DEP_BB is indirectly control dependent on BB.  */
   if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
- num_chains, cur_cd_chain))
+num_chains, cur_cd_chain, num_calls))
 {
   found_cd_chain = true;
   break;
@@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
  basic_block use_bb)
 {
   size_t num_chains = 0, i;
-  vecedge *dep_chains = 0;
-  vecedge cur_chain = vNULL;
+  int num_calls = 0;
+  vecedge dep_chains[MAX_NUM_CHAINS];
+  auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain;
   bool has_valid_pred = false;
   basic_block cd_root = 0;
 
-  typedef vecedge vec_edge_heap;
-  dep_chains = XCNEWVEC (vec_edge_heap, MAX_NUM_CHAINS);
-
   /* First find the closest bb that is control equivalent to 

Re: [RFA/dwarf v2] Add DW_AT_GNAT_use_descriptive_type flag for Ada units.

2014-02-21 Thread Joel Brobecker
Hello,

Would anyone be able to (re-)approve this patch, please?

It should be really really straightforward, and only adds a DWARF
flag to Ada Compilation Units, so I should think that the risk
is near zero. I've tested the patch as usual regardless.

Parallel to that, we have also started working on producing
standard DWARF in place of our encodings, and small progress has been
made. But this is even more of a huge task than we thought, and in
the meantime, this little flag will help non-AdaCore users...

Thank you!

On Fri, Jan 31, 2014 at 09:09:05AM +0400, Joel Brobecker wrote:
 On Tue, Feb 19, 2013 at 10:50:46PM -0500, Jason Merrill wrote:
  On 02/19/2013 10:42 PM, Joel Brobecker wrote:
  This is useful when a DIE does not have a descriptive type attribute.
  In that case, the debugger needs to determine whether the unit
  was compiled with a compiler that normally provides that information,
  or not.
  
  Ah.  OK, then.  But I'd prefer to call it
  DW_AT_GNAT_use_descriptive_type, to follow the convention of keeping
  the vendor tag at the beginning of the name.
 
 Almost a year ago, you privately approved a small patch of mine,
 with the small request above. I'm sorry I let it drag so long!
 Here is the updated patch.
 
 include/ChangeLog:
 
 * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into
 DW_AT_GNAT_use_descriptive_type.
 
 gcc/ChangeLog:
 
 * dwarf2out.c (gen_compile_unit_die): Add
 DW_AT_use_GNAT_descriptive_type attribute for Ada units.
 
 Tested on x86_64-linux.
 
 I should also adjust the Wiki page accordingly, but the login process
 keeps timing out. I know I have the right login and passwd since
 I succesfully reset them using the passwd recovery procedure, just
 in case the error was due to bad credentials. I'll try again later.
 
 If approved, I will also take care of coordinating the dwarf2.def
 change with binutils-gdb.git.
 
 Is this patch still OK to commit?
 
 Thank you,
 -- 
 Joel

 From 7aae3721addf6905113d9f0287a5cbb5301a462b Mon Sep 17 00:00:00 2001
 From: Joel Brobecker brobec...@adacore.com
 Date: Thu, 3 Jan 2013 09:25:12 -0500
 Subject: [PATCH] [dwarf] Add DW_AT_GNAT_use_descriptive_type flag for Ada 
 units.
 
 This patch first renames the DW_AT_use_GNAT_descriptive_type DWARF
 attribute into DW_AT_GNAT_use_descriptive_type to better follow
 the usual convention of keeping the vendor tag at the beginning
 of the name.
 
 It then modifies dwadrf2out to generate this attribute for Ada units.
 
 include/ChangeLog:
 
 * dwarf2.def: Rename DW_AT_use_GNAT_descriptive_type into
 DW_AT_GNAT_use_descriptive_type.
 
 gcc/ChangeLog:
 
 * dwarf2out.c (gen_compile_unit_die): Add
 DW_AT_use_GNAT_descriptive_type attribute for Ada units.
 ---
  gcc/dwarf2out.c|4 
  include/dwarf2.def |2 +-
  2 files changed, 5 insertions(+), 1 deletions(-)
 
 diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
 index d1ca4ba..057605c 100644
 --- a/gcc/dwarf2out.c
 +++ b/gcc/dwarf2out.c
 @@ -19318,6 +19318,10 @@ gen_compile_unit_die (const char *filename)
/* The default DW_ID_case_sensitive doesn't need to be specified.  */
break;
  }
 +
 +  if (language == DW_LANG_Ada95)
 +add_AT_flag (die, DW_AT_GNAT_use_descriptive_type, 1);
 +
return die;
  }
  
 diff --git a/include/dwarf2.def b/include/dwarf2.def
 index 71a37b3..4dd636e 100644
 --- a/include/dwarf2.def
 +++ b/include/dwarf2.def
 @@ -398,7 +398,7 @@ DW_AT (DW_AT_VMS_rtnbeg_pd_address, 0x2201)
  /* GNAT extensions.  */
  /* GNAT descriptive type.
 See http://gcc.gnu.org/wiki/DW_AT_GNAT_descriptive_type .  */
 -DW_AT (DW_AT_use_GNAT_descriptive_type, 0x2301)
 +DW_AT (DW_AT_GNAT_use_descriptive_type, 0x2301)
  DW_AT (DW_AT_GNAT_descriptive_type, 0x2302)
  /* UPC extension.  */
  DW_AT (DW_AT_upc_threads_scaled, 0x3210)
 -- 
 1.7.0.4
 


-- 
Joel


Re: [Patch, Fortran, OOP, Regression] PR 60234: ICE in generate_finalization_wrapper at fortran/class.c:1883

2014-02-21 Thread Janus Weil
2014-02-21 8:25 GMT+01:00 Tobias Burnus bur...@net-b.de:
 Hi Janus,

 Janus Weil wrote:

 What the patch does is to defer the building of the vtabs to a later
 stage. Previously this was done only for some rare cases, now we do it
 basically for all vtabs. This is necessary with finalization, since
 building the vtab also implies building the finalization wrapper, for
 which it is necessary that the finalizers have been resolved.

 Anyway, the patch regtests cleanly on x86_64-unknown-linux-gnu. Ok for
 trunk?


 Looks good to me.

 Does

  comp_is_finalizable (gfc_component *comp)
  {
 -  if (comp-attr.allocatable  comp-ts.type != BT_CLASS)
 +  if (comp-attr.proc_pointer)
 +return false;
 +  else if (comp-attr.allocatable  comp-ts.type != BT_CLASS)

 fix an other PR - or did you just spot it when looking at the code? It it
 certainly simple, correct and should go in.

this became necessary after the vtab changes (although I don't
remember which test case triggered it). comp_is_finalizable is called
(more or less directly) from generate_finalization_wrapper. Since the
latter was called too early, the problem with PPCs was not triggered
previously, it seems.

I have committed the patch as r207986. Thanks for the review!

Cheers,
Janus



 2014-02-20  Janus Weil  ja...@gcc.gnu.org

  PR fortran/60234
  * gfortran.h (gfc_build_class_symbol): Removed argument.
  * class.c (gfc_add_component_ref): Fix up missing vtype if necessary.
  (gfc_build_class_symbol): Remove argument 'delayed_vtab'. vtab is
 always
  delayed now, except for unlimited polymorphics.
  (comp_is_finalizable): Procedure pointer components are not
 finalizable.
  * decl. (build_sym, build_struct, attr_decl1): Removed argument of
  'gfc_build_class_symbol'.
  * match.c (copy_ts_from_selector_to_associate, select_type_set_tmp):
  Ditto.
  * symbol.c (gfc_set_default_type): Ditto.


 2014-02-20  Janus Weil  ja...@gcc.gnu.org

  PR fortran/60234
  * gfortran.dg/finalize_23.f90: New.


Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Jakub Jelinek wrote:

 Hi!
 
 As discussed in the PR, on larger functions we can end up with
 over 3 million of compute_control_dep_chain nested calls from
 a single compute_control_dep_chain call, on that testcase all that
 effort just to get zero or at most one (useless) control dep path.
 The problem is that the function is really unbound, even with the
 6 element path length limitation (recursion depth) and the limit of 8
 find_pdom calls - everything still iterates on all the successor edges at
 each level.  And, the function is often called on the same basic block
 again and again, even at a particular depth level (e.g. over 20 times
 same bb same depth level).  But the preceeding edge list is slightly
 different in each case and in theory it could give different answers.
 
 Fixed by bounding the total number of nested calls.
 
 Additionally, I've made a couple of cleanups, heap allocating 8 field array
 instead of using an automatic array makes no sense, the chain length is at
 most 6 and thus we can use a stack vector, etc.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2014-02-21  Jakub Jelinek  ja...@redhat.com
 
   PR tree-optimization/56490
   * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
   * tree-ssa-uninit.c: Include params.h.
   (compute_control_dep_chain): Add num_calls argument, return false
   if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
   num_calls to recursive call.
   (find_predicates): Change dep_chain into normal array,
   cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls
   variable and adjust compute_control_dep_chain caller.
   (find_def_preds): Likewise.
 
 --- gcc/params.def.jj 2014-01-09 19:09:47.0 +0100
 +++ gcc/params.def2014-02-20 19:30:37.467597338 +0100
 @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
   asan-use-after-return,
   Enable asan builtin functions protection,
   1, 0, 1)
 +
 +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
 +   uninit-control-dep-attempts,
 +   Maximum number of nested calls to search for control dependencies 
 +   during uninitialized variable analysis,
 +   1000, 1, 0)
  /*
  
  Local variables:
 --- gcc/tree-ssa-uninit.c.jj  2014-02-04 01:35:58.0 +0100
 +++ gcc/tree-ssa-uninit.c 2014-02-20 19:31:14.198385817 +0100
 @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
  #include hashtab.h
  #include tree-pass.h
  #include diagnostic-core.h
 +#include params.h
  
  /* This implements the pass that does predicate aware warning on uses of
 possibly uninitialized variables. The pass first collects the set of
 @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb
  
  /* Computes the control dependence chains (paths of edges)
 for DEP_BB up to the dominating basic block BB (the head node of a
 -   chain should be dominated by it).  CD_CHAINS is pointer to a
 -   dynamic array holding the result chains. CUR_CD_CHAIN is the current
 +   chain should be dominated by it).  CD_CHAINS is pointer to an
 +   array holding the result chains.  CUR_CD_CHAIN is the current
 chain being computed.  *NUM_CHAINS is total number of chains.  The
 function returns true if the information is successfully computed,
 return false if there is no control dependence or not computed.  */
 @@ -400,7 +401,8 @@ static bool
  compute_control_dep_chain (basic_block bb, basic_block dep_bb,
 vecedge *cd_chains,
 size_t *num_chains,
 -   vecedge *cur_cd_chain)
 +vecedge *cur_cd_chain,
 +int *num_calls)
  {
edge_iterator ei;
edge e;
 @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
if (EDGE_COUNT (bb-succs)  2)
  return false;
  
 +  if (*num_calls  PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
 +return false;
 +  ++*num_calls;
 +
/* Could use a set instead.  */
cur_chain_len = cur_cd_chain-length ();
if (cur_chain_len  MAX_CHAIN_LEN)
 @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b
  
/* Now check if DEP_BB is indirectly control dependent on BB.  */
if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
 - num_chains, cur_cd_chain))
 +  num_chains, cur_cd_chain, num_calls))
  {
found_cd_chain = true;
break;
 @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
   basic_block use_bb)
  {
size_t num_chains = 0, i;
 -  vecedge *dep_chains = 0;
 -  vecedge cur_chain = vNULL;
 +  int num_calls = 0;
 +  vecedge dep_chains[MAX_NUM_CHAINS];
 +  auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain;
bool has_valid_pred = false;
basic_block cd_root = 0;
  
 -  typedef vecedge 

Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk

2014-02-21 Thread Thomas Schwinge
Hi!

On Fri, 15 Nov 2013 14:44:45 -0700, Aldy Hernandez al...@redhat.com wrote:
 Attached is the final version of the patch I have committed to trunk.

 --- a/gcc/gimple-pretty-print.c
 +++ b/gcc/gimple-pretty-print.c
 @@ -1118,6 +1118,8 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, 
 int spc, int flags)
   case GF_OMP_FOR_KIND_SIMD:
 kind =  simd;
 break;
 + case GF_OMP_FOR_KIND_CILKSIMD:
 +   kind =  cilksimd;
   case GF_OMP_FOR_KIND_DISTRIBUTE:
 kind =  distribute;
 break;

Fixed (untested, but obvious) in r207987:

commit b12563e00026b48b817fd3532fc3df2db2a0f460
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Feb 21 09:18:15 2014 +

Correct TDF_RAW pretty-printing of GIMPLE_OMP_FOR's 
GF_OMP_FOR_KIND_CILKSIMD.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_for) [flags  TDF_RAW]
case GF_OMP_FOR_KIND_CILKSIMD: Add missing break statement.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@207987 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog gcc/ChangeLog
index 67299af..cc9031b 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-21  Thomas Schwinge  tho...@codesourcery.com
+
+   * gimple-pretty-print.c (dump_gimple_omp_for) [flags  TDF_RAW]
+   case GF_OMP_FOR_KIND_CILKSIMD: Add missing break statement.
+
 2014-02-21  Nick Clifton  ni...@redhat.com
 
* config/stormy16/stormy16.md (pushdqi1): Add mode to post_inc.
diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c
index 2d1e1c7..741cd92 100644
--- gcc/gimple-pretty-print.c
+++ gcc/gimple-pretty-print.c
@@ -1121,6 +1121,7 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple gs, 
int spc, int flags)
  break;
case GF_OMP_FOR_KIND_CILKSIMD:
  kind =  cilksimd;
+ break;
case GF_OMP_FOR_KIND_DISTRIBUTE:
  kind =  distribute;
  break;


Grüße,
 Thomas


pgpOQlUIk9VU2.pgp
Description: PGP signature


[PATCH][1/n] Improve PR60291

2014-02-21 Thread Richard Biener

This improves compile-time of PR60291 at -O1 from 210s to 85s,
getting remove unused locals out of the profile.  There walking
DECL_INITIAL of globals is quadratic when that is refered to from
multiple functions.  We've had the same issue with
add_referenced_vars when that still existed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
and branch?

I've verified that I can still properly debug

int **
foo (void)
{
  static int a = 0;
  static int *b = a;
  static int **c = b;
  return c;
}
int main()
{
  return **foo();
}

(step into foo and print a, b and c).  Note that even with 4.8
right now

int **
foo (void)
{
  int **q;
{
  static int a = 0;
  static int *b = a;
  static int **c = b;
  q = c;
}
  return q;
}
int main()
{
  return **foo();
}

is broken (with -O1 -fno-inline, with inlining both cases are
broken).  But that all doesn't regress with the following and
if we fix it then we should fix it another way, not by walking
global initializers.

Thanks,
Richard.

2014-02-21  Richard Biener  rguent...@suse.de

PR middle-end/60291
* tree-ssa-live.c (mark_all_vars_used_1): Do not walk
DECL_INITIAL.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c (revision 207960)
--- gcc/tree-ssa-live.c (working copy)
*** mark_all_vars_used_1 (tree *tp, int *wal
*** 432,443 
/* Only need to mark VAR_DECLS; parameters and return results are not
   eliminated as unused.  */
if (TREE_CODE (t) == VAR_DECL)
! {
!   /* When a global var becomes used for the first time also walk its
!  initializer (non global ones don't have any).  */
!   if (set_is_used (t)  is_global_var (t))
!   mark_all_vars_used (DECL_INITIAL (t));
! }
/* remove_unused_scope_block_p requires information about labels
   which are not DECL_IGNORED_P to tell if they might be used in the IL.  */
else if (TREE_CODE (t) == LABEL_DECL)
--- 432,438 
/* Only need to mark VAR_DECLS; parameters and return results are not
   eliminated as unused.  */
if (TREE_CODE (t) == VAR_DECL)
! set_is_used (t);
/* remove_unused_scope_block_p requires information about labels
   which are not DECL_IGNORED_P to tell if they might be used in the IL.  */
else if (TREE_CODE (t) == LABEL_DECL)


[PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener

This fixes the slowness of RTL expansion in PR60291 which is caused
by excessive collisions in mem-attr sharing.  The issue here is
that sharing attempts happens across functions and we have a _lot_
of functions in this testcase referencing the same lexically
equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
means those get the same hash value.  But they don't compare
equal because an SSA name _5 from function A is of course not equal
to one from function B.

The following fixes that by not doing mem-attr sharing across functions
by clearing the mem-attrs hashtable in rest_of_clean_state.

Another fix may be to do what the comment in iterative_hash_expr
says for SSA names:

case SSA_NAME:
  /* We can just compare by pointer.  */
  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);

(probably blame me for changing that to hashing the SSA version).  But
I'm not sure that doesn't uncover issues with various hashtables and
walking them, generating code dependent on the order.  It's IMHO just not
expected that you compare function-local expressions from different
functions.

The other thing would be to discard mem-attr sharing alltogether,
but that doesn't seem appropriate at this stage (but it would
also simplify quite some code).  With only one function in RTL
at a time that shouldn't be too bad (see several suggestions
along that line, even with statistics).

Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
trunk and 4.8 branch?

Thanks,
Richard.

2014-02-21  Richard Biener  rguent...@suse.de

PR middle-end/60291
* rtl.h (clear_mem_attrs_htab): Declare.
* emit-rtl.c (clear_mem_attrs_htab): New function.
* final.c (rest_of_clean_state): Call clear_mem_attrs_htab
to avoid sharing mem-attrs between functions.

Index: gcc/rtl.h
===
*** gcc/rtl.h   (revision 207960)
--- gcc/rtl.h   (working copy)
*** extern int in_sequence_p (void);
*** 2546,2551 
--- 2546,2552 
  extern void init_emit (void);
  extern void init_emit_regs (void);
  extern void init_emit_once (void);
+ extern void clear_mem_attrs_htab (void);
  extern void push_topmost_sequence (void);
  extern void pop_topmost_sequence (void);
  extern void set_new_first_and_last_insn (rtx, rtx);
Index: gcc/emit-rtl.c
===
*** gcc/emit-rtl.c  (revision 207960)
--- gcc/emit-rtl.c  (working copy)
*** init_emit_once (void)
*** 5913,5918 
--- 5913,5926 
simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
  }
+ 
+ /* Clear the mem-attrs sharing hash table.  */
+ 
+ void
+ clear_mem_attrs_htab (void)
+ {
+   htab_empty (mem_attrs_htab);
+ }
  
  /* Produce exact duplicate of insn INSN after AFTER.
 Care updating of libcall regions if present.  */
Index: gcc/final.c
===
*** gcc/final.c (revision 207960)
--- gcc/final.c (working copy)
*** rest_of_clean_state (void)
*** 4678,4683 
--- 4678,4686 
  
init_recog_no_volatile ();
  
+   /* Reset mem-attrs sharing.  */
+   clear_mem_attrs_htab ();
+ 
/* We're done with this function.  Free up memory if we can.  */
free_after_parsing (cfun);
free_after_compilation (cfun);


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 
 This fixes the slowness of RTL expansion in PR60291 which is caused
 by excessive collisions in mem-attr sharing.  The issue here is
 that sharing attempts happens across functions and we have a _lot_
 of functions in this testcase referencing the same lexically
 equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
 means those get the same hash value.  But they don't compare
 equal because an SSA name _5 from function A is of course not equal
 to one from function B.
 
 The following fixes that by not doing mem-attr sharing across functions
 by clearing the mem-attrs hashtable in rest_of_clean_state.
 
 Another fix may be to do what the comment in iterative_hash_expr
 says for SSA names:
 
 case SSA_NAME:
   /* We can just compare by pointer.  */
   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
 
 (probably blame me for changing that to hashing the SSA version).

It was lxo.

 But I'm not sure that doesn't uncover issues with various hashtables and
 walking them, generating code dependent on the order.  It's IMHO just not
 expected that you compare function-local expressions from different
 functions.

Same speedup result from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_hashval_t ((uintptr_t)t3, val);
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

and from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_host_wide_int
+ (DECL_UID (cfun-decl),
+  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

better than hashing pointers but requring cfun != NULL in this
function isn't good either.

At this point I'm more comfortable with clearing the hashtable
than with changing iterative_hash_expr in any way.  It's also
along the way to get rid of the hash completely.

Oh, btw, the speedup is going from

 expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
wall  293891 kB (15%) ggc

to

 expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
wall  262544 kB (13%) ggc

at -O0 (less dramatic slowness for -On).

 The other thing would be to discard mem-attr sharing alltogether,
 but that doesn't seem appropriate at this stage (but it would
 also simplify quite some code).  With only one function in RTL
 at a time that shouldn't be too bad (see several suggestions
 along that line, even with statistics).
 
 Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
 trunk and 4.8 branch?
 
 Thanks,
 Richard.
 
 2014-02-21  Richard Biener  rguent...@suse.de
 
   PR middle-end/60291
   * rtl.h (clear_mem_attrs_htab): Declare.
   * emit-rtl.c (clear_mem_attrs_htab): New function.
   * final.c (rest_of_clean_state): Call clear_mem_attrs_htab
   to avoid sharing mem-attrs between functions.
 
 Index: gcc/rtl.h
 ===
 *** gcc/rtl.h (revision 207960)
 --- gcc/rtl.h (working copy)
 *** extern int in_sequence_p (void);
 *** 2546,2551 
 --- 2546,2552 
   extern void init_emit (void);
   extern void init_emit_regs (void);
   extern void init_emit_once (void);
 + extern void clear_mem_attrs_htab (void);
   extern void push_topmost_sequence (void);
   extern void pop_topmost_sequence (void);
   extern void set_new_first_and_last_insn (rtx, rtx);
 Index: gcc/emit-rtl.c
 ===
 *** gcc/emit-rtl.c(revision 207960)
 --- gcc/emit-rtl.c(working copy)
 *** init_emit_once (void)
 *** 5913,5918 
 --- 5913,5926 
 simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
 cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   }
 + 
 + /* Clear the mem-attrs sharing hash table.  */
 + 
 + void
 + clear_mem_attrs_htab (void)
 + {
 +   htab_empty (mem_attrs_htab);
 + }
   
   /* Produce exact duplicate of insn INSN after AFTER.
  Care updating of libcall regions if present.  */
 Index: gcc/final.c
 ===
 *** gcc/final.c   (revision 207960)
 --- gcc/final.c   (working copy)
 *** 

[PATCH] Fix PR60276

2014-02-21 Thread Richard Biener

This attempts to fix PR60276 - the fact that the vectorizer dependence
analysis is run too early and that it invalidates assumptions it
makes there later.  The specific issue in question arises when
the vectorizer needs to effectively unroll the loop and by
performing all vectorized loads first and vectorized stores last
the idea that it can ignore known dependences with negative
distance doesn't work out if that distance is too short.

The following is the shortest (and eventually backportable) change
I could come up with - record the negative distance during
dependence analysis and re-validate it when decisions about
stmt copying and group sizes are fixed.

Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look
ok?

Thanks,
Richard.

2014-02-21  Richard Biener  rguent...@suse.de

PR tree-optimization/60276
* tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field.
(STMT_VINFO_MIN_NEG_DIST): New macro.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record
STMT_VINFO_MIN_NEG_DIST.
* tree-vect-stmts.c (vectorizable_load): Verify if assumptions
made for negative dependence distances still hold.

* gcc.dg/vect/pr60276.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 207938)
--- gcc/tree-vectorizer.h   (working copy)
*** typedef struct _stmt_vec_info {
*** 622,627 
--- 622,631 
   is 1.  */
unsigned int gap;
  
+   /* The minimum negative dependence distance this stmt participates in
+  or zero if none.  */
+   unsigned int min_neg_dist;
+ 
/* Not all stmts in the loop need to be vectorized. e.g, the increment
   of the loop induction variable and computation of array indexes. relevant
   indicates whether the stmt needs to be vectorized.  */
*** typedef struct _stmt_vec_info {
*** 677,682 
--- 681,687 
  #define STMT_VINFO_GROUP_SAME_DR_STMT(S)   (S)-same_dr_stmt
  #define STMT_VINFO_GROUPED_ACCESS(S)  ((S)-first_element != NULL  
(S)-data_ref_info)
  #define STMT_VINFO_LOOP_PHI_EVOLUTION_PART(S) (S)-loop_phi_evolution_part
+ #define STMT_VINFO_MIN_NEG_DIST(S)(S)-min_neg_dist
  
  #define GROUP_FIRST_ELEMENT(S)  (S)-first_element
  #define GROUP_NEXT_ELEMENT(S)   (S)-next_element
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 207938)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_analyze_data_ref_dependence (struct
*** 403,408 
--- 425,437 
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 dependence distance negative.\n);
+ /* Record a negative dependence distance to later limit the
+amount of stmt copying / unrolling we can perform.
+Only need to handle read-after-write dependence.  */
+ if (DR_IS_READ (drb)
+  (STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) == 0
+ || STMT_VINFO_MIN_NEG_DIST (stmtinfo_b)  dist))
+   STMT_VINFO_MIN_NEG_DIST (stmtinfo_b) = dist;
  continue;
}
  
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 207938)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_load (gimple stmt, gimple_s
*** 5629,5634 
--- 5629,5648 
return false;
  }
  
+   /* Invalidate assumptions made by dependence analysis when vectorization
+  on the unrolled body effectively re-orders stmts.  */
+   if (ncopies  1
+STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+  STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+ {
+   if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+cannot perform implicit CSE when unrolling 
+with negative dependence distance\n);
+   return false;
+ }
+ 
if (!STMT_VINFO_RELEVANT_P (stmt_info)  !bb_vinfo)
  return false;
  
*** vectorizable_load (gimple stmt, gimple_s
*** 5686,5691 
--- 5700,5719 
  else if (!vect_grouped_load_supported (vectype, group_size))
return false;
}
+ 
+   /* Invalidate assumptions made by dependence analysis when vectorization
+on the unrolled body effectively re-orders stmts.  */
+   if (!PURE_SLP_STMT (stmt_info)
+  STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+  ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+  STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+cannot 

Re: [PATCH] Fix latent bug in replace_uses_by

2014-02-21 Thread Bin.Cheng
On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener rguent...@suse.de wrote:

 The following fixes an ICE I got when building libjava with a local
 patch.  This causes us to substitute MEM[a, 5] into MEM[_3, 0]
 to MEM[MEM[a, 5], 0] and then asking stmt_ends_bb_p which doesn't
 expect such bogus MEM_REFs.  The MEM_REF is canonicalized by
 calling fold_stmt on it later, but the fix is of course to move
 the marking of altered BBs before doing the actual substitution
 (only then we are sure to catch all previous bb-ending stmts).

 I also noticed we don't verify MEM_REFs on LHSs.

 Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
 to trunk and branch (it's a regression uncovered by the fix for PR60115).

 Richard.

 2014-02-20  Richard Biener  rguent...@suse.de

 * tree-cfg.c (replace_uses_by): Mark altered BBs before
 doing the substitution.
 (verify_gimple_assign_single): Also verify bare MEM_REFs
 on the lhs.

 Index: gcc/tree-cfg.c
 ===
 --- gcc/tree-cfg.c  (revision 207936)
 +++ gcc/tree-cfg.c  (working copy)
 @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val)

FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
  {
 +  /* Mark the block if we change the last stmt in it.  */
 +  if (cfgcleanup_altered_bbs
 +  stmt_ends_bb_p (stmt))
 +   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index);
 +
FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
  {
   replace_exp (use, val);
 @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val)
   gimple orig_stmt = stmt;
   size_t i;

 - /* Mark the block if we changed the last stmt in it.  */
 - if (cfgcleanup_altered_bbs
 -  stmt_ends_bb_p (stmt))
 -   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index);
 -
Hi Richard,
I also noticed this with local patch, but is it OK just to move above
code after fold_stmt? In other words, does phi node matter (according
to comments before cfgcleanup_altered_bbs)?

Thanks in advance.


   /* FIXME.  It shouldn't be required to keep TREE_CONSTANT
  on ADDR_EXPRs up-to-date on GIMPLE.  Propagation will
  only change sth from non-invariant to invariant, and only
 @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt
return true;
  }

 -  if (handled_component_p (lhs))
 +  if (handled_component_p (lhs)
 +  || TREE_CODE (lhs) == MEM_REF
 +  || TREE_CODE (lhs) == TARGET_MEM_REF)
  res |= verify_types_in_gimple_reference (lhs, true);

/* Special codes we cannot handle via their class.  */



-- 
Best Regards.


Re: [PATCH] Fix PR60276

2014-02-21 Thread Jakub Jelinek
On Fri, Feb 21, 2014 at 11:32:41AM +0100, Richard Biener wrote:
 
 This attempts to fix PR60276 - the fact that the vectorizer dependence
 analysis is run too early and that it invalidates assumptions it
 makes there later.  The specific issue in question arises when
 the vectorizer needs to effectively unroll the loop and by
 performing all vectorized loads first and vectorized stores last
 the idea that it can ignore known dependences with negative
 distance doesn't work out if that distance is too short.
 
 The following is the shortest (and eventually backportable) change
 I could come up with - record the negative distance during
 dependence analysis and re-validate it when decisions about
 stmt copying and group sizes are fixed.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu - does this look
 ok?

Ok, thanks.

 2014-02-21  Richard Biener  rguent...@suse.de
 
   PR tree-optimization/60276
   * tree-vectorizer.h (struct _stmt_vec_info): Add min_neg_dist field.
   (STMT_VINFO_MIN_NEG_DIST): New macro.
   * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Record
   STMT_VINFO_MIN_NEG_DIST.
   * tree-vect-stmts.c (vectorizable_load): Verify if assumptions
   made for negative dependence distances still hold.
 
   * gcc.dg/vect/pr60276.c: New testcase.

Jakub


Re: [PATCH][1/n] Improve PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 
 This improves compile-time of PR60291 at -O1 from 210s to 85s,
 getting remove unused locals out of the profile.  There walking
 DECL_INITIAL of globals is quadratic when that is refered to from
 multiple functions.  We've had the same issue with
 add_referenced_vars when that still existed.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk
 and branch?
 
 I've verified that I can still properly debug
 
 int **
 foo (void)
 {
   static int a = 0;
   static int *b = a;
   static int **c = b;
   return c;
 }
 int main()
 {
   return **foo();
 }
 
 (step into foo and print a, b and c).  Note that even with 4.8
 right now
 
 int **
 foo (void)
 {
   int **q;
 {
   static int a = 0;
   static int *b = a;
   static int **c = b;
   q = c;
 }
   return q;
 }
 int main()
 {
   return **foo();
 }
 
 is broken (with -O1 -fno-inline, with inlining both cases are
 broken).  But that all doesn't regress with the following and
 if we fix it then we should fix it another way, not by walking
 global initializers.

So I checked if this all is a regression and this particular
piece is a regression from 4.7 where we only walk global
initializers for VAR_DECLs with DECL_CONTEXT == current_function_decl.

So at this point it's easiest and least intrusive to re-instantiate
this restriction which was removed by r187800 (that was me - the
change looks accidential).

Re-bootstrapping / testing on x86_64-unknown-linux-gnu and will
commit afterwards to trunk and to the branch a bit later.

Thanks,
Richard.

2014-02-21  Richard Biener  rguent...@suse.de

PR middle-end/60291
* tree-ssa-live.c (mark_all_vars_used_1): Do not walk
DECL_INITIAL for globals not in the current function context.

Index: gcc/tree-ssa-live.c
===
*** gcc/tree-ssa-live.c (revision 207960)
--- gcc/tree-ssa-live.c (working copy)
*** mark_all_vars_used_1 (tree *tp, int *wal
*** 435,441 
  {
/* When a global var becomes used for the first time also walk its
   initializer (non global ones don't have any).  */
!   if (set_is_used (t)  is_global_var (t))
mark_all_vars_used (DECL_INITIAL (t));
  }
/* remove_unused_scope_block_p requires information about labels
--- 435,442 
  {
/* When a global var becomes used for the first time also walk its
   initializer (non global ones don't have any).  */
!   if (set_is_used (t)  is_global_var (t)
!  DECL_CONTEXT (t) == current_function_decl)
mark_all_vars_used (DECL_INITIAL (t));
  }
/* remove_unused_scope_block_p requires information about labels


Re: [PATCH] Fix latent bug in replace_uses_by

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Bin.Cheng wrote:

 On Thu, Feb 20, 2014 at 10:51 PM, Richard Biener rguent...@suse.de wrote:
 
  The following fixes an ICE I got when building libjava with a local
  patch.  This causes us to substitute MEM[a, 5] into MEM[_3, 0]
  to MEM[MEM[a, 5], 0] and then asking stmt_ends_bb_p which doesn't
  expect such bogus MEM_REFs.  The MEM_REF is canonicalized by
  calling fold_stmt on it later, but the fix is of course to move
  the marking of altered BBs before doing the actual substitution
  (only then we are sure to catch all previous bb-ending stmts).
 
  I also noticed we don't verify MEM_REFs on LHSs.
 
  Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
  to trunk and branch (it's a regression uncovered by the fix for PR60115).
 
  Richard.
 
  2014-02-20  Richard Biener  rguent...@suse.de
 
  * tree-cfg.c (replace_uses_by): Mark altered BBs before
  doing the substitution.
  (verify_gimple_assign_single): Also verify bare MEM_REFs
  on the lhs.
 
  Index: gcc/tree-cfg.c
  ===
  --- gcc/tree-cfg.c  (revision 207936)
  +++ gcc/tree-cfg.c  (working copy)
  @@ -1677,6 +1677,11 @@ replace_uses_by (tree name, tree val)
 
 FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
   {
  +  /* Mark the block if we change the last stmt in it.  */
  +  if (cfgcleanup_altered_bbs
  +  stmt_ends_bb_p (stmt))
  +   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb (stmt)-index);
  +
 FOR_EACH_IMM_USE_ON_STMT (use, imm_iter)
   {
replace_exp (use, val);
  @@ -1701,11 +1706,6 @@ replace_uses_by (tree name, tree val)
gimple orig_stmt = stmt;
size_t i;
 
  - /* Mark the block if we changed the last stmt in it.  */
  - if (cfgcleanup_altered_bbs
  -  stmt_ends_bb_p (stmt))
  -   bitmap_set_bit (cfgcleanup_altered_bbs, gimple_bb 
  (stmt)-index);
  -
 Hi Richard,
 I also noticed this with local patch, but is it OK just to move above
 code after fold_stmt? In other words, does phi node matter (according
 to comments before cfgcleanup_altered_bbs)?

PHI node doesn't matter but doesn't trigger stmt_ends_bb_p anyway.
It's better to do before the replacement because a stmt that may
have been stmt_ends_bb_p before the replacement might not be
after it (and thus we'd miss a cfgcleanup opportunity to merge
two blocks).

Richard.

 Thanks in advance.
 
 
/* FIXME.  It shouldn't be required to keep TREE_CONSTANT
   on ADDR_EXPRs up-to-date on GIMPLE.  Propagation will
   only change sth from non-invariant to invariant, and only
  @@ -3986,7 +3986,9 @@ verify_gimple_assign_single (gimple stmt
 return true;
   }
 
  -  if (handled_component_p (lhs))
  +  if (handled_component_p (lhs)
  +  || TREE_CODE (lhs) == MEM_REF
  +  || TREE_CODE (lhs) == TARGET_MEM_REF)
   res |= verify_types_in_gimple_reference (lhs, true);
 
 /* Special codes we cannot handle via their class.  */
 
 
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


[C++ Patch] PR 60253

2014-02-21 Thread Paolo Carlini

Hi,

unless we have reasons to believe that the diagnostic quality could 
regress in some circumstances, we can easily resolve this ICE on invalid 
regression by always returning error_mark_node after error (thus outside 
SFINAE too).


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2014-02-21  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60253
* call.c (convert_arg_to_ellipsis): Return error_mark_node after
error_at.

/testsuite
2014-02-21  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60253
* g++.dg/overload/ellipsis2.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 207987)
+++ cp/call.c   (working copy)
@@ -6406,8 +6406,7 @@ convert_arg_to_ellipsis (tree arg, tsubst_flags_t
  if (complain  tf_error)
error_at (loc, cannot pass objects of non-trivially-copyable 
  type %q#T through %...%, arg_type);
- else
-   return error_mark_node;
+ return error_mark_node;
}
 }
 
Index: testsuite/g++.dg/overload/ellipsis2.C
===
--- testsuite/g++.dg/overload/ellipsis2.C   (revision 0)
+++ testsuite/g++.dg/overload/ellipsis2.C   (working copy)
@@ -0,0 +1,13 @@
+// PR c++/60253
+
+struct A
+{
+  ~A();
+};
+
+struct B
+{
+  B(...);
+};
+
+B b(0, A());  // { dg-error cannot pass }


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Biener wrote:
 
  
  This fixes the slowness of RTL expansion in PR60291 which is caused
  by excessive collisions in mem-attr sharing.  The issue here is
  that sharing attempts happens across functions and we have a _lot_
  of functions in this testcase referencing the same lexically
  equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
  means those get the same hash value.  But they don't compare
  equal because an SSA name _5 from function A is of course not equal
  to one from function B.
  
  The following fixes that by not doing mem-attr sharing across functions
  by clearing the mem-attrs hashtable in rest_of_clean_state.
  
  Another fix may be to do what the comment in iterative_hash_expr
  says for SSA names:
  
  case SSA_NAME:
/* We can just compare by pointer.  */
return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
  
  (probably blame me for changing that to hashing the SSA version).
 
 It was lxo.
 
  But I'm not sure that doesn't uncover issues with various hashtables and
  walking them, generating code dependent on the order.  It's IMHO just not
  expected that you compare function-local expressions from different
  functions.
 
 Same speedup result from
 
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  (revision 207960)
 +++ gcc/tree.c  (working copy)
 @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
}
  case SSA_NAME:
/* We can just compare by pointer.  */
 -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
 +  return iterative_hash_hashval_t ((uintptr_t)t3, val);
  case PLACEHOLDER_EXPR:
/* The node itself doesn't matter.  */
return val;
 
 and from
 
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  (revision 207960)
 +++ gcc/tree.c  (working copy)
 @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
}
  case SSA_NAME:
/* We can just compare by pointer.  */
 -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
 +  return iterative_hash_host_wide_int
 + (DECL_UID (cfun-decl),
 +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
  case PLACEHOLDER_EXPR:
/* The node itself doesn't matter.  */
return val;
 
 better than hashing pointers but requring cfun != NULL in this
 function isn't good either.
 
 At this point I'm more comfortable with clearing the hashtable
 than with changing iterative_hash_expr in any way.  It's also
 along the way to get rid of the hash completely.
 
 Oh, btw, the speedup is going from
 
  expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
 wall  293891 kB (15%) ggc
 
 to
 
  expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
 wall  262544 kB (13%) ggc
 
 at -O0 (less dramatic slowness for -On).
 
  The other thing would be to discard mem-attr sharing alltogether,
  but that doesn't seem appropriate at this stage (but it would
  also simplify quite some code).  With only one function in RTL
  at a time that shouldn't be too bad (see several suggestions
  along that line, even with statistics).

Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

Richard.


[patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-21 Thread Joey Ye
Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
this ICE:

1. thumb1 estimate if far_jump is used based on function insn size
2. During reload, after stack layout finalized, it does reload_as_needed. It
however increases insn size that changes estimation result of far_jump,
which in return need to save lr and change stack layout again. While there
is not chance to change, GCC crashes.

Solution:
Do not change estimation result of far_jump if reload_in_progress or
reload_completed is true.

Not likely need to fix lra according to Vlad:
http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html

ChangeLog:
* config/arm/arm.c (thumb_far_jump_used_p): Don't change
  if reload in progress or completed.

* gcc.target/arm/thumb1-far-jump-3.c: New case.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b562986..2cf362c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26255,6 +26255,11 @@ thumb_far_jump_used_p (void)
return 0;
 }
 
+  /* We should not change far_jump_used during or after reload, as there is
+ no chance to change stack frame layout.  */
+  if (reload_in_progress || reload_completed)
+return 0;
+
   /* Check to see if the function contains a branch
  insn with the far jump attribute set.  */
   for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c 
b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c
new file mode 100644
index 000..90559ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c
@@ -0,0 +1,108 @@
+/* Catch reload ICE on target thumb1 with far jump optimization.
+ * It is also a valid case for non-thumb1 target.  */
+
+/* Add -mno-lra option as it is only reproducable with reload.  It will
+   be removed after reload is completely removed.  */
+/* { dg-options -mno-lra -fomit-frame-pointer } */
+/* { dg-do compile } */
+
+#define C  2
+#define A  4
+#define RGB  (C | A)
+#define GRAY (A)
+
+typedef unsigned long uint_32;
+typedef unsigned char byte;
+typedef byte* bytep;
+
+typedef struct ss
+{
+   uint_32 w;
+   uint_32 r;
+   byte c;
+   byte b;
+   byte p;
+} info;
+
+typedef info * infop;
+
+void
+foo(infop info, bytep row)
+{
+   uint_32 iw = info-w;
+   if (info-c == RGB)
+   {
+  if (info-b == 8)
+  {
+ bytep sp = row + info-r;
+ bytep dp = sp;
+ byte save;
+ uint_32 i;
+
+ for (i = 0; i  iw; i++)
+ {
+save = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save;
+ }
+  }
+
+  else
+  {
+ bytep sp = row + info-r;
+ bytep dp = sp;
+ byte save[2];
+ uint_32 i;
+
+ for (i = 0; i  iw; i++)
+ {
+save[0] = *(--sp);
+save[1] = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save[0];
+*(--dp) = save[1];
+ }
+  }
+   }
+   else if (info-c == GRAY)
+   {
+  if (info-b == 8)
+  {
+ bytep sp = row + info-r;
+ bytep dp = sp;
+ byte save;
+ uint_32 i;
+
+ for (i = 0; i  iw; i++)
+ {
+save = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save;
+ }
+  }
+  else
+  {
+ bytep sp = row + info-r;
+ bytep dp = sp;
+ byte save[2];
+ uint_32 i;
+
+ for (i = 0; i  iw; i++)
+ {
+save[0] = *(--sp);
+save[1] = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = *(--sp);
+*(--dp) = save[0];
+*(--dp) = save[1];
+ }
+  }
+   }
+}


RE: [patch] [arm] Fix PR60169 - thumb1 far jump

2014-02-21 Thread Joey Ye
OK to trunk and 4.8?

-Original Message-
From: Joey Ye [mailto:joey...@arm.com] 
Sent: 2014年2月21日 19:32
To: gcc-patches@gcc.gnu.org
Subject: [patch] [arm] Fix PR60169 - thumb1 far jump

Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced
this ICE:

1. thumb1 estimate if far_jump is used based on function insn size 2. During
reload, after stack layout finalized, it does reload_as_needed. It however
increases insn size that changes estimation result of far_jump, which in
return need to save lr and change stack layout again. While there is not
chance to change, GCC crashes.

Solution:
Do not change estimation result of far_jump if reload_in_progress or
reload_completed is true.

Not likely need to fix lra according to Vlad:
http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html

ChangeLog:
* config/arm/arm.c (thumb_far_jump_used_p): Don't change
  if reload in progress or completed.

* gcc.target/arm/thumb1-far-jump-3.c: New case.





Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Biener wrote:
 
  On Fri, 21 Feb 2014, Richard Biener wrote:
  
   
   This fixes the slowness of RTL expansion in PR60291 which is caused
   by excessive collisions in mem-attr sharing.  The issue here is
   that sharing attempts happens across functions and we have a _lot_
   of functions in this testcase referencing the same lexically
   equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
   means those get the same hash value.  But they don't compare
   equal because an SSA name _5 from function A is of course not equal
   to one from function B.
   
   The following fixes that by not doing mem-attr sharing across functions
   by clearing the mem-attrs hashtable in rest_of_clean_state.
   
   Another fix may be to do what the comment in iterative_hash_expr
   says for SSA names:
   
   case SSA_NAME:
 /* We can just compare by pointer.  */
 return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
   
   (probably blame me for changing that to hashing the SSA version).
  
  It was lxo.
  
   But I'm not sure that doesn't uncover issues with various hashtables and
   walking them, generating code dependent on the order.  It's IMHO just not
   expected that you compare function-local expressions from different
   functions.
  
  Same speedup result from
  
  Index: gcc/tree.c
  ===
  --- gcc/tree.c  (revision 207960)
  +++ gcc/tree.c  (working copy)
  @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
 }
   case SSA_NAME:
 /* We can just compare by pointer.  */
  -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
  +  return iterative_hash_hashval_t ((uintptr_t)t3, val);
   case PLACEHOLDER_EXPR:
 /* The node itself doesn't matter.  */
 return val;
  
  and from
  
  Index: gcc/tree.c
  ===
  --- gcc/tree.c  (revision 207960)
  +++ gcc/tree.c  (working copy)
  @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
 }
   case SSA_NAME:
 /* We can just compare by pointer.  */
  -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
  +  return iterative_hash_host_wide_int
  + (DECL_UID (cfun-decl),
  +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
   case PLACEHOLDER_EXPR:
 /* The node itself doesn't matter.  */
 return val;
  
  better than hashing pointers but requring cfun != NULL in this
  function isn't good either.
  
  At this point I'm more comfortable with clearing the hashtable
  than with changing iterative_hash_expr in any way.  It's also
  along the way to get rid of the hash completely.
  
  Oh, btw, the speedup is going from
  
   expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
  wall  293891 kB (15%) ggc
  
  to
  
   expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
  wall  262544 kB (13%) ggc
  
  at -O0 (less dramatic slowness for -On).
  
   The other thing would be to discard mem-attr sharing alltogether,
   but that doesn't seem appropriate at this stage (but it would
   also simplify quite some code).  With only one function in RTL
   at a time that shouldn't be too bad (see several suggestions
   along that line, even with statistics).
 
 Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

With the patch below to get some statistics we see that one important
piece of sharing not covered by above measurements is RTX copying(?).

On the testcase for this PR I get at -O1 and without the patch
to clear the hashtable after each function

142489 mem_attrs created (142439 for new, 50 for modification)
1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
by rtx copying)
0 mem_attrs dropped

and with the patch to clear after each function

364411 mem_attrs created (144478 for new, 219933 for modification)
1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
by rtx copying)
0 mem_attrs dropped

while for dwarf2out.c I see without the clearing

24399 mem_attrs created (6929 for new, 17470 for modification)
102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
rtx copying)
16 mem_attrs dropped

which means that completely dropping the sharing would result
in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
That's still not a lot overhead given that mem-attrs take 40 bytes
(3MB vs. 950kB).  There is also always the possibility to
explicitely ref-count mem-attrs to handle sharing by rtx
copying (at least cse, fwprop, combine, ira and reload copy MEMs,
probably some for no good reason because MEMs are not shared),
thus make mem-attrs unshare-on-modify.

Richard.

Index: gcc/rtl.c

Re: [PATCH][AArch64] vrnd*_f64 patch for stage-1

2014-02-21 Thread Alex Velenko

On 13/02/14 17:43, Richard Henderson wrote:

On 02/13/2014 03:17 AM, Alex Velenko wrote:

+/* Sets rmode field of FPCR control register to
+   FPROUNDING_ZERO.  */


Comment is wrong, or at least misleading.


+void __inline __attribute__ ((__always_inline__))
+set_rounding_mode (uint32_t mode)
+{
+  uint32_t r;
+
+  /* Read current FPCR.  */
+  asm volatile (mrs %[r], fpcr : [r] =r (r) : :);
+
+  /* Clear rmode.  */
+  r = 3  RMODE_START;


   ~(3  RMODE_START)


+  /* Calculate desired FPCR.  */
+  r |= mode  RMODE_START;
+
+  /* Write desired FPCR back.  */
+  asm volatile (msr fpcr, %[r] : : [r] r (r) :);
+}


Fortunately for this testcase, you do always use FPROUNDING_ZERO == 3 when
calling this function, so the bugs are hidden.


r~



Hi Richard,
Thank you for pointing those issue out. here is a respin of the same 
patch with indecated issues fixed. the description of the patch is as 
follows:


This patch adds vrnd*_f64 aarch64 intrinsics. A testcase for those
intrinsics is added. Run a complete LE and BE regression run with no 
regressions.


Is patch OK for stage-1?

gcc/

2014-02-21  Alex Velenko  alex.vele...@arm.com

* config/aarch64/aarch64-builtins.c (BUILTIN_VDQF_DF): Macro
added.
* config/aarch64/aarch64-simd-builtins.def (frintn): Use added
macro.
* config/aarch64/aarch64-simd.md (frint_pattern): Comment
corrected.
* config/aarch64/aarch64.md (frint_pattern): Likewise.
* config/aarch64/arm_neon.h (vrnd_f64): Added.
(vrnda_f64): Likewise.
(vrndi_f64): Likewise.
(vrndm_f64): Likewise.
(vrndn_f64): Likewise.
(vrndp_f64): Likewise.
(vrndx_f64): Likewise.

gcc/testsuite/

2014-02-21  Alex Velenko  alex.vele...@arm.com

gcc.target/aarch64/vrnd_f64_1.c : New testcase.


diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index ebab2ce8347a4425977c5cbd0f285c3ff1d9f2f1..7adc5fb96b6473ecde5c4f76973aff68af0ca7d4 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -307,6 +307,8 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   VAR7 (T, N, MAP, v8qi, v16qi, v4hi, v8hi, v2si, v4si, v2di)
 #define BUILTIN_VDQF(T, N, MAP) \
   VAR3 (T, N, MAP, v2sf, v4sf, v2df)
+#define BUILTIN_VDQF_DF(T, N, MAP) \
+  VAR4 (T, N, MAP, v2sf, v4sf, v2df, df)
 #define BUILTIN_VDQH(T, N, MAP) \
   VAR2 (T, N, MAP, v4hi, v8hi)
 #define BUILTIN_VDQHS(T, N, MAP) \
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index e5f71b479ccfd1a9cbf84aed0f96b49762053f59..09e230c56683a0225f8760472d7137b7bac98297 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -264,7 +264,7 @@
   BUILTIN_VDQF (UNOP, nearbyint, 2)
   BUILTIN_VDQF (UNOP, rint, 2)
   BUILTIN_VDQF (UNOP, round, 2)
-  BUILTIN_VDQF (UNOP, frintn, 2)
+  BUILTIN_VDQF_DF (UNOP, frintn, 2)
 
   /* Implemented by lfcvt_patternsu_optabVQDF:modevcvt_target2.  */
   VAR1 (UNOP, lbtruncv2sf, 2, v2si)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 4dffb59e856aeaafb79007255d3b91a73ef1ef13..0c1d7de5b3f4fb0fa8fa226b81ec690d8112b849 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1427,7 +1427,7 @@
 )
 
 ;; Vector versions of the floating-point frint patterns.
-;; Expands to btrunc, ceil, floor, nearbyint, rint, round.
+;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 (define_insn frint_patternmode2
   [(set (match_operand:VDQF 0 register_operand =w)
 	(unspec:VDQF [(match_operand:VDQF 1 register_operand w)]
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 99a6ac8fcbdcd24a0ea18cc037bef9cf72070281..577aa9fe08bb445e66734bc404e94e13dc1fa65b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3187,7 +3187,7 @@
 ;; ---
 
 ;; frint floating-point round to integral standard patterns.
-;; Expands to btrunc, ceil, floor, nearbyint, rint, round.
+;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
 
 (define_insn frint_patternmode2
   [(set (match_operand:GPF 0 register_operand =w)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6af99361b8e265f66026dc506cfc23f044d153b4..797e37ad638648312ef34bcd63c463e5873c30c4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22481,6 +22481,12 @@ vrnd_f32 (float32x2_t __a)
   return __builtin_aarch64_btruncv2sf (__a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vrnd_f64 (float64x1_t __a)
+{
+  return vset_lane_f64 (__builtin_trunc (vget_lane_f64 (__a, 0)), __a, 0);
+}
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vrndq_f32 (float32x4_t __a)
 {
@@ -22501,6 +22507,12 @@ vrnda_f32 (float32x2_t __a)
   return 

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Biener wrote:
 
  On Fri, 21 Feb 2014, Richard Biener wrote:
  
   
   This fixes the slowness of RTL expansion in PR60291 which is caused
   by excessive collisions in mem-attr sharing.  The issue here is
   that sharing attempts happens across functions and we have a _lot_
   of functions in this testcase referencing the same lexically
   equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
   means those get the same hash value.  But they don't compare
   equal because an SSA name _5 from function A is of course not equal
   to one from function B.
   
   The following fixes that by not doing mem-attr sharing across functions
   by clearing the mem-attrs hashtable in rest_of_clean_state.
   
   Another fix may be to do what the comment in iterative_hash_expr
   says for SSA names:
   
   case SSA_NAME:
 /* We can just compare by pointer.  */
 return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
   
   (probably blame me for changing that to hashing the SSA version).
  
  It was lxo.
  
   But I'm not sure that doesn't uncover issues with various hashtables and
   walking them, generating code dependent on the order.  It's IMHO just not
   expected that you compare function-local expressions from different
   functions.
  
  Same speedup result from
  
  Index: gcc/tree.c
  ===
  --- gcc/tree.c  (revision 207960)
  +++ gcc/tree.c  (working copy)
  @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
 }
   case SSA_NAME:
 /* We can just compare by pointer.  */
  -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
  +  return iterative_hash_hashval_t ((uintptr_t)t3, val);
   case PLACEHOLDER_EXPR:
 /* The node itself doesn't matter.  */
 return val;
  
  and from
  
  Index: gcc/tree.c
  ===
  --- gcc/tree.c  (revision 207960)
  +++ gcc/tree.c  (working copy)
  @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
 }
   case SSA_NAME:
 /* We can just compare by pointer.  */
  -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
  +  return iterative_hash_host_wide_int
  + (DECL_UID (cfun-decl),
  +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
   case PLACEHOLDER_EXPR:
 /* The node itself doesn't matter.  */
 return val;
  
  better than hashing pointers but requring cfun != NULL in this
  function isn't good either.
  
  At this point I'm more comfortable with clearing the hashtable
  than with changing iterative_hash_expr in any way.  It's also
  along the way to get rid of the hash completely.
  
  Oh, btw, the speedup is going from
  
   expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
  wall  293891 kB (15%) ggc
  
  to
  
   expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
  wall  262544 kB (13%) ggc
  
  at -O0 (less dramatic slowness for -On).
  
   The other thing would be to discard mem-attr sharing alltogether,
   but that doesn't seem appropriate at this stage (but it would
   also simplify quite some code).  With only one function in RTL
   at a time that shouldn't be too bad (see several suggestions
   along that line, even with statistics).
 
 Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

 With the patch below to get some statistics we see that one important
 piece of sharing not covered by above measurements is RTX copying(?).

 On the testcase for this PR I get at -O1 and without the patch
 to clear the hashtable after each function

 142489 mem_attrs created (142439 for new, 50 for modification)
 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
 by rtx copying)
 0 mem_attrs dropped

 and with the patch to clear after each function

 364411 mem_attrs created (144478 for new, 219933 for modification)
 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
 by rtx copying)
 0 mem_attrs dropped

 while for dwarf2out.c I see without the clearing

 24399 mem_attrs created (6929 for new, 17470 for modification)
 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
 rtx copying)
 16 mem_attrs dropped

 which means that completely dropping the sharing would result
 in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
 That's still not a lot overhead given that mem-attrs take 40 bytes
 (3MB vs. 950kB).  There is also always the possibility to
 explicitely ref-count mem-attrs to handle sharing by rtx
 copying (at least cse, fwprop, combine, ira and reload copy MEMs,
 probably some for no good reason because MEMs are not shared),
 thus make mem-attrs unshare-on-modify.

In a thread a few 

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Biener wrote:
 
  Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
 
 With the patch below to get some statistics we see that one important
 piece of sharing not covered by above measurements is RTX copying(?).
 
 On the testcase for this PR I get at -O1 and without the patch
 to clear the hashtable after each function
 
 142489 mem_attrs created (142439 for new, 50 for modification)
 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
 by rtx copying)
 0 mem_attrs dropped
 
 and with the patch to clear after each function
 
 364411 mem_attrs created (144478 for new, 219933 for modification)
 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
 by rtx copying)
 0 mem_attrs dropped
 
 while for dwarf2out.c I see without the clearing
 
 24399 mem_attrs created (6929 for new, 17470 for modification)
 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
 rtx copying)
 16 mem_attrs dropped

Oh, and more than half of shared-modified are actually not modified
so are false sharing reports (set_mem_attrs (mem, MEM_ATTRS (mem))).

24399 mem_attrs created (6929 for new, 17470 for modification)
85801 mem_attrs shared (10878 for new, 12390 for modification, 62533 by 
rtx copying)
16 mem_attrs dropped

when dropping sharing completely you win creations for modification
but lose shares for new and copy.  Losing the copy case makes
it a loss overall which you can eventually offset by using a ref-counting
scheme (or better by avoiding copying the MEM in the first place,
a MEM is currently 24 bytes while its attrs are 40 bytes).

Richard.

Index: gcc/rtl.c
===
--- gcc/rtl.c   (revision 207938)
+++ gcc/rtl.c   (working copy)
@@ -326,6 +326,8 @@ copy_rtx (rtx orig)
   return copy;
 }
 
+unsigned long mem_attrs_shared_copy;
+
 /* Create a new copy of an rtx.  Only copy just one level.  */
 
 rtx
@@ -333,6 +335,8 @@ shallow_copy_rtx_stat (const_rtx orig ME
 {
   const unsigned int size = rtx_size (orig);
   rtx const copy = ggc_alloc_rtx_def_stat (size PASS_MEM_STAT);
+  if (MEM_P (orig)  MEM_ATTRS (orig))
+mem_attrs_shared_copy++;
   return (rtx) memcpy (copy, orig, size);
 }
 
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 207938)
+++ gcc/emit-rtl.c  (working copy)
@@ -290,6 +290,12 @@ mem_attrs_htab_eq (const void *x, const
   return mem_attrs_eq_p ((const mem_attrs *) x, (const mem_attrs *) y);
 }
 
+unsigned long mem_attrs_dropped;
+unsigned long mem_attrs_new;
+unsigned long mem_attrs_new_modified;
+unsigned long mem_attrs_shared;
+unsigned long mem_attrs_shared_modified;
+
 /* Set MEM's memory attributes so that they are the same as ATTRS.  */
 
 static void
@@ -300,6 +306,8 @@ set_mem_attrs (rtx mem, mem_attrs *attrs
   /* If everything is the default, we can just clear the attributes.  */
   if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
 {
+  if (MEM_ATTRS (mem))
+   mem_attrs_dropped++;
   MEM_ATTRS (mem) = 0;
   return;
 }
@@ -309,6 +317,20 @@ set_mem_attrs (rtx mem, mem_attrs *attrs
 {
   *slot = ggc_alloc_mem_attrs ();
   memcpy (*slot, attrs, sizeof (mem_attrs));
+  if (MEM_ATTRS (mem))
+   mem_attrs_new_modified++;
+  else
+   mem_attrs_new++;
+}
+  else
+{
+  if (MEM_ATTRS (mem))
+   {
+ if (MEM_ATTRS (mem) != *slot)
+   mem_attrs_shared_modified++;
+   }
+  else
+   mem_attrs_shared++;
 }
 
   MEM_ATTRS (mem) = (mem_attrs *) *slot;
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 207938)
+++ gcc/toplev.c(working copy)
@@ -1989,6 +2023,26 @@ toplev_main (int argc, char **argv)
   if (!exit_after_options)
 do_compile ();
 
+{
+  extern unsigned long mem_attrs_dropped;
+  extern unsigned long mem_attrs_new;
+  extern unsigned long mem_attrs_new_modified;
+  extern unsigned long mem_attrs_shared;
+  extern unsigned long mem_attrs_shared_modified;
+  extern unsigned long mem_attrs_shared_copy;
+  fprintf (stderr, %lu mem_attrs created (%lu for new, %lu for 
+  modification)\n,
+  mem_attrs_new + mem_attrs_new_modified,
+  mem_attrs_new, mem_attrs_new_modified);
+  fprintf (stderr, %lu mem_attrs shared (%lu for new, %lu for 
+  modification, %lu by rtx copying)\n,
+  mem_attrs_shared + mem_attrs_shared_modified
+  + mem_attrs_shared_copy,
+  mem_attrs_shared, mem_attrs_shared_modified,
+  mem_attrs_shared_copy);
+  fprintf (stderr, %lu mem_attrs dropped\n, mem_attrs_dropped);
+}
+
   if (warningcount || errorcount || werrorcount)
 print_ignored_options ();
 


How to use GCC to compile glib

2014-02-21 Thread shafiq132
Sir,

   I  have a cross compiler and I know how to cross compile a file . But I
am doing all just for glib compilation that I do not know how to do. Anyone
to guide me. or generally just inform me how can I compile a complete
library using gcc.



--
View this message in context: 
http://gcc.1065356.n5.nabble.com/PATCH-2-2-Fix-expansion-slowness-of-PR60291-tp1013329p1013362.html
Sent from the gcc - patches mailing list archive at Nabble.com.


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  On Fri, 21 Feb 2014, Richard Biener wrote:
 
  On Fri, 21 Feb 2014, Richard Biener wrote:
  
   On Fri, 21 Feb 2014, Richard Biener wrote:
   

This fixes the slowness of RTL expansion in PR60291 which is caused
by excessive collisions in mem-attr sharing.  The issue here is
that sharing attempts happens across functions and we have a _lot_
of functions in this testcase referencing the same lexically
equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
means those get the same hash value.  But they don't compare
equal because an SSA name _5 from function A is of course not equal
to one from function B.

The following fixes that by not doing mem-attr sharing across functions
by clearing the mem-attrs hashtable in rest_of_clean_state.

Another fix may be to do what the comment in iterative_hash_expr
says for SSA names:

case SSA_NAME:
  /* We can just compare by pointer.  */
  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);

(probably blame me for changing that to hashing the SSA version).
   
   It was lxo.
   
But I'm not sure that doesn't uncover issues with various hashtables 
and
walking them, generating code dependent on the order.  It's IMHO just 
not
expected that you compare function-local expressions from different
functions.
   
   Same speedup result from
   
   Index: gcc/tree.c
   ===
   --- gcc/tree.c  (revision 207960)
   +++ gcc/tree.c  (working copy)
   @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
  }
case SSA_NAME:
  /* We can just compare by pointer.  */
   -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
   +  return iterative_hash_hashval_t ((uintptr_t)t3, val);
case PLACEHOLDER_EXPR:
  /* The node itself doesn't matter.  */
  return val;
   
   and from
   
   Index: gcc/tree.c
   ===
   --- gcc/tree.c  (revision 207960)
   +++ gcc/tree.c  (working copy)
   @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
  }
case SSA_NAME:
  /* We can just compare by pointer.  */
   -  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
   +  return iterative_hash_host_wide_int
   + (DECL_UID (cfun-decl),
   +  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
case PLACEHOLDER_EXPR:
  /* The node itself doesn't matter.  */
  return val;
   
   better than hashing pointers but requring cfun != NULL in this
   function isn't good either.
   
   At this point I'm more comfortable with clearing the hashtable
   than with changing iterative_hash_expr in any way.  It's also
   along the way to get rid of the hash completely.
   
   Oh, btw, the speedup is going from
   
expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 
   (93%) 
   wall  293891 kB (15%) ggc
   
   to
   
expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 
   6%) 
   wall  262544 kB (13%) ggc
   
   at -O0 (less dramatic slowness for -On).
   
The other thing would be to discard mem-attr sharing alltogether,
but that doesn't seem appropriate at this stage (but it would
also simplify quite some code).  With only one function in RTL
at a time that shouldn't be too bad (see several suggestions
along that line, even with statistics).
  
  Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
 
  With the patch below to get some statistics we see that one important
  piece of sharing not covered by above measurements is RTX copying(?).
 
  On the testcase for this PR I get at -O1 and without the patch
  to clear the hashtable after each function
 
  142489 mem_attrs created (142439 for new, 50 for modification)
  1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
  by rtx copying)
  0 mem_attrs dropped
 
  and with the patch to clear after each function
 
  364411 mem_attrs created (144478 for new, 219933 for modification)
  1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
  by rtx copying)
  0 mem_attrs dropped
 
  while for dwarf2out.c I see without the clearing
 
  24399 mem_attrs created (6929 for new, 17470 for modification)
  102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
  rtx copying)
  16 mem_attrs dropped
 
  which means that completely dropping the sharing would result
  in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
  That's still not a lot overhead given that mem-attrs take 40 bytes
  (3MB vs. 950kB).  There is also always the possibility to
  explicitely ref-count mem-attrs to handle sharing by rtx
 

Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Sandiford wrote:
 
  Richard Biener rguent...@suse.de writes:
   On Fri, 21 Feb 2014, Richard Biener wrote:
  
   On Fri, 21 Feb 2014, Richard Biener wrote:
   
On Fri, 21 Feb 2014, Richard Biener wrote:

 
 This fixes the slowness of RTL expansion in PR60291 which is caused
 by excessive collisions in mem-attr sharing.  The issue here is
 that sharing attempts happens across functions and we have a _lot_
 of functions in this testcase referencing the same lexically
 equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
 means those get the same hash value.  But they don't compare
 equal because an SSA name _5 from function A is of course not equal
 to one from function B.
 
 The following fixes that by not doing mem-attr sharing across 
 functions
 by clearing the mem-attrs hashtable in rest_of_clean_state.
 
 Another fix may be to do what the comment in iterative_hash_expr
 says for SSA names:
 
 case SSA_NAME:
   /* We can just compare by pointer.  */
   return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), 
 val);
 
 (probably blame me for changing that to hashing the SSA version).

It was lxo.

 But I'm not sure that doesn't uncover issues with various hashtables 
 and
 walking them, generating code dependent on the order.  It's IMHO 
 just not
 expected that you compare function-local expressions from different
 functions.

Same speedup result from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_hashval_t ((uintptr_t)t3, val);
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

and from

Index: gcc/tree.c
===
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
   }
 case SSA_NAME:
   /* We can just compare by pointer.  */
-  return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+  return iterative_hash_host_wide_int
+ (DECL_UID (cfun-decl),
+  iterative_hash_host_wide_int (SSA_NAME_VERSION (t), 
val));
 case PLACEHOLDER_EXPR:
   /* The node itself doesn't matter.  */
   return val;

better than hashing pointers but requring cfun != NULL in this
function isn't good either.

At this point I'm more comfortable with clearing the hashtable
than with changing iterative_hash_expr in any way.  It's also
along the way to get rid of the hash completely.

Oh, btw, the speedup is going from

 expand  : 481.98 (94%) usr   1.15 (17%) sys 481.94 
(93%) 
wall  293891 kB (15%) ggc

to

 expand  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 
6%) 
wall  262544 kB (13%) ggc

at -O0 (less dramatic slowness for -On).

 The other thing would be to discard mem-attr sharing alltogether,
 but that doesn't seem appropriate at this stage (but it would
 also simplify quite some code).  With only one function in RTL
 at a time that shouldn't be too bad (see several suggestions
 along that line, even with statistics).
   
   Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
  
   With the patch below to get some statistics we see that one important
   piece of sharing not covered by above measurements is RTX copying(?).
  
   On the testcase for this PR I get at -O1 and without the patch
   to clear the hashtable after each function
  
   142489 mem_attrs created (142439 for new, 50 for modification)
   1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
   by rtx copying)
   0 mem_attrs dropped
  
   and with the patch to clear after each function
  
   364411 mem_attrs created (144478 for new, 219933 for modification)
   1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
   by rtx copying)
   0 mem_attrs dropped
  
   while for dwarf2out.c I see without the clearing
  
   24399 mem_attrs created (6929 for new, 17470 for modification)
   102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
   rtx copying)
   16 mem_attrs dropped
  
   which means that completely dropping the sharing would result
   in creating of 6929 + 17807 + 62533(!) vs. 24399 

C++ PATCH for c++/60167 (reference template parameters)

2014-02-21 Thread Jason Merrill
My patch for 58606 was incomplete; there were other places that needed 
to change to handle dereferencing reference non-type template parameters.


Tested x86_64-pc-linux-gnu, applying to trunk.  I reverted the earlier 
58606 patch on the 4.8 branch.
commit 7b1bb4515ae768ca44e192442d2578ea46c16f96
Author: Jason Merrill ja...@redhat.com
Date:   Thu Feb 20 23:22:21 2014 -0500

	PR c++/60167
	PR c++/60222
	PR c++/58606
	* parser.c (cp_parser_template_argument): Restore dereference.
	* pt.c (template_parm_to_arg): Dereference non-pack expansions too.
	(process_partial_specialization): Handle deref.
	(unify): Likewise.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4673f78..d8ccd2b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13937,6 +13937,7 @@ cp_parser_template_argument (cp_parser* parser)
 
 	  if (INDIRECT_REF_P (argument))
 	{
+	  /* Strip the dereference temporarily.  */
 	  gcc_assert (REFERENCE_REF_P (argument));
 	  argument = TREE_OPERAND (argument, 0);
 	}
@@ -13975,6 +13976,8 @@ cp_parser_template_argument (cp_parser* parser)
 	  if (address_p)
 		argument = build_x_unary_op (loc, ADDR_EXPR, argument,
 	 tf_warning_or_error);
+	  else
+		argument = convert_from_reference (argument);
 	  return argument;
 	}
 	}
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 6477fce..4cf387a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3861,6 +3861,8 @@ template_parm_to_arg (tree t)
 	  SET_ARGUMENT_PACK_ARGS (t, vec);
 	  TREE_TYPE (t) = type;
 	}
+  else
+	t = convert_from_reference (t);
 }
   return t;
 }
@@ -4218,10 +4220,12 @@ process_partial_specialization (tree decl)
   if (/* These first two lines are the `non-type' bit.  */
   !TYPE_P (arg)
TREE_CODE (arg) != TEMPLATE_DECL
-  /* This next line is the `argument expression is not just a
+  /* This next two lines are the `argument expression is not just a
  simple identifier' condition and also the `specialized
  non-type argument' bit.  */
-   TREE_CODE (arg) != TEMPLATE_PARM_INDEX)
+   TREE_CODE (arg) != TEMPLATE_PARM_INDEX
+	   !(REFERENCE_REF_P (arg)
+		TREE_CODE (TREE_OPERAND (arg, 0)) == TEMPLATE_PARM_INDEX))
 {
   if ((!packed_args  tpd.arg_uses_template_parms[i])
   || (packed_args  uses_template_parms (arg)))
@@ -17893,6 +17897,12 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict,
   /* Unification fails if we hit an error node.  */
   return unify_invalid (explain_p);
 
+case INDIRECT_REF:
+  if (REFERENCE_REF_P (parm))
+	return unify (tparms, targs, TREE_OPERAND (parm, 0), arg,
+		  strict, explain_p);
+  /* FALLTHRU */
+
 default:
   /* An unresolved overload is a nondeduced context.  */
   if (is_overloaded_fn (parm) || type_unknown_p (parm))
diff --git a/gcc/testsuite/g++.dg/template/ref7.C b/gcc/testsuite/g++.dg/template/ref7.C
new file mode 100644
index 000..f6395e2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ref7.C
@@ -0,0 +1,10 @@
+// PR c++/60167
+
+template int F
+struct Foo {
+  typedef int Bar;
+
+  static Bar cache;
+};
+
+template int F typename FooF::Bar FooF::cache;
diff --git a/gcc/testsuite/g++.dg/template/ref8.C b/gcc/testsuite/g++.dg/template/ref8.C
new file mode 100644
index 000..a2fc847
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/ref8.C
@@ -0,0 +1,8 @@
+// PR c++/60222
+
+templateint struct A
+{
+  templatetypename struct B;
+
+  templatetypename T struct BT* {};
+};


C++ PATCH for c++/60251 (ICE with VLA capture)

2014-02-21 Thread Jason Merrill
is_normal_capture_proxy got confused by the contortions we go through to 
build up a capture proxy for a VLA capture, so it's easier to just check 
for variably modified type.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1fa864d218992c8a1b9b1fd4fae2205d5572205b
Author: Jason Merrill ja...@redhat.com
Date:   Thu Feb 20 23:35:28 2014 -0500

	PR c++/60251
	* lambda.c (is_normal_capture_proxy): Handle VLA capture.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 8bb820d..ad993e9d 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -250,6 +250,10 @@ is_normal_capture_proxy (tree decl)
 /* It's not a capture proxy.  */
 return false;
 
+  if (variably_modified_type_p (TREE_TYPE (decl), NULL_TREE))
+/* VLA capture.  */
+return true;
+
   /* It is a capture proxy, is it a normal capture?  */
   tree val = DECL_VALUE_EXPR (decl);
   if (val == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla11.C b/gcc/testsuite/g++.dg/cpp1y/vla11.C
new file mode 100644
index 000..c9cdade
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/vla11.C
@@ -0,0 +1,8 @@
+// PR c++/60251
+// { dg-options -std=c++1y -pedantic-errors }
+
+void foo(int n)
+{
+  int x[n];
+  [x]() { decltype(x) y; }; // { dg-error decltype of array of runtime bound }
+}


C++ PATCH for c++/60250 (ICE with invalid array bound)

2014-02-21 Thread Jason Merrill
A type-dependent expression can have NULL TREE_TYPE, and if we wrap it 
in a NOP_EXPR also with NULL type, that confuses things.  Let's not try 
to do that.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 5564347b2b7b39d92f8f3b8307bc8ed8551e4d91
Author: Jason Merrill ja...@redhat.com
Date:   Thu Feb 20 23:46:00 2014 -0500

	PR c++/60250
	* parser.c (cp_parser_direct_declarator): Don't wrap a
	type-dependent expression in a NOP_EXPR.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d8ccd2b..d6c176f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -17233,7 +17233,8 @@ cp_parser_direct_declarator (cp_parser* parser,
    array bound is not an integer constant);
 		  bounds = error_mark_node;
 		}
-	  else if (processing_template_decl)
+	  else if (processing_template_decl
+		!type_dependent_expression_p (bounds))
 		{
 		  /* Remember this wasn't a constant-expression.  */
 		  bounds = build_nop (TREE_TYPE (bounds), bounds);
diff --git a/gcc/testsuite/g++.dg/cpp1y/vla12.C b/gcc/testsuite/g++.dg/cpp1y/vla12.C
new file mode 100644
index 000..df47f26
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/vla12.C
@@ -0,0 +1,7 @@
+// PR c++/60250
+// { dg-options -std=c++1y -pedantic-errors }
+
+templatetypename void foo()
+{
+  typedef int T[ ([](){ return 1; }()) ]; // { dg-error runtime bound }
+}


Re: [AArch64 01/14] Use generic target, if no other default.

2014-02-21 Thread Kyrill Tkachov

Hi Philipp,

On 18/02/14 21:09, Philipp Tomsich wrote:

The default target should be generic, as Cortex-A53 includes
optional ISA features (CRC and CRYPTO) that are not required for
architectural compliance. The key difference between generic (which
already uses the cortexa53 pipeline model for scheduling) is the
absence of any optional ISA features in the generic target.
---
  gcc/config/aarch64/aarch64.c | 2 +-
  gcc/config/aarch64/aarch64.h | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 784bfa3..70dda00 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5244,7 +5244,7 @@ aarch64_override_options (void)
  
/* If the user did not specify a processor, choose the default

   one for them.  This will be the CPU set during configuration using
- --with-cpu, otherwise it is cortex-a53.  */
+ --with-cpu, otherwise it is generic.  */
if (!selected_cpu)
  {
selected_cpu = all_cores[TARGET_CPU_DEFAULT  0x3f];
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 13c424c..b66a6b4 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -472,10 +472,10 @@ enum target_cpus
TARGET_CPU_generic
  };
  
-/* If there is no CPU defined at configure, use cortex-a53 as default.  */

+/* If there is no CPU defined at configure, use generic as default.  */
  #ifndef TARGET_CPU_DEFAULT
  #define TARGET_CPU_DEFAULT \
-  (TARGET_CPU_cortexa53 | (AARCH64_CPU_DEFAULT_FLAGS  6))
+  (TARGET_CPU_generic | (AARCH64_CPU_DEFAULT_FLAGS  6))
  #endif
  
  /* The processor for which instructions should be scheduled.  */


I don't think this approach will work. The bug we have here is that in 
config.gcc when processing a --with-arch directive it will use the CPU flags of 
the sample cpu given for the architecture in aarch64-arches.def. This will cause 
it to use cortex-a53+fp+simd+crypto+crc when asked to configure for 
--with-arch=armv8-a. Instead it should be using the 4th field of the 
AARCH64_ARCH which specifies the ISA flags implied by the architecture. Then we 
would get cortex-a53+fp+simd.


Also, if no --with-arch or --with-cpu is specified, config.gcc will still 
specify TARGET_CPU_DEFAULT as TARGET_CPU_generic but without encoding the ISA 
flags (AARCH64_FL_FOR_ARCH8 in this case) for it in the upper bits of 
TARGET_CPU_DEFAULT, leading to an always defined TARGET_CPU_DEFAULT which will 
cause the last hunk in this patch to never be used and configuring.


I'm working on a fix for these issues.

HTH,
Kyrill



C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)

2014-02-21 Thread Jason Merrill
While parsing the template parameter list for a lambda, we've already 
pushed into the closure class but haven't created the op() 
FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer 
of op() breaks.  Avoid the ICE by not trying to capture 'this' when 
parsing a parameter list.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 415022d49d1cee84b6d2085e7585e1d801d15732
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 00:35:35 2014 -0500

	PR c++/60252
	* lambda.c (maybe_resolve_dummy): Don't try to capture this
	in declaration context.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index ad993e9d..7fe235b 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -749,7 +749,10 @@ maybe_resolve_dummy (tree object)
   if (type != current_class_type
current_class_type
LAMBDA_TYPE_P (current_class_type)
-   DERIVED_FROM_P (type, current_nonlambda_class_type ()))
+   DERIVED_FROM_P (type, current_nonlambda_class_type ())
+  /* If we get here while parsing the parameter list of a lambda, it
+	 will fail, so don't even try (c++/60252).  */
+   current_binding_level-kind != sk_function_parms)
 {
   /* In a lambda, need to go through 'this' capture.  */
   tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type);
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C
new file mode 100644
index 000..58f0fa3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-ice11.C
@@ -0,0 +1,12 @@
+// PR c++/60252
+// { dg-require-effective-target c++11 }
+
+struct A
+{
+  int i;			// { dg-message  }
+
+  void foo()
+  {
+[](){ [](int[i]){}; };	// { dg-error  }
+  }
+};


C++ PATCH for c++/60248 (ICE with variadic template)

2014-02-21 Thread Jason Merrill
mangle_decl shouldn't try to make a forward-compatibility alias for a 
TYPE_DECL, since they don't have symbols.


Tested x86_64-pc-linux-gnu, applying to trunk, 4.7, 4.8.
commit 8d40d9322f567ba5720ac807168232ae3c5ee0e4
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 00:39:25 2014 -0500

	PR c++/60248
	* mangle.c (mangle_decl): Don't make an alias for a TYPE_DECL.

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 7bb6f4b..251edb1 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3485,6 +3485,7 @@ mangle_decl (const tree decl)
 
   if (G.need_abi_warning
   /* Don't do this for a fake symbol we aren't going to emit anyway.  */
+   TREE_CODE (decl) != TYPE_DECL
!DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (decl)
!DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (decl))
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic149.C b/gcc/testsuite/g++.dg/cpp0x/variadic149.C
new file mode 100644
index 000..a250e7c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic149.C
@@ -0,0 +1,11 @@
+// PR c++/60248
+// { dg-options -std=c++11 -g -fabi-version=2 }
+
+templateint... struct A {};
+
+template struct A0
+{
+  typedef enum { e } B;
+};
+
+A0 a;


C++ PATCH for c++/60224 (ICE initializing array with PMF)

2014-02-21 Thread Jason Merrill

We shouldn't treat a CONSTRUCTOR as an init-list if it already has a type.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8e1493a7a31ffdb1e70977c325e7d2f2686b14a7
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 00:52:20 2014 -0500

	PR c++/60224
	* decl.c (cp_complete_array_type, maybe_deduce_size_from_array_init):
	Don't get confused by a CONSTRUCTOR that already has a type.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index b7d2d9f..04c4cf5 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4880,7 +4880,7 @@ maybe_deduce_size_from_array_init (tree decl, tree init)
 	 those are not supported in GNU C++, and as the middle-end
 	 will crash if presented with a non-numeric designated
 	 initializer.  */
-  if (initializer  TREE_CODE (initializer) == CONSTRUCTOR)
+  if (initializer  BRACE_ENCLOSED_INITIALIZER_P (initializer))
 	{
 	  vecconstructor_elt, va_gc *v = CONSTRUCTOR_ELTS (initializer);
 	  constructor_elt *ce;
@@ -7099,6 +7099,11 @@ cp_complete_array_type (tree *ptype, tree initial_value, bool do_default)
   int failure;
   tree type, elt_type;
 
+  /* Don't get confused by a CONSTRUCTOR for some other type.  */
+  if (initial_value  TREE_CODE (initial_value) == CONSTRUCTOR
+   !BRACE_ENCLOSED_INITIALIZER_P (initial_value))
+return 1;
+
   if (initial_value)
 {
   unsigned HOST_WIDE_INT i;
diff --git a/gcc/testsuite/g++.dg/init/array36.C b/gcc/testsuite/g++.dg/init/array36.C
new file mode 100644
index 000..77e4f90
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/array36.C
@@ -0,0 +1,8 @@
+// PR c++/60224
+
+struct A {};
+
+void foo()
+{
+  bool b[] = (int (A::*)())0;	// { dg-error  }
+}


C++ PATCH for c++/60219 (ICE with invalid variadics)

2014-02-21 Thread Jason Merrill
In coerce_template_parms, if we try to pack the remaining arguments into 
an argument pack and that fails, we should immediately stop trying to 
process more arguments.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 1555baa24f537d0e724c53845e7ba2881df7a77f
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 01:05:42 2014 -0500

	PR c++/60219
	* pt.c (coerce_template_parms): Bail if argument packing fails.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 3e464ff..0729d93 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6808,6 +6808,8 @@ coerce_template_parms (tree parms,
   /* Store this argument.  */
   if (arg == error_mark_node)
 lost++;
+	  if (lost)
+	break;
   TREE_VEC_ELT (new_inner_args, parm_idx) = arg;
 
 	  /* We are done with all of the arguments.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic150.C b/gcc/testsuite/g++.dg/cpp0x/variadic150.C
new file mode 100644
index 000..6a30efe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic150.C
@@ -0,0 +1,9 @@
+// PR c++/60219
+// { dg-require-effective-target c++11 }
+
+templatetypename..., int void foo();
+
+void bar()
+{
+  foo0;			// { dg-error  }
+}


C++ PATCH for c++/60216 (ICE with specialization of deleted template)

2014-02-21 Thread Jason Merrill
We need to propagate DECL_DELETED_FN to clones when we get a new 
specialization.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit eaf1689e134ff4fb364c0045965b19879bff8f32
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 08:47:01 2014 -0500

	PR c++/60216
	* pt.c (register_specialization): Copy DECL_DELETED_FN to clones.
	(check_explicit_specialization): Don't clone.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0729d93..f07f6e6 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1440,6 +1440,8 @@ register_specialization (tree spec, tree tmpl, tree args, bool is_friend,
 		= DECL_DECLARED_INLINE_P (fn);
 		  DECL_SOURCE_LOCATION (clone)
 		= DECL_SOURCE_LOCATION (fn);
+		  DECL_DELETED_FN (clone)
+		= DECL_DELETED_FN (fn);
 		}
 	  check_specialization_namespace (tmpl);
 
@@ -2770,15 +2772,16 @@ check_explicit_specialization (tree declarator,
 	   It's just the name of an instantiation.  But, it's not
 	   a request for an instantiation, either.  */
 	SET_DECL_IMPLICIT_INSTANTIATION (decl);
-	  else if (DECL_CONSTRUCTOR_P (decl) || DECL_DESTRUCTOR_P (decl))
-	/* This is indeed a specialization.  In case of constructors
-	   and destructors, we need in-charge and not-in-charge
-	   versions in V3 ABI.  */
-	clone_function_decl (decl, /*update_method_vec_p=*/0);
 
 	  /* Register this specialization so that we can find it
 	 again.  */
 	  decl = register_specialization (decl, gen_tmpl, targs, is_friend, 0);
+
+	  /* A 'structor should already have clones.  */
+	  gcc_assert (decl == error_mark_node
+		  || !(DECL_CONSTRUCTOR_P (decl)
+			   || DECL_DESTRUCTOR_P (decl))
+		  || DECL_CLONED_FUNCTION_P (DECL_CHAIN (decl)));
 	}
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted3.C b/gcc/testsuite/g++.dg/cpp0x/deleted3.C
new file mode 100644
index 000..6783677
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/deleted3.C
@@ -0,0 +1,11 @@
+// PR c++/60216
+// { dg-require-effective-target c++11 }
+
+struct A
+{
+  templatetypename T A(T) = delete;
+};
+
+template A::Aint(int) {}
+
+A a(0);


Re: [PATCH][2/2] Fix expansion slowness of PR60291

2014-02-21 Thread Richard Biener
On Fri, 21 Feb 2014, Richard Biener wrote:

 On Fri, 21 Feb 2014, Richard Biener wrote:
 
  On Fri, 21 Feb 2014, Richard Sandiford wrote:
  
   In a thread a few years ago you talked about the possibility of going
   further and folding the attributes into the MEM itself, so avoiding
   the indirection and separate allocation:
   
 http://thread.gmane.org/gmane.comp.gcc.patches/244464/focus=244538
   
   (and earlier posts in the thread).  Would that still be OK?
   I might have a go if so.
  
  It would work for me.  Micha just brought up the easiest incremental
  change though, which is

...

 I am testing the following (and also consider it appropriate as a
 fix for the regression PR60291).
 
 Ok for trunk/branch(es)?  Now we have many variants to choose from ;)

Jakub requested statistics for a bootstrap for this one.  I get
for r207939 and a --enable-languages=c x86_64 bootstrap
3609924 mem-attrs created overall without the patch and
8268976 with the patch (that's a factor of 2.3 and thus nothing).

Richard.


[PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call

2014-02-21 Thread Martin Jambor
Hi,

in propagate_constants_accross_call we expect a thunk to have at least
one parameter and thus an ipa-prop parameter descriptor.  However,
when the callee comes from a CU that was compiled with -O0, there are
no parameter descriptors and we fail an index checking assert.

This patch fixes it by bailing out early if there are no parameter
descriptors because in that case there is nothing to do in that
function anyway.  Bootstrap and testing in progress, OK for trunk if
it passes?

Thanks,

Martin


2014-02-21  Martin Jambor  mjam...@suse.cz

PR ipa/60266
* ipa-cp.c (propagate_constants_accross_call): Bail out early if
there are no parameter descriptors.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 7d8bc05..4c9ab12 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
   args = IPA_EDGE_REF (cs);
   args_count = ipa_get_cs_argument_count (args);
   parms_count = ipa_get_param_count (callee_info);
+  if (parms_count == 0)
+return false;
 
   /* If this call goes through a thunk we must not propagate to the first (0th)
  parameter.  However, we might need to uncover a thunk from below a series


Re: [PATCH, ARM] Support ORN for DImode

2014-02-21 Thread Richard Earnshaw
On 19/02/14 10:18, Ian Bolton wrote:
 Hi,
 
 Patterns had previously been added to thumb2.md to support ORN, but only for
 SImode.
 
 This patch adds DImode support, to cover the full 64|64-64 operation and
 the various 32|64-64 operations (see AND:DI variants that use NOT).
 
 The patch comes with its own execution test and looks for correct number of
 ORN instructions in the assembly.
 
 Regressions passed.
 
 OK for stage 1?
 

OK.

Do you not also need a pattern for

(ior:DI (not:DI (reg:DI))
(zero_extend:DI (reg:SI))

-
   orn (lowpart)+ mvn(highpart)

I don't think one works for sign-extension, though.

R.

 
 2014-02-19  Ian Bolton  ian.bol...@arm.com
 
 gcc/
 * config/arm/thumb2.md (*iordi_notdi_di): New pattern.
 (*iordi_notzesidi): New pattern.
 (*iordi_notsesidi_di): New pattern.
 testsuite/
 * gcc.target/arm/iordi_notdi-1.c: New test.
 



[libstdc++-v3] Move shared_mutex to shared_timed_mutex - late C++14 change (n3891)

2014-02-21 Thread Ed Smith-Rowland

This are the patches as applied

Built and tested x86_64-linux.

2014-02-20  Ed Smith-Rowland  3dw...@verizon.net

Rename shared_mutex to shared_timed_mutex per C++14 acceptance of N3891.
* include/std/shared_mutex: Rename shared_mutex to shared_timed_mutex.
* testsuite/30_threads/shared_lock/locking/2.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/4.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/1.cc: Ditto.
* testsuite/30_threads/shared_lock/locking/3.cc: Ditto.
* testsuite/30_threads/shared_lock/requirements/
explicit_instantiation.cc: Ditto.
* testsuite/30_threads/shared_lock/requirements/typedefs.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/2.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/4.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/1.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/6.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/3.cc: Ditto.
* testsuite/30_threads/shared_lock/cons/5.cc: Ditto.
* testsuite/30_threads/shared_lock/modifiers/2.cc: Ditto.
* testsuite/30_threads/shared_lock/modifiers/1.cc: Ditto.
* testsuite/30_threads/shared_mutex/requirements/
standard_layout.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/copy_neg.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/1.cc: Ditto.
* testsuite/30_threads/shared_mutex/cons/assign_neg.cc: Ditto.
* testsuite/30_threads/shared_mutex/try_lock/2.cc: Ditto.
* testsuite/30_threads/shared_mutex/try_lock/1.cc: Ditto.
2014-02-21  Ed Smith-Rowland  3dw...@verizon.net

Rename testsuite directory shared_mutex to shared_timed_mutex
for consistency.
* testsuite/30_threads/shared_mutex: Moved to...
* testsuite/30_threads/shared_timed_mutex: ...here
Index: include/std/shared_mutex
===
--- include/std/shared_mutex(revision 207061)
+++ include/std/shared_mutex(working copy)
@@ -52,8 +52,8 @@
*/
 
 #if defined(_GLIBCXX_HAS_GTHREADS)  defined(_GLIBCXX_USE_C99_STDINT_TR1)
-  /// shared_mutex
-  class shared_mutex
+  /// shared_timed_mutex
+  class shared_timed_mutex
   {
 #if _GTHREAD_USE_MUTEX_TIMEDLOCK
 struct _Mutex : mutex, __timed_mutex_impl_Mutex
@@ -84,15 +84,15 @@
 static constexpr unsigned _M_n_readers = ~_S_write_entered;
 
   public:
-shared_mutex() : _M_state(0) {}
+shared_timed_mutex() : _M_state(0) {}
 
-~shared_mutex()
+~shared_timed_mutex()
 {
   _GLIBCXX_DEBUG_ASSERT( _M_state == 0 );
 }
 
-shared_mutex(const shared_mutex) = delete;
-shared_mutex operator=(const shared_mutex) = delete;
+shared_timed_mutex(const shared_timed_mutex) = delete;
+shared_timed_mutex operator=(const shared_timed_mutex) = delete;
 
 // Exclusive ownership
 
Index: testsuite/30_threads/shared_lock/locking/2.cc
===
--- testsuite/30_threads/shared_lock/locking/2.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/2.cc   (working copy)
@@ -30,7 +30,7 @@
 void test01()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lockmutex_type lock_type;
 
   try
@@ -66,7 +66,7 @@
 void test02()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lockmutex_type lock_type;
 
   try
Index: testsuite/30_threads/shared_lock/locking/4.cc
===
--- testsuite/30_threads/shared_lock/locking/4.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/4.cc   (working copy)
@@ -31,7 +31,7 @@
 int main()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lockmutex_type lock_type;
   typedef std::chrono::system_clock clock_type;
 
Index: testsuite/30_threads/shared_lock/locking/1.cc
===
--- testsuite/30_threads/shared_lock/locking/1.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/1.cc   (working copy)
@@ -30,7 +30,7 @@
 int main()
 {
   bool test __attribute__((unused)) = true;
-  typedef std::shared_mutex mutex_type;
+  typedef std::shared_timed_mutex mutex_type;
   typedef std::shared_lockmutex_type lock_type;
 
   try
Index: testsuite/30_threads/shared_lock/locking/3.cc
===
--- testsuite/30_threads/shared_lock/locking/3.cc   (revision 205961)
+++ testsuite/30_threads/shared_lock/locking/3.cc   (working copy)
@@ -31,7 +31,7 @@
 int main()
 {
   bool test 

C++ PATCH for c++/60051 (ICE deducing array)

2014-02-21 Thread Jason Merrill
This patch benefits from the discussion of array deduction at last 
week's C++ standardization committee meeting, where we clarified that we 
should only try to deduce the array bound from an initializer-list if 
the array bound is deducible, i.e. if it's a non-type template 
parameter.  We also should avoid crashing on a 0-length init-list which 
would result in an invalid 0-length array.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8fc69de2c377470b3ae9a8ebc65b0909d626d6e3
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 00:16:52 2014 -0500

	DR 1591
	PR c++/60051
	* pt.c (unify): Only unify if deducible.  Handle 0-length list.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4cf387a..0f576a5 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17262,14 +17262,16 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict,
    explain_p);
 	}
 
-  if (TREE_CODE (parm) == ARRAY_TYPE)
+  if (TREE_CODE (parm) == ARRAY_TYPE
+	   deducible_array_bound (TYPE_DOMAIN (parm)))
 	{
 	  /* Also deduce from the length of the initializer list.  */
 	  tree max = size_int (CONSTRUCTOR_NELTS (arg));
 	  tree idx = compute_array_index_type (NULL_TREE, max, tf_none);
-	  if (TYPE_DOMAIN (parm) != NULL_TREE)
-	return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm),
-   idx, explain_p);
+	  if (idx == error_mark_node)
+	return unify_invalid (explain_p);
+	  return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm),
+ idx, explain_p);
 	}
 
   /* If the std::initializer_listT deduction worked, replace the
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist80.C b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
new file mode 100644
index 000..7947f1f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
@@ -0,0 +1,6 @@
+// PR c++/60051
+// { dg-require-effective-target c++11 }
+
+#include initializer_list
+
+auto x[2] = {};			// { dg-error  }


Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Ilya Verbin
2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com:
 There were still a number of things in these patches that did not make sense
 to me and which I've changed. Let me know if there was a good reason for the
 way some of these things were originally done.
  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).

What do you mean by multiple small table fragments?
The tables from every object file should be joined together while
linking DSO in the same order for both host and target.
If you need to join tables from multiple target images into one big
table, the host tables also should be joined in the same order. In our
case we're obtaining each target table while loading the image to
target device, and merging it with a corresponding host table.
How splitting functions and global vars into 2 tables will help to
avoid intermixing?

  * Is there a reason to call a register function for the host tables?
The way I've set it up, we register a target function/variable table
while also passing a pointer to the __OPENMP_TARGET__ symbol which
holds information about the host side tables.

Suppose there is liba, that depends on libb, that depends on libc.
Also corresponding target image tgtimga depends on tgtimgb, that
depends on tgtimgc. When liba is going to start offloaded function, it
calls GOMP_target with a pointer to its descriptor, which contains a
pointer to tgtimga. But how does GOMP_target know that it should also
load tgtimgb and tgtimgc to target? And where to get their descriptors
from?
That's why we have added host-side DSO registration. In this example
they are loaded on host in the following order: libc, libb, liba. In
the same order they are registered in libgomp, and loaded to target
device while initialization. In the same order the tables received
from target are merged with the host tables from the descriptors.

 I'm appending those parts of my current patch kit that seem relevant. This
 includes the ptx mkoffload tool and a patch to make a dummy
 GOMP_offload_register function. Most of the others are updated versions of
 patches I've posted before, and two adapted from Michael Zolotukhin's set
 (automatically generated files not included in the diffs for size reasons).
 How does this look?

I will take a closer look at you changes, try to run it, and send
feedback next week.

  -- Ilya


Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Ilya Tocar
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
  We can either support new CPUID or disable PREFETCHWT1 from generating,
  without removing code, and enable it in 4.9.1/latest version.
  I am not sure that adding new -m flag and related stuff this late
  is a good idea. Should still add it?
 
 Please submit the patch anyway. We can relax release constraints on
 non-algorithmic patch a bit, weighting in benefits of having gcc
 release that fully conforms to some published specification.

Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
Ok for trunk?

ChangeLog:

2014-02-21  Ilya Tocar  ilya.to...@intel.com

* common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
(OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
(ix86_handle_option): Handle OPT_mprefetchwt1.
* config/i386/cpuid.h (bit_PREFETCHWT1): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
PREFETCHWT1 CPUID.
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
OPTION_MASK_ISA_PREFETCHWT1.
* config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
(PTA_PREFETCHWT1): New.
(ix86_option_override_internal): Handle PTA_PREFETCHWT1.
(ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
* config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
  New.
* config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
(*prefetch_avx512pf_mode_: Change into ...
 (*prefetch_prefetchwt1_mode: This.
* config/i386/i386.opt (mprefetchwt1): New.
* config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
(_mm_prefetch): Handle intent to write.
* doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument. 

And for tests:

2014-02-22  Ilya Tocar  ilya.to...@intel.com

* gcc.target/i386/avx-1.c: Update __builtin_prefetch.
* gcc.target/i386/prefetchwt1-1.c: New.
* gcc.target/i386/sse-13.c: Update __builtin_prefetch.
* gcc.target/i386/sse-23.c: Ditto. 

---
 gcc/common/config/i386/i386-common.c  | 15 +++
 gcc/config/i386/cpuid.h   |  4 
 gcc/config/i386/driver-i386.c |  7 +--
 gcc/config/i386/i386-c.c  |  2 ++
 gcc/config/i386/i386.c|  6 ++
 gcc/config/i386/i386.h|  2 ++
 gcc/config/i386/i386.md   | 13 ++---
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/xmmintrin.h   |  6 --
 gcc/doc/invoke.texi   |  4 +++-
 gcc/testsuite/gcc.target/i386/avx-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c | 14 ++
 gcc/testsuite/gcc.target/i386/sse-13.c|  2 +-
 gcc/testsuite/gcc.target/i386/sse-23.c|  2 +-
 14 files changed, 68 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefetchwt1-1.c

diff --git a/gcc/common/config/i386/i386-common.c 
b/gcc/common/config/i386/i386-common.c
index b7f9ff6..a6ab555 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -69,6 +69,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_SET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_SET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_SET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_SET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -154,6 +155,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_PRFCHW_UNSET OPTION_MASK_ISA_PRFCHW
 #define OPTION_MASK_ISA_RDSEED_UNSET OPTION_MASK_ISA_RDSEED
 #define OPTION_MASK_ISA_ADX_UNSET OPTION_MASK_ISA_ADX
+#define OPTION_MASK_ISA_PREFETCHWT1_UNSET OPTION_MASK_ISA_PREFETCHWT1
 
 /* SSE4 includes both SSE4.1 and SSE4.2.  -mno-sse4 should the same
as -mno-sse4.1. */
@@ -757,6 +759,19 @@ ix86_handle_option (struct gcc_options *opts,
}
   return true;
 
+case OPT_mprefetchwt1:
+  if (value)
+   {
+ opts-x_ix86_isa_flags |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_SET;
+   }
+  else
+   {
+ opts-x_ix86_isa_flags = ~OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+ opts-x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_PREFETCHWT1_UNSET;
+   }
+  return true;
+
   /* Comes from final.c -- no real reason to change it.  */
 #define MAX_CODE_ALIGN 16
 
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index c7a53dd..8c323ae 100644
--- 

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Bernd Schmidt

On 02/21/2014 04:17 PM, Ilya Verbin wrote:

2014-02-20 22:27 GMT+04:00 Bernd Schmidt ber...@codesourcery.com:

There were still a number of things in these patches that did not make sense
to me and which I've changed. Let me know if there was a good reason for the
way some of these things were originally done.
  * Functions and variables now go into different tables, otherwise
intermixing between them could be a problem that causes tables to
go out of sync between host and target (imagine one big table being
generated by ptx lto1/mkoffload, and multiple small table fragments
being linked together on the host side).


What do you mean by multiple small table fragments?


Well, suppose you have file1.o and file2.o compiled for the host with a 
.offload_func_table_section in each, and they get linked together - each 
provides a fragment of the whole table.



The tables from every object file should be joined together while
linking DSO in the same order for both host and target.
If you need to join tables from multiple target images into one big
table, the host tables also should be joined in the same order.


The problem is that ptx does not have a linker, so we cannot exactly 
reproduce what happens on the host side. We have to process all host .o 
files in one single invocation of ptx lto1, and produce a single ptx 
assembly file, with a single function/variable table, from there. Having 
functions and variables separated gives us at least a small chance that 
the order will match that found in the host tables if the host table is 
produced by linking multiple fragments.



Suppose there is liba, that depends on libb, that depends on libc.


What kind of dependencies between liba and libb do you expect to be able 
to support on the target side? References to each other's functions and 
variables?



Bernd



Re: [PATCH] Fix PR c++/60065.

2014-02-21 Thread Jason Merrill

On 02/21/2014 03:19 AM, Adam Butcher wrote:

A class template with an out-of-line generic function definition will
give the same issue I think:

   template typename T
   void AT::f(auto x) {}  // should inject a new list


Right.  template_class_depth should be useful here.  This is basically 
the same question as whether a particular member function is a primary 
template (member template) or not, but figuring it out in the middle of 
the parameter list complicates things.



Once it's resolved I think it'd be useful to create a new function to
determine this rather than doing the scope walk in a number of places.
Something like 'templ_parm_scope_for_fn_being_declared' --- or hopefully
some more elegant name!


Right.


Why doesn't num_template_parameter_lists work as a predicate here?


It works in the lambda case as it is updated there, but for generic
functions I think the following prevents it:

   cp/parser.c:17063:

   /* Inside the function parameter list, surrounding
  template-parameter-lists do not apply.  */
   saved_num_template_parameter_lists
 = parser-num_template_parameter_lists;
   parser-num_template_parameter_lists = 0;


Hmm, I wonder what that's for?  What breaks when you remove it? :)

Jason



[jit] New API entrypoint: gcc_jit_context_dump_to_file

2014-02-21 Thread David Malcolm
Committed to branch dmalcolm/jit:

Add a new gcc_jit_context_dump_to_file, which dumps a C-like
representation of the context's IR to a given path.

There is also a flag update_locations, which, when true, will set up
gcc_jit_location information throughout the context, pointing at the dump
file as if it were a source file.

I've been using this in conjunction with GCC_JIT_BOOL_OPTION_DEBUGINFO to
step through generated code in the debugger (when trying to debug my port
of GNU Octave's JIT to libgccjit).

gcc/jit/
* libgccjit.h (gcc_jit_context_dump_to_file): New.
* libgccjit.map (gcc_jit_context_dump_to_file): New.
* libgccjit.c (gcc_jit_context_dump_to_file): New.
* libgccjit++.h (gccjit::context::dump_to_file): New.

* internal-api.h (gcc::jit::dump): New class.
(gcc::jit::recording::playback_location): Add a replayer argument,
so that playback locations can be created before playback statements.
(gcc::jit::recording::location::playback_location): Likewise.
(gcc::jit::recording::statement::playback_location): Likewise.
(gcc::jit::recording::context::dump_to_file): New.
(gcc::jit::recording::context::m_structs): New field, for use by
dump_to_file.
(gcc::jit::recording::context::m_functions): Likewise.
(gcc::jit::recording::memento::write_to_dump): New virtual function.
(gcc::jit::recording::field::write_to_dump): New.
(gcc::jit::recording::fields::write_to_dump): New.
(gcc::jit::recording::function::write_to_dump): New.
(gcc::jit::recording::function::m_locals): New field for use by
write_to_dump.
(gcc::jit::recording::function::m_activity): Likewise.
(gcc::jit::recording::local::write_to_dump): New.
(gcc::jit::recording::statement::write_to_dump): New.
(gcc::jit::recording::place_label::write_to_dump): New.

* internal-api.c (gcc::jit::dump::dump): New.
(gcc::jit::dump::~dump): New.
(gcc::jit::dump::write): New.
(gcc::jit::dump::make_location): New.
(gcc::jit::recording::playback_location): Add a replayer argument,
so that playback locations can be created before playback statements.

(gcc::jit::recording::context::context): Initialize new fields.
(gcc::jit::recording::function::function): Likewise.

(gcc::jit::recording::context::new_struct_type): Add struct to the
context's m_structs vector.
(gcc::jit::recording::context::new_function): Add function to the
context's m_functions vector.
(gcc::jit::recording::context::dump_to_file): New.
(gcc::jit::recording::memento::write_to_dump): New.
(gcc::jit::recording::field::write_to_dump): New.
(gcc::jit::recording::fields::write_to_dump): New.
(gcc::jit::recording::function::write_to_dump): New.
(gcc::jit::recording::local::write_to_dump): New.
(gcc::jit::recording::statement::write_to_dump): New.
(gcc::jit::recording::place_label::write_to_dump): New.

(gcc::jit::recording::array_type::replay_into): Pass on replayer
to call to playback_location.
(gcc::jit::recording::field::replay_into): Likewise.
(gcc::jit::recording::struct_::replay_into): Likewise.
(gcc::jit::recording::param::replay_into): Likewise.
(gcc::jit::recording::function::replay_into): Likewise.
(gcc::jit::recording::global::replay_into): Likewise.
(gcc::jit::recording::unary_op::replay_into): Likewise.
(gcc::jit::recording::binary_op::replay_into): Likewise.
(gcc::jit::recording::comparison::replay_into): Likewise.
(gcc::jit::recording::call::replay_into): Likewise.
(gcc::jit::recording::array_access::replay_into): Likewise.
(gcc::jit::recording::access_field_of_lvalue::replay_into): Likewise.
(gcc::jit::recording::access_field_rvalue::replay_into): Likewise.
(gcc::jit::recording::dereference_field_rvalue::replay_into): Likewise.
(gcc::jit::recording::dereference_rvalue::replay_into): Likewise.
(gcc::jit::recording::get_address_of_lvalue::replay_into): Likewise.
(gcc::jit::recording::local::replay_into): Likewise.
(gcc::jit::recording::eval::replay_into): Likewise.
(gcc::jit::recording::assignment::replay_into): Likewise.
(gcc::jit::recording::assignment_op::replay_into): Likewise.
(gcc::jit::recording::comment::replay_into): Likewise.
(gcc::jit::recording::conditional::replay_into): Likewise.
(gcc::jit::recording::place_label::replay_into): Likewise.
(gcc::jit::recording::jump::replay_into): Likewise.
(gcc::jit::recording::return_::replay_into): Likewise.
(gcc::jit::recording::loop::replay_into): Likewise.
(gcc::jit::recording::loop_end::replay_into): Likewise.

(gcc::jit::recording::function::new_local): Add to the function's
 

[Patch, AArch64] Fix shuffle for big-endian.

2014-02-21 Thread Tejas Belagod


Hi,

When a shuffle of more than one input happens, on NEON we end up with a 
'mixed-endian' format in the register list which TBL operates on. We don't make 
this correction in RTL and therefore the shuffle operation gets it incorrect. 
Here is a patch that fixes-up the index table in the selector rtx in RTL to also 
be mixed-endian to reflect what's happening on NEON.


As trunk stands, this patch will not be exercised as constant vector permute for 
Big-endian is disabled. I've tested this by locally enabling const vec_perm and 
it fixes the some regressions we have on big-endian:


aarch64_be-none-elf:
FAIL-PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer
FAIL-PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-all-loops -finline-functions
FAIL-PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -fomit-frame-pointer 
-funroll-loops

FAIL-PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
FAIL-PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
FAIL-PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
FAIL-PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
FAIL-PASS: gcc.dg/vect/vect-114.c execution test
FAIL-PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
FAIL-PASS: gcc.dg/vect/vect-15.c execution test

Also regressed on aarch64-none-elf.

OK for stage-1?

Thanks,
Tejas.

2014-02-21  Tejas Belagod  tejas.bela...@arm.com

gcc/
* config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
big-endian when dealing with more than one input shuffle vector.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ea90311..fd473a3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8128,7 +8128,28 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
 return false;

   for (i = 0; i  nelt; ++i)
-rperm[i] = GEN_INT (d-perm[i]);
+{
+  int nunits = GET_MODE_NUNITS (vmode);
+  int elt = d-perm[i];
+
+  /* If two vectors, we end up with a wierd mixed-endian mode on NEON.  */
+  if (BYTES_BIG_ENDIAN)
+   {
+ if (!d-one_vector_p  d-perm[i]  nunits)
+   {
+ /* Extract the offset.  */
+ elt = d-perm[i]  (nunits - 1);
+ /* Reverse the top half.  */
+ elt = nunits - 1 - elt;
+ /* Offset it by the bottom half.  */
+ elt += nunits;
+   }
+ else
+   elt = nunits - 1 - d-perm[i];
+   }
+
+  rperm[i] = GEN_INT (elt);
+}
   sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
   sel = force_reg (vmode, sel);


Re: [PATCH] Bound number of recursive compute_control_dep_chain calls with a param (PR tree-optimization/56490)

2014-02-21 Thread Xinliang David Li
thanks for the fix!

David

On Fri, Feb 21, 2014 at 12:21 AM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 As discussed in the PR, on larger functions we can end up with
 over 3 million of compute_control_dep_chain nested calls from
 a single compute_control_dep_chain call, on that testcase all that
 effort just to get zero or at most one (useless) control dep path.
 The problem is that the function is really unbound, even with the
 6 element path length limitation (recursion depth) and the limit of 8
 find_pdom calls - everything still iterates on all the successor edges at
 each level.  And, the function is often called on the same basic block
 again and again, even at a particular depth level (e.g. over 20 times
 same bb same depth level).  But the preceeding edge list is slightly
 different in each case and in theory it could give different answers.

 Fixed by bounding the total number of nested calls.

 Additionally, I've made a couple of cleanups, heap allocating 8 field array
 instead of using an automatic array makes no sense, the chain length is at
 most 6 and thus we can use a stack vector, etc.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

 2014-02-21  Jakub Jelinek  ja...@redhat.com

 PR tree-optimization/56490
 * params.def (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS): New param.
 * tree-ssa-uninit.c: Include params.h.
 (compute_control_dep_chain): Add num_calls argument, return false
 if it exceed PARAM_UNINIT_CONTROL_DEP_ATTEMPTS param, pass
 num_calls to recursive call.
 (find_predicates): Change dep_chain into normal array,
 cur_chain into auto_vecedge, MAX_CHAIN_LEN + 1, add num_calls
 variable and adjust compute_control_dep_chain caller.
 (find_def_preds): Likewise.

 --- gcc/params.def.jj   2014-01-09 19:09:47.0 +0100
 +++ gcc/params.def  2014-02-20 19:30:37.467597338 +0100
 @@ -1078,6 +1078,12 @@ DEFPARAM (PARAM_ASAN_USE_AFTER_RETURN,
   asan-use-after-return,
   Enable asan builtin functions protection,
   1, 0, 1)
 +
 +DEFPARAM (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS,
 + uninit-control-dep-attempts,
 + Maximum number of nested calls to search for control dependencies 
 + during uninitialized variable analysis,
 + 1000, 1, 0)
  /*

  Local variables:
 --- gcc/tree-ssa-uninit.c.jj2014-02-04 01:35:58.0 +0100
 +++ gcc/tree-ssa-uninit.c   2014-02-20 19:31:14.198385817 +0100
 @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.
  #include hashtab.h
  #include tree-pass.h
  #include diagnostic-core.h
 +#include params.h

  /* This implements the pass that does predicate aware warning on uses of
 possibly uninitialized variables. The pass first collects the set of
 @@ -390,8 +391,8 @@ find_control_equiv_block (basic_block bb

  /* Computes the control dependence chains (paths of edges)
 for DEP_BB up to the dominating basic block BB (the head node of a
 -   chain should be dominated by it).  CD_CHAINS is pointer to a
 -   dynamic array holding the result chains. CUR_CD_CHAIN is the current
 +   chain should be dominated by it).  CD_CHAINS is pointer to an
 +   array holding the result chains.  CUR_CD_CHAIN is the current
 chain being computed.  *NUM_CHAINS is total number of chains.  The
 function returns true if the information is successfully computed,
 return false if there is no control dependence or not computed.  */
 @@ -400,7 +401,8 @@ static bool
  compute_control_dep_chain (basic_block bb, basic_block dep_bb,
 vecedge *cd_chains,
 size_t *num_chains,
 -   vecedge *cur_cd_chain)
 +  vecedge *cur_cd_chain,
 +  int *num_calls)
  {
edge_iterator ei;
edge e;
 @@ -411,6 +413,10 @@ compute_control_dep_chain (basic_block b
if (EDGE_COUNT (bb-succs)  2)
  return false;

 +  if (*num_calls  PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
 +return false;
 +  ++*num_calls;
 +
/* Could use a set instead.  */
cur_chain_len = cur_cd_chain-length ();
if (cur_chain_len  MAX_CHAIN_LEN)
 @@ -450,7 +456,7 @@ compute_control_dep_chain (basic_block b

/* Now check if DEP_BB is indirectly control dependent on BB.  */
if (compute_control_dep_chain (cd_bb, dep_bb, cd_chains,
 - num_chains, cur_cd_chain))
 +num_chains, cur_cd_chain, num_calls))
  {
found_cd_chain = true;
break;
 @@ -595,14 +601,12 @@ find_predicates (pred_chain_union *preds
   basic_block use_bb)
  {
size_t num_chains = 0, i;
 -  vecedge *dep_chains = 0;
 -  vecedge cur_chain = vNULL;
 +  int num_calls = 0;
 +  vecedge dep_chains[MAX_NUM_CHAINS];
 +  auto_vecedge, MAX_CHAIN_LEN + 1 cur_chain;
bool has_valid_pred = 

Re: [PATCH] Fix PR 60268

2014-02-21 Thread Vladimir Makarov

On 2/21/2014, 2:22 AM, Andrey Belevantsev wrote:

Hello,

While fixing PR 58960 I forgot about single-block regions placing the
initialization of the new nr_regions_initial variable in the wrong
place. Thus for single block regions we ended up with nr_regions = 1 and
nr_regions_initial = 0 and effectively turned off sched-pressure
immediately.  No worries for the usual scheduling path but with the
-flive-range-shrinkage we have broke an assert that sched-pressure is in
the specific mode.

Fixed by placing the initialization properly at the end of
sched_rgn_init and also moving the check for sched_pressure != NONE
outside of the if statement in schedule_region as discussed in the PR
trail with Jakub.

Bootstrapped and tested on x86-64, ok?




Ok.  Thanks, Andrey.


2014-02-21  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60268
 * sched-rgn.c (haifa_find_rgns): Move the nr_regions_initial init
to ...
 (sched_rgn_init) ... here.
 (schedule_region): Check for SCHED_PRESSURE_NONE earlier.

testsuite/

2014-02-21  Andrey Belevantsev  a...@ispras.ru

 PR rtl-optimization/60268
 * gcc.c-torture/compile/pr60268.c: New test.




Re: [PATCH, PR 60266] Fix problem with mixing -O0 and -O2 in propagate_constants_accross_call

2014-02-21 Thread Jan Hubicka
 Hi,
 
 in propagate_constants_accross_call we expect a thunk to have at least
 one parameter and thus an ipa-prop parameter descriptor.  However,
 when the callee comes from a CU that was compiled with -O0, there are
 no parameter descriptors and we fail an index checking assert.
 
 This patch fixes it by bailing out early if there are no parameter
 descriptors because in that case there is nothing to do in that
 function anyway.  Bootstrap and testing in progress, OK for trunk if
 it passes?
 
 Thanks,
 
 Martin
 
 
 2014-02-21  Martin Jambor  mjam...@suse.cz
 
   PR ipa/60266
   * ipa-cp.c (propagate_constants_accross_call): Bail out early if
   there are no parameter descriptors.

Actually I have similar patch in my tree for few days since I hit the problem
while building libreoffice.

OK.
Honza
 
 diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
 index 7d8bc05..4c9ab12 100644
 --- a/gcc/ipa-cp.c
 +++ b/gcc/ipa-cp.c
 @@ -1428,6 +1428,8 @@ propagate_constants_accross_call (struct cgraph_edge 
 *cs)
args = IPA_EDGE_REF (cs);
args_count = ipa_get_cs_argument_count (args);
parms_count = ipa_get_param_count (callee_info);
 +  if (parms_count == 0)
 +return false;
  
/* If this call goes through a thunk we must not propagate to the first 
 (0th)
   parameter.  However, we might need to uncover a thunk from below a 
 series


Re: [PATCH][i386][AVX512] Match latest spec. Add CPUID prefetchwt1.

2014-02-21 Thread Uros Bizjak
On Fri, Feb 21, 2014 at 4:25 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
  Latest version of AVX512 spec
  http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf
  Has a few changes.
 
  1)PREFETCHWT1 instruction now has separate CPUID bit PREFETCHWT1.
  We can either support new CPUID or disable PREFETCHWT1 from generating,
  without removing code, and enable it in 4.9.1/latest version.
  I am not sure that adding new -m flag and related stuff this late
  is a good idea. Should still add it?

 Please submit the patch anyway. We can relax release constraints on
 non-algorithmic patch a bit, weighting in benefits of having gcc
 release that fully conforms to some published specification.

 Patch bellow add -mprefetchwt1 flag, corresponding TARGET_PREFETCHWT1,
 and uses them for prefetchwt1 instruction. Bootstraps/passes testing.
 Ok for trunk?

 ChangeLog:

 2014-02-21  Ilya Tocar  ilya.to...@intel.com

 * common/config/i386/i386-common.c (OPTION_MASK_ISA_PREFETCHWT1_SET),
 (OPTION_MASK_ISA_PREFETCHWT1_UNSET): New.
 (ix86_handle_option): Handle OPT_mprefetchwt1.
 * config/i386/cpuid.h (bit_PREFETCHWT1): New.
 * config/i386/driver-i386.c (host_detect_local_cpu): Detect
 PREFETCHWT1 CPUID.
 * config/i386/i386-c.c (ix86_target_macros_internal): Handle
 OPTION_MASK_ISA_PREFETCHWT1.
 * config/i386/i386.c (ix86_target_string): Handle mprefetchwt1.
 (PTA_PREFETCHWT1): New.
 (ix86_option_override_internal): Handle PTA_PREFETCHWT1.
 (ix86_valid_target_attribute_inner_p): Handle OPT_mprefetchwt1.
 * config/i386/i386.h (TARGET_PREFETCHWT1), (TARGET_PREFETCHWT1_P):
   New.
 * config/i386/i386.md (prefetch): Check TARGET_PREFETCHWT1
 (*prefetch_avx512pf_mode_: Change into ...
  (*prefetch_prefetchwt1_mode: This.
 * config/i386/i386.opt (mprefetchwt1): New.
 * config/i386/xmmintrin.h (_mm_hint): Add _MM_HINT_ET1.
 (_mm_prefetch): Handle intent to write.
 * doc/invoke.texi (mprefetchwt1), (mno-prefetchwt1): Doccument.

 And for tests:

 2014-02-22  Ilya Tocar  ilya.to...@intel.com

 * gcc.target/i386/avx-1.c: Update __builtin_prefetch.
 * gcc.target/i386/prefetchwt1-1.c: New.
 * gcc.target/i386/sse-13.c: Update __builtin_prefetch.
 * gcc.target/i386/sse-23.c: Ditto.

Please also add new switch to gcc-target/i386/sse-{12,13,14}.c and
g++.dg/other/i386-{2,3} and new options to
gcc.tatget/i386/sse-{22,23}.c. Please re-test with new additions and
repost the patch.

 @@ -17867,8 +17867,8 @@
   supported by SSE counterpart or the SSE prefetch is not available
   (K6 machines).  Otherwise use SSE prefetch as it allows specifying
   of locality.  */
 -  if (TARGET_AVX512PF  write)
 -operands[2] = const1_rtx;
 +  if (TARGET_PREFETCHWT1  write)
 +operands[2] = GEN_INT (2);

you can use const2_rtx here.

Uros.


[PATCH, rs6000] vec_sums must define all result vector elements

2014-02-21 Thread Bill Schmidt
Hi,

The little-endian implementation of vec_sums is incorrect.  I had
misread the specification and thought that the fields not containing the
result value were undefined, but in fact they are defined to contain
zero.  My previous implementation used a vector splat to copy the field
from BE element 3 to LE element 3.  The corrected implementation will
use a vector shift left to move the field and fill the remaining fields
with zeros.

When I fixed this, I discovered I had also missed a use of
gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct
instead.  This is fixed in this patch as well.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Bootstrap and regression test on
powerpc64-unknown-linux-gnu is in progress.  If no big-endian
regressions are found, is this ok for trunk?

Thanks,
Bill


gcc:

2014-02-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/altivec.md (altivec_vsumsws): Replace second
vspltw with vsldoi.
(reduc_uplus_v16qi): Use gen_altivec_vsumsws_direct instead of
gen_altivec_vsumsws.

gcc/testsuite:

2014-02-21  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gcc.dg/vmx/vsums.c: Check entire result vector.
* gcc.dg/vmx/vsums-be-order.c: Likewise.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 207967)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1651,7 +1651,7 @@
   if (VECTOR_ELT_ORDER_BIG)
 return vsumsws %0,%1,%2;
   else
-return vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvspltw %0,%3,3;
+return vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12;
 }
   [(set_attr type veccomplex)
(set (attr length)
@@ -2483,7 +2539,7 @@
 
   emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
   emit_insn (gen_altivec_vsum4ubs (vtmp1, operands[1], vzero));
-  emit_insn (gen_altivec_vsumsws (dest, vtmp1, vzero));
+  emit_insn (gen_altivec_vsumsws_direct (dest, vtmp1, vzero));
   DONE;
 })
 
Index: gcc/testsuite/gcc.dg/vmx/vsums-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/vsums-be-order.c   (revision 207967)
+++ gcc/testsuite/gcc.dg/vmx/vsums-be-order.c   (working copy)
@@ -8,12 +8,13 @@ static void test()
 
 #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
   vector signed int vb = {128,0,0,0};
+  vector signed int evd = {136,0,0,0};
 #else
   vector signed int vb = {0,0,0,128};
+  vector signed int evd = {0,0,0,136};
 #endif
 
   vector signed int vd = vec_sums (va, vb);
-  signed int r = vec_extract (vd, 3);
 
-  check (r == 136, sums);
+  check (vec_all_eq (vd, evd), sums);
 }
Index: gcc/testsuite/gcc.dg/vmx/vsums.c
===
--- gcc/testsuite/gcc.dg/vmx/vsums.c(revision 207967)
+++ gcc/testsuite/gcc.dg/vmx/vsums.c(working copy)
@@ -4,9 +4,9 @@ static void test()
 {
   vector signed int va = {-7,11,-13,17};
   vector signed int vb = {0,0,0,128};
+  vector signed int evd = {0,0,0,136};
 
   vector signed int vd = vec_sums (va, vb);
-  signed int r = vec_extract (vd, 3);
 
-  check (r == 136, sums);
+  check (vec_all_eq (vd, evd), sums);
 }




Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-02-21 Thread Ilya Verbin
2014-02-21 19:41 GMT+04:00 Bernd Schmidt ber...@codesourcery.com:
 The problem is that ptx does not have a linker, so we cannot exactly
 reproduce what happens on the host side. We have to process all host .o
 files in one single invocation of ptx lto1, and produce a single ptx
 assembly file, with a single function/variable table, from there. Having
 functions and variables separated gives us at least a small chance that the
 order will match that found in the host tables if the host table is produced
 by linking multiple fragments.

If ptx lto1 will process all .o files in order as they were passed to
it, the resulting table should be consistent with the table produced
by host's lto1.

 What kind of dependencies between liba and libb do you expect to be able to
 support on the target side? References to each other's functions and
 variables?

Yes, references to global variables and calls to functions, marked
with omp declare target.


Re: [PATCH, rs6000] vec_sums must define all result vector elements

2014-02-21 Thread David Edelsohn
On Fri, Feb 21, 2014 at 12:56 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 The little-endian implementation of vec_sums is incorrect.  I had
 misread the specification and thought that the fields not containing the
 result value were undefined, but in fact they are defined to contain
 zero.  My previous implementation used a vector splat to copy the field
 from BE element 3 to LE element 3.  The corrected implementation will
 use a vector shift left to move the field and fill the remaining fields
 with zeros.

 When I fixed this, I discovered I had also missed a use of
 gen_altivec_vsumsws, which should now use gen_altivec_vsumsws_direct
 instead.  This is fixed in this patch as well.

 Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
 regressions.  Bootstrap and regression test on
 powerpc64-unknown-linux-gnu is in progress.  If no big-endian
 regressions are found, is this ok for trunk?

Okay.
Thanks, David


[PATCH, testsuite]: Add some missing avx512 options to g++.dg/other/i386-{2,3}.C and gcc.target/i386/sse-{12,13}.c

2014-02-21 Thread Uros Bizjak
Hello!

No additional testsuite failures.

2014-02-21  Uros Bizjak  ubiz...@gmail.com

* g++.dg/other/i386-2.C (dg-options): Add -mavx512pf.
* g++.dg/other/i386-3.C (dg-options): Ditto.
* gcc.target/i386/sse-12.c (dg-options): Add -msha.
* gcc.target/i386/sse-13.c (dg-options): Add -mavx512er, -mavx512cd,
-mavx512pf and -msha.

Tested on x86_64-pc-linux-gnu and committed to mainline SVN.

Uros.
Index: g++.dg/other/i386-2.C
===
--- g++.dg/other/i386-2.C   (revision 208010)
+++ g++.dg/other/i386-2.C   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options -O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 
-mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp 
-mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt 
-mavx512f -mavx512er -mavx512cd -msha } */
+/* { dg-options -O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 
-mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp 
-mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt 
-mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
Index: g++.dg/other/i386-3.C
===
--- g++.dg/other/i386-3.C   (revision 208010)
+++ g++.dg/other/i386-3.C   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile { target i?86-*-* x86_64-*-* } } */
-/* { dg-options -O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -msha } */
+/* { dg-options -O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */
 
 /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h,
xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h,
Index: gcc.target/i386/sse-12.c
===
--- gcc.target/i386/sse-12.c(revision 208010)
+++ gcc.target/i386/sse-12.c(working copy)
@@ -3,7 +3,7 @@
popcntintrin.h and mm_malloc.h are usable
with -O -std=c89 -pedantic-errors.  */
 /* { dg-do compile } */
-/* { dg-options -O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512cd -mavx512er -mavx512pf } */
+/* { dg-options -O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx 
-mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm 
-mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr 
-mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */
 
 #include x86intrin.h
 
Index: gcc.target/i386/sse-13.c
===
--- gcc.target/i386/sse-13.c(revision 208010)
+++ gcc.target/i386/sse-13.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -Werror-implicit-function-declaration -march=k8 -msse4a 
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi 
-mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw 
-madx -mfxsr -mxsaveopt -mavx512f } */
+/* { dg-options -O2 -Werror-implicit-function-declaration -march=k8 -msse4a 
-m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi 
-mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw 
-madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha } */
 
 #include mm_malloc.h
 


Re: [PATCH, rs6000] Add -maltivec=be semantics in LE mode for vec_ld and vec_st

2014-02-21 Thread David Edelsohn
On Thu, Feb 20, 2014 at 2:46 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 For compatibility with the XL compilers, we need to support -maltivec=be
 for vec_ld, vec_ldl, vec_st, and vec_stl.  (A later patch will also
 handle vec_lde and vec_ste.)

 This is a much simpler patch than its size would indicate.  The original
 implementation of these built-ins treated them all as always loading and
 storing V4SI values, relying on subregs to adjust type mismatches.  For
 this work we need to have the true type so that we know how to reverse
 the order of vector elements.  So most of this patch is the busy-work of
 adding new built-in definitions for all the supported types (six types
 for each of the four built-ins).

 The real work is done in altivec.md to call altivec_expand_{lvx,stvx}_be
 for these built-ins when -maltivec=be is selected for a little endian
 target, and in rs6000.c where these functions are defined.  For the
 loads, the usual load insn is generated followed by a permute to reverse
 the order of the vector elements.  For the stores, the usual store insn
 is generated preceded by a permute to reverse the order of the vector
 elements.  A common routine swap_selector_for_mode is used to generate
 the permute control vector for the permute.

 There are 16 new tests, 4 for each built-in.  These cover the VMX and
 VSX built-ins for big-endian, little-endian, and little-endian with
 -maltivec=be.

 Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
 regressions.  All the new tests pass in all endian environments.  Is
 this ok for trunk?

 Thanks,
 Bill


 gcc:

 2014-02-20  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/altivec.md (altivec_lvxl): Rename as
 *altivec_lvxl_mode_internal and use VM2 iterator instead of
 V4SI.
 (altivec_lvxl_mode): New define_expand incorporating
 -maltivec=be semantics where needed.
 (altivec_lvx): Rename as *altivec_lvx_mode_internal.
 (altivec_lvx_mode): New define_expand incorporating -maltivec=be
 semantics where needed.
 (altivec_stvx): Rename as *altivec_stvx_mode_internal.
 (altivec_stvx_mode): New define_expand incorporating
 -maltivec=be semantics where needed.
 (altivec_stvxl): Rename as *altivec_stvxl_mode_internal and use
 VM2 iterator instead of V4SI.
 (altivec_stvxl_mode): New define_expand incorporating
 -maltivec=be semantics where needed.
 * config/rs6000/rs6000-builtin.def: Add new built-in definitions
 LVXL_V2DF, LVXL_V2DI, LVXL_V4SF, LVXL_V4SI, LVXL_V8HI, LVXL_V16QI,
 LVX_V2DF, LVX_V2DI, LVX_V4SF, LVX_V4SI, LVX_V8HI, LVX_V16QI,
 STVX_V2DF, STVX_V2DI, STVX_V4SF, STVX_V4SI, STVX_V8HI, STVX_V16QI,
 STVXL_V2DF, STVXL_V2DI, STVXL_V4SF, STVXL_V4SI, STVXL_V8HI,
 STVXL_V16QI.
 * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Replace
 ALTIVEC_BUILTIN_LVX with ALTIVEC_BUILTIN_LVX_MODE throughout;
 similarly for ALTIVEC_BUILTIN_LVXL, ALTIVEC_BUILTIN_STVX, and
 ALTIVEC_BUILTIN_STVXL.
 * config/rs6000/rs6000-protos.h (altivec_expand_lvx_be): New
 prototype.
 (altivec_expand_stvx_be): Likewise.
 * config/rs6000/rs6000.c (swap_selector_for_mode): New function.
 (altivec_expand_lvx_be): Likewise.
 (altivec_expand_stvx_be): Likewise.
 (altivec_expand_builtin): Add cases for
 ALTIVEC_BUILTIN_STVX_MODE, ALTIVEC_BUILTIN_STVXL_MODE,
 ALTIVEC_BUILTIN_LVXL_MODE, and ALTIVEC_BUILTIN_LVX_MODE.
 (altivec_init_builtins): Add definitions for
 __builtin_altivec_lvxl_mode, __builtin_altivec_lvx_mode,
 __builtin_altivec_stvx_mode, and
 __builtin_altivec_stvxl_mode.


 gcc/testsuite:

 2014-02-20  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * gcc.dg/vmx/ld.c: New test.
 * gcc.dg/vmx/ld-be-order.c: New test.
 * gcc.dg/vmx/ld-vsx.c: New test.
 * gcc.dg/vmx/ld-vsx-be-order.c: New test.
 * gcc.dg/vmx/ldl.c: New test.
 * gcc.dg/vmx/ldl-be-order.c: New test.
 * gcc.dg/vmx/ldl-vsx.c: New test.
 * gcc.dg/vmx/ldl-vsx-be-order.c: New test.
 * gcc.dg/vmx/st.c: New test.
 * gcc.dg/vmx/st-be-order.c: New test.
 * gcc.dg/vmx/st-vsx.c: New test.
 * gcc.dg/vmx/st-vsx-be-order.c: New test.
 * gcc.dg/vmx/stl.c: New test.
 * gcc.dg/vmx/stl-be-order.c: New test.
 * gcc.dg/vmx/stl-vsx.c: New test.
 * gcc.dg/vmx/stl-vsx-be-order.c: New test.

Okay.
Thanks, David


[GOMP4] gimple_code_is_oacc - is_gimple_omp_oacc_specifically (was: [PATCH 4/6] [GOMP4] OpenACC 1.0+ support in fortran front-end)

2014-02-21 Thread Thomas Schwinge
Hi!

On Tue, 11 Feb 2014 17:51:15 +0100, I wrote:
 On Fri, 31 Jan 2014 15:16:07 +0400, Ilmir Usmanov i.usma...@samsung.com 
 wrote:
  --- a/gcc/omp-low.c
  +++ b/gcc/omp-low.c
  @@ -1491,6 +1491,18 @@ fixup_child_record_type (omp_context *ctx)
 TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type);
   }
   
  +static bool
  +gimple_code_is_oacc (const_gimple g)
  +{
  +  switch (gimple_code (g))
  +{
  +case GIMPLE_OACC_PARALLEL:
  +  return true;
  +default:
  +  return false;
  +}
  +}
  +
 
 Eventually, this will probably end up next to CASE_GIMPLE_OMP/is_gimple_omp
 in gimple.h (or the latter be reworked to be able to ask for is_omp vs.
 is_oacc vs. is_omp_or_oacc), but it's fine to do that once we actually
 need it in files other than just omp-low.c, and once we support more
 GIMPLE_OACC_* codes.

Ah, well, I'm now in the situation that I need to do such a check in
another file, so I have applied the following to gomp-4_0-branch in
r208013.  I have also renamed the function to
is_gimple_omp_oacc_specifically, building on the existing is_gimple_omp
name.  (Don't worry about the unwieldy name, as all this is to disappear
as the development progresses.)

commit 25aab0dd39a57661e9d7f3a5f405f4647977b9de
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Feb 21 19:26:01 2014 +

gimple_code_is_oacc - is_gimple_omp_oacc_specifically.

gcc/
* omp-low.c (gimple_code_is_oacc): Move to...
* gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
users, and also use it in more places where currently we've only
been checking for GIMPLE_OACC_PARALLEL.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208013 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 14d8805..1ce952d 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,10 @@
+2014-02-21  Thomas Schwinge  tho...@codesourcery.com
+
+   * omp-low.c (gimple_code_is_oacc): Move to...
+   * gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
+   users, and also use it in more places where currently we've only
+   been checking for GIMPLE_OACC_PARALLEL.
+
 2014-02-18  Thomas Schwinge  tho...@codesourcery.com
 
* omp-low.c (diagnose_sb_0, diagnose_sb_1, diagnose_sb_2): Handle
diff --git gcc/gimple.h gcc/gimple.h
index 5b5a0ee..0d250ef 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ -5670,6 +5670,25 @@ is_gimple_omp (const_gimple stmt)
 }
 }
 
+/* Return true if STMT is any of the OpenACC types specifically.
+
+   TODO: This function should go away eventually, once all its callers have
+   either been fixed, changed into more specific checks, or verified to not
+   need any special handling for OpenACC.  */
+
+static inline bool
+is_gimple_omp_oacc_specifically (const_gimple stmt)
+{
+  gcc_assert (is_gimple_omp (stmt));
+  switch (gimple_code (stmt))
+{
+case GIMPLE_OACC_PARALLEL:
+  return true;
+default:
+  return false;
+}
+}
+
 
 /* Returns TRUE if statement G is a GIMPLE_NOP.  */
 
diff --git gcc/omp-low.c gcc/omp-low.c
index 110ea63..b975dad 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -863,7 +863,7 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
  when we know the value is not accessible from an outer scope.  */
   if (shared_ctx)
 {
-  gcc_assert (gimple_code (shared_ctx-stmt) != GIMPLE_OACC_PARALLEL);
+  gcc_assert (!is_gimple_omp_oacc_specifically (shared_ctx-stmt));
 
   /* ??? Trivially accessible from anywhere.  But why would we even
 be passing an address in this case?  Should we simply assert
@@ -1006,7 +1006,7 @@ build_receiver_ref (tree var, bool by_ref, omp_context 
*ctx)
 static tree
 build_outer_var_ref (tree var, omp_context *ctx)
 {
-  gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OACC_PARALLEL);
+  gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt));
 
   tree x;
 
@@ -1072,7 +1072,7 @@ install_var_field (tree var, bool by_ref, int mask, 
omp_context *ctx)
   gcc_assert ((mask  2) == 0 || !ctx-sfield_map
  || !splay_tree_lookup (ctx-sfield_map, (splay_tree_key) var));
   gcc_assert ((mask  3) == 3
- || gimple_code (ctx-stmt) != GIMPLE_OACC_PARALLEL);
+ || !is_gimple_omp_oacc_specifically (ctx-stmt));
 
   type = TREE_TYPE (var);
   if (mask  4)
@@ -1491,18 +1491,6 @@ fixup_child_record_type (omp_context *ctx)
   TREE_TYPE (ctx-receiver_decl) = build_pointer_type (type);
 }
 
-static bool
-gimple_code_is_oacc (const_gimple g)
-{
-  switch (gimple_code (g))
-{
-case GIMPLE_OACC_PARALLEL:
-  return true;
-default:
-  return false;
-}
-}
-
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
specified by CLAUSES.  */
 
@@ -1519,7 +1507,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
   switch (OMP_CLAUSE_CODE (c))
{
case OMP_CLAUSE_PRIVATE:
- 

Re: [gomp4 3/6] Initial support for OpenACC memory mapping semantics.

2014-02-21 Thread Thomas Schwinge
Hi!

On Tue, 14 Jan 2014 16:10:05 +0100, I wrote:
 --- gcc/gimplify.c
 +++ gcc/gimplify.c
 @@ -86,7 +92,11 @@ enum omp_region_type
ORT_UNTIED_TASK = 5,
ORT_TEAMS = 8,
ORT_TARGET_DATA = 16,
 -  ORT_TARGET = 32
 +  ORT_TARGET = 32,
 +
 +  /* Flags for ORT_TARGET.  */
 +  /* Default to GOVD_MAP_FORCE for implicit mappings in this region.  */
 +  ORT_TARGET_MAP_FORCE = 64
  };

Continuing on that route, I have now applied the following to
gomp-4_0-branch in r208014:

commit dee2965ae547af0bc90d618e7fa40fbf2f5292b4
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Fri Feb 21 19:45:12 2014 +

Gimplification: New flag ORT_TARGET_OFFLOAD replaces !ORT_TARGET_DATA.

gcc/
* gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
Update all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208014 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 1ce952d..bf8ec96 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,9 @@
 2014-02-21  Thomas Schwinge  tho...@codesourcery.com
 
+   * gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
+   flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
+   Update all users.
+
* omp-low.c (gimple_code_is_oacc): Move to...
* gimple.h (is_gimple_omp_oacc_specifically): ... here.  Update
users, and also use it in more places where currently we've only
diff --git gcc/gimplify.c gcc/gimplify.c
index 51a1b73..9aa9301c 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -100,10 +100,11 @@ enum omp_region_type
   ORT_TASK = 4,
   ORT_UNTIED_TASK = 5,
   ORT_TEAMS = 8,
-  ORT_TARGET_DATA = 16,
-  ORT_TARGET = 32,
+  ORT_TARGET = 16,
 
   /* Flags for ORT_TARGET.  */
+  /* Prepare this region for offloading.  */
+  ORT_TARGET_OFFLOAD = 32,
   /* Default to GOVD_MAP_FORCE for implicit mappings in this region.  */
   ORT_TARGET_MAP_FORCE = 64
 };
@@ -2202,7 +2203,7 @@ gimplify_arg (tree *arg_p, gimple_seq *pre_p, location_t 
call_location)
   return gimplify_expr (arg_p, pre_p, NULL, test, fb);
 }
 
-/* Don't fold STMT inside ORT_TARGET, because it can break code by adding decl
+/* Don't fold inside offloading regsion: it can break code by adding decl
references that weren't in the source.  We'll do it during omplower pass
instead.  */
 
@@ -2211,7 +2212,8 @@ maybe_fold_stmt (gimple_stmt_iterator *gsi)
 {
   struct gimplify_omp_ctx *ctx;
   for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx-outer_context)
-if (ctx-region_type  ORT_TARGET)
+if (ctx-region_type  ORT_TARGET
+ctx-region_type  ORT_TARGET_OFFLOAD)
   return false;
   return fold_stmt (gsi);
 }
@@ -5388,10 +5390,12 @@ omp_firstprivatize_variable (struct gimplify_omp_ctx 
*ctx, tree decl)
return;
}
   else if (ctx-region_type  ORT_TARGET)
-   omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY);
+   {
+ if (ctx-region_type  ORT_TARGET_OFFLOAD)
+   omp_add_variable (ctx, decl, GOVD_MAP | GOVD_MAP_TO_ONLY);
+   }
   else if (ctx-region_type != ORT_WORKSHARE
-   ctx-region_type != ORT_SIMD
-   ctx-region_type != ORT_TARGET_DATA)
+   ctx-region_type != ORT_SIMD)
omp_add_variable (ctx, decl, GOVD_FIRSTPRIVATE);
 
   ctx = ctx-outer_context;
@@ -5580,7 +5584,8 @@ omp_notice_threadprivate_variable (struct 
gimplify_omp_ctx *ctx, tree decl,
   struct gimplify_omp_ctx *octx;
 
   for (octx = ctx; octx; octx = octx-outer_context)
-if (octx-region_type  ORT_TARGET)
+if ((octx-region_type  ORT_TARGET)
+(octx-region_type  ORT_TARGET_OFFLOAD))
   {
gcc_assert (!(octx-region_type  ORT_TARGET_MAP_FORCE));
 
@@ -5643,7 +5648,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
 }
 
   n = splay_tree_lookup (ctx-variables, (splay_tree_key)decl);
-  if (ctx-region_type  ORT_TARGET)
+  if ((ctx-region_type  ORT_TARGET)
+   (ctx-region_type  ORT_TARGET_OFFLOAD))
 {
   unsigned map_force;
   if (ctx-region_type  ORT_TARGET_MAP_FORCE)
@@ -5695,7 +5701,8 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
 
   if (ctx-region_type == ORT_WORKSHARE
  || ctx-region_type == ORT_SIMD
- || ctx-region_type == ORT_TARGET_DATA)
+ || ((ctx-region_type  ORT_TARGET)
+  !(ctx-region_type  ORT_TARGET_OFFLOAD)))
goto do_outer;
 
   /* ??? Some compiler-generated variables (like SAVE_EXPRs) could be
@@ -5746,7 +5753,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
decl, bool in_code)
{
  splay_tree_node n2;
 
- if ((octx-region_type  (ORT_TARGET_DATA | ORT_TARGET)) != 0)
+ if (octx-region_type  ORT_TARGET)
continue;
  n2 = 

[gomp4 1/3] Clarify to/from/map clauses usage in context of GF_OMP_TARGET_KIND_UPDATE.

2014-02-21 Thread Thomas Schwinge
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4

gcc/
* omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208015 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  3 +++
 gcc/omp-low.c  | 25 +
 2 files changed, 28 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index bf8ec96..bd46f2e 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2014-02-21  Thomas Schwinge  tho...@codesourcery.com
 
+   * omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
+   of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.
+
* gimplify.c (enum omp_region_type): Make ORT_TARGET_OFFLOAD a
flag for ORT_TARGET, in its negation replacing ORT_TARGET_DATA.
Update all users.
diff --git gcc/omp-low.c gcc/omp-low.c
index 9fef4c1..bca4599 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1630,6 +1630,26 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FROM:
  gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt));
case OMP_CLAUSE_MAP:
+ switch (OMP_CLAUSE_CODE (c))
+   {
+   case OMP_CLAUSE_TO:
+   case OMP_CLAUSE_FROM:
+ /* The to and from clauses are only ever seen with OpenMP target
+update constructs.  */
+ gcc_assert (gimple_code (ctx-stmt) == GIMPLE_OMP_TARGET
+  (gimple_omp_target_kind (ctx-stmt)
+ == GF_OMP_TARGET_KIND_UPDATE));
+ break;
+   case OMP_CLAUSE_MAP:
+ /* The map clause is never seen with OpenMP target update
+constructs.  */
+ gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OMP_TARGET
+ || (gimple_omp_target_kind (ctx-stmt)
+ != GF_OMP_TARGET_KIND_UPDATE));
+ break;
+   default:
+ gcc_unreachable ();
+   }
  if (ctx-outer)
scan_omp_op (OMP_CLAUSE_SIZE (c), ctx-outer);
  decl = OMP_CLAUSE_DECL (c);
@@ -1799,6 +1819,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  break;
 
case OMP_CLAUSE_MAP:
+ /* The map clause is never seen with OpenMP target update
+constructs.  */
+ gcc_assert (gimple_code (ctx-stmt) != GIMPLE_OMP_TARGET
+ || (gimple_omp_target_kind (ctx-stmt)
+ != GF_OMP_TARGET_KIND_UPDATE));
  if (!gimple_code_is_oacc (ctx-stmt)
   gimple_omp_target_kind (ctx-stmt) == GF_OMP_TARGET_KIND_DATA)
break;
-- 
1.8.1.1



[gomp4 2/3] OpenACC data construct implementation in terms of GF_OMP_TARGET_KIND_OACC_DATA.

2014-02-21 Thread Thomas Schwinge
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4

gcc/
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA.
(is_gimple_omp_oacc_specifically): Handle it.
* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
* gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise.
* omp-low.c (scan_sharing_clauses, scan_omp_target)
(expand_omp_target, lower_omp_target, lower_omp_1): Likewise.
* gimple.def (GIMPLE_OMP_TARGET): Update comment.
* gimple.c (gimple_build_omp_target): Likewise.
(gimple_copy): Catch unimplemented case.
* tree-inline.c (remap_gimple_stmt): Likewise.
* tree-nested.c (convert_nonlocal_reference_stmt)
(convert_local_reference_stmt, convert_gimple_call): Likewise.
* oacc-builtins.def (BUILT_IN_GOACC_DATA_START)
(BUILT_IN_GOACC_DATA_END): New builtins.
libgomp/
* libgomp.map (GOACC_2.0): Add GOACC_data_end, GOACC_data_start.
* libgomp_g.h (GOACC_data_start, GOACC_data_end): New prototypes.
* oacc-parallel.c (GOACC_data_start, GOACC_data_end): New
functions.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208016 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  15 ++
 gcc/gimple-pretty-print.c |   3 ++
 gcc/gimple.c  |   4 +-
 gcc/gimple.def|   1 +
 gcc/gimple.h  |   9 
 gcc/gimplify.c|  33 +---
 gcc/oacc-builtins.def |   6 ++-
 gcc/omp-low.c | 132 --
 gcc/tree-inline.c |   1 +
 gcc/tree-nested.c |   3 ++
 libgomp/ChangeLog.gomp|   7 +++
 libgomp/libgomp.map   |   2 +
 libgomp/libgomp_g.h   |   3 ++
 libgomp/oacc-parallel.c   |  34 +++-
 14 files changed, 213 insertions(+), 40 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index bd46f2e..824ec94 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,20 @@
 2014-02-21  Thomas Schwinge  tho...@codesourcery.com
 
+   * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DATA.
+   (is_gimple_omp_oacc_specifically): Handle it.
+   * gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
+   * gimplify.c (gimplify_omp_workshare, gimplify_expr): Likewise.
+   * omp-low.c (scan_sharing_clauses, scan_omp_target)
+   (expand_omp_target, lower_omp_target, lower_omp_1): Likewise.
+   * gimple.def (GIMPLE_OMP_TARGET): Update comment.
+   * gimple.c (gimple_build_omp_target): Likewise.
+   (gimple_copy): Catch unimplemented case.
+   * tree-inline.c (remap_gimple_stmt): Likewise.
+   * tree-nested.c (convert_nonlocal_reference_stmt)
+   (convert_local_reference_stmt, convert_gimple_call): Likewise.
+   * oacc-builtins.def (BUILT_IN_GOACC_DATA_START)
+   (BUILT_IN_GOACC_DATA_END): New builtins.
+
* omp-low.c (scan_sharing_clauses): Catch unexpected occurrences
of OMP_CLAUSE_TO, OMP_CLAUSE_FROM, OMP_CLAUSE_MAP.
 
diff --git gcc/gimple-pretty-print.c gcc/gimple-pretty-print.c
index 91a3eb2..ad9369c 100644
--- gcc/gimple-pretty-print.c
+++ gcc/gimple-pretty-print.c
@@ -1289,6 +1289,9 @@ dump_gimple_omp_target (pretty_printer *buffer, gimple 
gs, int spc, int flags)
 case GF_OMP_TARGET_KIND_UPDATE:
   kind =  update;
   break;
+case GF_OMP_TARGET_KIND_OACC_DATA:
+  kind =  oacc_data;
+  break;
 default:
   gcc_unreachable ();
 }
diff --git gcc/gimple.c gcc/gimple.c
index 2a967aa..30561b1 100644
--- gcc/gimple.c
+++ gcc/gimple.c
@@ -1051,7 +1051,8 @@ gimple_build_omp_single (gimple_seq body, tree clauses)
 /* Build a GIMPLE_OMP_TARGET statement.
 
BODY is the sequence of statements that will be executed.
-   CLAUSES are any of the OMP target construct's clauses.  */
+   KIND is the kind of target region.
+   CLAUSES are any of the construct's clauses.  */
 
 gimple
 gimple_build_omp_target (gimple_seq body, int kind, tree clauses)
@@ -1747,6 +1748,7 @@ gimple_copy (gimple stmt)
case GIMPLE_OMP_TASKGROUP:
case GIMPLE_OMP_ORDERED:
copy_omp_body:
+ gcc_assert (!is_gimple_omp_oacc_specifically (stmt));
  new_seq = gimple_seq_copy (gimple_omp_body (stmt));
  gimple_omp_set_body (copy, new_seq);
  break;
diff --git gcc/gimple.def gcc/gimple.def
index 2b78c06..ce800bd 100644
--- gcc/gimple.def
+++ gcc/gimple.def
@@ -360,6 +360,7 @@ DEFGSCODE(GIMPLE_OMP_SECTIONS_SWITCH, 
gimple_omp_sections_switch, GSS_BASE)
 DEFGSCODE(GIMPLE_OMP_SINGLE, gimple_omp_single, GSS_OMP_SINGLE_LAYOUT)
 
 /* GIMPLE_OMP_TARGET BODY, CLAUSES, CHILD_FN represents
+   #pragma acc data
#pragma omp target {,data,update}
BODY is the sequence of statements inside the target construct
(NULL for target update).
diff --git gcc/gimple.h gcc/gimple.h
index 0d250ef..b4ee9fa 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ 

[gomp4 3/3] OpenACC data construct support in the C front end.

2014-02-21 Thread Thomas Schwinge
From: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add data.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA.
gcc/c/
* c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition.
(c_parser_oacc_data): New function.
(c_parser_omp_construct): Handle PRAGMA_OACC_DATA.
* c-tree.h (c_finish_oacc_data): New prototype.
* c-typeck.c (c_finish_oacc_data): New function.
gcc/testsuite/
* c-c++-common/goacc-gomp/nesting-fail-1.c: Extend for OpenACC
data construct.
* c-c++-common/goacc/nesting-fail-1.c: Likewise.
* c-c++-common/goacc/parallel-fail-1.c: Rename to...
* c-c++-common/goacc/clauses-fail.c: ... this new file.  Extend
for OpenACC data construct.
* c-c++-common/goacc/data-1.c: New file.
libgomp/
* testsuite/libgomp.oacc-c/data-1.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208017 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/c-family/ChangeLog.gomp|   5 +
 gcc/c-family/c-pragma.c|   1 +
 gcc/c-family/c-pragma.h|   1 +
 gcc/c/ChangeLog.gomp   |   8 +
 gcc/c/c-parser.c   |  42 +
 gcc/c/c-tree.h |   1 +
 gcc/c/c-typeck.c   |  19 +++
 gcc/testsuite/ChangeLog.gomp   |  10 ++
 .../c-c++-common/goacc-gomp/nesting-fail-1.c   |  92 ++-
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c|   9 ++
 gcc/testsuite/c-c++-common/goacc/data-1.c  |   6 +
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  18 ++-
 gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c |   6 -
 libgomp/ChangeLog.gomp |   2 +
 libgomp/testsuite/libgomp.oacc-c/data-1.c  | 170 +
 15 files changed, 380 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/clauses-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/data-1.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/parallel-fail-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c/data-1.c

diff --git gcc/c-family/ChangeLog.gomp gcc/c-family/ChangeLog.gomp
index e092d53..3da377f 100644
--- gcc/c-family/ChangeLog.gomp
+++ gcc/c-family/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-02-21  Thomas Schwinge  tho...@codesourcery.com
+
+   * c-pragma.c (oacc_pragmas): Add data.
+   * c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DATA.
+
 2014-01-28  Thomas Schwinge  tho...@codesourcery.com
 
* c-pragma.h (pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_COPY,
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index f69486a..08374aa 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1169,6 +1169,7 @@ static vecpragma_ns_name registered_pp_pragmas;
 
 struct omp_pragma_def { const char *name; unsigned int id; };
 static const struct omp_pragma_def oacc_pragmas[] = {
+  { data, PRAGMA_OACC_DATA },
   { parallel, PRAGMA_OACC_PARALLEL },
 };
 static const struct omp_pragma_def omp_pragmas[] = {
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index 1ea5b1d..d092f9f 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 typedef enum pragma_kind {
   PRAGMA_NONE = 0,
 
+  PRAGMA_OACC_DATA,
   PRAGMA_OACC_PARALLEL,
   PRAGMA_OMP_ATOMIC,
   PRAGMA_OMP_BARRIER,
diff --git gcc/c/ChangeLog.gomp gcc/c/ChangeLog.gomp
index b199957..9b95725 100644
--- gcc/c/ChangeLog.gomp
+++ gcc/c/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2014-02-21  Thomas Schwinge  tho...@codesourcery.com
+
+   * c-parser.c (OACC_DATA_CLAUSE_MASK): New macro definition.
+   (c_parser_oacc_data): New function.
+   (c_parser_omp_construct): Handle PRAGMA_OACC_DATA.
+   * c-tree.h (c_finish_oacc_data): New prototype.
+   * c-typeck.c (c_finish_oacc_data): New function.
+
 2014-02-17  Thomas Schwinge  tho...@codesourcery.com
 
* c-parser.c (c_parser_omp_clause_name): Accept pcopy, pcopyin,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 7850eab..4643722 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -4776,10 +4776,14 @@ c_parser_label (c_parser *parser)
 
openacc-construct:
  parallel-construct
+ data-construct
 
parallel-construct:
  parallel-directive structured-block
 
+   data-construct:
+ data-directive structured-block
+
OpenMP:
 
statement:
@@ -11362,6 +11366,41 @@ c_parser_omp_structured_block (c_parser *parser)
 }
 
 /* OpenACC 2.0:
+   # pragma acc data oacc-data-clause[optseq] new-line
+ structured-block
+
+   LOC is the location of the #pragma token.
+*/
+
+#define OACC_DATA_CLAUSE_MASK  \
+   ( (OMP_CLAUSE_MASK_1  PRAGMA_OMP_CLAUSE_COPY)  

patch to fix PR60298

2014-02-21 Thread Vladimir Makarov

The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60298

The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 208023.

2014-02-21  Vladimir Makarov  vmaka...@redhat.com

PR target/60298
* lra-constraints.c (inherit_reload_reg): Use lra_emit_move
instead of emit_move_insn.
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 207787)
+++ lra-constraints.c   (working copy)
@@ -4473,9 +4473,9 @@ inherit_reload_reg (bool def_p, int orig
rclass, inheritance);
   start_sequence ();
   if (def_p)
-emit_move_insn (original_reg, new_reg);
+lra_emit_move (original_reg, new_reg);
   else
-emit_move_insn (new_reg, original_reg);
+lra_emit_move (new_reg, original_reg);
   new_insns = get_insns ();
   end_sequence ();
   if (NEXT_INSN (new_insns) != NULL_RTX)


C++ PATCH for c++/60241 (ICE with specialization of member class template)

2014-02-21 Thread Jason Merrill
We already have the code to reassign instances to the appropriate 
template when we see a specialization of a partial instantiation of a 
member template, but it wasn't firing properly in this case, for two 
reasons:


1) We were attaching the instances to the most general template and then 
looking for them on the partial instantiation.

2) We were only reassigning explicit specializations.

Tested x86_64-pc-linux-gnu, applying to trunk.  It should be appropriate 
for backporting later if it doesn't cause trouble.
commit 667bae7d1bfeea4e881cf6236d8679fc0c11c49e
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 13:51:18 2014 -0500

	PR c++/60241
	* pt.c (lookup_template_class_1): Update DECL_TEMPLATE_INSTANTIATIONS
	of the partial instantiation, not the most general template.
	(maybe_process_partial_specialization): Reassign everything on
	that list.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a394441..91a8840 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -914,11 +914,13 @@ maybe_process_partial_specialization (tree type)
 	   t; t = TREE_CHAIN (t))
 	{
 	  tree inst = TREE_VALUE (t);
-	  if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst))
+	  if (CLASSTYPE_TEMPLATE_SPECIALIZATION (inst)
+		  || !COMPLETE_OR_OPEN_TYPE_P (inst))
 		{
 		  /* We already have a full specialization of this partial
-		 instantiation.  Reassign it to the new member
-		 specialization template.  */
+		 instantiation, or a full specialization has been
+		 looked up but not instantiated.  Reassign it to the
+		 new member specialization template.  */
 		  spec_entry elt;
 		  spec_entry *entry;
 		  void **slot;
@@ -937,7 +939,7 @@ maybe_process_partial_specialization (tree type)
 		  *entry = elt;
 		  *slot = entry;
 		}
-	  else if (COMPLETE_OR_OPEN_TYPE_P (inst))
+	  else
 		/* But if we've had an implicit instantiation, that's a
 		   problem ([temp.expl.spec]/6).  */
 		error (specialization %qT after instantiation %qT,
@@ -7596,7 +7598,7 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 	}
 
   /* Let's consider the explicit specialization of a member
- of a class template specialization that is implicitely instantiated,
+ of a class template specialization that is implicitly instantiated,
 	 e.g.:
 	 templateclass T
 	 struct S
@@ -7694,9 +7696,9 @@ lookup_template_class_1 (tree d1, tree arglist, tree in_decl, tree context,
 
   /* Note this use of the partial instantiation so we can check it
 	 later in maybe_process_partial_specialization.  */
-  DECL_TEMPLATE_INSTANTIATIONS (templ)
+  DECL_TEMPLATE_INSTANTIATIONS (found)
 	= tree_cons (arglist, t,
-		 DECL_TEMPLATE_INSTANTIATIONS (templ));
+		 DECL_TEMPLATE_INSTANTIATIONS (found));
 
   if (TREE_CODE (template_type) == ENUMERAL_TYPE  !is_dependent_type
 	   !DECL_ALIAS_TEMPLATE_P (gen_tmpl))
diff --git a/gcc/testsuite/g++.dg/template/memclass5.C b/gcc/testsuite/g++.dg/template/memclass5.C
new file mode 100644
index 000..eb32f13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/memclass5.C
@@ -0,0 +1,26 @@
+// PR c++/60241
+
+template typename T
+struct x
+{
+template typename U
+struct y
+{
+typedef T result2;
+};
+
+typedef yint zy;
+};
+
+template
+templateclass T
+struct xint::y
+{
+typedef double result2;
+};
+
+int main()
+{
+xint::zy::result2 xxx;
+xint::yint::result2 xxx2;
+}


C++ PATCH for c++/59347 (ICE with ill-formed typedef in template)

2014-02-21 Thread Jason Merrill
An earlier patch of mine changed the compiler to retain erroneous 
declarations to provide better error-recovery behavior.  But that's 
causing problems with nested typedefs, so let's not bother in that case.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 85cffc1cc3fe706d61a417cf6a1139f546a458e9
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 13:59:45 2014 -0500

	PR c++/59347
	* pt.c (tsubst_decl) [TYPE_DECL]: Don't try to instantiate an
	erroneous typedef.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 91a8840..2dc5f32 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10824,6 +10824,9 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	tree type = NULL_TREE;
 	bool local_p;
 
+	if (TREE_TYPE (t) == error_mark_node)
+	  RETURN (error_mark_node);
+
 	if (TREE_CODE (t) == TYPE_DECL
 	 t == TYPE_MAIN_DECL (TREE_TYPE (t)))
 	  {
diff --git a/gcc/testsuite/g++.dg/template/typedef41.C b/gcc/testsuite/g++.dg/template/typedef41.C
new file mode 100644
index 000..dc25518
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/typedef41.C
@@ -0,0 +1,8 @@
+// PR c++/59347
+
+templateint struct A
+{
+  typedef int ::X;		// { dg-error  }
+};
+
+A0 a;


C++ PATCH for c++/60187 (ICE with bare parameter pack in enum-base)

2014-02-21 Thread Jason Merrill

Yet another place where we need to check for bare parameter packs.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 4e02d1498063b3ffa31d3fe35682b0c94667360c
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 14:03:36 2014 -0500

	PR c++/60187
	* parser.c (cp_parser_enum_specifier): Call
	check_for_bare_parameter_packs.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6f19ae2..7bbdf90 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -15376,7 +15376,8 @@ cp_parser_enum_specifier (cp_parser* parser)
 {
   underlying_type = grokdeclarator (NULL, type_specifiers, TYPENAME,
 /*initialized=*/0, NULL);
-  if (underlying_type == error_mark_node)
+  if (underlying_type == error_mark_node
+	  || check_for_bare_parameter_packs (underlying_type))
 underlying_type = NULL_TREE;
 }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/enum_base2.C b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C
new file mode 100644
index 000..8c6a901
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/enum_base2.C
@@ -0,0 +1,9 @@
+// PR c++/60187
+// { dg-require-effective-target c++11 }
+
+templatetypename... T struct A
+{
+  enum E : T {};		// { dg-error parameter pack }
+};
+
+Aint a;


C++ PATCH for c++/60186 (ICE with constexpr and init-list in template)

2014-02-21 Thread Jason Merrill
My earlier massage_init_elt patch neglected to call 
fold_non_dependent_expr before maybe_constant_init.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit b77241e3be8b3eb4247d07e2f2967cbb585e08bc
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 14:37:17 2014 -0500

	PR c++/60186
	* typeck2.c (massage_init_elt): Call fold_non_dependent_expr_sfinae.

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 546b83f..8877286 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1131,7 +1131,10 @@ massage_init_elt (tree type, tree init, tsubst_flags_t complain)
 init = TARGET_EXPR_INITIAL (init);
   /* When we defer constant folding within a statement, we may want to
  defer this folding as well.  */
-  init = maybe_constant_init (init);
+  tree t = fold_non_dependent_expr_sfinae (init, complain);
+  t = maybe_constant_value (t);
+  if (TREE_CONSTANT (t))
+init = t;
   return init;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C
new file mode 100644
index 000..6fea82f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-initlist7.C
@@ -0,0 +1,7 @@
+// PR c++/60186
+// { dg-require-effective-target c++11 }
+
+templatetypename void foo(int i)
+{
+  constexpr int a[] = { i };	// { dg-error  }
+}


Re: C++ PATCH for c++/60252 (ICE with VLA in lambda parameter)

2014-02-21 Thread Jason Merrill

On 02/21/2014 09:10 AM, Jason Merrill wrote:

While parsing the template parameter list for a lambda, we've already
pushed into the closure class but haven't created the op()
FUNCTION_DECL, so trying to capture 'this' by way of the 'this' pointer
of op() breaks.  Avoid the ICE by not trying to capture 'this' when
parsing a parameter list.


On second thought, I'd rather not depend on the parsing state here, 
since we don't always update current_binding_level during template 
instantiation.  So let's check for the actual problem instead.


Tested x86_64-pc-linux-gnu, applying to trunk.


commit 5ca06118071f28b060b751415d18f8af4968a0a4
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 15:06:47 2014 -0500

	PR c++/60252
	* lambda.c (maybe_resolve_dummy): Check lambda_function rather
	than current_binding_level.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 7fe235b..277dec6 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -749,10 +749,8 @@ maybe_resolve_dummy (tree object)
   if (type != current_class_type
current_class_type
LAMBDA_TYPE_P (current_class_type)
-   DERIVED_FROM_P (type, current_nonlambda_class_type ())
-  /* If we get here while parsing the parameter list of a lambda, it
-	 will fail, so don't even try (c++/60252).  */
-   current_binding_level-kind != sk_function_parms)
+   lambda_function (current_class_type)
+   DERIVED_FROM_P (type, current_nonlambda_class_type ()))
 {
   /* In a lambda, need to go through 'this' capture.  */
   tree lam = CLASSTYPE_LAMBDA_EXPR (current_class_type);


C++ PATCH for c++/60185 (ICE with invalid default arg in template)

2014-02-21 Thread Jason Merrill
To avoid problems trying to resolve an invalid use of 'this' before 
diagnosing it later, let's do the same thing we do in 
tsubst_default_argument, namely clear current_class_{ptr,ref}.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f1051ca23020746350bacff3c499b2a9d1ec0dff
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 15:08:28 2014 -0500

	PR c++/60185
	* parser.c (cp_parser_default_argument): Clear
	current_class_ptr/current_class_ref like tsubst_default_argument.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 7bbdf90..47a67c4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18633,8 +18633,24 @@ cp_parser_default_argument (cp_parser *parser, bool template_parm_p)
   /* Parse the assignment-expression.  */
   if (template_parm_p)
 push_deferring_access_checks (dk_no_deferred);
+  tree saved_class_ptr = NULL_TREE;
+  tree saved_class_ref = NULL_TREE;
+  /* The this pointer is not valid in a default argument.  */
+  if (cfun)
+{
+  saved_class_ptr = current_class_ptr;
+  cp_function_chain-x_current_class_ptr = NULL_TREE;
+  saved_class_ref = current_class_ref;
+  cp_function_chain-x_current_class_ref = NULL_TREE;
+}
   default_argument
 = cp_parser_initializer (parser, is_direct_init, non_constant_p);
+  /* Restore the this pointer.  */
+  if (cfun)
+{
+  cp_function_chain-x_current_class_ptr = saved_class_ptr;
+  cp_function_chain-x_current_class_ref = saved_class_ref;
+}
   if (BRACE_ENCLOSED_INITIALIZER_P (default_argument))
 maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
   if (template_parm_p)
diff --git a/gcc/testsuite/g++.dg/overload/defarg5.C b/gcc/testsuite/g++.dg/overload/defarg5.C
index 06ea6bf..d022b0c 100644
--- a/gcc/testsuite/g++.dg/overload/defarg5.C
+++ b/gcc/testsuite/g++.dg/overload/defarg5.C
@@ -2,6 +2,6 @@
 
 struct A
 {
-  int i;
-  A() { void foo(int=i); }	// { dg-error this }
+  int i;			// { dg-message  }
+  A() { void foo(int=i); }	// { dg-error  }
 };
diff --git a/gcc/testsuite/g++.dg/template/defarg17.C b/gcc/testsuite/g++.dg/template/defarg17.C
new file mode 100644
index 000..38d68d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/defarg17.C
@@ -0,0 +1,9 @@
+// PR c++/60185
+
+templateint struct A
+{
+  int i;			// { dg-message  }
+  A() { void foo(int=i); }	// { dg-error  }
+};
+
+A0 a;


C++ PATCH for c++/60108 (ICE with defaulted virtual in template)

2014-02-21 Thread Jason Merrill
emit_associated_thunks expects DECL_INTERFACE_KNOWN to be set, but we 
weren't setting it in this case (as opposed to the case where the 
destructor is implicitly declared) because it has 
DECL_TEMPLATE_INSTANTIATION set.  Fixed by checking for 
DECL_DEFAULTED_FN as well.


Tested x86_64-pc-linux-gnu, applying to trunk and 4.8.
commit 670511e83f8bb5df8dd87bfbd3b8a9625ba9963f
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 15:37:45 2014 -0500

	PR c++/60108
	* semantics.c (expand_or_defer_fn_1): Check DECL_DEFAULTED_FN.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 6f32496..85d6807 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3986,7 +3986,7 @@ expand_or_defer_fn_1 (tree fn)
 	 linkage of all functions, and as that causes writes to
 	 the data mapped in from the PCH file, it's advantageous
 	 to mark the functions at this point.  */
-	  if (!DECL_IMPLICIT_INSTANTIATION (fn))
+	  if (!DECL_IMPLICIT_INSTANTIATION (fn) || DECL_DEFAULTED_FN (fn))
 	{
 	  /* This function must have external linkage, as
 		 otherwise DECL_INTERFACE_KNOWN would have been
diff --git a/gcc/testsuite/g++.dg/cpp0x/defaulted48.C b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C
new file mode 100644
index 000..727afc5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/defaulted48.C
@@ -0,0 +1,17 @@
+// PR c++/60108
+// { dg-require-effective-target c++11 }
+
+templateint struct A
+{
+  virtual ~A();
+};
+
+templatetypename struct B : A0, A1
+{
+  ~B() = default;
+};
+
+struct C : Bbool
+{
+  C() {}
+};


Re: [google gcc-4_8] not split bb for machine dependent builtins

2014-02-21 Thread Xinliang David Li
Ok. I expect this also submitted  to trunk later.

David

On Fri, Feb 21, 2014 at 2:08 PM, Rong Xu x...@google.com wrote:
 Hi,

 For builtins without nothrow attributes, we currently split bb by adding
 fake edge to func_exit in instrumenting profile counters. While it's safe,
 The resulted control flow and additional counters drastically increase the
 compile time for programs with lots of builtin calls.
 This patch suppresses the adding of the fake edges for machine dependent
 builtins.

 This is for google branch only.

 Tested with SPEC2006, google internal benchmarks and bootstrap.

 OK to commit?

 Thanks,

 -Rong




C++ PATCH for c++/58170 (ICE with alias template)

2014-02-21 Thread Jason Merrill
There's no reason why we wouldn't check for dependent scopes when 
parsing the target of an alias declaration, and indeed not doing so led 
to the ICE here.


The rest of the patch improves the diagnostic for this testcase (and 
some others).


Tested x86_64-pc-linux-gnu, applying to trunk.  Also applying the 
cp_parser_type_name hunk to 4.8.
commit 21f4a8a5550498513e1235239b69aa5bc537687b
Author: Jason Merrill ja...@redhat.com
Date:   Fri Feb 21 16:58:21 2014 -0500

	PR c++/58170
	* parser.c (cp_parser_type_name): Always check dependency.
	(cp_parser_type_specifier_seq): Call
	cp_parser_parse_and_diagnose_invalid_type_name.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 47a67c4..1e98032 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -14763,7 +14763,7 @@ cp_parser_type_name (cp_parser* parser)
 	 instantiation of an alias template...  */
   type_decl = cp_parser_template_id (parser,
 	 /*template_keyword_p=*/false,
-	 /*check_dependency_p=*/false,
+	 /*check_dependency_p=*/true,
 	 none_type,
 	 /*is_declaration=*/false);
   /* Note that this must be an instantiation of an alias template
@@ -18083,7 +18083,16 @@ cp_parser_type_specifier_seq (cp_parser* parser,
 	 type-specifier-seq at all.  */
 	  if (!seen_type_specifier)
 	{
-	  cp_parser_error (parser, expected type-specifier);
+	  /* Set in_declarator_p to avoid skipping to the semicolon.  */
+	  int in_decl = parser-in_declarator_p;
+	  parser-in_declarator_p = true;
+
+	  if (cp_parser_uncommitted_to_tentative_parse_p (parser)
+		  || !cp_parser_parse_and_diagnose_invalid_type_name (parser))
+		cp_parser_error (parser, expected type-specifier);
+
+	  parser-in_declarator_p = in_decl;
+
 	  type_specifier_seq-type = error_mark_node;
 	  return;
 	}
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C
new file mode 100644
index 000..f8bff78
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-40.C
@@ -0,0 +1,33 @@
+// PR c++/58170
+// { dg-require-effective-target c++11 }
+// { dg-prune-output not declared }
+// { dg-prune-output expected }
+
+template typename T, typename U
+struct base {
+  template typename V
+  struct derived;
+};
+
+template typename T, typename U
+template typename V
+struct baseT, U::derived : public baseT, V {
+};
+
+// This (wrong?) alias declaration provokes the crash.
+template typename T, typename U, typename V
+using alias = baseT, U::derivedV; // { dg-error template|typename }
+
+// This one works:
+// template typename T, typename U, typename V
+// using alias = typename baseT, U::template derivedV;
+
+template typename T
+void f() {
+  aliasT, bool, char m{};
+  (void) m;
+}
+
+int main() {
+  fint();
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/error8.C b/gcc/testsuite/g++.dg/cpp0x/error8.C
index cc4f877..a992077 100644
--- a/gcc/testsuite/g++.dg/cpp0x/error8.C
+++ b/gcc/testsuite/g++.dg/cpp0x/error8.C
@@ -3,5 +3,5 @@
 
 struct A
 {
-  int* p = new foo; // { dg-error 16:expected type-specifier }
+  int* p = new foo; // { dg-error 16:foo. does not name a type }
 };
diff --git a/gcc/testsuite/g++.dg/cpp0x/override4.C b/gcc/testsuite/g++.dg/cpp0x/override4.C
index aec5c2c..695f9a3 100644
--- a/gcc/testsuite/g++.dg/cpp0x/override4.C
+++ b/gcc/testsuite/g++.dg/cpp0x/override4.C
@@ -16,12 +16,12 @@ struct B2
 
 struct B3
 {
-  virtual auto f() - final void; // { dg-error expected type-specifier }
+  virtual auto f() - final void; // { dg-error type }
 };
 
 struct B4
 {
-  virtual auto f() - final void {} // { dg-error expected type-specifier }
+  virtual auto f() - final void {} // { dg-error type }
 };
 
 struct D : B
@@ -36,10 +36,10 @@ struct D2 : B
 
 struct D3 : B
 {
-  virtual auto g() - override void; // { dg-error expected type-specifier }
+  virtual auto g() - override void; // { dg-error type }
 };
 
 struct D4 : B
 {
-  virtual auto g() - override void {} // { dg-error expected type-specifier }
+  virtual auto g() - override void {} // { dg-error type }
 };
diff --git a/gcc/testsuite/g++.dg/ext/underlying_type1.C b/gcc/testsuite/g++.dg/ext/underlying_type1.C
index a8f68d3..999cd9f 100644
--- a/gcc/testsuite/g++.dg/ext/underlying_type1.C
+++ b/gcc/testsuite/g++.dg/ext/underlying_type1.C
@@ -8,7 +8,7 @@ templatetypename T
   { typedef __underlying_type(T) type; }; // { dg-error not an enumeration }
 
 __underlying_type(int) i1; // { dg-error not an enumeration|invalid }
-__underlying_type(A)   i2; // { dg-error expected }
+__underlying_type(A)   i2; // { dg-error expected|type }
 __underlying_type(B)   i3; // { dg-error not an enumeration|invalid }
 __underlying_type(U)   i4; // { dg-error not an enumeration|invalid }
 
diff --git a/gcc/testsuite/g++.dg/parse/crash48.C b/gcc/testsuite/g++.dg/parse/crash48.C
index 4541548..020ddf0 100644
--- a/gcc/testsuite/g++.dg/parse/crash48.C
+++ b/gcc/testsuite/g++.dg/parse/crash48.C
@@ -5,5 +5,5 @@ void
 foo (bool b)
 {