Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Andreas Schwab
Marc Glisse marc.gli...@inria.fr writes:

 /usr/local/gcc/gcc-20140204/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:32:6:
  error: alignment for 'void g()' must be at least 16

 (I don't know why we error out for this, align specifies a minimal
 alignment, if it ends up more aligned, that's fine)

Probably because there is no packed attribute for functions, so the
alignment is always taken to be exact.

 Feel free to replace 8 with 16 in the initialization of size (you can
 commit it as obvious if it works for you).

Are there any targets that may reject an overaligned function decl?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote:
  Feel free to replace 8 with 16 in the initialization of size (you can
  commit it as obvious if it works for you).
 
 Are there any targets that may reject an overaligned function decl?

I think that is very well possible.  Why not just #ifdef that particular
test line to a handful of widely used targets where it is known to work?

Jakub


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Dominique Dhumieres
The test g++.dg/cpp0x/constexpr-attribute2.C fails on darwin
with

FAIL: g++.dg/cpp0x/constexpr-attribute2.C (test for excess errors)
Excess errors:
/opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:9:40: error: 
'init_priority' attribute is not supported on this platform
/opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:16:46: error: 
'init_priority' attribute is not supported on this platform
/opt/gcc/_clean/gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C:23:45: error: 
'init_priority' attribute is not supported on this platform

IMO the test should check that 'init_priority' is supported.

TIA

Dominique


[PATCH][4.10] Move PRE:inhibit_phi_insertion to where it better applies

2014-02-05 Thread Richard Biener

The following patch moves the code that avoids creating loop
carried data dependences in PRE when later vectorizing to a place
where it can use the optimizations PRE performs (in particular
loop invariant motion).  This makes sure it applies for Himeno
when built with -fipa-pta which then allows the critical loop
to be vectorized without any runtime checks for aliasing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for 4.10.

Richard.

2014-02-05  Richard Biener  rguent...@suse.de

PR tree-optimization/60042
* tree-ssa-pre.c (inhibit_phi_insertion): Remove.
(insert_into_preds_of_block): Do not prevent PHI insertion
for REFERENCE exprs here ...
(eliminate_dom_walker::before_dom_children): ... but prevent
their use here under similar conditions when applied to the
IL after PRE optimizations.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 207455)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_expression_by_pieces (basic_block
*** 3013,3078 
  }
  
  
- /* Returns true if we want to inhibit the insertions of PHI nodes
-for the given EXPR for basic block BB (a member of a loop).
-We want to do this, when we fear that the induction variable we
-create might inhibit vectorization.  */
- 
- static bool
- inhibit_phi_insertion (basic_block bb, pre_expr expr)
- {
-   vn_reference_t vr = PRE_EXPR_REFERENCE (expr);
-   vecvn_reference_op_s ops = vr-operands;
-   vn_reference_op_t op;
-   unsigned i;
- 
-   /* If we aren't going to vectorize we don't inhibit anything.  */
-   if (!flag_tree_loop_vectorize)
- return false;
- 
-   /* Otherwise we inhibit the insertion when the address of the
-  memory reference is a simple induction variable.  In other
-  cases the vectorizer won't do anything anyway (either it's
-  loop invariant or a complicated expression).  */
-   FOR_EACH_VEC_ELT (ops, i, op)
- {
-   switch (op-opcode)
-   {
-   case CALL_EXPR:
- /* Calls are not a problem.  */
- return false;
- 
-   case ARRAY_REF:
-   case ARRAY_RANGE_REF:
- if (TREE_CODE (op-op0) != SSA_NAME)
-   break;
- /* Fallthru.  */
-   case SSA_NAME:
- {
-   basic_block defbb = gimple_bb (SSA_NAME_DEF_STMT (op-op0));
-   affine_iv iv;
-   /* Default defs are loop invariant.  */
-   if (!defbb)
- break;
-   /* Defined outside this loop, also loop invariant.  */
-   if (!flow_bb_inside_loop_p (bb-loop_father, defbb))
- break;
-   /* If it's a simple induction variable inhibit insertion,
-  the vectorizer might be interested in this one.  */
-   if (simple_iv (bb-loop_father, bb-loop_father,
-  op-op0, iv, true))
- return true;
-   /* No simple IV, vectorizer can't do anything, hence no
-  reason to inhibit the transformation for this operand.  */
-   break;
- }
-   default:
- break;
-   }
- }
-   return false;
- }
- 
  /* Insert the to-be-made-available values of expression EXPRNUM for each
 predecessor, stored in AVAIL, into the predecessors of BLOCK, and
 merge the result with a phi node, given the same value number as
--- 3013,3018 
*** insert_into_preds_of_block (basic_block
*** 3106,3113 
EDGE_PRED (block, 1)-src);
/* Induction variables only have one edge inside the loop.  */
if ((firstinsideloop ^ secondinsideloop)
!  (expr-kind != REFERENCE
! || inhibit_phi_insertion (block, expr)))
{
  if (dump_file  (dump_flags  TDF_DETAILS))
fprintf (dump_file, Skipping insertion of phi for partial 
redundancy: Looks like an induction variable\n);
--- 3046,3052 
EDGE_PRED (block, 1)-src);
/* Induction variables only have one edge inside the loop.  */
if ((firstinsideloop ^ secondinsideloop)
!  expr-kind != REFERENCE)
{
  if (dump_file  (dump_flags  TDF_DETAILS))
fprintf (dump_file, Skipping insertion of phi for partial 
redundancy: Looks like an induction variable\n);
*** eliminate_dom_walker::before_dom_childre
*** 4234,4239 
--- 4173,4228 
  
  gcc_assert (sprime != rhs);
  
+ /* Inhibit the use of an inserted PHI on a loop header when
+the address of the memory reference is a simple induction
+variable.  In other cases the vectorizer won't do anything
+anyway (either it's loop invariant or a complicated
+expression).  */
+ if (flag_tree_loop_vectorize
+  gimple_assign_single_p (stmt)
+  

Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Marc Glisse

On Wed, 5 Feb 2014, Jakub Jelinek wrote:


On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote:

Feel free to replace 8 with 16 in the initialization of size (you can
commit it as obvious if it works for you).


Are there any targets that may reject an overaligned function decl?


I think that is very well possible.  Why not just #ifdef that particular
test line to a handful of widely used targets where it is known to work?


With Dominique's message that Darwin doesn't support init_priority, would 
this be ok? That's the most frequently tested platform, some attributes 
are already tested elsewhere, etc.


// { dg-do compile { target x86_64-*linux-gnu } }


--
Marc Glisse


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 09:58:28AM +0100, Marc Glisse wrote:
 On Wed, 5 Feb 2014, Jakub Jelinek wrote:
 
 On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote:
 Feel free to replace 8 with 16 in the initialization of size (you can
 commit it as obvious if it works for you).
 
 Are there any targets that may reject an overaligned function decl?
 
 I think that is very well possible.  Why not just #ifdef that particular
 test line to a handful of widely used targets where it is known to work?
 
 With Dominique's message that Darwin doesn't support init_priority,
 would this be ok? That's the most frequently tested platform, some
 attributes are already tested elsewhere, etc.
 
 // { dg-do compile { target x86_64-*linux-gnu } }

I'd use i?86-*linux* x86_64-*linux* instead, but otherwise LGTM.  Please
give others a chance to comment on that though.

Jakub


[PATCH] Improve diagnostics for PR60042

2014-02-05 Thread Richard Biener

The following improves dumping / messages for vectorizer runtime
alias checks.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.  I'll consider
this documentation and thus appropriate at this stage.

Richard.

2014-02-05  Richard Biener  rguent...@suse.de

* tree-vect-loop.c (vect_analyze_loop_2): Be more informative
when not vectorizing because of too many alias checks.
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Add more verboseness, avoid duplicate MSG_MISSED_OPTIMIZATION.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 207497)
--- gcc/tree-vect-loop.c(working copy)
*** vect_analyze_loop_2 (loop_vec_info loop_
*** 1723,1730 
  {
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!too long list of versioning for alias 
!run-time tests.\n);
return false;
  }
  
--- 1723,1732 
  {
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!number of versioning for alias 
!run-time tests exceeds %d 
!(--param vect-max-version-for-alias-checks)\n,
!PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
return false;
  }
  
Index: gcc/tree-vect-data-refs.c
===
*** gcc/tree-vect-data-refs.c   (revision 207497)
--- gcc/tree-vect-data-refs.c   (working copy)
*** vect_prune_runtime_alias_test_list (loop
*** 2901,2906 
--- 2923,2946 
   diff - (HOST_WIDE_INT) TREE_INT_CST_LOW (dr_a1-seg_len) 
 min_seg_len_b))
{
+ if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  merging ranges for );
+ dump_generic_expr (MSG_NOTE, TDF_SLIM,
+DR_REF (dr_a1-dr));
+ dump_printf (MSG_NOTE,  , );
+ dump_generic_expr (MSG_NOTE, TDF_SLIM,
+DR_REF (dr_b1-dr));
+ dump_printf (MSG_NOTE,   and );
+ dump_generic_expr (MSG_NOTE, TDF_SLIM,
+DR_REF (dr_a2-dr));
+ dump_printf (MSG_NOTE,  , );
+ dump_generic_expr (MSG_NOTE, TDF_SLIM,
+DR_REF (dr_b2-dr));
+ dump_printf (MSG_NOTE, \n);
+   }
+ 
  dr_a1-seg_len = size_binop (PLUS_EXPR,
   dr_a2-seg_len, size_int (diff));
  comp_alias_ddrs.ordered_remove (i--);
*** vect_prune_runtime_alias_test_list (loop
*** 2908,2925 
}
  }
  
if ((int) comp_alias_ddrs.length () 
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
! {
!   if (dump_enabled_p ())
!   {
! dump_printf_loc (MSG_MISSED_OPTIMIZATION,  vect_location,
!  disable versioning for alias - max number of 
!  generated checks exceeded.\n);
!   }
! 
!   return false;
! }
  
return true;
  }
--- 2948,2959 
}
  }
  
+   dump_printf_loc (MSG_NOTE, vect_location,
+  improved number of alias checks from %d to %d\n,
+  may_alias_ddrs.length (), comp_alias_ddrs.length ());
if ((int) comp_alias_ddrs.length () 
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS))
! return false;
  
return true;
  }


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Marc Glisse

On Wed, 5 Feb 2014, Jakub Jelinek wrote:


On Wed, Feb 05, 2014 at 09:58:28AM +0100, Marc Glisse wrote:

On Wed, 5 Feb 2014, Jakub Jelinek wrote:


On Wed, Feb 05, 2014 at 09:28:16AM +0100, Andreas Schwab wrote:

Feel free to replace 8 with 16 in the initialization of size (you can
commit it as obvious if it works for you).


Are there any targets that may reject an overaligned function decl?


I think that is very well possible.  Why not just #ifdef that particular
test line to a handful of widely used targets where it is known to work?


With Dominique's message that Darwin doesn't support init_priority,
would this be ok? That's the most frequently tested platform, some
attributes are already tested elsewhere, etc.

// { dg-do compile { target x86_64-*linux-gnu } }


I'd use i?86-*linux* x86_64-*linux* instead, but otherwise LGTM.


I wasn't sure if all libc (not just glibc) used on linux could handle all 
the attributes, but now you've said it, good :-)



Please give others a chance to comment on that though.


I'll give it a day (it can easily be changed again later).

--
Marc Glisse


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Dominique Dhumieres
 I'll give it a day (it can easily be changed again later).

IMO it would be better (more general) to use a dg-require-effective-target
with the corresponding test(s) in target-supports-dg.exp.

Dominique


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Marc Glisse

On Wed, 5 Feb 2014, Dominique Dhumieres wrote:


I'll give it a day (it can easily be changed again later).


IMO it would be better (more general) to use a dg-require-effective-target
with the corresponding test(s) in target-supports-dg.exp.


But which tests? One for over/under alignment of functions, one for 
init_priority, maybe yet one more, all that for a file that is just 
checking that the parser understands the attribute argument is a 
constant...


I agree on the principle, but unless those tests already exist, it doesn't 
seem worth it (well, if you are volunteering to write them, I certainly 
won't object).


--
Marc Glisse


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 10:43:24AM +0100, Marc Glisse wrote:
 On Wed, 5 Feb 2014, Dominique Dhumieres wrote:
 
 I'll give it a day (it can easily be changed again later).
 
 IMO it would be better (more general) to use a dg-require-effective-target
 with the corresponding test(s) in target-supports-dg.exp.
 
 But which tests? One for over/under alignment of functions, one for
 init_priority, maybe yet one more, all that for a file that is just
 checking that the parser understands the attribute argument is a
 constant...
 
 I agree on the principle, but unless those tests already exist, it
 doesn't seem worth it (well, if you are volunteering to write them,
 I certainly won't object).

Yeah, the tests really don't test anything target specific, but test it
on attributes that have target specific meaning/applicability.
So it is enough to catch it on some most widely tested targets, any
regression in that area will be known the same day when it breaks.

Jakub


Re: PR ipa/59831 (ipa-cp devirt issues)

2014-02-05 Thread Martin Jambor
Hi,

On Wed, Feb 05, 2014 at 12:47:30AM +0100, Jan Hubicka wrote:
   -  if (TREE_CODE (t) != TREE_BINFO)
   +  /* Try to work out BINFO from virtual table pointer value in 
   replacements.  */
   +  if (!t  agg_reps  !ie-indirect_info-by_ref)
  
  At this point you know that !ie-indirect_info-polymorphic is set and
  thus ie-indirect_info-by_ref is always false because it really has
  no meaning (it is only meaningful when agg_contents is set which is
  mutually exclusive with the polymorphic flag).
 
 I was worried here about case where in future we may want to represent call of
 virtual methods from pointers passed by reference (i.e. in some other object).
 We don't do that at the moment, but for that would really need better jump
 functions.
 
 If you preffer, I can remove that check.
 

I think it would be better, yes.  IIRC, We want to re-organize
indirect_info anyway (I vaguely remember we want to split the
overloaded offset field into two but forgot the exact reason why but I
have it written somewhere), I suppose we'll be turning it into a union
(or class hierarchy?) and this would make it slightly awkward.  If we
ever support the pointer-by-reference scenario, quite a few more
places will need to be adjusted anyway.

Thanks,

Martin


Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.

2014-02-05 Thread Venkataramanan Kumar
Hi Marcus,

 +  ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0
 +  [(set_attr length 12)])

 This pattern emits an opaque sequence of instructions that cannot be
 scheduled, is that necessary? Can we not expand individual
 instructions or at least split ?

Almost all the ports emits a template of assembly instructions.
I m not sure why they have to be generated this way.
But usage of these pattern is to clear the register that holds canary
value immediately after its usage.

 -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
 +/* { dg-do compile { target stack_protection } } */
  /* { dg-options -O2 -fstack-protector-strong } */

 Do we need a new effective target test, why is the existing
 fstack_protector not appropriate?

stack_protector does a run time test. It failed in cross compilation
environment and these are compile only tests.
Also I thought  richard suggested  me to add a new option for this.
ref: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03358.html

regards,
Venkat.

On 4 February 2014 21:39, Marcus Shawcroft marcus.shawcr...@gmail.com wrote:
 Hi Venkat,

 On 22 January 2014 16:57, Venkataramanan Kumar
 venkataramanan.ku...@linaro.org wrote:
 Hi Marcus,

 After we changed the frame growing direction (downwards) in Aarch64,
 the back-end now generates stack smashing set and test based on
 generic code available in GCC.

 But most of the ports (i386, spu, rs6000, s390, sh, sparc, tilepro and
 tilegx) define machine descriptions using standard pattern names
 'stack_protect_set' and 'stack_protect_test'. This is done for both
 TLS model as well as global variable based stack guard model.

 +  
 +  ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0
 +  [(set_attr length 12)])

 This pattern emits an opaque sequence of instructions that cannot be
 scheduled, is that necessary? Can we not expand individual
 instructions or at least split ?

 +  ldr\t%x3, %x1\;ldr\t%x0, %x2\;eor\t%x0, %x3, %x0
 +  [(set_attr length 12)])

 Likewise.

 -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */
 +/* { dg-do compile { target stack_protection } } */
  /* { dg-options -O2 -fstack-protector-strong } */

 Do we need a new effective target test, why is the existing
 fstack_protector not appropriate?

 Cheers
 /Marcus


Re: [ARM Documentation] Clarify -mcpu, -mtune, -march

2014-02-05 Thread James Greenhalgh
*ping*

Thanks,
James

On Mon, Jan 27, 2014 at 10:01:51AM +, James Greenhalgh wrote:
 
 Hi,
 
 I've tripped myself over with these three options too many times,
 actually, their behaviour is very simple.
 
 This patch clarifies the language used to describe the options, and
 puts them in a logical order. I'm happy to reword again if this
 is still not clear.
 
 OK?
 
 Thanks,
 James
 
 ---
 gcc/
 
 2014-01-27  James Greenhalgh  james.greenha...@arm.com
 
   PR target/59718
   * doc/invoke.texi (-march=): Clarify documentation for ARM.
   (-mtune=): Likewise.
   (-mcpu=): Likewise.
 

 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
 index 8c620a5..38a55a0 100644
 --- a/gcc/doc/invoke.texi
 +++ b/gcc/doc/invoke.texi
 @@ -12221,11 +12221,35 @@ option should only be used if you require 
 compatibility with code for
  big-endian ARM processors generated by versions of the compiler prior to
  2.8.  This option is now deprecated.
  
 -@item -mcpu=@var{name}
 -@opindex mcpu
 -This specifies the name of the target ARM processor.  GCC uses this name
 -to determine what kind of instructions it can emit when generating
 -assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
 +@item -march=@var{name}
 +@opindex march
 +This specifies the name of the target ARM architecture.  GCC uses this
 +name to determine what kind of instructions it can emit when generating
 +assembly code.  This option can be used in conjunction with or instead
 +of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
 +@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
 +@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
 +@samp{armv6}, @samp{armv6j},
 +@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk}, @samp{armv6-m},
 +@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m},
 +@samp{armv8-a}, @samp{armv8-a+crc},
 +@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}.
 +
 +@option{-march=armv8-a+crc} enables code generation for the ARMv8-A
 +architecture together with the optional CRC32 extensions.
 +
 +@option{-march=native} causes the compiler to auto-detect the architecture
 +of the build computer.  At present, this feature is only supported on
 +Linux, and not all architectures are recognized.  If the auto-detect is
 +unsuccessful the option has no effect.
 +
 +@item -mtune=@var{name}
 +@opindex mtune
 +This option specifies the name of the target ARM processor for
 +which GCC should tune the performance of the code.
 +For some ARM implementations better performance can be obtained by using
 +this option.
 +Permissible names are: @samp{arm2}, @samp{arm250},
  @samp{arm3}, @samp{arm6}, @samp{arm60}, @samp{arm600}, @samp{arm610},
  @samp{arm620}, @samp{arm7}, @samp{arm7m}, @samp{arm7d}, @samp{arm7dm},
  @samp{arm7di}, @samp{arm7dmi}, @samp{arm70}, @samp{arm700},
 @@ -12259,26 +12283,6 @@ Additionally, this option can specify that GCC 
 should tune the performance
  of the code for a big.LITTLE system.  Permissible names are:
  @samp{cortex-a15.cortex-a7}, @samp{cortex-a57.cortex-a53}.
  
 -@option{-mcpu=generic-@var{arch}} is also permissible, and is
 -equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}.
 -See @option{-mtune} for more information.
 -
 -@option{-mcpu=native} causes the compiler to auto-detect the CPU
 -of the build computer.  At present, this feature is only supported on
 -Linux, and not all architectures are recognized.  If the auto-detect is
 -unsuccessful the option has no effect.
 -
 -@item -mtune=@var{name}
 -@opindex mtune
 -This option is very similar to the @option{-mcpu=} option, except that
 -instead of specifying the actual target processor type, and hence
 -restricting which instructions can be used, it specifies that GCC should
 -tune the performance of the code as if the target were of the type
 -specified in this option, but still choosing the instructions it
 -generates based on the CPU specified by a @option{-mcpu=} option.
 -For some ARM implementations better performance can be obtained by using
 -this option.
 -
  @option{-mtune=generic-@var{arch}} specifies that GCC should tune the
  performance for a blend of processors within architecture @var{arch}.
  The aim is to generate code that run well on the current most popular
 @@ -12291,24 +12295,21 @@ of the build computer.  At present, this feature is 
 only supported on
  Linux, and not all architectures are recognized.  If the auto-detect is
  unsuccessful the option has no effect.
  
 -@item -march=@var{name}
 -@opindex march
 -This specifies the name of the target ARM architecture.  GCC uses this
 -name to determine what kind of instructions it can emit when generating
 -assembly code.  This option can be used in conjunction with or instead
 -of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
 -@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
 -@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
 -@samp{armv6}, 

RFA: MN10300: Include saved registers in stack usage

2014-02-05 Thread Nick Clifton
Hi Jeff, Hi Alex,

  Sorry - another MN10300 patch - this time for the stack size reported
  with -fstack-usage.  Currently the value includes the stack frame, but
  it does not take into account any registers that might have been
  pushed onto the stack.

  The patch below fixes this problem, but there is one thing that I am
  not sure about - is it OK to use __builtin_popcount() or should I be
  calling some other function ?

  Tested with no regression on an mn10300-elf toolchain.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2014-02-05  Nick Clifton  ni...@redhat.com

* config/mn10300/mn10300.c (mn10300_expand_prologue): Include
saved registers in current function's stack size.

Index: gcc/config/mn10300/mn10300.c
===
--- gcc/config/mn10300/mn10300.c(revision 207498)
+++ gcc/config/mn10300/mn10300.c(working copy)
@@ -745,13 +745,15 @@
 mn10300_expand_prologue (void)
 {
   HOST_WIDE_INT size = mn10300_frame_size ();
+  int mask;
 
+  mask = mn10300_get_live_callee_saved_regs (NULL);
+  /* If we use any of the callee-saved registers, save them now.  */
+  mn10300_gen_multiple_store (mask);
+
   if (flag_stack_usage_info)
-current_function_static_stack_size = size;
+current_function_static_stack_size = size + __builtin_popcount (mask) * 4;
 
-  /* If we use any of the callee-saved registers, save them now.  */
-  mn10300_gen_multiple_store (mn10300_get_live_callee_saved_regs (NULL));
-
   if (TARGET_AM33_2  fp_regs_to_save ())
 {
   int num_regs_to_save = fp_regs_to_save (), i;
@@ -767,6 +769,9 @@
   unsigned int strategy_size = (unsigned)-1, this_strategy_size;
   rtx reg;
 
+  if (flag_stack_usage_info)
+   current_function_static_stack_size += num_regs_to_save * 4;
+
   /* We have several different strategies to save FP registers.
 We can store them using SP offsets, which is beneficial if
 there are just a few registers to save, or we can use `a0' in



[PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060

2014-02-05 Thread Richard Biener

When looking at PR60060, outputting the declaration debug info for
a local static variable twice (once via BLOCK_VARS and once via
calling the debug hook on all globals) I noticed that the
rest_of_decl_compilation call in materialize_cgraph always works
on the empty lto_global_var_decls vector (we only populate it
later).  That results in the opportunity to effectively remove it
and in lto_write_globals walk over the defined vars in the
varpool instead.

Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN
decls there (but I'm not sure of other side-effects of that ...).

Thus, this cleanup first.

LTO bootstrap / regtest running on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2014-02-05  Richard Biener  rguent...@suse.de

lto/
* lto.h (lto_global_var_decls): Remove.
* lto-lang.c (lto_init): Do not allocate lto_global_var_decls.
(lto_write_globals): Do nothing in WPA stage, gather globals from
the varpool here ...
* lto.c (lto_main): ... not here.
(materialize_cgraph): Do not call rest_of_decl_compilation
on the empty lto_global_var_decls vector.
(lto_global_var_decls): Remove.

Index: gcc/lto/lto-lang.c
===
*** gcc/lto/lto-lang.c  (revision 207497)
--- gcc/lto/lto-lang.c  (working copy)
*** lto_getdecls (void)
*** 1075,1085 
  static void
  lto_write_globals (void)
  {
!   tree *vec = lto_global_var_decls-address ();
!   int len = lto_global_var_decls-length ();
wrapup_global_declarations (vec, len);
emit_debug_global_declarations (vec, len);
!   vec_free (lto_global_var_decls);
  }
  
  static tree
--- 1075,1094 
  static void
  lto_write_globals (void)
  {
!   if (flag_wpa)
! return;
! 
!   /* Record the global variables.  */
!   vectree lto_global_var_decls = vNULL;
!   varpool_node *vnode;
!   FOR_EACH_DEFINED_VARIABLE (vnode)
! lto_global_var_decls.safe_push (vnode-decl);
! 
!   tree *vec = lto_global_var_decls.address ();
!   int len = lto_global_var_decls.length ();
wrapup_global_declarations (vec, len);
emit_debug_global_declarations (vec, len);
!   lto_global_var_decls.release ();
  }
  
  static tree
*** lto_init (void)
*** 1218,1224 
  #undef NAME_TYPE
  
/* Initialize LTO-specific data structures.  */
-   vec_alloc (lto_global_var_decls, 256);
in_lto_p = true;
  
return true;
--- 1227,1232 
Index: gcc/lto/lto.c
===
*** gcc/lto/lto.c   (revision 207497)
--- gcc/lto/lto.c   (working copy)
*** along with GCC; see the file COPYING3.
*** 50,57 
  #include context.h
  #include pass_manager.h
  
- /* Vector to keep track of external variables we've seen so far.  */
- vectree, va_gc *lto_global_var_decls;
  
  static GTY(()) tree first_personality_decl;
  
--- 50,55 
*** read_cgraph_and_symbols (unsigned nfiles
*** 3009,3017 
  static void
  materialize_cgraph (void)
  {
-   tree decl;
struct cgraph_node *node; 
-   unsigned i;
timevar_id_t lto_timer;
  
if (!quiet_flag)
--- 3007,3013 
*** materialize_cgraph (void)
*** 3043,3052 
current_function_decl = NULL;
set_cfun (NULL);
  
-   /* Inform the middle end about the global variables we have seen.  */
-   FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl)
- rest_of_decl_compilation (decl, 1, 0);
- 
if (!quiet_flag)
  fprintf (stderr, \n);
  
--- 3039,3044 
*** lto_main (void)
*** 3309,3316 
do_whole_program_analysis ();
else
{
- varpool_node *vnode;
- 
  timevar_start (TV_PHASE_OPT_GEN);
  
  materialize_cgraph ();
--- 3301,3306 
*** lto_main (void)
*** 3330,3339 
 this.  */
  if (flag_lto_report || (flag_wpa  flag_lto_report_wpa))
print_lto_report_1 ();
- 
- /* Record the global variables.  */
- FOR_EACH_DEFINED_VARIABLE (vnode)
-   vec_safe_push (lto_global_var_decls, vnode-decl);
}
  }
  
--- 3320,3325 
Index: gcc/lto/lto.h
===
*** gcc/lto/lto.h   (revision 207497)
--- gcc/lto/lto.h   (working copy)
*** extern tree lto_eh_personality (void);
*** 41,49 
  extern void lto_main (void);
  extern void lto_read_all_file_options (void);
  
- /* In lto-symtab.c  */
- extern GTY(()) vectree, va_gc *lto_global_var_decls;
- 
  /* In lto-elf.c or lto-coff.c  */
  extern lto_file *lto_obj_file_open (const char *filename, bool writable);
  extern void lto_obj_file_close (lto_file *file);
--- 41,46 


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Andreas Schwab
Marc Glisse marc.gli...@inria.fr writes:

 On Wed, 5 Feb 2014, Dominique Dhumieres wrote:

 I'll give it a day (it can easily be changed again later).

 IMO it would be better (more general) to use a dg-require-effective-target
 with the corresponding test(s) in target-supports-dg.exp.

 But which tests? One for over/under alignment of functions, one for
 init_priority, maybe yet one more, all that for a file that is just
 checking that the parser understands the attribute argument is a
 constant...

Wrt. the function decl it should probably just be changed to a variable
decl.  Alignments on function decls are kind of exotic.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: [ARM Documentation] Clarify -mcpu, -mtune, -march

2014-02-05 Thread Ramana Radhakrishnan

On 01/27/14 10:01, James Greenhalgh wrote:


Hi,

I've tripped myself over with these three options too many times,
actually, their behaviour is very simple.

This patch clarifies the language used to describe the options, and
puts them in a logical order. I'm happy to reword again if this
is still not clear.

OK?

Thanks,
James

---
gcc/

2014-01-27  James Greenhalgh  james.greenha...@arm.com

PR target/59718
* doc/invoke.texi (-march=): Clarify documentation for ARM.
(-mtune=): Likewise.
(-mcpu=): Likewise.




s/=/ in Changelog :)


Where this option is used in conjunction with @option{-march} or 
@option{-mtune}, those options override this option


Also, rewording this sentence in the mcpu description like the AArch64 
version would be better.


i.e.

Where this option is used in conjunction with @option{-march} or 
@option{-mtune}, those options override appropriate parts of this option.


Ok with those changes.

Thanks,
Ramana

--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.



New Dutch PO file for 'gcc' (version 4.9-b20140202)

2014-02-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Dutch team of translators.  The file is available at:

http://translationproject.org/latest/gcc/nl.po

(This file, 'gcc-4.9-b20140202.nl.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Marc Glisse

On Wed, 5 Feb 2014, Andreas Schwab wrote:


Marc Glisse marc.gli...@inria.fr writes:


On Wed, 5 Feb 2014, Dominique Dhumieres wrote:


I'll give it a day (it can easily be changed again later).


IMO it would be better (more general) to use a dg-require-effective-target
with the corresponding test(s) in target-supports-dg.exp.


But which tests? One for over/under alignment of functions, one for
init_priority, maybe yet one more, all that for a file that is just
checking that the parser understands the attribute argument is a
constant...


Wrt. the function decl it should probably just be changed to a variable
decl.  Alignments on function decls are kind of exotic.


Good idea. How about this patch then?

--
Marc GlisseIndex: gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C
===
--- gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C   (revision 207501)
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-attribute2.C   (working copy)
@@ -1,10 +1,11 @@
+// { dg-do compile { target init_priority } }
 // { dg-options -std=gnu++11 }
 
 struct t { t(); };
 
 constexpr int prio = 123;
 constexpr int size = 8;
 constexpr int pos = 1;
 enum A { zero = 0, one, two };
 __attribute__((init_priority(prio))) t a;
 
@@ -22,11 +23,11 @@ enum E2 {
 };
 __attribute__((init_priority(E2_second))) t c;
 
 void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(pos,two)));
 void* my_realloc(void*, unsigned) __attribute__((alloc_size(two)));
 
 typedef char vec __attribute__((vector_size(size)));
 
 void f(char*) __attribute__((nonnull(pos)));
 
-void g() __attribute__((aligned(size)));
+char g __attribute__((aligned(size)));


[PATCH] Don't ICE on empty bbs with no successors in EH opt (PR middle-end/57499)

2014-02-05 Thread Jakub Jelinek
Hi!

As the testcase shows, invalid use of noreturn attribute (which we diagnose)
can result in in empty bb with no successors.  This patch fixes
cleanup_empty_eh not to ICE on these.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-02-04  Jakub Jelinek  ja...@redhat.com

PR middle-end/57499
* tree-eh.c (cleanup_empty_eh): Bail out on totally empty
bb with no successors.

* g++.dg/torture/pr57499.C: New test.

--- gcc/tree-eh.c.jj2014-01-31 22:22:09.0 +0100
+++ gcc/tree-eh.c   2014-02-04 12:35:05.611066852 +0100
@@ -4396,8 +4396,11 @@ cleanup_empty_eh (eh_landing_pad lp)
   /* If the block is totally empty, look for more unsplitting cases.  */
   if (gsi_end_p (gsi))
 {
-  /* For the degenerate case of an infinite loop bail out.  */
-  if (infinite_empty_loop_p (e_out))
+  /* For the degenerate case of an infinite loop bail out.
+If bb has no successors and is totally empty, which can happen e.g.
+because of incorrect noreturn attribute, bail out too.  */
+  if (e_out == NULL
+ || infinite_empty_loop_p (e_out))
return ret;
 
   return ret | cleanup_empty_eh_unsplit (bb, e_out, lp);
--- gcc/testsuite/g++.dg/torture/pr57499.C.jj   2014-02-04 12:52:10.241846755 
+0100
+++ gcc/testsuite/g++.dg/torture/pr57499.C  2014-02-04 12:51:51.0 
+0100
@@ -0,0 +1,14 @@
+// PR middle-end/57499
+// { dg-do compile }
+
+struct S
+{
+  ~S () __attribute__ ((noreturn)) {} // { dg-warning function does return }
+};
+
+void
+foo ()
+{
+  S s;
+  throw 1;
+}

Jakub


[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)

2014-02-05 Thread Jakub Jelinek
Hi!

Using !!optimize to determine if we should switch local ABI to regparm
convention isn't compatible with optimize attribute, as !!optimize is
whether the current function is being optimized, but for the ABI decisions
we actually need the caller and callee to agree on the calling convention.

Fixed by looking at callee's optimize settings all the time.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-02-05  Jakub Jelinek  ja...@redhat.com

PR target/60062
* config/i386/i386.c (ix86_function_regparm): Use optimize
from the callee instead of current function's optimize
to determine if local regparm convention should be used.

* gcc.c-torture/execute/pr60062.c: New test.
* gcc.c-torture/execute/pr60072.c: New test.

--- gcc/config/i386/i386.c.jj   2014-02-04 01:36:00.0 +0100
+++ gcc/config/i386/i386.c  2014-02-05 09:09:29.603827877 +0100
@@ -5608,11 +5608,22 @@ ix86_function_regparm (const_tree type,
   /* Use register calling convention for local functions when possible.  */
   if (decl
TREE_CODE (decl) == FUNCTION_DECL
-   optimize
!(profile_flag  !flag_fentry))
 {
   /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified.  */
   struct cgraph_local_info *i = cgraph_local_info (CONST_CAST_TREE (decl));
+
+  /* Caller and callee must agree on the calling convention, so
+checking here just optimize means that with
+__attribute__((optimize (...))) caller could use regparm convention
+and callee not, or vice versa.  Instead look at whether the callee
+is optimized or not.  */
+  tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl);
+  if (opts == NULL_TREE)
+   opts = optimization_default_node;
+  if (!TREE_OPTIMIZATION (opts)-x_optimize)
+   i = NULL;
+
   if (i  i-local  i-can_change_signature)
{
  int local_regparm, globals = 0, regno;
--- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 
09:02:13.703085235 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr60062.c   2014-02-05 
09:01:54.0 +0100
@@ -0,0 +1,25 @@
+/* PR target/60062 */
+
+int a;
+
+static void
+foo (const char *p1, int p2)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+static void
+bar (const char *p1)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+__attribute__((optimize (0))) int
+main ()
+{
+  foo (hello, a);
+  bar (hello);
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2013-08-25 
18:20:55.717911035 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr60072.c   2014-02-05 
11:33:07.501748426 +0100
@@ -0,0 +1,16 @@
+/* PR target/60072 */
+
+int c = 1;
+
+__attribute__ ((optimize (1)))
+static int *foo (int *p)
+{
+  return p;
+}
+
+int
+main ()
+{
+  *foo (c) = 2;
+  return c - 2;
+}

Jakub


[PATCH] Make pr59597 test PIC-friendly

2014-02-05 Thread Ian Bolton
PR59597 reinstated some code to cancel unnecessary jump threading,
and brought with it a testcase to check that the cancelling happened.

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01448.html

With PIC enabled for arm and aarch64, the unnecessary jump threading
already never took place, so there is nothing to cancel, leading the
test case to fail.  My suspicion is that similar issues will happen
for other architectures too.

This patch changes the called function to be static, so that jump
threading and the resulting cancellation happen for PIC variants too.

OK for stage 4 or wait for stage 1?

Cheers,
Ian


2014-02-05  Ian Bolton  ian.bol...@arm.com

testsuite/
* gcc.dg/tree-ssa/pr59597.c: Make called function static so that
expected outcome works for PIC variants too.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index 814d299..bc9d730 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -8,7 +8,8 @@ typedef unsigned int u32;
 
 u32 f[NNN], t[NNN];
 
-u16 Calc_crc8(u8 data, u16 crc )
+static u16
+Calc_crc8 (u8 data, u16 crc)
 {
   u8 i=0,x16=0,carry=0;
   for (i = 0; i  8; i++)
@@ -31,7 +32,9 @@ u16 Calc_crc8(u8 data, u16 crc )
 }
   return crc;
 }
-int main (int argc, char argv[])
+
+int
+main (int argc, char argv[])
 {
   int i, j; u16 crc;
   for (j = 0; j  1000; j++)


[PATCH] One possible fix for the compute_bb_predicate oscillation (PR ipa/60013)

2014-02-05 Thread Jakub Jelinek
Hi!

Here is one possible fix for the PR60013 compute_bb_predicates oscillation,
the PR contains long analysis, but to sum it up, the problem is that
we only allow 8 conjuction operands for the predicates and when we reach 8
and want to add some further one, we just throw some of the disjunctions
on the floor, which makes the dataflow computation not guaranteed to
terminate.

The following patch fixes that by just turning all predicates that would
need more than 8 CNF toplevel operands into always true predicates (meaning
that we conservatively consider the particular basic block to be potentially
always executed independently on the function arguments).

I've bootstrapped/regtested this patch on x86_64-linux and i686-linux,
the (i2 == MAX_CLAUSES) condition was true in 2447 cases while inside
of compute_bb_predicates (on average compute_bb_predicates call that
triggered at least one such conservative bail out would see 3.14 of those)
and 482 times outside of compute_bb_predicate, on 159 unique source files
something triggered.  But, the fact that it triggered doesn't mean at all
the function would be even considered for inlining and even if it would, it
often would not change the inlining decisions.

As I said in the PR, the other possibility I see is just in letting
compute_bb_predicates do the dataflow until it set's the predicates on all
basic blocks, and then perhaps have some counter for each bb how many times
it has changed in the following iterations and if it changes too many times,
just use true_predicate for it or something similar.

Or the following patch could be combined with some MAX_CLAUSES growth (say
from 8 to 16) to make it punt less often.

2014-02-05  Jakub Jelinek  ja...@redhat.com

PR ipa/60013
* ipa-inline-analysis.c (add_clause): Fix comment typos.  If
clause couldn't be added because there are already MAX_CLAUSES
clauses, turn p into a true predicate.
(and_predicates, or_predicates): If add_clause turned the
result into a true_predicate_p, break out of the loop nest.

* gcc.dg/pr60013.c: New test.

--- gcc/ipa-inline-analysis.c.jj2014-02-03 08:53:12.0 +0100
+++ gcc/ipa-inline-analysis.c   2014-02-05 08:01:57.093878954 +0100
@@ -310,7 +310,7 @@ add_clause (conditions conditions, struc
   if (false_predicate_p (p))
 return;
 
-  /* No one should be sily enough to add false into nontrivial clauses.  */
+  /* No one should be silly enough to add false into nontrivial clauses.  */
   gcc_checking_assert (!(clause  (1  predicate_false_condition)));
 
   /* Look where to insert the clause.  At the same time prune out
@@ -372,9 +372,13 @@ add_clause (conditions conditions, struc
 }
 
 
-  /* We run out of variants.  Be conservative in positive direction.  */
+  /* We run out of variants.  Conservatively turn clause into true
+ predicate.  */
   if (i2 == MAX_CLAUSES)
-return;
+{
+  p-clause[0] = 0;
+  return;
+}
   /* Keep clauses in decreasing order. This makes equivalence testing easy.  */
   p-clause[i2 + 1] = 0;
   if (insert_here = 0)
@@ -412,6 +416,8 @@ and_predicates (conditions conditions,
 {
   gcc_checking_assert (i  MAX_CLAUSES);
   add_clause (conditions, out, p2-clause[i]);
+  if (true_predicate_p (out))
+   break;
 }
   return out;
 }
@@ -459,6 +465,8 @@ or_predicates (conditions conditions,
   {
gcc_checking_assert (i  MAX_CLAUSES  j  MAX_CLAUSES);
add_clause (conditions, out, p-clause[i] | p2-clause[j]);
+   if (true_predicate_p (out))
+ return out;
   }
   return out;
 }
@@ -1035,7 +1043,7 @@ inline_node_removal_hook (struct cgraph_
   memset (info, 0, sizeof (inline_summary_t));
 }
 
-/* Remap predicate P of former function to be predicate of duplicated functoin.
+/* Remap predicate P of former function to be predicate of duplicated function.
POSSIBLE_TRUTHS is clause of possible truths in the duplicated node,
INFO is inline summary of the duplicated node.  */
 
--- gcc/testsuite/gcc.dg/pr60013.c.jj   2014-02-05 09:49:51.632142747 +0100
+++ gcc/testsuite/gcc.dg/pr60013.c  2014-02-05 09:49:14.0 +0100
@@ -0,0 +1,47 @@
+/* PR ipa/60013 */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+typedef long int jmp_buf[64];
+extern int _setjmp (jmp_buf) __attribute__ ((__nothrow__));
+struct S { int a, b, c; };
+extern struct S *baz (struct S *);
+static jmp_buf j;
+
+static inline int
+bar (int b, int d)
+{
+  return (b  d)  0;
+}
+
+struct S *
+foo (int a, struct S *b, struct S *c, struct S *d)
+{
+  if (b-a == 0)
+{
+  switch (a)
+   {
+   case 8:
+ return baz (b);
+   case 7:
+ bar (b-c, c-b);
+ return 0;
+   case 6:
+   case 5:
+   case 4:
+ return baz (c);
+   case 3:
+   case 2:
+ return baz (d);
+   }
+  return 0;
+}
+  if (b-a == 1)
+{
+  if (baz (c))
+   return c;
+  else if 

Re: [PATCH] Don't ICE on empty bbs with no successors in EH opt (PR middle-end/57499)

2014-02-05 Thread Richard Biener
On Wed, 5 Feb 2014, Jakub Jelinek wrote:

 Hi!
 
 As the testcase shows, invalid use of noreturn attribute (which we diagnose)
 can result in in empty bb with no successors.  This patch fixes
 cleanup_empty_eh not to ICE on these.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2014-02-04  Jakub Jelinek  ja...@redhat.com
 
   PR middle-end/57499
   * tree-eh.c (cleanup_empty_eh): Bail out on totally empty
   bb with no successors.
 
   * g++.dg/torture/pr57499.C: New test.
 
 --- gcc/tree-eh.c.jj  2014-01-31 22:22:09.0 +0100
 +++ gcc/tree-eh.c 2014-02-04 12:35:05.611066852 +0100
 @@ -4396,8 +4396,11 @@ cleanup_empty_eh (eh_landing_pad lp)
/* If the block is totally empty, look for more unsplitting cases.  */
if (gsi_end_p (gsi))
  {
 -  /* For the degenerate case of an infinite loop bail out.  */
 -  if (infinite_empty_loop_p (e_out))
 +  /* For the degenerate case of an infinite loop bail out.
 +  If bb has no successors and is totally empty, which can happen e.g.
 +  because of incorrect noreturn attribute, bail out too.  */
 +  if (e_out == NULL
 +   || infinite_empty_loop_p (e_out))
   return ret;
  
return ret | cleanup_empty_eh_unsplit (bb, e_out, lp);
 --- gcc/testsuite/g++.dg/torture/pr57499.C.jj 2014-02-04 12:52:10.241846755 
 +0100
 +++ gcc/testsuite/g++.dg/torture/pr57499.C2014-02-04 12:51:51.0 
 +0100
 @@ -0,0 +1,14 @@
 +// PR middle-end/57499
 +// { dg-do compile }
 +
 +struct S
 +{
 +  ~S () __attribute__ ((noreturn)) {} // { dg-warning function does return 
 }
 +};
 +
 +void
 +foo ()
 +{
 +  S s;
 +  throw 1;
 +}
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)

2014-02-05 Thread Richard Biener
On Wed, 5 Feb 2014, Jakub Jelinek wrote:

 Hi!
 
 Using !!optimize to determine if we should switch local ABI to regparm
 convention isn't compatible with optimize attribute, as !!optimize is
 whether the current function is being optimized, but for the ABI decisions
 we actually need the caller and callee to agree on the calling convention.
 
 Fixed by looking at callee's optimize settings all the time.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Seeing a 2nd variant of such code I'd rather have an abstraction
for this ... optimize_fn ()?  opt_for_fn (fn, OPT_...)?

Richard.

 2014-02-05  Jakub Jelinek  ja...@redhat.com
 
   PR target/60062
   * config/i386/i386.c (ix86_function_regparm): Use optimize
   from the callee instead of current function's optimize
   to determine if local regparm convention should be used.
 
   * gcc.c-torture/execute/pr60062.c: New test.
   * gcc.c-torture/execute/pr60072.c: New test.
 
 --- gcc/config/i386/i386.c.jj 2014-02-04 01:36:00.0 +0100
 +++ gcc/config/i386/i386.c2014-02-05 09:09:29.603827877 +0100
 @@ -5608,11 +5608,22 @@ ix86_function_regparm (const_tree type,
/* Use register calling convention for local functions when possible.  */
if (decl
 TREE_CODE (decl) == FUNCTION_DECL
 -   optimize
 !(profile_flag  !flag_fentry))
  {
/* FIXME: remove this CONST_CAST when cgraph.[ch] is constified.  */
struct cgraph_local_info *i = cgraph_local_info (CONST_CAST_TREE 
 (decl));
 +
 +  /* Caller and callee must agree on the calling convention, so
 +  checking here just optimize means that with
 +  __attribute__((optimize (...))) caller could use regparm convention
 +  and callee not, or vice versa.  Instead look at whether the callee
 +  is optimized or not.  */
 +  tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl);
 +  if (opts == NULL_TREE)
 + opts = optimization_default_node;
 +  if (!TREE_OPTIMIZATION (opts)-x_optimize)
 + i = NULL;
 +
if (i  i-local  i-can_change_signature)
   {
 int local_regparm, globals = 0, regno;
 --- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj  2014-02-05 
 09:02:13.703085235 +0100
 +++ gcc/testsuite/gcc.c-torture/execute/pr60062.c 2014-02-05 
 09:01:54.0 +0100
 @@ -0,0 +1,25 @@
 +/* PR target/60062 */
 +
 +int a;
 +
 +static void
 +foo (const char *p1, int p2)
 +{
 +  if (__builtin_strcmp (p1, hello) != 0)
 +__builtin_abort ();
 +}
 +
 +static void
 +bar (const char *p1)
 +{
 +  if (__builtin_strcmp (p1, hello) != 0)
 +__builtin_abort ();
 +}
 +
 +__attribute__((optimize (0))) int
 +main ()
 +{
 +  foo (hello, a);
 +  bar (hello);
 +  return 0;
 +}
 --- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj  2013-08-25 
 18:20:55.717911035 +0200
 +++ gcc/testsuite/gcc.c-torture/execute/pr60072.c 2014-02-05 
 11:33:07.501748426 +0100
 @@ -0,0 +1,16 @@
 +/* PR target/60072 */
 +
 +int c = 1;
 +
 +__attribute__ ((optimize (1)))
 +static int *foo (int *p)
 +{
 +  return p;
 +}
 +
 +int
 +main ()
 +{
 +  *foo (c) = 2;
 +  return c - 2;
 +}
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: [ARM Documentation] Clarify -mcpu, -mtune, -march

2014-02-05 Thread James Greenhalgh
On Wed, Feb 05, 2014 at 11:02:42AM +, Ramana Radhakrishnan wrote:
 On 01/27/14 10:01, James Greenhalgh wrote:
 
  Hi,
 
  I've tripped myself over with these three options too many times,
  actually, their behaviour is very simple.
 
  This patch clarifies the language used to describe the options, and
  puts them in a logical order. I'm happy to reword again if this
  is still not clear.
 
  OK?
 
  Thanks,
  James
 
  ---
  gcc/
 
  2014-01-27  James Greenhalgh  james.greenha...@arm.com
 
  PR target/59718
  * doc/invoke.texi (-march=): Clarify documentation for ARM.
  (-mtune=): Likewise.
  (-mcpu=): Likewise.
 
 
 
snip
 Ok with those changes.

Thanks,

I've attached what I ended up committing.

As far as I know the behaviour of this flag has always been this way.  So is
this also OK to backport to release branches? 

Thanks,
James

From b72249d400685a3c1a2e7eee5ad21db86006d34c Mon Sep 17 00:00:00 2001
From: James Greenhalgh james.greenha...@arm.com
Date: Wed, 8 Jan 2014 12:26:12 +
Subject: [ARM Documentation] Clarify -mcpu, -mtune, -march
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=1.8.3-rc0

This is a multi-part message in MIME format.
--1.8.3-rc0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


Hi,

I've tripped myself over with these three options too many times,
actually, their behaviour is very simple.

This patch clarifies the language used to describe the options, and
puts them in a logical order. I'm happy to reword again if this
is still not clear.

OK?

Thanks,
James

---
gcc/

2014-02-05  James Greenhalgh  james.greenha...@arm.com

	PR target/59718
	* doc/invoke.texi (-march=): Clarify documentation for ARM.
	(-mtune=): Likewise.
	(-mcpu=): Likewise.


--1.8.3-rc0
Content-Type: text/x-patch; name=0001-ARM-Documentation-Clarify-mcpu-mtune-march.patch
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename=0001-ARM-Documentation-Clarify-mcpu-mtune-march.patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 640c123..e3dc9df 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12221,11 +12221,38 @@ option should only be used if you require compatibility with code for
 big-endian ARM processors generated by versions of the compiler prior to
 2.8.  This option is now deprecated.
 
-@item -mcpu=@var{name}
-@opindex mcpu
-This specifies the name of the target ARM processor.  GCC uses this name
-to determine what kind of instructions it can emit when generating
-assembly code.  Permissible names are: @samp{arm2}, @samp{arm250},
+@item -march=@var{name}
+@opindex march
+This specifies the name of the target ARM architecture.  GCC uses this
+name to determine what kind of instructions it can emit when generating
+assembly code.  This option can be used in conjunction with or instead
+of the @option{-mcpu=} option.  Permissible names are: @samp{armv2},
+@samp{armv2a}, @samp{armv3}, @samp{armv3m}, @samp{armv4}, @samp{armv4t},
+@samp{armv5}, @samp{armv5t}, @samp{armv5e}, @samp{armv5te},
+@samp{armv6}, @samp{armv6j},
+@samp{armv6t2}, @samp{armv6z}, @samp{armv6zk}, @samp{armv6-m},
+@samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7ve},
+@samp{armv8-a}, @samp{armv8-a+crc},
+@samp{iwmmxt}, @samp{iwmmxt2}, @samp{ep9312}.
+
+@option{-march=armv7ve} is the armv7-a architecture with virtualization
+extensions.
+
+@option{-march=armv8-a+crc} enables code generation for the ARMv8-A
+architecture together with the optional CRC32 extensions.
+
+@option{-march=native} causes the compiler to auto-detect the architecture
+of the build computer.  At present, this feature is only supported on
+Linux, and not all architectures are recognized.  If the auto-detect is
+unsuccessful the option has no effect.
+
+@item -mtune=@var{name}
+@opindex mtune
+This option specifies the name of the target ARM processor for
+which GCC should tune the performance of the code.
+For some ARM implementations better performance can be obtained by using
+this option.
+Permissible names are: @samp{arm2}, @samp{arm250},
 @samp{arm3}, @samp{arm6}, @samp{arm60}, @samp{arm600}, @samp{arm610},
 @samp{arm620}, @samp{arm7}, @samp{arm7m}, @samp{arm7d}, @samp{arm7dm},
 @samp{arm7di}, @samp{arm7dmi}, @samp{arm70}, @samp{arm700},
@@ -12259,26 +12286,6 @@ Additionally, this option can specify that GCC should tune the performance
 of the code for a big.LITTLE system.  Permissible names are:
 @samp{cortex-a15.cortex-a7}, @samp{cortex-a57.cortex-a53}.
 
-@option{-mcpu=generic-@var{arch}} is also permissible, and is
-equivalent to @option{-march=@var{arch} -mtune=generic-@var{arch}}.
-See @option{-mtune} for more information.
-
-@option{-mcpu=native} causes the compiler to auto-detect the CPU
-of the build computer.  At present, this feature is only supported on
-Linux, and not all architectures are recognized.  If the auto-detect is
-unsuccessful the option has no effect.
-
-@item 

Re: [ARM Documentation] Clarify -mcpu, -mtune, -march

2014-02-05 Thread Ramana Radhakrishnan



I've attached what I ended up committing.

As far as I know the behaviour of this flag has always been this way.  So is
this also OK to backport to release branches?


Ok by me. It's a documentation fix to make things more explicit.

And s/=/ again in Changelog :) .

Ramana



Thanks,
James




--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.



[RS6000] SDmode and reload

2014-02-05 Thread Alan Modra
This little patch fixes a number of dfp failures:

Fail: c-c++-common/dfp/func-vararg-alternate-d32.c execution test
FAIL: c-c++-common/dfp/func-vararg-alternate-d32.c -std=c++11 execution test
FAIL: c-c++-common/dfp/func-vararg-alternate-d32.c -std=c++98 execution test
FAIL: c-c++-common/dfp/func-vararg-dfp.c execution test
FAIL: c-c++-common/dfp/func-vararg-dfp.c -std=c++11 execution test
FAIL: c-c++-common/dfp/func-vararg-dfp.c -std=c++98 execution test
FAIL: c-c++-common/dfp/func-vararg-mixed.c execution test
FAIL: c-c++-common/dfp/func-vararg-mixed.c -std=c++11 execution test
FAIL: c-c++-common/dfp/func-vararg-mixed.c -std=c++98 execution test
FAIL: decimal/pr54036-1.cc (test for excess errors)
FAIL: gcc.dg/compat/scalar-by-value-dfp c_compat_x_tst.o-c_compat_y_tst.o 
execute 
FAIL: gcc.dg/compat/scalar-return-dfp c_compat_x_tst.o-c_compat_y_tst.o execute 
FAIL: gcc.target/powerpc/ppc32-abi-dfp-1.c execution test
FAIL: g++.dg/compat/decimal/pass-1 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/pass-2 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/pass-3 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/pass-4 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/pass-5 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/pass-6 cp_compat_x_tst.o-cp_compat_y_tst.o execute 
FAIL: g++.dg/compat/decimal/return-1 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 
FAIL: g++.dg/compat/decimal/return-2 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 
FAIL: g++.dg/compat/decimal/return-3 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 
FAIL: g++.dg/compat/decimal/return-4 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 
FAIL: g++.dg/compat/decimal/return-5 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 
FAIL: g++.dg/compat/decimal/return-6 cp_compat_x_tst.o-cp_compat_y_tst.o 
execute 

Vlad added rs6000_secondary_memory_needed_mode for lra, and in so
doing broke reload's use of cfun-machine-sdmode_stack_slot in
rs6000_secondary_memory_needed_rtx (mode was no longer SDmode there).
Bootstrapped and regression tested powerpc64-linux, both with and
without -mlra.  OK to commit?

PR target/60032
* config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only
change SDmode to DDmode when lra_in_progress.

* gcc.target/powerpc/pr60032.c: New.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 207417)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -15581,7 +15581,7 @@
 enum machine_mode
 rs6000_secondary_memory_needed_mode (enum machine_mode mode)
 {
-  if (mode == SDmode)
+  if (lra_in_progress  mode == SDmode)
 return DDmode;
   return mode;
 }
Index: gcc/testsuite/gcc.target/powerpc/pr60032.c
===
--- gcc/testsuite/gcc.target/powerpc/pr60032.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr60032.c  (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target dfp } */
+/* { dg-options -O2 } */
+
+void foo (void)
+{
+  register float __attribute__ ((mode(SD))) r31 __asm__ (r31);
+  register float __attribute__ ((mode(SD))) fr1 __asm__ (fr1);
+
+  __asm__ (# : =d (fr1));
+  r31 = fr1;
+  __asm__ (# : : r (r31));
+}

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] Fix loop preserving in loop header copying

2014-02-05 Thread Richard Biener

There's a simple mistake in gimple_duplicate_sese_region that
causes all loop-header copied loops to be thrown away (but soon
re-discovered, hopefully without somebody inbetween messing things up).

Fixed like the following, bootstrapped on x86_64-unknown-linux-gnu,
testing in progress.

Richard.

2014-02-05  Richard Biener  rguent...@suse.de

* tree-cfg.c (gimple_duplicate_sese_region): Fix ordering of
set_loop_copy and initialize_original_copy_tables.

Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 207497)
--- gcc/tree-cfg.c  (working copy)
*** gimple_duplicate_sese_region (edge entry
*** 5879,5892 
return false;
  }
  
-   set_loop_copy (loop, loop);
- 
/* In case the function is used for loop header copying (which is the 
primary
   use), ensure that EXIT and its copy will be new latch and entry edges.  
*/
if (loop-header == entry-dest)
  {
copying_header = true;
-   set_loop_copy (loop, loop_outer (loop));
  
if (!dominated_by_p (CDI_DOMINATORS, loop-latch, exit-src))
return false;
--- 5905,5915 
*** gimple_duplicate_sese_region (edge entry
*** 5897,5910 
  return false;
  }
  
if (!region_copy)
  {
region_copy = XNEWVEC (basic_block, n_region);
free_region_copy = true;
  }
  
-   initialize_original_copy_tables ();
- 
/* Record blocks outside the region that are dominated by something
   inside.  */
if (update_dominance)
--- 5920,5938 
  return false;
  }
  
+   initialize_original_copy_tables ();
+ 
+   if (copying_header)
+ set_loop_copy (loop, loop_outer (loop));
+   else
+ set_loop_copy (loop, loop);
+ 
if (!region_copy)
  {
region_copy = XNEWVEC (basic_block, n_region);
free_region_copy = true;
  }
  
/* Record blocks outside the region that are dominated by something
   inside.  */
if (update_dominance)


Re: [ARM][PATCH] Vectorizer generates unaligned access when -mno-unaligned-access is enabled

2014-02-05 Thread Yury Gribov

Ramana wrote:

The error is caused by unaligned vector load via two vldr instructions:


So, does this mean that we get unaligned vector loads
when munaligned-access is on, using vldr instructions ?


No, when -munaligned-access is on, the backend will enable movmisalign 
patterns that generate friendly vst1/vld1 instructions (well, at least 
on little-endian ARMs).



Cause for this error is, even when -mno-unaligned-access is enabled,
backend will inform vectorizer that it supports misaligned accesses
via TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.


Ramana wrote:

Attached patch fixes this. Is this OK for trunk?


How has it been tested ?


I was hoping Linaro people could run their magical cbuild on it...

-Y


Re: [C,C++] integer constants in attribute arguments

2014-02-05 Thread Andreas Schwab
Marc Glisse marc.gli...@inria.fr writes:

 On Wed, 5 Feb 2014, Andreas Schwab wrote:

 Wrt. the function decl it should probably just be changed to a variable
 decl.  Alignments on function decls are kind of exotic.

 Good idea. How about this patch then?

LGTM.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: [PATCH, rs6000] Implement -maltivec=be for vec_pack, vec_unpackh, vec_unpackl Altivec builtins

2014-02-05 Thread David Edelsohn
On Tue, Feb 4, 2014 at 2:13 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:

 Yet another -maltivec=be patch, this one for the vector pack/unpack
 builtins.  For the pack operations, we need to reverse the order of the
 inputs for little endian without -maltivec=be.  For the unpack
 operations, we need to replace unpackh with unpackl, and vice versa, for
 little endian without -maltivec=be.

 For both pack and unpack, there are some internal uses that should not
 have their semantics changed from the existing implementation.  For
 these, _direct forms of the insns are provided, as has been done in
 previous patches in this series.

 Four new test cases are added to test these builtins for all applicable
 vector types, with and without -maltivec=be.

 Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
 regressions.  Is this ok for trunk?

 Thanks,
 Bill


 gcc:

 2014-02-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * altivec.md (UNSPEC_VPACK_UNS_UNS_MOD_DIRECT): New unspec.
 (UNSPEC_VUNPACK_HI_SIGN_DIRECT): Likewise.
 (UNSPEC_VUNPACK_LO_SIGN_DIRECT): Likewise.
 (mulv8hi3): Use gen_altivec_vpkuwum_direct instead of
 gen_altivec_vpkuwum.
 (altivec_vpkpx): Test for VECTOR_ELT_ORDER_BIG instead of for
 BYTES_BIG_ENDIAN.
 (altivec_vpksVI_charss): Likewise.
 (altivec_vpksVI_charus): Likewise.
 (altivec_vpkuVI_charus): Likewise.
 (altivec_vpkuVI_charum): Likewise.
 (altivec_vpkuVI_charum_direct): New (copy of
 altivec_vpkuVI_charum that still relies on BYTES_BIG_ENDIAN, for
 internal use).
 (altivec_vupkhsVU_char): Emit vupkls* instead of vupkhs* when
 target is little endian and -maltivec=be is not specified.
 (*altivec_vupkhsVU_char_direct): New (copy of
 altivec_vupkhsVU_char that always emits vupkhs*, for internal
 use).
 (altivec_vupklsVU_char): Emit vupkhs* instead of vupkls* when
 target is little endian and -maltivec=be is not specified.
 (*altivec_vupklsVU_char_direct): New (copy of
 altivec_vupklsVU_char that always emits vupkls*, for internal
 use).
 (altivec_vupkhpx): Emit vupklpx instead of vupkhpx when target is
 little endian and -maltivec=be is not specified.
 (altivec_vupklpx): Emit vupkhpx instead of vupklpx when target is
 little endian and -maltivec=be is not specified.

 gcc/testsuite:

 2014-02-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * gcc.dg/vmx/pack.c: New.
 * gcc.dg/vmx/pack-be-order.c: New.
 * gcc.dg/vmx/unpack.c: New.
 * gcc.dg/vmx/unpack-be-order.c: New.

Okay.

Thanks, David


Re: [ARM][PATCH] Vectorizer generates unaligned access when -mno-unaligned-access is enabled

2014-02-05 Thread Ramana Radhakrishnan





Ramana wrote:

Attached patch fixes this. Is this OK for trunk?


How has it been tested ?


I was hoping Linaro people could run their magical cbuild on it...


Ok if no regressions.

regards
Ramana



-Y




--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.



[PATCH] FIx PR60076

2014-02-05 Thread Richard Biener

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-02-05  Richard Biener  rguent...@suse.de

PR testsuite/60076
* gcc.dg/vect/pr60012.c: Require vect_extract_even_odd and
avoid using unsigned long long.

Index: gcc/testsuite/gcc.dg/vect/pr60012.c
===
--- gcc/testsuite/gcc.dg/vect/pr60012.c (revision 207504)
+++ gcc/testsuite/gcc.dg/vect/pr60012.c (working copy)
@@ -8,14 +8,14 @@ typedef struct
 } complex16_t;
 
 void
-libvector_AccSquareNorm_ref (unsigned long long  *acc,
+libvector_AccSquareNorm_ref (unsigned int *acc,
 const complex16_t *x, unsigned len)
 {
   unsigned i;
   for (i = 0; i  len; i++)
-acc[i] += ((unsigned long long)((int)x[i].real * x[i].real))
-   + ((unsigned long long)((int)x[i].imag * x[i].imag));
+acc[i] += ((unsigned int)((int)x[i].real * x[i].real))
+   + ((unsigned int)((int)x[i].imag * x[i].imag));
 }
 
-/* { dg-final { scan-tree-dump LOOP VECTORIZED vect } } */
+/* { dg-final { scan-tree-dump LOOP VECTORIZED vect { target { 
vect_extract_even_odd } } } } */
 /* { dg-final { cleanup-tree-dump vect } } */


Re: [PATCH, rs6000] Handle -maltivec=be for vec_sum2s builtin (last -maltivec=be patch)

2014-02-05 Thread David Edelsohn
On Tue, Feb 4, 2014 at 10:15 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 One final patch in the series, this one for vec_sum2s.  This builtin
 requires some additional code generation for the case of little endian
 without -maltivec=be.  Here's an example:

   va = {-10,1,2,3};0x 0003 0002 0001 fff6
   vb = {100,101,102,-103}; 0x ff99 0066 0065 0064
   vc = vec_sum2s (va, vb); 0x ff9e  005c 
   = {0,92,0,-98};

 We need to add -10 + 1 + 101 = 92 and place it in vc[1], and add 2 + 3 +
 -103 and place the result in vc[3], with zeroes in the other two
 elements.  To do this, we first use vsldoi vs,vb,vb,12 to rotate 101
 and -103 into big-endian elements 1 and 3, as required by the vsum2sws
 instruction:

   0x ff99 0066 0065 0064 ff99 0066 0065 0064
    
   vs =  0064 ff99 0066 0065

 Executing vsum2sws vs,va,vs then gives

   vs = 0x  ff9e  005c

 which then must be shifted into position with vsldoi vc,vs,vs,4

   0x  ff9e  005c  ff9e  005c
      
  vc = ff9e  005c 

 which is the desired result.

 In addition to this change, I noticed a redundant test from one of my
 previous patches and simplified it.  (BYTES_BIG_ENDIAN implies
 VECTOR_ELT_ORDER_BIG, so we don't need to test BYTES_BIG_ENDIAN.)

 As usual, new test cases are added to cover the possible cases.  These
 are simpler this time since only vector signed integer is a legal type
 for vec_sum2s.

 Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no
 regressions.  Is this ok for trunk?

 Thanks,
 Bill


 gcc:

 2014-02-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/altivec.md (altivec_vsum2sws): Adjust code
 generation for -maltivec=be.
 (altivec_vsumsws): Simplify redundant test.

 gcc/testsuite:

 2014-02-04  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * gcc.dg/vmx/sum2s.c: New.
 * gcc.dg/vmx/sum2s-be-order.c: New.

Okay.

The multi-instruction sequences really should be emitted as separate
instructions and the scratch only allocated for the LE case.

Thanks, David


Re: [PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062)

2014-02-05 Thread Jan Hubicka
 On Wed, 5 Feb 2014, Jakub Jelinek wrote:
 
  Hi!
  
  Using !!optimize to determine if we should switch local ABI to regparm
  convention isn't compatible with optimize attribute, as !!optimize is
  whether the current function is being optimized, but for the ABI decisions
  we actually need the caller and callee to agree on the calling convention.
  
  Fixed by looking at callee's optimize settings all the time.
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 Seeing a 2nd variant of such code I'd rather have an abstraction
 for this ... optimize_fn ()?  opt_for_fn (fn, OPT_...)?

Propbable opt_for_fn.  I wanted to implement SSE passing conventions for local
functions but didn't get to it since I was concerned about -fno-sse being
passed to teh callee by target attribute.  Having API to check those would be 
nice.

Honza


Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060

2014-02-05 Thread Jan Hubicka
 
 When looking at PR60060, outputting the declaration debug info for
 a local static variable twice (once via BLOCK_VARS and once via
 calling the debug hook on all globals) I noticed that the
 rest_of_decl_compilation call in materialize_cgraph always works
 on the empty lto_global_var_decls vector (we only populate it
 later).  That results in the opportunity to effectively remove it
 and in lto_write_globals walk over the defined vars in the
 varpool instead.
 
 Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN
 decls there (but I'm not sure of other side-effects of that ...).
 
 Thus, this cleanup first.
 
 LTO bootstrap / regtest running on x86_64-unknown-linux-gnu.
 
 Ok?

Seems OK to me.  I always had an impression that this machinery exists to
make decls that are revmoed fro symbol table (optimized out) to land the
debug info, but I think this is mostly broken by partitioning anyway, right?
Do we stream them at all?

Honza
 
 Thanks,
 Richard.
 
 2014-02-05  Richard Biener  rguent...@suse.de
 
   lto/
   * lto.h (lto_global_var_decls): Remove.
   * lto-lang.c (lto_init): Do not allocate lto_global_var_decls.
   (lto_write_globals): Do nothing in WPA stage, gather globals from
   the varpool here ...
   * lto.c (lto_main): ... not here.
   (materialize_cgraph): Do not call rest_of_decl_compilation
   on the empty lto_global_var_decls vector.
   (lto_global_var_decls): Remove.
 
 Index: gcc/lto/lto-lang.c
 ===
 *** gcc/lto/lto-lang.c(revision 207497)
 --- gcc/lto/lto-lang.c(working copy)
 *** lto_getdecls (void)
 *** 1075,1085 
   static void
   lto_write_globals (void)
   {
 !   tree *vec = lto_global_var_decls-address ();
 !   int len = lto_global_var_decls-length ();
 wrapup_global_declarations (vec, len);
 emit_debug_global_declarations (vec, len);
 !   vec_free (lto_global_var_decls);
   }
   
   static tree
 --- 1075,1094 
   static void
   lto_write_globals (void)
   {
 !   if (flag_wpa)
 ! return;
 ! 
 !   /* Record the global variables.  */
 !   vectree lto_global_var_decls = vNULL;
 !   varpool_node *vnode;
 !   FOR_EACH_DEFINED_VARIABLE (vnode)
 ! lto_global_var_decls.safe_push (vnode-decl);
 ! 
 !   tree *vec = lto_global_var_decls.address ();
 !   int len = lto_global_var_decls.length ();
 wrapup_global_declarations (vec, len);
 emit_debug_global_declarations (vec, len);
 !   lto_global_var_decls.release ();
   }
   
   static tree
 *** lto_init (void)
 *** 1218,1224 
   #undef NAME_TYPE
   
 /* Initialize LTO-specific data structures.  */
 -   vec_alloc (lto_global_var_decls, 256);
 in_lto_p = true;
   
 return true;
 --- 1227,1232 
 Index: gcc/lto/lto.c
 ===
 *** gcc/lto/lto.c (revision 207497)
 --- gcc/lto/lto.c (working copy)
 *** along with GCC; see the file COPYING3.
 *** 50,57 
   #include context.h
   #include pass_manager.h
   
 - /* Vector to keep track of external variables we've seen so far.  */
 - vectree, va_gc *lto_global_var_decls;
   
   static GTY(()) tree first_personality_decl;
   
 --- 50,55 
 *** read_cgraph_and_symbols (unsigned nfiles
 *** 3009,3017 
   static void
   materialize_cgraph (void)
   {
 -   tree decl;
 struct cgraph_node *node; 
 -   unsigned i;
 timevar_id_t lto_timer;
   
 if (!quiet_flag)
 --- 3007,3013 
 *** materialize_cgraph (void)
 *** 3043,3052 
 current_function_decl = NULL;
 set_cfun (NULL);
   
 -   /* Inform the middle end about the global variables we have seen.  */
 -   FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl)
 - rest_of_decl_compilation (decl, 1, 0);
 - 
 if (!quiet_flag)
   fprintf (stderr, \n);
   
 --- 3039,3044 
 *** lto_main (void)
 *** 3309,3316 
   do_whole_program_analysis ();
 else
   {
 -   varpool_node *vnode;
 - 
 timevar_start (TV_PHASE_OPT_GEN);
   
 materialize_cgraph ();
 --- 3301,3306 
 *** lto_main (void)
 *** 3330,3339 
this.  */
 if (flag_lto_report || (flag_wpa  flag_lto_report_wpa))
   print_lto_report_1 ();
 - 
 -   /* Record the global variables.  */
 -   FOR_EACH_DEFINED_VARIABLE (vnode)
 - vec_safe_push (lto_global_var_decls, vnode-decl);
   }
   }
   
 --- 3320,3325 
 Index: gcc/lto/lto.h
 ===
 *** gcc/lto/lto.h (revision 207497)
 --- gcc/lto/lto.h (working copy)
 *** extern tree lto_eh_personality (void);
 *** 41,49 
   extern void lto_main (void);
   extern void lto_read_all_file_options (void);
   
 - /* In lto-symtab.c  */
 - extern GTY(()) vectree, va_gc *lto_global_var_decls;
 - 
   /* In lto-elf.c or lto-coff.c  */
   

Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060

2014-02-05 Thread Richard Biener
On Wed, 5 Feb 2014, Jan Hubicka wrote:

  
  When looking at PR60060, outputting the declaration debug info for
  a local static variable twice (once via BLOCK_VARS and once via
  calling the debug hook on all globals) I noticed that the
  rest_of_decl_compilation call in materialize_cgraph always works
  on the empty lto_global_var_decls vector (we only populate it
  later).  That results in the opportunity to effectively remove it
  and in lto_write_globals walk over the defined vars in the
  varpool instead.
  
  Eventually the fix for PR60060 is to not push TREE_ASM_WRITTEN
  decls there (but I'm not sure of other side-effects of that ...).
  
  Thus, this cleanup first.
  
  LTO bootstrap / regtest running on x86_64-unknown-linux-gnu.
  
  Ok?
 
 Seems OK to me.  I always had an impression that this machinery exists to
 make decls that are revmoed fro symbol table (optimized out) to land the
 debug info, but I think this is mostly broken by partitioning anyway, right?
 Do we stream them at all?

No, we don't stream those.  We'd need to stream what the FEs think
are the globals (lang_hooks.getdecls()) and somehow partition them
and ship them somewhere.

Richard.
 
 Honza
  
  Thanks,
  Richard.
  
  2014-02-05  Richard Biener  rguent...@suse.de
  
  lto/
  * lto.h (lto_global_var_decls): Remove.
  * lto-lang.c (lto_init): Do not allocate lto_global_var_decls.
  (lto_write_globals): Do nothing in WPA stage, gather globals from
  the varpool here ...
  * lto.c (lto_main): ... not here.
  (materialize_cgraph): Do not call rest_of_decl_compilation
  on the empty lto_global_var_decls vector.
  (lto_global_var_decls): Remove.
  
  Index: gcc/lto/lto-lang.c
  ===
  *** gcc/lto/lto-lang.c  (revision 207497)
  --- gcc/lto/lto-lang.c  (working copy)
  *** lto_getdecls (void)
  *** 1075,1085 
static void
lto_write_globals (void)
{
  !   tree *vec = lto_global_var_decls-address ();
  !   int len = lto_global_var_decls-length ();
  wrapup_global_declarations (vec, len);
  emit_debug_global_declarations (vec, len);
  !   vec_free (lto_global_var_decls);
}

static tree
  --- 1075,1094 
static void
lto_write_globals (void)
{
  !   if (flag_wpa)
  ! return;
  ! 
  !   /* Record the global variables.  */
  !   vectree lto_global_var_decls = vNULL;
  !   varpool_node *vnode;
  !   FOR_EACH_DEFINED_VARIABLE (vnode)
  ! lto_global_var_decls.safe_push (vnode-decl);
  ! 
  !   tree *vec = lto_global_var_decls.address ();
  !   int len = lto_global_var_decls.length ();
  wrapup_global_declarations (vec, len);
  emit_debug_global_declarations (vec, len);
  !   lto_global_var_decls.release ();
}

static tree
  *** lto_init (void)
  *** 1218,1224 
#undef NAME_TYPE

  /* Initialize LTO-specific data structures.  */
  -   vec_alloc (lto_global_var_decls, 256);
  in_lto_p = true;

  return true;
  --- 1227,1232 
  Index: gcc/lto/lto.c
  ===
  *** gcc/lto/lto.c   (revision 207497)
  --- gcc/lto/lto.c   (working copy)
  *** along with GCC; see the file COPYING3.
  *** 50,57 
#include context.h
#include pass_manager.h

  - /* Vector to keep track of external variables we've seen so far.  */
  - vectree, va_gc *lto_global_var_decls;

static GTY(()) tree first_personality_decl;

  --- 50,55 
  *** read_cgraph_and_symbols (unsigned nfiles
  *** 3009,3017 
static void
materialize_cgraph (void)
{
  -   tree decl;
  struct cgraph_node *node; 
  -   unsigned i;
  timevar_id_t lto_timer;

  if (!quiet_flag)
  --- 3007,3013 
  *** materialize_cgraph (void)
  *** 3043,3052 
  current_function_decl = NULL;
  set_cfun (NULL);

  -   /* Inform the middle end about the global variables we have seen.  */
  -   FOR_EACH_VEC_ELT (*lto_global_var_decls, i, decl)
  - rest_of_decl_compilation (decl, 1, 0);
  - 
  if (!quiet_flag)
fprintf (stderr, \n);

  --- 3039,3044 
  *** lto_main (void)
  *** 3309,3316 
  do_whole_program_analysis ();
  else
  {
  - varpool_node *vnode;
  - 
timevar_start (TV_PHASE_OPT_GEN);

materialize_cgraph ();
  --- 3301,3306 
  *** lto_main (void)
  *** 3330,3339 
   this.  */
if (flag_lto_report || (flag_wpa  flag_lto_report_wpa))
  print_lto_report_1 ();
  - 
  - /* Record the global variables.  */
  - FOR_EACH_DEFINED_VARIABLE (vnode)
  -   vec_safe_push (lto_global_var_decls, vnode-decl);
  }
}

  --- 3320,3325 
  Index: gcc/lto/lto.h
  ===
  *** gcc/lto/lto.h   (revision 207497)
  --- 

Re: PR ipa/59831 (ipa-cp devirt issues)

2014-02-05 Thread Jan Hubicka
  
 
 I think it would be better, yes.  IIRC, We want to re-organize
 indirect_info anyway (I vaguely remember we want to split the
 overloaded offset field into two but forgot the exact reason why but I
 have it written somewhere), I suppose we'll be turning it into a union
 (or class hierarchy?) and this would make it slightly awkward.  If we
 ever support the pointer-by-reference scenario, quite a few more
 places will need to be adjusted anyway.

OK, I will remove the check. I wanted to split polymorphic call context
and its propagation from aggregate analysis.  For example

struct A
{
  struct B b;
  struct C c;
}

when you call method of A.c.foo() its context is not
{outer_type=a,offset=offsetof(c)}, since A is not polymorphic type at all. We
should instead use {outer_type=c,offset=0,not_derived_type}.  In
ipa-devirt there is function get_class_context that gets you from first to
second.

So as briefly discussed last july, I think ipa-prop can simply do two
propagations in parallel where the type one is done via functionality that
willbe exported from ipa-devirt, since the logic should be shared with
a local devirt pass.

There are also cases where the parameter is KNOWN_TYPE but also PASS_THROUGH at
the same type (as tings passed by invisible refernece, or when the function is
constructor and it initialized dynamic type of THIS pointer).

Those are cases where current code is losing information because one is not
supperset of the other.  We want to use KNOWN_TYPE information for 
devirtualization,
but we do not want to forget about PASS_THROUGH for normal constant propagation 
in
case constructor is only called on decl.

Honza
 
 Thanks,
 
 Martin


Re: [RS6000] SDmode and reload

2014-02-05 Thread David Edelsohn
On Wed, Feb 5, 2014 at 8:27 AM, Alan Modra amo...@gmail.com wrote:

 Vlad added rs6000_secondary_memory_needed_mode for lra, and in so
 doing broke reload's use of cfun-machine-sdmode_stack_slot in
 rs6000_secondary_memory_needed_rtx (mode was no longer SDmode there).
 Bootstrapped and regression tested powerpc64-linux, both with and
 without -mlra.  OK to commit?

 PR target/60032
 * config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only
 change SDmode to DDmode when lra_in_progress.

 * gcc.target/powerpc/pr60032.c: New.

So Vlad added the definition to the rs6000 port, but it only is needed
by LRA and does not interact correctly with classic reload.

Okay.

- David


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Richard Sandiford
Jakub Jelinek ja...@redhat.com writes:
 On Tue, Feb 04, 2014 at 12:21:14PM +, Richard Sandiford wrote:
 Jakub Jelinek ja...@redhat.com writes:
  But then we wouldn't be able to use var-tracking when __builtin_eh_return
  is used, since in that case replacing the (set (reg ...) (mem ...))
  with a (plus ...) would be incorrect -- the value we're loading from the
  stack will have had a variable adjustment applied.  And I know from 
  painful
  experience that being able to debug the unwind code is very useful. :-)
 
  Aren't functions using EH_RETURN typically using frame pointer?
 
 Sorry, forgot to respond to this bit.  On s390, _Unwind_RaiseException and
 _Unwind_ForcedUnwind don't use a frame pointer, at least not on trunk.
 %r11 is used as a general scratch register instead.  E.g.:
 
 2ba8 _Unwind_ForcedUnwind:
 2ba8:   90 6f f0 18 stm %r6,%r15,24(%r15)
 2bac:   0d d0   basr%r13,%r0
 2bae:   60 40 f0 50 std %f4,80(%r15)
 2bb2:   60 60 f0 58 std %f6,88(%r15)
 2bb6:   a7 fa fd f8 ahi %r15,-520
 2bba:   58 c0 d0 9e l   %r12,158(%r13)
 2bbe:   58 10 d0 9a l   %r1,154(%r13)
 2bc2:   18 b2   lr  %r11,%r2
 ...
 2c10:   98 6f f2 20 lm  %r6,%r15,544(%r15)
 2c14:   07 f4   br  %r4

 Looking at pr54693-2.c -O2 -g -m64 -march=z196 (was that the one you meant)
 with your patches, I see:
 (insn/f:TI 122 30 31 4 (parallel [
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 80 [0x50])) [3  S8 A8])
 (reg:DI 10 %r10))
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 88 [0x58])) [3  S8 A8])
 (reg:DI 11 %r11))
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 96 [0x60])) [3  S8 A8])
 (reg:DI 12 %r12))
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 104 [0x68])) [3  S8 A8])
 (reg:DI 13 %r13))
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 112 [0x70])) [3  S8 A8])
 (reg:DI 14 %r14))
 (set/f (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 120 [0x78])) [3  S8 A8])
 (reg/f:DI 15 %r15))
 ]) pr54693-2.c:16 94 {*store_multiple_di}
  (expr_list:REG_DEAD (reg:DI 14 %r14)
 (expr_list:REG_DEAD (reg:DI 13 %r13)
 (expr_list:REG_DEAD (reg:DI 12 %r12)
 (expr_list:REG_DEAD (reg:DI 11 %r11)
 (expr_list:REG_DEAD (reg:DI 10 %r10)
 (nil)))
 ...
 (insn/f 123 31 124 4 (set (reg/f:DI 15 %r15)
 (plus:DI (reg/f:DI 15 %r15)
 (const_int -160 [0xff60]))) pr54693-2.c:16 65 {*la_64}
  (expr_list:REG_FRAME_RELATED_EXPR (set (reg/f:DI 15 %r15)
 (plus:DI (reg/f:DI 15 %r15)
 (const_int -160 [0xff60])))
 (nil)))
 ...
 (insn/f:TI 135 134 136 7 (parallel [
 (set (reg:DI 10 %r10)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 240 [0xf0])) [3  S8 A8]))
 (set (reg:DI 11 %r11)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 248 [0xf8])) [3  S8 A8]))
 (set (reg:DI 12 %r12)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 256 [0x100])) [3  S8 A8]))
 (set (reg:DI 13 %r13)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 264 [0x108])) [3  S8 A8]))
 (set (reg:DI 14 %r14)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 272 [0x110])) [3  S8 A8]))
 (set (reg/f:DI 15 %r15)
 (mem:DI (plus:DI (reg/f:DI 15 %r15)
 (const_int 280 [0x118])) [3  S8 A8]))
 ]) pr54693-2.c:25 92 {*load_multiple_di}
  (expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 15 %r15)
 (const_int 160 [0xa0]))
 (expr_list:REG_CFA_RESTORE (reg/f:DI 15 %r15)
 (expr_list:REG_CFA_RESTORE (reg:DI 14 %r14)
 (expr_list:REG_CFA_RESTORE (reg:DI 13 %r13)
 (expr_list:REG_CFA_RESTORE (reg:DI 12 %r12)
 (expr_list:REG_CFA_RESTORE (reg:DI 11 %r11)
 (expr_list:REG_CFA_RESTORE (reg:DI 10 %r10)
 (nil)
 In a function that doesn't need frame pointer, I'd say this is a serious
 bloat of the unwind info, you are saving/restoring %r15 not because you have
 to, but just that it seems to be cheaper for the target.  So, I'd say you
 shouldn't emit DW_CFA_offset 15, 5 on the insn 122 

Re: Fix bootstrap with -mno-accumulate-outgoing-args

2014-02-05 Thread Richard Henderson
On 02/04/2014 09:35 AM, Jan Hubicka wrote:
 On 02/04/2014 08:48 AM, Jan Hubicka wrote:
 How things are supposed to work in this case? So perhaps we need scheduler 
 to
 understand and move around the ARGS_SIZE note?

 I believe we do need to have the scheduler move the notes around.
 
 If we need notes on non-stack adjusting insns as you seem to show in your 
 testcase,
 I guess it is the only way around.  Still combine stack adjust may be smart 
 enough
 to not produce redundant note in the case of go's ICE.
 
 I am not terribly familiar with the code, will you look into it or shall I 
 give it a try?

I had a brief look at find_modifiable_mems, which seems to be the only kind of
schedule-time dependency breaking that we do.  I started writing some new code
that would handle REG_ARGS_SIZE, but then considered that wasn't terribly
appropriate for stage4.

So I've left off with just the dependency links between the insns.  We'd need
these for the schedule-time adjustments anyway, and with them in place the
scheduler does not re-order the insns in a way that causes problems.

Testing for x86_64 has finished; i686 is nearly done.  I suppose the only
question I have is if anyone disagrees that OUTPUT is incorrect as the
dependency type.


r~



Re: Fix bootstrap with -mno-accumulate-outgoing-args

2014-02-05 Thread Richard Henderson
This time with the patch...

On 02/05/2014 07:40 AM, Richard Henderson wrote:
 On 02/04/2014 09:35 AM, Jan Hubicka wrote:
 On 02/04/2014 08:48 AM, Jan Hubicka wrote:
 How things are supposed to work in this case? So perhaps we need scheduler 
 to
 understand and move around the ARGS_SIZE note?

 I believe we do need to have the scheduler move the notes around.

 If we need notes on non-stack adjusting insns as you seem to show in your 
 testcase,
 I guess it is the only way around.  Still combine stack adjust may be smart 
 enough
 to not produce redundant note in the case of go's ICE.

 I am not terribly familiar with the code, will you look into it or shall I 
 give it a try?
 
 I had a brief look at find_modifiable_mems, which seems to be the only kind of
 schedule-time dependency breaking that we do.  I started writing some new code
 that would handle REG_ARGS_SIZE, but then considered that wasn't terribly
 appropriate for stage4.
 
 So I've left off with just the dependency links between the insns.  We'd need
 these for the schedule-time adjustments anyway, and with them in place the
 scheduler does not re-order the insns in a way that causes problems.
 
 Testing for x86_64 has finished; i686 is nearly done.  I suppose the only
 question I have is if anyone disagrees that OUTPUT is incorrect as the
 dependency type.
 
 
 r~
 

* combine-stack-adj.c: Revert r206943.
* sched-int.h (struct deps_desc): Add last_args_size.
* sched-deps.c (init_deps): Initialize it.
(sched_analyze_insn): Add OUTPUT dependencies between insns that
contain REG_ARGS_SIZE notes.


diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index c591c60..69fd5ea 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -567,7 +567,6 @@ combine_stack_adjustments_for_block (basic_block bb)
   try_apply_stack_adjustment (insn, reflist, 0,
 -last_sp_adjust))
{
- rtx note;
  if (last2_sp_set)
maybe_move_args_size_note (last2_sp_set, last_sp_set, false);
  else
@@ -577,11 +576,6 @@ combine_stack_adjustments_for_block (basic_block bb)
  reflist = NULL;
  last_sp_set = NULL_RTX;
  last_sp_adjust = 0;
- /* We no longer adjust stack size.  Whoever adjusted it earlier
-hopefully got the note right.  */
- note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
- if (note)
-   remove_note (insn, note);
  continue;
}
}
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 7efc937..efc4223 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3470,6 +3470,15 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx 
insn)
 change_spec_dep_to_hard (sd_it);
 }
 }
+
+  /* We do not yet have code to adjust REG_ARGS_SIZE, therefore we must
+ honor their original ordering.  */
+  if (find_reg_note (insn, REG_ARGS_SIZE, NULL))
+{
+  if (deps-last_args_size)
+   add_dependence (insn, deps-last_args_size, REG_DEP_OUTPUT);
+  deps-last_args_size = insn;
+}
 }
 
 /* Return TRUE if INSN might not always return normally (e.g. call exit,
@@ -3876,6 +3885,7 @@ init_deps (struct deps_desc *deps, bool lazy_reg_last)
   deps-sched_before_next_jump = 0;
   deps-in_post_call_group_p = not_post_call;
   deps-last_debug_insn = 0;
+  deps-last_args_size = 0;
   deps-last_reg_pending_barrier = NOT_A_BARRIER;
   deps-readonly = 0;
 }
diff --git a/gcc/sched-int.h b/gcc/sched-int.h
index 3b1106f..2cec624 100644
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@@ -539,6 +539,9 @@ struct deps_desc
   /* The last debug insn we've seen.  */
   rtx last_debug_insn;
 
+  /* The last insn bearing REG_ARGS_SIZE that we've seen.  */
+  rtx last_args_size;
+
   /* The maximum register number for the following arrays.  Before reload
  this is max_reg_num; after reload it is FIRST_PSEUDO_REGISTER.  */
   int max_reg;


Re: Fix bootstrap with -mno-accumulate-outgoing-args

2014-02-05 Thread Jan Hubicka
  I had a brief look at find_modifiable_mems, which seems to be the only kind 
  of
  schedule-time dependency breaking that we do.  I started writing some new 
  code
  that would handle REG_ARGS_SIZE, but then considered that wasn't terribly
  appropriate for stage4.
  
  So I've left off with just the dependency links between the insns.  We'd 
  need
  these for the schedule-time adjustments anyway, and with them in place the
  scheduler does not re-order the insns in a way that causes problems.
  
  Testing for x86_64 has finished; i686 is nearly done.  I suppose the only
  question I have is if anyone disagrees that OUTPUT is incorrect as the
  dependency type.

From i386  perspective, this seems like resonable compromise for now.
I did some re-testing of scheduler recently for generic.
Call sequences is one of few places where scheduling matters (in addition to
tight loops), but I do not think we will lose measurable % of perofmrance by
disabling scheduling here.

Thanks!
Honza
  
  
  r~
  
 

   * combine-stack-adj.c: Revert r206943.
   * sched-int.h (struct deps_desc): Add last_args_size.
   * sched-deps.c (init_deps): Initialize it.
   (sched_analyze_insn): Add OUTPUT dependencies between insns that
   contain REG_ARGS_SIZE notes.
 
 
 diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
 index c591c60..69fd5ea 100644
 --- a/gcc/combine-stack-adj.c
 +++ b/gcc/combine-stack-adj.c
 @@ -567,7 +567,6 @@ combine_stack_adjustments_for_block (basic_block bb)
  try_apply_stack_adjustment (insn, reflist, 0,
-last_sp_adjust))
   {
 -   rtx note;
 if (last2_sp_set)
   maybe_move_args_size_note (last2_sp_set, last_sp_set, false);
 else
 @@ -577,11 +576,6 @@ combine_stack_adjustments_for_block (basic_block bb)
 reflist = NULL;
 last_sp_set = NULL_RTX;
 last_sp_adjust = 0;
 -   /* We no longer adjust stack size.  Whoever adjusted it earlier
 -  hopefully got the note right.  */
 -   note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
 -   if (note)
 - remove_note (insn, note);
 continue;
   }
   }
 diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
 index 7efc937..efc4223 100644
 --- a/gcc/sched-deps.c
 +++ b/gcc/sched-deps.c
 @@ -3470,6 +3470,15 @@ sched_analyze_insn (struct deps_desc *deps, rtx x, rtx 
 insn)
  change_spec_dep_to_hard (sd_it);
  }
  }
 +
 +  /* We do not yet have code to adjust REG_ARGS_SIZE, therefore we must
 + honor their original ordering.  */
 +  if (find_reg_note (insn, REG_ARGS_SIZE, NULL))
 +{
 +  if (deps-last_args_size)
 + add_dependence (insn, deps-last_args_size, REG_DEP_OUTPUT);
 +  deps-last_args_size = insn;
 +}
  }
  
  /* Return TRUE if INSN might not always return normally (e.g. call exit,
 @@ -3876,6 +3885,7 @@ init_deps (struct deps_desc *deps, bool lazy_reg_last)
deps-sched_before_next_jump = 0;
deps-in_post_call_group_p = not_post_call;
deps-last_debug_insn = 0;
 +  deps-last_args_size = 0;
deps-last_reg_pending_barrier = NOT_A_BARRIER;
deps-readonly = 0;
  }
 diff --git a/gcc/sched-int.h b/gcc/sched-int.h
 index 3b1106f..2cec624 100644
 --- a/gcc/sched-int.h
 +++ b/gcc/sched-int.h
 @@ -539,6 +539,9 @@ struct deps_desc
/* The last debug insn we've seen.  */
rtx last_debug_insn;
  
 +  /* The last insn bearing REG_ARGS_SIZE that we've seen.  */
 +  rtx last_args_size;
 +
/* The maximum register number for the following arrays.  Before reload
   this is max_reg_num; after reload it is FIRST_PSEUDO_REGISTER.  */
int max_reg;



Re: [PATCH] Remove lto_global_var_decls, simplify necessary analysis for PR60060

2014-02-05 Thread Jan Hubicka
  
  Seems OK to me.  I always had an impression that this machinery exists to
  make decls that are revmoed fro symbol table (optimized out) to land the
  debug info, but I think this is mostly broken by partitioning anyway, right?
  Do we stream them at all?
 
 No, we don't stream those.  We'd need to stream what the FEs think
 are the globals (lang_hooks.getdecls()) and somehow partition them
 and ship them somewhere.

My old idea was to simply output debug for dead stuff from WPA and add it asone
of object files to linking later.
But indeed, this is for later time.

Honza


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote:
  In a function that doesn't need frame pointer, I'd say this is a serious
  bloat of the unwind info, you are saving/restoring %r15 not because you have
  to, but just that it seems to be cheaper for the target.  So, I'd say you
  shouldn't emit DW_CFA_offset 15, 5 on the insn 122 in the prologue and
  DW_CFA_restore 15 in the epilogue (twice in this case), that just waste
  of .eh_frame/.debug_frame space.  I'd say you should represent in
  this case the *store_multiple_di as REG_FRAME_RELATED_EXPR
  with all but the last set in the PARALLEL, and on the *load_multiple_di
  remove REG_CFA_RESTORE (r15) note and change REG_CFA_DEF_CFA note to
  REG_CFA_ADJUST_CFA note which would say that stack pointer has been
  incremented by 160 bytes (epilogue expansion knows that value).
 
 I think at the moment we're relying on the DW_CFA_offset 15s to
 provide correct %r15 values on eh_return.  Every non-leaf function
 saves the stack pointer and the unwind routines track the %r15 save
 slots like they would track any call-saved register.  In other words,
 the final eh_return %r15 comes directly from a save slot, without any
 CFA-specific handling.  If some frames omit the CFI for the %r15 save
 then we'll just keep the save slot from the previous (inner) frame instead.

CCing Richard Henderson into the discussion.

 The new unwind routines would handle 4.9+ and pre-4.9 frames, but pre-4.9
 unwind routines wouldn't handle 4.9+ frames, so would that require a version
 bump on the unwind symbols?

No need to version anything IMHO.  We've never done that for any port
where we've started to emit something the old unwinder wouldn't cope with,
it is user's responsibility to use new enough libgcc_s.

BTW, what is the reason why %r15 is saved/restored from the stack at all say
for simple:
void
foo (void)
{
  int a = 6;
  asm volatile (# %0 : +m (a));
}

Is that some ABI matter where it effectively always uses requires to use
some kind of frame pointer (which the saved previous stack pointer is)?
It can surely make debugging without unwind info easier, ditto backtraces,
on the other side it must have a runtime cost.

Jakub


[testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Rainer Orth
gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86
with gas:

FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
c:21:14: warning: incompatible implicit declaration of built-in function 'floorf
' [enabled by default]
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
c:24:14: warning: incompatible implicit declaration of built-in function 'ceilf'
 [enabled by default]

The platform lacks C99 support, but this can easily be avoided by using
the builtins instead.  The following patch does just that; tested
with the appropriate runtest invocation on i386-pc-solaris2.9 and
x86_64-unknown-linux-gnu.

Ok for mainline?

Rainer


2014-02-05  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.target/i386/avx512f-vrndscaless-2.c (compute_rndscaless):
Use __builtin_floorf, __builtin_ceilf.

# HG changeset patch
# Parent c8a18ca98263f4a2ca4e3e723f9d2b4596b67207
Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
@@ -18,10 +18,10 @@ compute_rndscaless (float *s1, float *s2
   switch (rc)
 {
 case _MM_FROUND_FLOOR:
-  r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m);
+  r[0] = __builtin_floorf (s2[0] * pow (2, m)) / pow (2, m);
   break;
 case _MM_FROUND_CEIL:
-  r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m);
+  r[0] = __builtin_ceilf (s2[0] * pow (2, m)) / pow (2, m);
   break;
 default:
   abort ();

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Uros Bizjak
On Wed, Feb 5, 2014 at 4:50 PM, Rainer Orth r...@cebitec.uni-bielefeld.de 
wrote:
 gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86
 with gas:

 FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
 Excess errors:
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
 c:21:14: warning: incompatible implicit declaration of built-in function 
 'floorf
 ' [enabled by default]
 /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
 c:24:14: warning: incompatible implicit declaration of built-in function 
 'ceilf'
  [enabled by default]

 The platform lacks C99 support, but this can easily be avoided by using
 the builtins instead.  The following patch does just that; tested
 with the appropriate runtest invocation on i386-pc-solaris2.9 and
 x86_64-unknown-linux-gnu.

Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add

extern float floorf (float);

after math.h include.

Does this work for you?

Uros.


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 05:09:31PM +0100, Uros Bizjak wrote:
 On Wed, Feb 5, 2014 at 4:50 PM, Rainer Orth r...@cebitec.uni-bielefeld.de 
 wrote:
  gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86
  with gas:
 
  FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
  Excess errors:
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
  c:21:14: warning: incompatible implicit declaration of built-in function 
  'floorf
  ' [enabled by default]
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
  c:24:14: warning: incompatible implicit declaration of built-in function 
  'ceilf'
   [enabled by default]
 
  The platform lacks C99 support, but this can easily be avoided by using
  the builtins instead.  The following patch does just that; tested
  with the appropriate runtest invocation on i386-pc-solaris2.9 and
  x86_64-unknown-linux-gnu.
 
 Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add
 
 extern float floorf (float);

Won't that break if math.h defines floorf as a macro?
You'd at least need then
extern float (floorf) (float);

Jakub


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Uros Bizjak
On Wed, Feb 5, 2014 at 5:11 PM, Jakub Jelinek ja...@redhat.com wrote:

  gcc.target/i386/avx512f-vrndscaless-2.c currently FAILs on Solaris 9/x86
  with gas:
 
  FAIL: gcc.target/i386/avx512f-vrndscaless-2.c (test for excess errors)
  Excess errors:
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
  c:21:14: warning: incompatible implicit declaration of built-in function 
  'floorf
  ' [enabled by default]
  /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.
  c:24:14: warning: incompatible implicit declaration of built-in function 
  'ceilf'
   [enabled by default]
 
  The platform lacks C99 support, but this can easily be avoided by using
  the builtins instead.  The following patch does just that; tested
  with the appropriate runtest invocation on i386-pc-solaris2.9 and
  x86_64-unknown-linux-gnu.

 Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add

 extern float floorf (float);

 Won't that break if math.h defines floorf as a macro?
 You'd at least need then
 extern float (floorf) (float);

Looks like using builtins should be the correct way, so the original
patch is OK.

Rainer, can you please add the same cure to sse4_1-floor*.vec tests?

Thanks,
Uros.


Re: [C++ Patch/RFC] PR 60047

2014-02-05 Thread Paolo Carlini

Hi,

On 02/04/2014 12:51 PM, Paolo Carlini wrote:

Hi,

thus I tried to have a look to this issue, while experiencing some 
weird problems with the debugger, which slowed me down a lot. I have 
been able to figure out that we don't seem to actively set constexpr_p 
to true anywhere in implicitly_declare_fn (besides the unrelated case 
of inheriting constructors), thus I'm wondering if it would make sense 
to simply do the below?!? (and revert my recent tweak)
I spent a little more time on this. Outside the case of inheriting 
constructors and the special case of expected_trivial default 
constructors (which are handled in a special conditional), we are 
normally assigning true to constexpr_p only in one place, that is close 
to the beginning of synthesized_method_walk:


  /* If that user-written default constructor would satisfy the
 requirements of a constexpr constructor (7.1.5), the
 implicitly-defined default constructor is constexpr.  */
  if (constexpr_p)
*constexpr_p = ctor_p;

then, for the testcase at issue it becomes false in the place which we 
already discussed for the ICE:


  if (vec_safe_is_empty (vbases))
/* No virtual bases to worry about.  */;
  else if (!assign_p)
{
  if (constexpr_p)
*constexpr_p = false;

Then it doesn't gets a chance to return true. Thus, given the more 
accurate analysis, I think that we either do what I quickly proposed 
yesterday, or alternately, the only option I can see, we change 
process_subob_fn to set it to true in some cases (but I note that it 
does never set to true trivial_p).


Thanks,
Paolo.


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Rainer Orth
Hi Uros,

 Let's solve this in the way sse4_1-floorf-vec.c solves it and simply add

 extern float floorf (float);

 after math.h include.

 Does this work for you?

it does, but only because the floorf call is optimized away at -O2.
While the Solaris 9 libm contains __floorf, there's no floorf, neither
in libm nor in headers.  The builtin route avoids those problems.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Rainer Orth
Uros Bizjak ubiz...@gmail.com writes:

 Looks like using builtins should be the correct way, so the original
 patch is OK.

 Rainer, can you please add the same cure to sse4_1-floor*.vec tests?

Sure, will do.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

2014-02-05 Thread Rainer Orth
Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 Uros Bizjak ubiz...@gmail.com writes:

 Looks like using builtins should be the correct way, so the original
 patch is OK.

 Rainer, can you please add the same cure to sse4_1-floor*.vec tests?

 Sure, will do.

Here's what I committed after retesting on i386-pc-solaris2.9,
i386-pc-solaris2.11 and x86_64-unknown-linux-gnu.

Rainer


2014-02-05  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.target/i386/avx512f-vrndscaless-2.c (compute_rndscaless):
Use __builtin_floorf, __builtin_ceilf.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c (floorf): Remove
declaration.
(TEST): Use __builtin_floorf.
* gcc.target/i386/sse4_1-floorf-vec.c: Likewise.

# HG changeset patch
# Parent c8a18ca98263f4a2ca4e3e723f9d2b4596b67207
Fix gcc.target/i386/avx512f-vrndscaless-2.c on Solaris 9/x86

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
--- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c
@@ -18,10 +18,10 @@ compute_rndscaless (float *s1, float *s2
   switch (rc)
 {
 case _MM_FROUND_FLOOR:
-  r[0] = floorf (s2[0] * pow (2, m)) / pow (2, m);
+  r[0] = __builtin_floorf (s2[0] * pow (2, m)) / pow (2, m);
   break;
 case _MM_FROUND_CEIL:
-  r[0] = ceilf (s2[0] * pow (2, m)) / pow (2, m);
+  r[0] = __builtin_ceilf (s2[0] * pow (2, m)) / pow (2, m);
   break;
 default:
   abort ();
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c
--- a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-sfix-vec.c
@@ -15,8 +15,6 @@
 
 #include math.h
 
-extern float floorf (float);
-
 #define NUM 64
 
 static void
@@ -53,10 +51,10 @@ TEST (void)
   init_src (a);
 
   for (i = 0; i  NUM; i++)
-r[i] = (int) floorf (a[i]);
+r[i] = (int) __builtin_floorf (a[i]);
 
   /* check results:  */
   for (i = 0; i  NUM; i++)
-if (r[i] != (int) floorf (a[i]))
+if (r[i] != (int) __builtin_floorf (a[i]))
   abort();
 }
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c
--- a/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c
+++ b/gcc/testsuite/gcc.target/i386/sse4_1-floorf-vec.c
@@ -15,8 +15,6 @@
 
 #include math.h
 
-extern float floorf (float);
-
 #define NUM 64
 
 static void
@@ -53,10 +51,10 @@ TEST (void)
   init_src (a);
 
   for (i = 0; i  NUM; i++)
-r[i] = floorf (a[i]);
+r[i] = __builtin_floorf (a[i]);
 
   /* check results:  */
   for (i = 0; i  NUM; i++)
-if (r[i] != floorf (a[i]))
+if (r[i] != __builtin_floorf (a[i]))
   abort();
 }

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: regression test issue

2014-02-05 Thread Paolo Carlini

Hi,

On 02/05/2014 06:29 AM, Iyer, Balaji V wrote:

Hello Everyone,
The following two Cilk Plus tests is timing out at -O1 in my x86_64 box 
(-O2, -O3  and -O0 works fine). These tests were working fine till revision 
r207047. Can someone please look at this? It looks like a middle-end/back-end 
issue.

WARNING: program timed out.
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus execution test
WARNING: program timed out.
FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c  -O1 execution test
Thanks. I'm wondering if we could open an high priority Bugzilla about 
that and for the time being avoid running the test at -O1, it is slowing 
down the testsuite and using a lot of cpu. Also, maybe you could figure 
out which specific change caused the regression and ping the appropriate 
people...


Thanks again,
Paolo.


RE: regression test issue

2014-02-05 Thread Iyer, Balaji V


 -Original Message-
 From: Paolo Carlini [mailto:paolo.carl...@oracle.com]
 Sent: Wednesday, February 5, 2014 11:53 AM
 To: Iyer, Balaji V; gcc-patches@gcc.gnu.org
 Subject: Re: regression test issue
 
 Hi,
 
 On 02/05/2014 06:29 AM, Iyer, Balaji V wrote:
  Hello Everyone,
  The following two Cilk Plus tests is timing out at -O1 in my x86_64 box
 (-O2, -O3  and -O0 works fine). These tests were working fine till revision
 r207047. Can someone please look at this? It looks like a middle-end/back-
 end issue.
 
  WARNING: program timed out.
  FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus execution test
  WARNING: program timed out.
  FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c  -O1 execution test
 Thanks. I'm wondering if we could open an high priority Bugzilla about that
 and for the time being avoid running the test at -O1, it is slowing down the
 testsuite and using a lot of cpu. Also, maybe you could figure out which
 specific change caused the regression and ping the appropriate people...

Hi Paolo,   
OK. Here is a patch that will do so:

Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
===
--- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 207437)
+++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy)
@@ -64,12 +64,12 @@
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -O3 
-ftree-vectorize -fcilkplus -g  

 if { [check_libcilkrts_available] } {
-dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-O1 -fcilkplus  
+#dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-O1 -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-O3 -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-g -fcilkplus  
 dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
-g -O2 -fcilkplus  

-dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O1  
+#dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O1  
 dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -O3  
 dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -g  
 dg-runtest [lsort [glob -nocomplain 
$srcdir/c-c++-common/cilk-plus/CK/*.c]]  -g -O2  
===

Here is the ChangeLog entry:

2014-02-05  Balaji V. Iyer  balaji.v.i...@intel.com

* g++.dg/cilk-plus/cilk-plus.exp: Commented out -O1 Cilk Keywords
test for the time-being.


Note: I am purposely commenting the line out and not deleting them because I 
want to get them to run in -O1 some time, and deleting them might cause us 
(atleast me :-). ) to forget about it.

I will go ahead and file a bugzilla report.

Thanks,

Balaji V. Iyer.


 
 Thanks again,
 Paolo.


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Richard Henderson
On 02/05/2014 07:58 AM, Jakub Jelinek wrote:
 On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote:
 I think at the moment we're relying on the DW_CFA_offset 15s to
 provide correct %r15 values on eh_return.  Every non-leaf function
 saves the stack pointer and the unwind routines track the %r15 save
 slots like they would track any call-saved register.  In other words,
 the final eh_return %r15 comes directly from a save slot, without any
 CFA-specific handling.  If some frames omit the CFI for the %r15 save
 then we'll just keep the save slot from the previous (inner) frame instead.
 
 CCing Richard Henderson into the discussion.

If the sp is saved for a given frame, we'll use that instead of the CFA when
unwinding to a previous frame.  That should be unaffected by whether or not sp
is saved for any particular frame.

There appears to be code in uw_install_context_1 to handle a mix of sp-saving
and non-sp-saving frames.  Given that s390 is pretty much the only target that
uses these paths, I suppose it could be broken.  But if it is, it would surely
be better to fix than just say it doesn't work.

 BTW, what is the reason why %r15 is saved/restored from the stack at all say
 for simple:
 void
 foo (void)
 {
   int a = 6;
   asm volatile (# %0 : +m (a));
 }
 
 Is that some ABI matter where it effectively always uses requires to use
 some kind of frame pointer (which the saved previous stack pointer is)?
 It can surely make debugging without unwind info easier, ditto backtraces,
 on the other side it must have a runtime cost.

As far as I recall, the return address is in r14 so the ABI also saves r15 for
all non-leaf functions simply because it is practically free to do so with
store-multiple.  This creates the familiar ra/fp unwinding chain.  Given how
cheap it is to maintain this chain on s390, I see no particular call to break 
it.

As for why r15 gets saved for this leaf example... I have no idea.
Seems like a bug to me, frankly.


r~


RE: [PATCH RFC] MIPS add support for MIPS SIMD ARCHITECTURE V1.07

2014-02-05 Thread Graham Stott
Hi Richard,

Attached is an updated patch for  feedback so  MSA support to MIPS backend can 
be added when open again for next stage1.

It is unfinished in that some comments from your review of the initial patch 
have yet to be addressed.

The diff is against svn 207500 

Graham




msa.svn.207500.diffs.gz
Description: msa.svn.207500.diffs.gz


Re: RFA: MN10300: Include saved registers in stack usage

2014-02-05 Thread Joseph S. Myers
On Wed, 5 Feb 2014, Jeff Law wrote:

 However, it's pretty easy to avoid the headaches and just provide a popcount
 routine.   I don't think this code is at all performance critical (once per
 function being compiled), so a simple popcount should be sufficient.  No need
 for lookup tables IMHO.

hwint.[ch] provide popcount_hwi, if HOST_WIDE_INT is a suitable type.

(Personally I think there are lots of bits of standard integer 
manipulation functionality, such as popcount, that are common in processor 
instructions and compiler intrinsics, but could do with having bindings in 
ISO C to provide a standard way to access that functionality.  Such 
bindings might take the form of library functions / macros that compilers 
/ library headers can then optimize as appropriate; that seems most in the 
ISO C style rather than __builtin_*.  Even outside of ISO C, I suspect 
there's quite a bit in common between target-specific builtins for 
different targets where a target-independent builtin would be better - and 
for that matter, target-specific builtins where some form of standard 
binding already exists but isn't supported in GCC, or where the 
target-specific builtin duplicates something generic in GNU C.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFA: MN10300: Include saved registers in stack usage

2014-02-05 Thread Jeff Law

On 02/05/14 03:44, Nick Clifton wrote:

Hi Jeff, Hi Alex,

   Sorry - another MN10300 patch - this time for the stack size reported
   with -fstack-usage.  Currently the value includes the stack frame, but
   it does not take into account any registers that might have been
   pushed onto the stack.

   The patch below fixes this problem, but there is one thing that I am
   not sure about - is it OK to use __builtin_popcount() or should I be
   calling some other function ?
According to our coding conventions, the ability to build with something 
other than gcc is still desirable.  You could argue that you're unlikely 
to be bootstrapping on a mn103 with something other than GCC and if 
you're building a cross, you could start by first building gcc native.


However, it's pretty easy to avoid the headaches and just provide a 
popcount routine.   I don't think this code is at all performance 
critical (once per function being compiled), so a simple popcount should 
be sufficient.  No need for lookup tables IMHO.


jeff



RE: regression test issue

2014-02-05 Thread Iyer, Balaji V
Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to 
install?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Wednesday, February 5, 2014 12:02 PM
 To: 'Paolo Carlini'; gcc-patches@gcc.gnu.org
 Subject: RE: regression test issue
 
 
 
  -Original Message-
  From: Paolo Carlini [mailto:paolo.carl...@oracle.com]
  Sent: Wednesday, February 5, 2014 11:53 AM
  To: Iyer, Balaji V; gcc-patches@gcc.gnu.org
  Subject: Re: regression test issue
 
  Hi,
 
  On 02/05/2014 06:29 AM, Iyer, Balaji V wrote:
   Hello Everyone,
 The following two Cilk Plus tests is timing out at -O1 in my x86_64
   box
  (-O2, -O3  and -O0 works fine). These tests were working fine till
  revision r207047. Can someone please look at this? It looks like a
  middle-end/back- end issue.
  
   WARNING: program timed out.
   FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus execution
   test
   WARNING: program timed out.
   FAIL: c-c++-common/cilk-plus/CK/spawner_inline.c  -O1 execution test
  Thanks. I'm wondering if we could open an high priority Bugzilla about
  that and for the time being avoid running the test at -O1, it is
  slowing down the testsuite and using a lot of cpu. Also, maybe you
  could figure out which specific change caused the regression and ping the
 appropriate people...
 
 Hi Paolo,
   OK. Here is a patch that will do so:
 
 Index: gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp
 ==
 =
 --- gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(revision 207437)
 +++ gcc/testsuite/g++.dg/cilk-plus/cilk-plus.exp(working copy)
 @@ -64,12 +64,12 @@
  dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/AN/*.cc]]  -O3 
 -
 ftree-vectorize -fcilkplus -g  
 
  if { [check_libcilkrts_available] } {
 -dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
 -
 O1 -fcilkplus  
 +#dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]] 
  -
 O1 -fcilkplus  
  dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
 -O3
 -fcilkplus  
  dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
 -g -
 fcilkplus  
  dg-runtest [lsort [glob -nocomplain $srcdir/g++.dg/cilk-plus/CK/*.cc]]  
 -g -
 O2 -fcilkplus  
 
 -dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-
 plus/CK/*.c]]  -O1  
 +#dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-
 plus/CK/*.c]]  -O1  
  dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-
 plus/CK/*.c]]  -O3  
  dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-
 plus/CK/*.c]]  -g  
  dg-runtest [lsort [glob -nocomplain $srcdir/c-c++-common/cilk-
 plus/CK/*.c]]  -g -O2  
 ==
 =
 
 Here is the ChangeLog entry:
 
 2014-02-05  Balaji V. Iyer  balaji.v.i...@intel.com
 
 * g++.dg/cilk-plus/cilk-plus.exp: Commented out -O1 Cilk Keywords
 test for the time-being.
 
 
 Note: I am purposely commenting the line out and not deleting them
 because I want to get them to run in -O1 some time, and deleting them
 might cause us (atleast me :-). ) to forget about it.
 
 I will go ahead and file a bugzilla report.
 
 Thanks,
 
 Balaji V. Iyer.
 
 
 
  Thanks again,
  Paolo.


Re: regression test issue

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 07:08:32PM +, Iyer, Balaji V wrote:
 Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to 
 install?

No.  If you want to skip a particular test, you should use dg-skip-if for
that, see http://gcc.gnu.org/onlinedocs/gccint/Directives.html

Jakub


RE: regression test issue

2014-02-05 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, February 5, 2014 2:25 PM
 To: Iyer, Balaji V
 Cc: Paolo Carlini; gcc-patches@gcc.gnu.org
 Subject: Re: regression test issue
 
 On Wed, Feb 05, 2014 at 07:08:32PM +, Iyer, Balaji V wrote:
  Sorry, I forgot to put [PATCH] in the subject line. Is the patch below OK to
 install?
 
 No.  If you want to skip a particular test, you should use dg-skip-if for 
 that,
 see http://gcc.gnu.org/onlinedocs/gccint/Directives.html

Ok. Here is the fixed patch:

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 207437)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2014-02-05  Balaji V. Iyer  balaji.v.i...@intel.com
+
+   * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1
+   * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise.
+
 2014-02-03  Marc Glisse  marc.gli...@inria.fr

PR c++/53017
Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc
===
--- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (revision 207437)
+++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (working copy)
@@ -1,6 +1,7 @@
 /* { dg-options -fcilkplus } */
 /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
 /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* arm*-*-* 
} } } */
+/* { dg-skip-if  { *-*-* } { -O1 } {  } } */

 #include assert.h
 #include unistd.h
Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c
===
--- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 
207437)
+++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-options -fcilkplus } */
 /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-skip-if  { *-*-* } { -O1 } {  } } */

OK to install?

Thanks,
Balaji V. Iyer.

 
   Jakub


Re: regression test issue

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 07:38:11PM +, Iyer, Balaji V wrote:
 +2014-02-05  Balaji V. Iyer  balaji.v.i...@intel.com
 +
 +   * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1
 +   * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise.

Ok.

 --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (revision 207437)
 +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-options -fcilkplus } */
  /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
  /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* 
 arm*-*-* } } } */
 +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
  #include assert.h
  #include unistd.h
 Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c
 ===
 --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 
 207437)
 +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
  /* { dg-options -fcilkplus } */
  /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } 
 */
 +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
 OK to install?

Jakub


[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062, take 2)

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 02:20:05PM +0100, Richard Biener wrote:
  Using !!optimize to determine if we should switch local ABI to regparm
  convention isn't compatible with optimize attribute, as !!optimize is
  whether the current function is being optimized, but for the ABI decisions
  we actually need the caller and callee to agree on the calling convention.
  
  Fixed by looking at callee's optimize settings all the time.
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 Seeing a 2nd variant of such code I'd rather have an abstraction
 for this ... optimize_fn ()?  opt_for_fn (fn, OPT_...)?

Here is an updated version of the patch, unfortunately this one regressed in
quite a lot of tests.  The problem is that by using opt_for_fn there
it also sets the failure reason for all functions at -O0 that don't have
the optimize attribute at all.  Now copy_forbidden is used by the inliner,
where we have to handle always_inline functions, other -O0 functions we likely
don't want to inline, be it because the cmd line options are -O0 or because
the function we are considering for inlining has optimize (0), and
for versioning, where I'd say we want to never version the -O0 functions.

So, where do we want to do that instead?  E.g. should it be e.g. in
tree_versionable_function_p directly and let the inliner (if it doesn't do
already) also treat optimize(0) functions that aren't always_inline as
noinline?

I guess even without this patch my PR60026 fix broke inlining of
__attribute__((always_inline, optimize (0))) functions.

2014-02-05  Jakub Jelinek  ja...@redhat.com

PR target/60062
* tree.h (opts_for_fn): New inline function.
(opt_for_fn): Define.
* config/i386/i386.c (ix86_function_regparm): Use
opt_for_fn (decl, optimize) instead of optimize.
* tree-inline.c (copy_forbidden): Use opt_for_fn.

* gcc.c-torture/execute/pr60062.c: New test.
* gcc.c-torture/execute/pr60072.c: New test.

--- gcc/tree.h.jj   2014-01-20 19:16:29.0 +0100
+++ gcc/tree.h  2014-02-05 15:54:04.681904492 +0100
@@ -4470,6 +4470,20 @@ may_be_aliased (const_tree var)
  || TREE_ADDRESSABLE (var)));
 }
 
+/* Return pointer to optimization flags of FNDECL.  */
+static inline struct cl_optimization *
+opts_for_fn (const_tree fndecl)
+{
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
+  if (fn_opts == NULL_TREE)
+fn_opts = optimization_default_node;
+  return TREE_OPTIMIZATION (fn_opts);
+}
+
+/* opt flag for function FNDECL, e.g. opts_for_fn (fndecl, optimize) is
+   the optimization level of function fndecl.  */
+#define opt_for_fn(fndecl, opt) (opts_for_fn (fndecl)-x_##opt)
+
 /* For anonymous aggregate types, we need some sort of name to
hold on to.  In practice, this should not appear, but it should
not be harmful if it does.  */
--- gcc/config/i386/i386.c.jj   2014-02-05 10:37:36.166167116 +0100
+++ gcc/config/i386/i386.c  2014-02-05 15:56:36.779146394 +0100
@@ -5608,7 +5608,12 @@ ix86_function_regparm (const_tree type,
   /* Use register calling convention for local functions when possible.  */
   if (decl
TREE_CODE (decl) == FUNCTION_DECL
-   optimize
+  /* Caller and callee must agree on the calling convention, so
+checking here just optimize means that with
+__attribute__((optimize (...))) caller could use regparm convention
+and callee not, or vice versa.  Instead look at whether the callee
+is optimized or not.  */
+   opt_for_fn (decl, optimize)
!(profile_flag  !flag_fentry))
 {
   /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified.  */
--- gcc/tree-inline.c.jj2014-02-04 14:03:31.0 +0100
+++ gcc/tree-inline.c   2014-02-05 15:55:34.747448968 +0100
@@ -3315,16 +3315,10 @@ copy_forbidden (struct function *fun, tr
goto fail;
   }
 
-  tree fs_opts;
-  fs_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fun-decl);
-  if (fs_opts)
+  if (!opt_for_fn (fun-decl, optimize))
 {
-  struct cl_optimization *os = TREE_OPTIMIZATION (fs_opts);
-  if (!os-x_optimize)
-   {
- reason = G_(function %q+F compiled without optimizations);
- goto fail;
-   }
+  reason = G_(function %q+F compiled without optimizations);
+  goto fail;
 }
 
  fail:
--- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 
15:55:48.639388697 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr60062.c   2014-02-05 
15:55:48.639388697 +0100
@@ -0,0 +1,25 @@
+/* PR target/60062 */
+
+int a;
+
+static void
+foo (const char *p1, int p2)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+static void
+bar (const char *p1)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+__attribute__((optimize (0))) int
+main ()
+{
+  foo (hello, a);
+  bar (hello);
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2014-02-05 

RE: regression test issue

2014-02-05 Thread Iyer, Balaji V


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Wednesday, February 5, 2014 2:43 PM
 To: Iyer, Balaji V
 Cc: Paolo Carlini; gcc-patches@gcc.gnu.org
 Subject: Re: regression test issue
 
 On Wed, Feb 05, 2014 at 07:38:11PM +, Iyer, Balaji V wrote:
  +2014-02-05  Balaji V. Iyer  balaji.v.i...@intel.com
  +
  +   * g++.dg/cilk-plus/CK/catch_exc.cc: Disable test for -O1
  +   * c-c++-common/cilk-plus/CK/spawner_inline.c: Likewise.
 
 Ok.
 

Committed.

-Balaji V. Iyer.


  --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (revision 207437)
  +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (working copy)
  @@ -1,6 +1,7 @@
   /* { dg-options -fcilkplus } */
   /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
   /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* 
  arm*-*-* }
 } } */
  +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
   #include assert.h
   #include unistd.h
  Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c
 
 ==
 =
  --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision
 207437)
  +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working
 copy)
  @@ -1,6 +1,7 @@
   /* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
   /* { dg-options -fcilkplus } */
   /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } 
  } */
  +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
  OK to install?
 
   Jakub


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Richard Sandiford
Richard Henderson r...@redhat.com writes:
 On 02/05/2014 07:58 AM, Jakub Jelinek wrote:
 On Wed, Feb 05, 2014 at 03:32:22PM +, Richard Sandiford wrote:
 I think at the moment we're relying on the DW_CFA_offset 15s to
 provide correct %r15 values on eh_return.  Every non-leaf function
 saves the stack pointer and the unwind routines track the %r15 save
 slots like they would track any call-saved register.  In other words,
 the final eh_return %r15 comes directly from a save slot, without any
 CFA-specific handling.  If some frames omit the CFI for the %r15 save
 then we'll just keep the save slot from the previous (inner) frame instead.
 
 CCing Richard Henderson into the discussion.

 If the sp is saved for a given frame, we'll use that instead of the CFA when
 unwinding to a previous frame.  That should be unaffected by whether or not sp
 is saved for any particular frame.

 There appears to be code in uw_install_context_1 to handle a mix of sp-saving
 and non-sp-saving frames.  Given that s390 is pretty much the only target that
 uses these paths, I suppose it could be broken.  But if it is, it would surely
 be better to fix than just say it doesn't work.

OK.  When I was looking at it earlier today, the reason it didn't work
seemed to be that if %r15 wasn't saved for a particular frame, the context
would inherit the %r15 save slot for the previous frame, just like when
some functions use and save a particular call-saved register and some don't.
So we would only get to uw_install_context_1 without a %r15 slot if none
of the frames had one.

In other words, the reason seemed to be that the _Unwind_SetGRPtr in:

#ifdef EH_RETURN_STACKADJ_RTX
  /* Special handling here: Many machines do not use a frame pointer,
 and track the CFA only through offsets from the stack pointer from
 one frame to the next.  In this case, the stack pointer is never
 stored, so it has no saved address in the context.  What we do
 have is the CFA from the previous stack frame.

 In very special situations (such as unwind info for signal return),
 there may be location expressions that use the stack pointer as well.

 Do this conditionally for one frame.  This allows the unwind info
 for one frame to save a copy of the stack pointer from the previous
 frame, and be able to use much easier CFA mechanisms to do it.
 Always zap the saved stack pointer value for the next frame; carrying
 the value over from one frame to another doesn't make sense.  */

  _Unwind_SpTmp tmp_sp;

  if (!_Unwind_GetGRPtr (orig_context, __builtin_dwarf_sp_column ()))
_Unwind_SetSpColumn (orig_context, context-cfa, tmp_sp);
  _Unwind_SetGRPtr (context, __builtin_dwarf_sp_column (), NULL);
#endif

was conditional on EH_RETURN_STACKADJ_RTX being defined.  It looked at
first glance like things would work if that call was moved outside,
so that we never inherit save slots for the stack pointer.  Does that
make sense?  I can give it a go tomorrow if so.

Thanks,
Richard


[PATCH] Remove unreachable return stmt (PR c/53123)

2014-02-05 Thread Marek Polacek
This is not a regression, on the other hand it's so obvious that
it could go in nevertheless.  Ok?

2014-02-05  Marek Polacek  pola...@redhat.com

PR c/53123
c-family/
* c-omp.c (c_finish_omp_atomic): Remove unreachable return
statement.

diff --git gcc/c-family/c-omp.c gcc/c-family/c-omp.c
index 4ce51e4..dd0a45d 100644
--- gcc/c-family/c-omp.c
+++ gcc/c-family/c-omp.c
@@ -183,7 +183,6 @@ c_finish_omp_atomic (location_t loc, enum tree_code code,
   OMP_ATOMIC_SEQ_CST (x) = seq_cst;
   return build_modify_expr (loc, v, NULL_TREE, NOP_EXPR,
loc, x, NULL_TREE);
-  return x;
 }
 
   /* There are lots of warnings, errors, and conversions that need to happen

Marek


Re: [PATCH] Remove unreachable return stmt (PR c/53123)

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 09:58:32PM +0100, Marek Polacek wrote:
 This is not a regression, on the other hand it's so obvious that
 it could go in nevertheless.  Ok?
 
 2014-02-05  Marek Polacek  pola...@redhat.com
 
   PR c/53123
 c-family/
   * c-omp.c (c_finish_omp_atomic): Remove unreachable return
   statement.

Ok, thanks.
 --- gcc/c-family/c-omp.c
 +++ gcc/c-family/c-omp.c
 @@ -183,7 +183,6 @@ c_finish_omp_atomic (location_t loc, enum tree_code code,
OMP_ATOMIC_SEQ_CST (x) = seq_cst;
return build_modify_expr (loc, v, NULL_TREE, NOP_EXPR,
   loc, x, NULL_TREE);
 -  return x;
  }
  
/* There are lots of warnings, errors, and conversions that need to happen

Jakub


Re: regression test issue

2014-02-05 Thread Rainer Orth
Iyer, Balaji V balaji.v.i...@intel.com writes:

 Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc
 ===
 --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (revision 207437)
 +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-options -fcilkplus } */
  /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
  /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-* 
 arm*-*-* } } } */
 +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */

  #include assert.h
  #include unistd.h
 Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c
 ===
 --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision 
 207437)
 +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
  /* { dg-options -fcilkplus } */
  /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-* } } } 
 */
 +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */

You should put a comment (e.g. a PR reference) in the comment field to
explain why you're skipping this test.  Most likely, the last arg to
dg-skip-if ({  }) is optional; if so, please omit it.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Ulrich Weigand
Richard Sandiford wrote:

 In other words, the reason seemed to be that the _Unwind_SetGRPtr in:
 
 #ifdef EH_RETURN_STACKADJ_RTX
   /* Special handling here: Many machines do not use a frame pointer,
  and track the CFA only through offsets from the stack pointer from
  one frame to the next.  In this case, the stack pointer is never
  stored, so it has no saved address in the context.  What we do
  have is the CFA from the previous stack frame.
 
  In very special situations (such as unwind info for signal return),
  there may be location expressions that use the stack pointer as well.
 
  Do this conditionally for one frame.  This allows the unwind info
  for one frame to save a copy of the stack pointer from the previous
  frame, and be able to use much easier CFA mechanisms to do it.
  Always zap the saved stack pointer value for the next frame; carrying
  the value over from one frame to another doesn't make sense.  */
 
   _Unwind_SpTmp tmp_sp;
 
   if (!_Unwind_GetGRPtr (orig_context, __builtin_dwarf_sp_column ()))
 _Unwind_SetSpColumn (orig_context, context-cfa, tmp_sp);
   _Unwind_SetGRPtr (context, __builtin_dwarf_sp_column (), NULL);
 #endif
 
 was conditional on EH_RETURN_STACKADJ_RTX being defined.  It looked at
 first glance like things would work if that call was moved outside,
 so that we never inherit save slots for the stack pointer.  Does that
 make sense?  I can give it a go tomorrow if so.

I haven't looked into this in detail right now, but I recall that I had
to specifically add that #ifdef a long time ago since unwinding wouldn't
work correctly on s390 otherwise:
http://gcc.gnu.org/ml/gcc-patches/2003-05/msg00904.html

And the background of the bug here:
http://gcc.gnu.org/ml/gcc/2003-05/msg00536.html

Actually, now I think the problem originally described there is still
valid: on s390 the CFA is *not* equal to the value at function entry,
but biased by 96/160 bytes.  So setting the SP to the CFA is wrong ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [C++ Patch/RFC] PR 60047

2014-02-05 Thread Jason Merrill

On 02/05/2014 11:19 AM, Paolo Carlini wrote:

   if (vec_safe_is_empty (vbases))
 /* No virtual bases to worry about.  */;
   else if (!assign_p)
 {
   if (constexpr_p)
 *constexpr_p = false;


*constexpr_p should be false for a constructor of a class with virtual 
bases, according to the standard (7.1.5p4):


The definition of a constexpr constructor shall satisfy the following 
constraints:

— the class shall not have any virtual base classes;
...

So the assert in implicit_declare_fn is wrong.  I guess I would fix it 
by checking CLASSTYPE_VBASECLASSES.


Jason



Re: var-tracking and s390's LM(G)

2014-02-05 Thread Ulrich Weigand
I wrote and forgot to proof-read:

 Actually, now I think the problem originally described there is still
 valid: on s390 the CFA is *not* equal to the value at function entry,

... CFA is not equal to the *SP* value at function entry ...

 but biased by 96/160 bytes.  So setting the SP to the CFA is wrong ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



RE: regression test issue

2014-02-05 Thread Iyer, Balaji V


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Rainer Orth
 Sent: Wednesday, February 5, 2014 4:22 PM
 To: Iyer, Balaji V
 Cc: Jakub Jelinek; Paolo Carlini; gcc-patches@gcc.gnu.org
 Subject: Re: regression test issue
 
 Iyer, Balaji V balaji.v.i...@intel.com writes:
 
  Index: gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc
 
 ==
 =
  --- gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (revision 207437)
  +++ gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc  (working copy)
  @@ -1,6 +1,7 @@
   /* { dg-options -fcilkplus } */
   /* { dg-do run { target i?86-*-* x86_64-*-* arm*-*-* } } */
   /* { dg-options -fcilkplus -lcilkrts { target { i?86-*-* x86_64-*-*
  arm*-*-* } } } */
  +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
   #include assert.h
   #include unistd.h
  Index: gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c
 
 ==
 =
  --- gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(revision
 207437)
  +++ gcc/testsuite/c-c++-common/cilk-plus/CK/spawner_inline.c(working
 copy)
  @@ -1,6 +1,7 @@
   /* { dg-do run  { target { i?86-*-* x86_64-*-* } } } */
   /* { dg-options -fcilkplus } */
   /* { dg-additional-options -lcilkrts { target { i?86-*-* x86_64-*-*
  } } } */
  +/* { dg-skip-if  { *-*-* } { -O1 } {  } } */
 
 You should put a comment (e.g. a PR reference) in the comment field to
 explain why you're skipping this test.  Most likely, the last arg to 
 dg-skip-if ({
  }) is optional; if so, please omit it.

There isn't a PR reference for this. I can add a comment in the testcase. Would 
that be enough?

I will also remove the last {} from the tests.

Thanks,

Balaji V. Iyer.
 
 Thanks.
 Rainer
 
 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [i386] Replace builtins with vector extensions

2014-02-05 Thread Marc Glisse

Hello,

I was wondering if the new #pragma target in *mmintrin.h make this 
approach more acceptable for 4.10?


http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00374.html

On Sun, 7 Apr 2013, Marc Glisse wrote:


Hello,

the attached patch is very incomplete (it passes bootstrap+testsuite on 
x86_64-linux-gnu), but it raises a number of questions that I'd like to 
settle before continuing.


* Is there any chance of a patch in this direction being accepted?

* May I remove the builtins (from i386.c and the doc) when they become 
unused?


* Do we want to keep the casts even when they don't seem strictly necessary? 
For instance for _mm_add_ps, we can write:

return __A + __B;
or:
return (__m128) ((__v4sf)__A + (__v4sf)__B);
Note that for _mm_add_epi8 for instance we do need the casts.

* For integer operations like _mm_add_epi16 I should probably use the 
unsigned typedefs to make it clear overflow is well defined? (the patch still 
has the signed version)


* Any better name than __v4su for the unsigned version of __v4si?

* Other comments?


2013-04-07  Marc Glisse  marc.gli...@inria.fr

* emmintrin.h (__v2du, __v4su, __v8hu): New typedefs.
(_mm_add_pd, _mm_sub_pd, _mm_mul_pd, _mm_div_pd,
_mm_cmpeq_pd, _mm_cmplt_pd, _mm_cmple_pd, _mm_cmpgt_pd, _mm_cmpge_pd,
_mm_cmpneq_pd, _mm_add_epi8, _mm_add_epi16, _mm_add_epi32,
_mm_add_epi64, _mm_slli_epi16, _mm_slli_epi32, _mm_slli_epi64,
_mm_srai_epi16, _mm_srai_epi32, _mm_srli_epi16, _mm_srli_epi32,
_mm_srli_epi64): Replace builtins with vector extensions.
* xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps, _mm_div_ps,
_mm_cmpeq_ps, _mm_cmplt_ps, _mm_cmple_ps, _mm_cmpgt_ps, _mm_cmpge_ps,
_mm_cmpneq_ps): Likewise.


--
Marc Glisse


Re: regression test issue

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 09:48:12PM +, Iyer, Balaji V wrote:
  You should put a comment (e.g. a PR reference) in the comment field to
  explain why you're skipping this test.  Most likely, the last arg to 
  dg-skip-if ({
   }) is optional; if so, please omit it.
 
 There isn't a PR reference for this. I can add a comment in the testcase. 
 Would that be enough?
 
 I will also remove the last {} from the tests.

Please keep it as is, I have an untested fix for the issue already and will
be testing it soon, so I'd prefer not to have to back out too many changes.

Jakub


Re: RFA: MN10300: Include saved registers in stack usage

2014-02-05 Thread Jeff Law

On 02/05/14 11:26, Joseph S. Myers wrote:

On Wed, 5 Feb 2014, Jeff Law wrote:


However, it's pretty easy to avoid the headaches and just provide a popcount
routine.   I don't think this code is at all performance critical (once per
function being compiled), so a simple popcount should be sufficient.  No need
for lookup tables IMHO.


hwint.[ch] provide popcount_hwi, if HOST_WIDE_INT is a suitable type.

Even better.



(Personally I think there are lots of bits of standard integer
manipulation functionality, such as popcount, that are common in processor
instructions and compiler intrinsics, but could do with having bindings in
ISO C to provide a standard way to access that functionality.

Agreed.


Jeff


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote:
 Actually, now I think the problem originally described there is still
 valid: on s390 the CFA is *not* equal to the value at function entry,
 but biased by 96/160 bytes.  So setting the SP to the CFA is wrong ...

Such biasing happens on most of the targets, e.g. x86_64 has upon function
entry CFA set to %rsp + 8, i?86 to %esp + 4.

Jakub


Re: Is testing libgomp outside of the build tree supported?

2014-02-05 Thread Matthias Klose
Am 04.02.2014 03:14, schrieb Mike Stump:
 On Feb 3, 2014, at 3:52 PM, Joseph S. Myers jos...@codesourcery.com wrote:
 On Mon, 3 Feb 2014, Andrew Pinski wrote:
 On Mon, Feb 3, 2014 at 11:14 AM, Diego Novillo dnovi...@google.com wrote:
 On Mon, Feb 3, 2014 at 2:11 PM, Ian Lance Taylor i...@google.com wrote:

 If the presence of the build
 tree makes writing some tests significantly simpler, I think that is
 OK.

 I would like to discourage that.  Testing an already installed GCC for
 which no build tree exists is a very useful feature.

 I agree.  Here at Cavium, we use the packaged up toolchain that comes
 from a RPM and test it so we are testing exactly what we ship out to
 our customers.

 Similarly at Mentor.
 
 And the maintainer of the test suite thinks that supporting the people that 
 ship gcc to large numbers of people who help ensure the quality of gcc is 
 useful.  :-)  It is nice to hear from people that this type of testing is 
 useful; thanks.

could somebody please shed some light on how this is done?  It's nice that
everybody has this kind of testing, but the only bit in the gcc sources itself
seems to be a bit bit-rot and incomplete (contrib/test_installed).

thanks, Matthias



Re: var-tracking and s390's LM(G)

2014-02-05 Thread Ulrich Weigand
Jakub Jelinek wrote:
 On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote:
  Actually, now I think the problem originally described there is still
  valid: on s390 the CFA is *not* equal to the value at function entry,
  but biased by 96/160 bytes.  So setting the SP to the CFA is wrong ...
 
 Such biasing happens on most of the targets, e.g. x86_64 has upon function
 entry CFA set to %rsp + 8, i?86 to %esp + 4.

Ah, but that's a different bias.  In those cases it is still correct
to *unwind* SP to the CFA, since that's the value SP needs to have
back in the caller immediately after the call site.

On s390, the call/return instructions do not modify the SP so the
SP at function entry is equal to the SP in the caller after return,
but neither of these is equal to the CFA.  Instead, SP points to
96/160 bytes below the CFA ...   Therefore you cannot simply
unwind SP to the CFA.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] Make pr59597 test PIC-friendly

2014-02-05 Thread Jeff Law

On 02/05/14 05:32, Ian Bolton wrote:


2014-02-05  Ian Bolton  ian.bol...@arm.com

testsuite/
 * gcc.dg/tree-ssa/pr59597.c: Make called function static so that
 expected outcome works for PIC variants too.

This is fine for stage4.  Please install.

Thanks,
Jeff



Re: Is testing libgomp outside of the build tree supported?

2014-02-05 Thread Jeff Law

On 02/05/14 15:10, Matthias Klose wrote:

Am 04.02.2014 03:14, schrieb Mike Stump:

On Feb 3, 2014, at 3:52 PM, Joseph S. Myers jos...@codesourcery.com wrote:

On Mon, 3 Feb 2014, Andrew Pinski wrote:

On Mon, Feb 3, 2014 at 11:14 AM, Diego Novillo dnovi...@google.com wrote:

On Mon, Feb 3, 2014 at 2:11 PM, Ian Lance Taylor i...@google.com wrote:


If the presence of the build
tree makes writing some tests significantly simpler, I think that is
OK.


I would like to discourage that.  Testing an already installed GCC for
which no build tree exists is a very useful feature.


I agree.  Here at Cavium, we use the packaged up toolchain that comes
from a RPM and test it so we are testing exactly what we ship out to
our customers.


Similarly at Mentor.


And the maintainer of the test suite thinks that supporting the people that 
ship gcc to large numbers of people who help ensure the quality of gcc is 
useful.  :-)  It is nice to hear from people that this type of testing is 
useful; thanks.


could somebody please shed some light on how this is done?  It's nice that
everybody has this kind of testing, but the only bit in the gcc sources itself
seems to be a bit bit-rot and incomplete (contrib/test_installed).
I suspect most folks have a site.exp they drop somewhere and explicitly 
call runtest --tool gcc 


I know we do it internally in our QE team, but I never asked them about 
the current mechanics of how it's done.


Jeff


[PATCH] C++ thunk section names

2014-02-05 Thread Sriraman Tallam
Hi,

  I would like this patch reviewed and considered for commit when
Stage 1 is active again.

Patch Description:

A C++ thunk's section name is set to be the same as the original function's
section name for which the thunk was created in order to place the two
together.  This is done in cp/method.c in function use_thunk.
However, with function reordering turned on, the original function's section
name can change to something like .text.hot.orginal or
.text.unlikely.original in function default_function_section in varasm.c
based on the node count of that function.  The thunk function's section name
is not updated to be the same as the original here and also is not always
correct to do it as the original function can be hotter than the thunk.

I have created a patch to not name the thunk function's section to be the same
as the original function when function reordering is enabled.

Thanks
Sri
A C++ thunk's section name is set to be the same as the original function's
section name for which the thunk was created in order to place the two
together.  This is done in cp/method.c in function use_thunk.
However, with function reordering turned on, the original function's section
name can change to something like .text.hot.orginal or 
.text.unlikely.original in function default_function_section in varasm.c
based on the node count of that function.  The thunk function's section name 
is not updated to be the same as the original here and also is not always
correct to do it as the original function can be hotter than the thunk.

I have created a patch to not name the thunk function's section to be the same
as the original function when function reordering is enabled. 


Index: testsuite/g++.dg/thunk_section_name.C
===
--- testsuite/g++.dg/thunk_section_name.C   (revision 0)
+++ testsuite/g++.dg/thunk_section_name.C   (revision 0)
@@ -0,0 +1,30 @@
+/* { dg-require-named-sections  } */
+/* { dg-do compile } */
+/* { dg-options -O2 -fno-reorder-blocks-and-partition -ffunction-sections } 
*/
+
+class base_class_1
+{
+public:
+  virtual void vfn () {}
+};
+
+class base_class_2
+{
+public:
+  virtual void vfn () {}
+};
+
+class need_thunk_class : public base_class_1, public base_class_2
+{
+public:
+  virtual void vfn () {} 
+};
+
+int main (int argc, char *argv[])
+{
+  base_class_1 *c = new need_thunk_class ();
+  c-vfn();
+  return 0;
+}
+
+/* { dg-final { scan-assembler \.text\._ZThn8_N16need_thunk_class3vfnEv } } 
*/
Index: cp/method.c
===
--- cp/method.c (revision 207517)
+++ cp/method.c (working copy)
@@ -362,8 +362,10 @@ use_thunk (tree thunk_fndecl, bool emit_p)
{
  resolve_unique_section (thunk_fndecl, 0, flag_function_sections);
 
- /* Output the thunk into the same section as function.  */
- DECL_SECTION_NAME (thunk_fndecl) = DECL_SECTION_NAME (function);
+ /* Output the thunk into the same section as function if function 
reordering
+is not switched on.  */
+ if (!flag_reorder_functions)
+   DECL_SECTION_NAME (thunk_fndecl) = DECL_SECTION_NAME (function);
}
 }
 


Re: [PATCH] Documentation for dump and optinfo output

2014-02-05 Thread Sharad Singhai
I am really sorry about the delay. I couldn't exactly reproduce the
warning which you described, but I found a place where two nodes were
out of order in optinfo.texi. Could you please apply the following
patch and see if the problem goes away? If it works for you, I will
commit the doc fixes.

Also I would appreciate the exact command which produces these warnings.

Thanks,
Sharad

2014-02-05  Sharad Singhai  sing...@google.com

* doc/optinfo.texi: Fix order of nodes.

Index: doc/optinfo.texi
===
--- doc/optinfo.texi (revision 207525)
+++ doc/optinfo.texi (working copy)
@@ -12,8 +12,8 @@ information about various compiler transformations
 @menu
 * Dump setup:: Setup of optimization dumps.
 * Optimization groups::Groups made up of optimization passes.
+* Dump files and streams:: Dump output file names and streams.
 * Dump output verbosity::  How much information to dump.
-* Dump files and streams:: Dump output file names and streams.
 * Dump types:: Various types of dump functions.
 * Dump examples::  Sample usage.
 @end menu

Thanks,
Sharad


On Fri, Jan 24, 2014 at 2:20 AM, Thomas Schwinge
tho...@codesourcery.com wrote:
 Hi!

 On Thu, 19 Dec 2013 01:44:08 -0800, Sharad Singhai sing...@google.com wrote:
 I ran 'make doc' and browsed the info output for basic sanity. Okay for 
 trunk?

   * Makefile.in: Add optinfo.texi.
   * doc/optinfo.texi: New documentation for optimization info.
   * doc/passes.texi: Add new node.

 Thanks for contributing to the documentation!

 During build, I'm seeing the following warnings; would be great if you
 could address these:

 ../../source/gcc/doc/optinfo.texi:45: warning: node next `Optimization 
 groups' in menu `Dump output verbosity' and in sectioning `Dump files and 
 streams' differ
 ../../source/gcc/doc/optinfo.texi:77: warning: node next `Dump files and 
 streams' in menu `Dump types' and in sectioning `Dump output verbosity' differ
 ../../source/gcc/doc/optinfo.texi:77: warning: node prev `Dump files and 
 streams' in menu `Dump output verbosity' and in sectioning `Optimization 
 groups' differ
 ../../source/gcc/doc/optinfo.texi:104: warning: node next `Dump output 
 verbosity' in menu `Dump files and streams' and in sectioning `Dump types' 
 differ
 ../../source/gcc/doc/optinfo.texi:104: warning: node prev `Dump output 
 verbosity' in menu `Optimization groups' and in sectioning `Dump files and 
 streams' differ
 ../../source/gcc/doc/optinfo.texi:137: warning: node prev `Dump types' in 
 menu `Dump files and streams' and in sectioning `Dump output verbosity' differ


 Grüße,
  Thomas


Re: Is testing libgomp outside of the build tree supported?

2014-02-05 Thread Joseph S. Myers
On Wed, 5 Feb 2014, Jeff Law wrote:

 I suspect most folks have a site.exp they drop somewhere and explicitly call
 runtest --tool gcc 

* Create site.exp (based on what GCC's makefiles do for build-tree 
testing).  Note that in some cases you may need different contents for 
different testsuite, especially runtime libraries.

* Set up PATH so the installed compilers are found, LD_LIBRARY_PATH so the 
installed shared libraries are found by the dynamic linker (or otherwise 
set up your board file appropriately for running newly compiled programs).

* runtest --srcdir /some/where --tool whatever

If you compare results of this for build-tree and installed testing, 
fixing differences is very useful.  (There are some known bugs for such 
differences; at least, bugs 23867, 25320, 58867.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: var-tracking and s390's LM(G)

2014-02-05 Thread Richard Henderson
On 02/05/2014 02:30 PM, Ulrich Weigand wrote:
 Jakub Jelinek wrote:
 On Wed, Feb 05, 2014 at 10:26:16PM +0100, Ulrich Weigand wrote:
 Actually, now I think the problem originally described there is still
 valid: on s390 the CFA is *not* equal to the value at function entry,
 but biased by 96/160 bytes.  So setting the SP to the CFA is wrong ...

 Such biasing happens on most of the targets, e.g. x86_64 has upon function
 entry CFA set to %rsp + 8, i?86 to %esp + 4.
 
 Ah, but that's a different bias.  In those cases it is still correct
 to *unwind* SP to the CFA, since that's the value SP needs to have
 back in the caller immediately after the call site.
 
 On s390, the call/return instructions do not modify the SP so the
 SP at function entry is equal to the SP in the caller after return,
 but neither of these is equal to the CFA.  Instead, SP points to
 96/160 bytes below the CFA ...   Therefore you cannot simply
 unwind SP to the CFA.

I was about to reply the same to Jakub, Ulrich beat me to it.

Basically, the CFA has been defined incorrectly for s390, but it hasn't
mattered since they record the save of the SP.  But the result is that you
can't stop recording the save of SP without also adjusting how the CFA is 
defined.

Which _can_ be done, in a way that's fully compatible with all of the existing
unwinding.  But certainly shouldn't be attempted at this stage of development.


r~



New Vietnamese PO file for 'cpplib' (version 4.9-b20140202)

2014-02-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-4.9-b20140202.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Contents of PO file 'cpplib-4.9-b20140202.vi.po'

2014-02-05 Thread Translation Project Robot


cpplib-4.9-b20140202.vi.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org


New Vietnamese PO file for 'gcc' (version 4.9-b20140202)

2014-02-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/gcc/vi.po

(This file, 'gcc-4.9-b20140202.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Re: [PATCH] One possible fix for the compute_bb_predicate oscillation (PR ipa/60013)

2014-02-05 Thread Jan Hubicka

Hi,
this is variant of patch I am testing (thanks Jakub for the hard analysis).
The dataflow was supposed to be monotonous by keeping the rule that oldvalue
imply new value at each step.  This is broken by the approximation done by
and/or operations.

Instead of dropping to true, it seems easier to simply or the result before
updating.  This operation should be no-op until we hit the clause limit.

Bootstrapping/regtesting x86_64-linux, will commit if it passes.

Honza

* ipa-inline-analysis.c (compute_bb_predicates): Ensure monotonicity
of the dataflow.

gcc.dg/pr60013.c: New testcase.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 207514)
+++ ipa-inline-analysis.c   (working copy)
@@ -310,7 +310,7 @@ add_clause (conditions conditions, struc
   if (false_predicate_p (p))
 return;
 
-  /* No one should be sily enough to add false into nontrivial clauses.  */
+  /* No one should be silly enough to add false into nontrivial clauses.  */
   gcc_checking_assert (!(clause  (1  predicate_false_condition)));
 
   /* Look where to insert the clause.  At the same time prune out
@@ -1035,7 +1035,7 @@ inline_node_removal_hook (struct cgraph_
   memset (info, 0, sizeof (inline_summary_t));
 }
 
-/* Remap predicate P of former function to be predicate of duplicated functoin.
+/* Remap predicate P of former function to be predicate of duplicated function.
POSSIBLE_TRUTHS is clause of possible truths in the duplicated node,
INFO is inline summary of the duplicated node.  */
 
@@ -1887,8 +1887,15 @@ compute_bb_predicates (struct cgraph_nod
}
  else if (!predicates_equal_p (p, (struct predicate *) bb-aux))
{
- done = false;
- *((struct predicate *) bb-aux) = p;
+ /* This OR operation is needed to ensure monotonous data flow
+in the case we hit the limit on number of clauses and the
+and/or operations above give approximate answers.  */
+ p = or_predicates (summary-conds, p, (struct predicate 
*)bb-aux);
+ if (!predicates_equal_p (p, (struct predicate *) bb-aux))
+   {
+ done = false;
+ *((struct predicate *) bb-aux) = p;
+   }
}
}
}
Index: testsuite/gcc.dg/pr60013.c
===
--- testsuite/gcc.dg/pr60013.c  (revision 0)
+++ testsuite/gcc.dg/pr60013.c  (revision 0)
@@ -0,0 +1,47 @@
+/* PR ipa/60013 */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+typedef long int jmp_buf[64];
+extern int _setjmp (jmp_buf) __attribute__ ((__nothrow__));
+struct S { int a, b, c; };
+extern struct S *baz (struct S *);
+static jmp_buf j;
+
+static inline int
+bar (int b, int d)
+{
+  return (b  d)  0;
+}
+
+struct S *
+foo (int a, struct S *b, struct S *c, struct S *d)
+{
+  if (b-a == 0)
+{
+  switch (a)
+   {
+   case 8:
+ return baz (b);
+   case 7:
+ bar (b-c, c-b);
+ return 0;
+   case 6:
+   case 5:
+   case 4:
+ return baz (c);
+   case 3:
+   case 2:
+ return baz (d);
+   }
+  return 0;
+}
+  if (b-a == 1)
+{
+  if (baz (c))
+   return c;
+  else if (_setjmp (j))
+   baz (b);
+}
+  return 0;
+}


[PATCH] Fix ix86_function_regparm with optimize attribute (PR target/60062, take 3)

2014-02-05 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 08:42:27PM +0100, Jakub Jelinek wrote:
 So, where do we want to do that instead?  E.g. should it be e.g. in
 tree_versionable_function_p directly and let the inliner (if it doesn't do
 already) also treat optimize(0) functions that aren't always_inline as
 noinline?

So, another attempt to put the  opt_for_fn (fndecl, optimize) into
tree_versionable_function_p also failed, because e.g. for TM (but also for
SIMD clones) we need to clone -O0 functions.  So, can I fix PR600{6,7}2
without touching tree-inline.c and fix PR60026 again separately somehow else
as follow-up?  Bootstrapped/regtested on x86_64-linux and i686-linux.

2014-02-06  Jakub Jelinek  ja...@redhat.com

PR target/60062
* tree.h (opts_for_fn): New inline function.
(opt_for_fn): Define.
* config/i386/i386.c (ix86_function_regparm): Use
opt_for_fn (decl, optimize) instead of optimize.

* gcc.c-torture/execute/pr60062.c: New test.
* gcc.c-torture/execute/pr60072.c: New test.

--- gcc/tree.h.jj   2014-01-20 19:16:29.0 +0100
+++ gcc/tree.h  2014-02-05 15:54:04.681904492 +0100
@@ -4470,6 +4470,20 @@ may_be_aliased (const_tree var)
  || TREE_ADDRESSABLE (var)));
 }
 
+/* Return pointer to optimization flags of FNDECL.  */
+static inline struct cl_optimization *
+opts_for_fn (const_tree fndecl)
+{
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (fndecl);
+  if (fn_opts == NULL_TREE)
+fn_opts = optimization_default_node;
+  return TREE_OPTIMIZATION (fn_opts);
+}
+
+/* opt flag for function FNDECL, e.g. opts_for_fn (fndecl, optimize) is
+   the optimization level of function fndecl.  */
+#define opt_for_fn(fndecl, opt) (opts_for_fn (fndecl)-x_##opt)
+
 /* For anonymous aggregate types, we need some sort of name to
hold on to.  In practice, this should not appear, but it should
not be harmful if it does.  */
--- gcc/config/i386/i386.c.jj   2014-02-05 10:37:36.166167116 +0100
+++ gcc/config/i386/i386.c  2014-02-05 15:56:36.779146394 +0100
@@ -5608,7 +5608,12 @@ ix86_function_regparm (const_tree type,
   /* Use register calling convention for local functions when possible.  */
   if (decl
TREE_CODE (decl) == FUNCTION_DECL
-   optimize
+  /* Caller and callee must agree on the calling convention, so
+checking here just optimize means that with
+__attribute__((optimize (...))) caller could use regparm convention
+and callee not, or vice versa.  Instead look at whether the callee
+is optimized or not.  */
+   opt_for_fn (decl, optimize)
!(profile_flag  !flag_fentry))
 {
   /* FIXME: remove this CONST_CAST when cgraph.[ch] is constified.  */
--- gcc/testsuite/gcc.c-torture/execute/pr60062.c.jj2014-02-05 
15:55:48.639388697 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr60062.c   2014-02-05 
15:55:48.639388697 +0100
@@ -0,0 +1,25 @@
+/* PR target/60062 */
+
+int a;
+
+static void
+foo (const char *p1, int p2)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+static void
+bar (const char *p1)
+{
+  if (__builtin_strcmp (p1, hello) != 0)
+__builtin_abort ();
+}
+
+__attribute__((optimize (0))) int
+main ()
+{
+  foo (hello, a);
+  bar (hello);
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr60072.c.jj2014-02-05 
15:55:48.640388641 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr60072.c   2014-02-05 
15:55:48.640388641 +0100
@@ -0,0 +1,16 @@
+/* PR target/60072 */
+
+int c = 1;
+
+__attribute__ ((optimize (1)))
+static int *foo (int *p)
+{
+  return p;
+}
+
+int
+main ()
+{
+  *foo (c) = 2;
+  return c - 2;
+}


Jakub


[PATCH] Fix two missed Makefile.in dependencies

2014-02-05 Thread Jakub Jelinek
Hi!

I have noticed that with the .deps introduction for gcc/ we've lost
these two dependencies, which autodependency creation isn't aware of,
as the Makefile runs cat on those files and passes the content through
-D option.

Ok for trunk?

2014-02-06  Jakub Jelinek  ja...@redhat.com

* Makefile.in (prefix.o, cppbuiltin.o): Depend on $(BASEVER).

--- gcc/Makefile.in.jj  2014-01-28 14:03:49.0 +0100
+++ gcc/Makefile.in 2014-02-05 13:09:26.871019299 +0100
@@ -1925,6 +1925,7 @@ default-c.o: config/default-c.c
 # Files used by all variants of C and some other languages.
 
 CFLAGS-prefix.o += -DPREFIX=\$(prefix)\ -DBASEVER=$(BASEVER_s)
+prefix.o: $(BASEVER)
 
 # Language-independent files.
 
@@ -2540,6 +2541,7 @@ PREPROCESSOR_DEFINES = \
   @TARGET_SYSTEM_ROOT_DEFINE@
 
 CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s)
+cppbuiltin.o: $(BASEVER)
 
 CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
 

Jakub


[PATCH] Fix linemap_location_before_p with adhoc locs (PR preprocessor/56824)

2014-02-05 Thread Jakub Jelinek
Hi!

linemap_location_before_p (which is implemented as a call to
linemap_compare_locations) broke with the addition of IS_ADHOC_LOC,
because it doesn't look through those and thus can happily compare some
0x8... number to normal locus and expect it to work.  Most of the other
line-map.c routines handle IS_ADHOC_LOC already, just not this function.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The rest is just formatting cleanup, seeing opening [ at the end of
lines was just too much for me.

2014-02-06  Jakub Jelinek  ja...@redhat.com

PR preprocessor/56824
* line-map.c (get_combined_adhoc_loc, linemap_get_expansion_line,
linemap_get_expansion_filename, linemap_location_in_system_header_p,
linemap_location_from_macro_expansion_p,
linemap_macro_loc_to_spelling_point, linemap_macro_loc_to_def_point,
linemap_macro_loc_to_exp_point, linemap_expand_location): Fix
formatting.
(linemap_compare_locations): Look through adhoc locations for both
l0 and l1.

* gcc.dg/pr56824.c: New test.

--- libcpp/line-map.c.jj2014-01-23 10:53:07.0 +0100
+++ libcpp/line-map.c   2014-02-05 15:07:41.558748559 +0100
@@ -106,8 +106,8 @@ get_combined_adhoc_loc (struct line_maps
   linemap_assert (data);
 
   if (IS_ADHOC_LOC (locus))
-locus =
-   set-location_adhoc_data_map.data[locus  MAX_SOURCE_LOCATION].locus;
+locus
+  = set-location_adhoc_data_map.data[locus  MAX_SOURCE_LOCATION].locus;
   if (locus == 0  data == NULL)
 return 0;
   lb.locus = locus;
@@ -141,8 +141,8 @@ get_combined_adhoc_loc (struct line_maps
}
   *slot = set-location_adhoc_data_map.data
  + set-location_adhoc_data_map.curr_loc;
-  set-location_adhoc_data_map.data[
- set-location_adhoc_data_map.curr_loc++] = lb;
+  
set-location_adhoc_data_map.data[set-location_adhoc_data_map.curr_loc++]
+   = lb;
 }
   return ((*slot) - set-location_adhoc_data_map.data) | 0x8000;
 }
@@ -833,8 +833,8 @@ linemap_get_expansion_line (struct line_
   const struct line_map *map = NULL;
 
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   if (location  RESERVED_LOCATION_COUNT)
 return 0;
@@ -861,8 +861,8 @@ linemap_get_expansion_filename (struct l
   const struct line_map *map = NULL;
 
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   if (location  RESERVED_LOCATION_COUNT)
 return NULL;
@@ -899,8 +899,8 @@ linemap_location_in_system_header_p (str
   const struct line_map *map = NULL;
 
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   if (location  RESERVED_LOCATION_COUNT)
 return false;
@@ -942,8 +942,8 @@ linemap_location_from_macro_expansion_p
 source_location location)
 {
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   linemap_assert (location = MAX_SOURCE_LOCATION
   (set-highest_location
@@ -1024,6 +1024,11 @@ linemap_compare_locations (struct line_m
   bool pre_virtual_p, post_virtual_p;
   source_location l0 = pre, l1 = post;
 
+  if (IS_ADHOC_LOC (l0))
+l0 = set-location_adhoc_data_map.data[l0  MAX_SOURCE_LOCATION].locus;
+  if (IS_ADHOC_LOC (l1))
+l1 = set-location_adhoc_data_map.data[l1  MAX_SOURCE_LOCATION].locus;
+
   if (l0 == l1)
 return 0;
 
@@ -1086,8 +1091,8 @@ linemap_macro_loc_to_spelling_point (str
   struct line_map *map;
 
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   linemap_assert (set  location = RESERVED_LOCATION_COUNT);
 
@@ -1124,8 +1129,8 @@ linemap_macro_loc_to_def_point (struct l
   struct line_map *map;
 
   if (IS_ADHOC_LOC (location))
-location = set-location_adhoc_data_map.data[
-   location  MAX_SOURCE_LOCATION].locus;
+location = set-location_adhoc_data_map.data[location
+ MAX_SOURCE_LOCATION].locus;
 
   linemap_assert (set  location = 

[PATCH] Fix various issues in vect_analyze_data_refs (PR middle-end/59150)

2014-02-05 Thread Jakub Jelinek
Hi!

This patch fixes a double free on simd-lane access if
get_vectype_for_scalar_type fails (the new dr is already in datarefs[i]
and if we free_data_ref it, it will be free_data_ref'ed again at the end of
the failed vectorization), fixes handling of clobbers (plugs a memory leak
due to missed free_data_ref and more importantly, if we goto again, dr
would be kept at the old dr and thus we'd basically remove all the remaining
drs in a loop) and plugs a memory leak with simd_lane_access.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

PS, just looking at the patch now again, in the light of PR59594
the reordering of datarefs before goto again looks also wrong, we IMHO have
to perform a vector ordered remove in that case, ok to handle it as a
follow-up?

2014-02-06  Jakub Jelinek  ja...@redhat.com

PR middle-end/59150
* tree-vect-data-refs.c (vect_analyze_data_refs): For clobbers, call
free_data_ref on the dr first, and before goto again also set dr
to the next dr.  For simd_lane_access, free old datarefs[i] before
overwriting it.  For get_vectype_for_scalar_type failure, don't
free_data_ref if simd_lane_access.

--- gcc/tree-vect-data-refs.c.jj2014-02-05 15:28:10.0 +0100
+++ gcc/tree-vect-data-refs.c   2014-02-05 18:30:53.323659288 +0100
@@ -3297,12 +3297,13 @@ again:
  clobber stmts during vectorization.  */
   if (gimple_clobber_p (stmt))
{
+ free_data_ref (dr);
  if (i == datarefs.length () - 1)
{
  datarefs.pop ();
  break;
}
- datarefs[i] = datarefs.pop ();
+ datarefs[i] = dr = datarefs.pop ();
  goto again;
}
 
@@ -3643,13 +3644,14 @@ again:
   if (simd_lane_access)
{
  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
+ free_data_ref (datarefs[i]);
  datarefs[i] = dr;
}
 
   /* Set vectype for STMT.  */
   scalar_type = TREE_TYPE (DR_REF (dr));
-  STMT_VINFO_VECTYPE (stmt_info) =
-get_vectype_for_scalar_type (scalar_type);
+  STMT_VINFO_VECTYPE (stmt_info)
+   = get_vectype_for_scalar_type (scalar_type);
   if (!STMT_VINFO_VECTYPE (stmt_info))
 {
   if (dump_enabled_p ())
@@ -3669,7 +3671,8 @@ again:
  if (gather || simd_lane_access)
{
  STMT_VINFO_DATA_REF (stmt_info) = NULL;
- free_data_ref (dr);
+ if (gather)
+   free_data_ref (dr);
}
  return false;
 }

Jakub