date:20121022

Re: [ping] couple of fixes

2012-10-22 Thread Paolo Bonzini

Il 19/10/2012 19:01, Eric Botcazou ha scritto:
 PR bootstrap/54820 (stage #1 bootstrap failure)
   http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01093.html

This one is okay, thanks.

Paolo

Re: [PATCH GCC]Fix test case failure reported in PR54989

2012-10-22 Thread Jakub Jelinek

On Mon, Oct 22, 2012 at 11:00:08AM +0800, Bin Cheng wrote:
 The test case gcc/testsuite/gcc.dg/hoist-register-pressure.c is failed on
 x86_64-apple-darwin because it uses more registers than x86_64-linux. This
 can be fixed by simplifying the case using fewer registers. 
 
 Tested on x86_64-apple-darwin/x86_64-linux, is it OK?

I'd say it is better to do the scan-rtl-dump only on nonpic targets,
that way it won't be done on darwin or for testing with
--target_board=unix/-fpic where it would fail too.  You can add
the test with smaller register pressure as a new test
(hoist-register-pressure2.c).

Jakub

Re: Ping: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver

2012-10-22 Thread Richard Biener

On Sun, 21 Oct 2012, Hans-Peter Nilsson wrote:

 CC:ing middle-end maintainers this time.  I was a bit surprised
 when Eric Botcazou wrote in his review, quoted below, that he's
 not one of you.  Maybe approve that too?

If Eric is fine with the patch it is ok.  Yes, he is not
middle-end maintainer but RTL optimizer reviewer.

Thanks,
Richard.

 On Mon, 15 Oct 2012, Hans-Peter Nilsson wrote:
 
  On Fri, 12 Oct 2012, Eric Botcazou wrote:
(insn 168 49 51 3 (set (reg/f:DI 253 $253)
(plus:DI (reg/f:DI 253 $253)
(const_int 24 [0x18])))
/tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21
-1 (nil))
(insn 51 168 52 3 (clobber (reg/f:DI 253 $253))
  ...
 
Note that insn 168 deleted, which seems a logical optimization.  The
bug is to emit the clobber, not that the restoring insn is removed.
  
   Had that worked in the past for MMIX?
 
  Yes, for svn revision 106027 (20051030) 4.1.0-era (!)
  http://gcc.gnu.org/ml/gcc-testresults/2005-10/msg01340.html
  where the test must have passed, as
  gcc.c-torture/execute/built-in-setjmp.c is at least four years
  older than that.
 
If so, what changed recently?
 
  By these days I didn't mean recent, just not eons ago. :)
  I see in a gcc-test-results posting from Mike Stein (whom I'd
  like to thank for test-results posting over the years), matching
  FAILs for svn revision 126095 (20070628) 4.3.0-era
  http://gcc.gnu.org/ml/gcc-testresults/2007-06/msg01287.html.
 
  Sorry, I have nothing in between those reports, my bad.  Though
  I see no point narrowing down the failing revision further here
  IMO; as mentioned the bug is not that the restoring insn is
  removed.
 
   Agreed.  However, I'd suggest rescuing the comment for the ELIMINABLE_REGS
   block from expand_nl_goto_receiver as it still sounds valid to me.
 
  Oops, my bad; I see I removed all the good comments.  Fixed.
 
* stmt.c (expand_nl_goto_receiver): Remove almost-copy of
expand_builtin_setjmp_receiver.
(expand_label): Adjust, call expand_builtin_setjmp_receiver
with NULL for the label parameter.
* builtins.c (expand_builtin_setjmp_receiver): Don't clobber
the frame-pointer.  Adjust comments.
[HAVE_builtin_setjmp_receiver]: Emit builtin_setjmp_receiver
only if LABEL is non-NULL.
  
   I cannot formally approve, but this looks good to me modulo:
 
+   If RECEIVER_LABEL is NULL, instead the port-specific parts of a
+   nonlocal goto handler are emitted.  */
  
   The port-specific parts wording is a bit confusing I think.  I'd just 
   write:
  
 If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler.
 
  Sure.  Thanks for the review.  Updated patch below.  As nothing
  was changed from the previous post but comments as per the
  review (mostly moving / reviving, fixing one grammo), already
  covered by the changelog quoted above, the previous testing is
  still valid.
 
  Ok for trunk, approvers?
 
  Index: gcc/builtins.c
  ===
  --- gcc/builtins.c  (revision 192353)
  +++ gcc/builtins.c  (working copy)
  @@ -885,14 +885,15 @@ expand_builtin_setjmp_setup (rtx buf_add
   }
 
   /* Construct the trailing part of a __builtin_setjmp call.  This is
  -   also called directly by the SJLJ exception handling code.  */
  +   also called directly by the SJLJ exception handling code.
  +   If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler.  */
 
   void
   expand_builtin_setjmp_receiver (rtx receiver_label ATTRIBUTE_UNUSED)
   {
 rtx chain;
 
  -  /* Clobber the FP when we get here, so we have to make sure it's
  +  /* Mark the FP as used when we get here, so we have to make sure it's
marked as used by this function.  */
 emit_use (hard_frame_pointer_rtx);
 
  @@ -907,17 +908,28 @@ expand_builtin_setjmp_receiver (rtx rece
   #ifdef HAVE_nonlocal_goto
 if (! HAVE_nonlocal_goto)
   #endif
  -{
  -  emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
  -  /* This might change the hard frame pointer in ways that aren't
  -apparent to early optimization passes, so force a clobber.  */
  -  emit_clobber (hard_frame_pointer_rtx);
  -}
  +/* First adjust our frame pointer to its actual value.  It was
  +   previously set to the start of the virtual area corresponding to
  +   the stacked variables when we branched here and now needs to be
  +   adjusted to the actual hardware fp value.
  +
  +   Assignments to virtual registers are converted by
  +   instantiate_virtual_regs into the corresponding assignment
  +   to the underlying register (fp in this case) that makes
  +   the original assignment true.
  +   So the following insn will actually be decrementing fp by
  +   STARTING_FRAME_OFFSET.  */
  +emit_move_insn (virtual_stack_vars_rtx,

Re: Minimize downward code motion during reassociation

2012-10-22 Thread Richard Biener

On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
You seem to call the above multiple times and thus do work bigger than
O(number of basic blocks).

 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

Likewise you call this for each call to rewrite_expr_tree, so it seems to me
this is quadratic in the number of ops in the op vector.

Why make this all so complicated?  It seems to me that we should
fixup stmt order only after the whole ops vector has been materialized.

 +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
 +  gimple def_stmt;
 +  basic_block def_bb;
 +  /* Ignore constants and operands with default definitons.  */
 +  if (TREE_CODE (oe-op) != SSA_NAME
 +  || SSA_NAME_IS_DEFAULT_DEF (oe-op))
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  def_bb = gimple_bb (def_stmt);
 +  if (def_bb != insert_bb
 +   !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
 +{
 +  insert_bb = def_bb;
 +  insert_stmt = def_stmt;
 +}
 +  else if (def_bb == insert_bb
 +gimple_uid (insert_stmt)  gimple_uid (def_stmt))
 +insert_stmt = def_stmt;
 +}
 +  if (insert_stmt == stmt)
 +return;
 +  gsistmt = gsi_for_stmt (stmt);
 +  /* If GSI_STMT is a phi node, then do not insert just after that statement.
 + Instead, find the first non-label gimple statement in BB and insert 
 before
 + that.  */
 +  if (gimple_code (insert_stmt) == GIMPLE_PHI)
 +{
 +  gsi_insert = gsi_after_labels (insert_bb);
 +  gsi_move_before (gsistmt, gsi_insert);
 +}
 +  /* Statements marked for throw can not be in the middle of a basic block. 
 So
 + we can not insert a statement (not marked for throw) immediately
 after.  */
 +  else if (lookup_stmt_eh_lp (insert_stmt)  0

that's already performed by stmt_can_throw_internal

 +stmt_can_throw_internal (insert_stmt))

But all this should be a non-issue as re-assoc should never assign an ops
vector entry for such stmts (but it could have leafs defined by such stmts).
If you only ever move definitions

Re: [PATCH] Fix dumps for IPA passes

2012-10-22 Thread Richard Biener

On Sat, Oct 20, 2012 at 3:24 AM, Sharad Singhai sing...@google.com wrote:
 As suggested in http://gcc.gnu.org/ml/gcc/2012-10/msg00285.html, I
 have updated the attached patch to rename 'dump_enabled_phase' to
 'dump_enabled_phase_p'. The 'dump_enabled_p ()' doesn't take any
 argument and can be used as a predicate for the dump calls.

 Once this patch gets in, the plan is to update the existing calls (in
 vectorizer passes) of the form
   if (dump_kind_p (flags))
   dump_printf(flags, ...)

 to

   if (dump_enabled_p ())
   dump_printf(flags, ...)

 Bootstrapped and tested on x86_64 and didn't observe any new test
 failures. Okay for trunk?

Ok.

Thanks,
Richard.

 Thanks,
 Sharad

 2012-10-19  Sharad Singhai  sing...@google.com

 * dumpfile.c (dump_phase_enabled_p): Renamed dump_enabled_p. Update
 all callers.
 (dump_enabled_p): A new function to check if any of the dump files
 is available.
 (dump_kind_p): Remove check for current_function_decl. Add check for
 dumpfile and alt_dump_file.
 * dumpfile.h: Add declaration of dump_enabled_p.

 Index: dumpfile.c
 ===
 --- dumpfile.c (revision 192623)
 +++ dumpfile.c (working copy)
 @@ -35,7 +35,7 @@ static int alt_flags;/* current op
  static FILE *alt_dump_file = NULL;

  static void dump_loc (int, FILE *, source_location);
 -static int dump_enabled_p (int);
 +static int dump_phase_enabled_p (int);
  static FILE *dump_open_alternate_stream (struct dump_file_info *);

  /* Table of tree dump switches. This must be consistent with the
 @@ -380,7 +380,7 @@ dump_start (int phase, int *flag_ptr)
char *name;
struct dump_file_info *dfi;
FILE *stream;
 -  if (phase == TDI_none || !dump_enabled_p (phase))
 +  if (phase == TDI_none || !dump_phase_enabled_p (phase))
  return 0;

dfi = get_dump_file_info (phase);
 @@ -461,7 +461,7 @@ dump_begin (int phase, int *flag_ptr)
struct dump_file_info *dfi;
FILE *stream;

 -  if (phase == TDI_none || !dump_enabled_p (phase))
 +  if (phase == TDI_none || !dump_phase_enabled_p (phase))
  return NULL;

name = get_dump_file_name (phase);
 @@ -493,8 +493,8 @@ dump_begin (int phase, int *flag_ptr)
 If PHASE is TDI_tree_all, return nonzero if any dump is enabled for
 any phase.  */

 -int
 -dump_enabled_p (int phase)
 +static int
 +dump_phase_enabled_p (int phase)
  {
if (phase == TDI_tree_all)
  {
 @@ -514,6 +514,14 @@ dump_begin (int phase, int *flag_ptr)
  }
  }

 +/* Return true if any of the dumps are enabled, false otherwise. */
 +
 +inline bool
 +dump_enabled_p (void)
 +{
 +  return (dump_file || alt_dump_file);
 +}
 +
  /* Returns nonzero if tree dump PHASE has been initialized.  */

  int
 @@ -834,9 +842,8 @@ opt_info_switch_p (const char *arg)
  bool
  dump_kind_p (int msg_type)
  {
 -  if (!current_function_decl)
 -return 0;
 -  return ((msg_type  pflags) || (msg_type  alt_flags));
 +  return (dump_file  (msg_type  pflags))
 +|| (alt_dump_file  (msg_type  alt_flags));
  }

  /* Print basic block on the dump streams.  */
 Index: dumpfile.h
 ===
 --- dumpfile.h (revision 192623)
 +++ dumpfile.h (working copy)
 @@ -121,6 +121,7 @@ extern int dump_switch_p (const char *);
  extern int opt_info_switch_p (const char *);
  extern const char *dump_flag_name (int);
  extern bool dump_kind_p (int);
 +extern inline bool dump_enabled_p (void);
  extern void dump_printf (int, const char *, ...) ATTRIBUTE_PRINTF_2;
  extern void dump_printf_loc (int, source_location,
   const char *, ...) ATTRIBUTE_PRINTF_3;

Re: Fix array bound niter estimate (PR middle-end/54937)

2012-10-22 Thread Richard Biener

On Fri, 19 Oct 2012, Jan Hubicka wrote:

  On Fri, 19 Oct 2012, Jan Hubicka wrote:
  
   Hi,
   this patch fixes off-by-one error in the testcase attached.  The problem 
   is that
   dominance based test used by record_estimate to check whether the given 
   statement
   must be executed at last iteration of the loop is wrong ignoring the side 
   effect
   of other statements that may terminate the program.
   It also does not work for mulitple exits as excercised by cunroll-2.c 
   testcase.
   
   This patch makes simple approach of computing set of all statements that 
   must
   by executed last iteration first time record_estimate is executed this 
   way.
   The set is computed conservatively walking header BB and its signle 
   successors
   (possibly diving into nested loop) stopping on first BB with multiple 
   exits.
   
   Better result can be computed by
   1) estimating what loops are known to be finite
   2) inserting fake edges for all infinite loop and all statements with 
   side effect
  that may terminate the execution
   3) using the post dominance info.
  
  would using post-dom info even work?  That only says that _if_ the
  dominated stmt executed then it came through the dominator.  It
  doesn't deal with functions that may not return.
 
 With fake edges inserted it will. We do have code for that used in profiling 
 that
 also needs this stronger definition of CFG.

Huh, but then we will need to split blocks.  I don't think that's viable.

  
  What about the conservative variant of simply
  
else
  delta = double_int_one;
 
 I think it would be bad idea: it makes us to completely unroll one interation
 too many that bloats code for no benefit. No optimization cancels the path in
 CFG because of undefined effect and thus the code will be output (unless 
 someone
 smarter, like VRP, cleans up later, but it is more an exception than rule.)
  
  ?  I don't like all the code you add, nor the use of -aux.
 
 Neither I really do, but what are the alternatives?

See above ;)

 My first implementation simply checked that stmt is in the loop header and
 walked up to the beggining of basic blocks looking for side effects.  Then I
 become worried about possibility of gigantic basic blocks with many array
 stores within the loop, so I decided to record the reachable statements 
 instead
 of repeating the walk.
 Loop count estimation is recursive (i.e. it dives into inner loops), thus I
 ended up with using AUX.  I can for sure put this separately or add extra
 reference argument passed over the whole call stack, but there are quite many
 functions that can leads to record_estimate. (I have nothing against that
 alternative however if AUX looks ugly)

I am worried about passes trying to use AUX.  We should at least document
that it is for internal use only.

  
   i_bound += delta;
  
  Another alternative would be to not use i_bound for the
  strong upper bound but only the estimate (thus conservatively
  use i_bound + 1 for the upper bound if !is_exit).
 
 We can not derrive realistic estimate based on this: the loop may exit much 
 earlier.
 We can only lower the estimate if it is already there and greater than this 
 bound.
 This can probably happen with profile feedback and I can implement it later,
 I do not think it is terribly important though.
 
 Honza
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode

2012-10-22 Thread Richard Earnshaw


On 20/10/12 12:38, Julian Brown wrote:

Hi,

Quite a few tests fail for big-endian multilibs which use VFP
instructions at present. One reason for many of these is glaringly
obvious once you notice it: for D registers interpreted as two S
registers, the lower-numbered register is always the less-significant
part of the value, and the higher-numbered register the
more-significant -- regardless of the endianness the processor is
running in.

However, for big-endian mode, when DFmode values are represented in
memory (or indeed core registers), the opposite is true. So, a subreg
expression such as the following will work fine on core registers (or
e.g. pseudos assigned to stack slots):

(subreg:SI (reg:DF) 0)

but, when applied to a VFP register Dn, it should be resolved to the
hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e.
the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should
be the most-significant part of the value). For the relatively few cases
where DFmode values are interpreted as a pair of (integer) words, this
means that wrong code is generated.

My feeling is that implementing a proper solution to this problem is
probably impractical -- the closest existing macros to control
behaviour aren't sufficient for this case:

* FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct
   as is it.

* REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian
   order in registers, but refers to *all* registers. We only want to
   change the behaviour for the VFP registers. Defining a new macro
   FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would
   differ depending on the hard register under observation: that seems
   like too much to ask of generic machinery in the middle-end.

So, the attached patch just avoids the problem, by pretending that
greater-than-word-size values in VFP registers, in big-endian mode, are
opaque and cannot be subreg'ed. In practice, for at least the test case
I looked at, this isn't as much of a pessimisation as you might expect
-- the value in question might already be stored in core registers
(e.g. for function arguments with -mfloat-abi=softfp), so can be
retrieved directly from those rather than via memory.

This is the testsuite delta for current FSF mainline, with multilibs
adjusted to build for little/big-endian, and using options
-mbig-endian -mfloat-abi=softfp -mfpu=vfpv3 for testing:

FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O1  
execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2  
execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
-flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
-flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 
-fomit-frame-pointer  execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 -g  
execution test
FAIL - PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -Os  
execution test
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O1
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O3 -fomit-frame-pointer
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O3 -g
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -Og -g
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -Os
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 
c_compat_x_tst.o-c_compat_y_tst.o execute
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O1  
execution test
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2  
execution test
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2 
-flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL - PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2 
-flto -fuse-linker-plugin -fno-fat-lto-objects  execution test

Re: Fix array bound niter estimate (PR middle-end/54937)

2012-10-22 Thread Richard Biener

On Fri, 19 Oct 2012, Jan Hubicka wrote:

What about the conservative variant of simply

  else
delta = double_int_one;
   
   I think it would be bad idea: it makes us to completely unroll one 
   interation
   too many that bloats code for no benefit. No optimization cancels the 
   path in
   CFG because of undefined effect and thus the code will be output (unless 
   someone
   smarter, like VRP, cleans up later, but it is more an exception than 
   rule.)
  
  OK, on deper tought I guess I can add double_int_one always at that spot and
  once we are done with everything I can walk nb_iter_bound for all statements
  known to not be executed on last iteration and record them to pointer set.
  
  Finally I can walk from header in DFS stopping on loop exits, side effects 
  and
  those stateemnts.  If I visit no loop exit or side effect I know I can lower
  iteration count by 1 (in estimate_numbers_of_iterations_loop).
  
  This will give accurate answer and requires just little extra bookkeeping.
  
  I will give this a try.
 
 Here is updated patch.  It solves the testcase and gives better estimates 
 than before.
 
 Here is obvious improvements: record_estimate can put all statements to the 
 list not only
 those that dominates loop latch and maybe_lower_iteration_bound can track 
 lowest estimate
 it finds on its walk.  This will need bit more work and I am thus sending the 
 bugfix
 separately, because I think it should go to 4.7, too.
 
 Honza
 
   * tree-ssa-loop-niter.c (record_estimate): Remove confused
   dominators check.
   (maybe_lower_iteration_bound): New function.
   (estimate_numbers_of_iterations_loop): Use it.
 Index: tree-ssa-loop-niter.c
 ===
 --- tree-ssa-loop-niter.c (revision 192537)
 +++ tree-ssa-loop-niter.c (working copy)
 @@ -2535,7 +2541,6 @@ record_estimate (struct loop *loop, tree
gimple at_stmt, bool is_exit, bool realistic, bool upper)
  {
double_int delta;
 -  edge exit;
  
if (dump_file  (dump_flags  TDF_DETAILS))
  {
 @@ -2570,14 +2577,10 @@ record_estimate (struct loop *loop, tree
  }
  
/* Update the number of iteration estimates according to the bound.
 - If at_stmt is an exit or dominates the single exit from the loop,
 - then the loop latch is executed at most BOUND times, otherwise
 - it can be executed BOUND + 1 times.  */
 -  exit = single_exit (loop);
 -  if (is_exit
 -  || (exit != NULL
 -dominated_by_p (CDI_DOMINATORS,
 -  exit-src, gimple_bb (at_stmt
 + If at_stmt is an exit then the loop latch is executed at most BOUND 
 times,
 + otherwise it can be executed BOUND + 1 times.  We will lower the 
 estimate
 + later if such statement must be executed on last iteration  */
 +  if (is_exit)
  delta = double_int_zero;
else
  delta = double_int_one;
 @@ -2953,6 +2956,87 @@ gcov_type_to_double_int (gcov_type val)
return ret;
  }
  
 +/* See if every path cross the loop goes through a statement that is known
 +   to not execute at the last iteration. In that case we can decrese 
 iteration
 +   count by 1.  */
 +
 +static void
 +maybe_lower_iteration_bound (struct loop *loop)
 +{
 +  pointer_set_t *not_executed_last_iteration = pointer_set_create ();
 +  pointer_set_t *visited;
 +  struct nb_iter_bound *elt;
 +  bool found = false;
 +  VEC (basic_block, heap) *queue = NULL;
 +
 +  for (elt = loop-bounds; elt; elt = elt-next)
 +{
 +  if (!elt-is_exit
 +elt-bound.ult (loop-nb_iterations_upper_bound))
 + {
 +   found = true;
 +   pointer_set_insert (not_executed_last_iteration, elt-stmt);
 + }
 +}

So you are looking for all stmts a bound was derived from.

 +  if (!found)
 +{
 +  pointer_set_destroy (not_executed_last_iteration);

create this on-demand in the above loop?

 +  return;
 +}
 +  visited = pointer_set_create ();
 +  VEC_safe_push (basic_block, heap, queue, loop-header);
 +  pointer_set_insert (visited, loop-header);

pointer-set for BB visited?  In most other places we use a
[s]bitmap with block numbers.

 +  found = false;
 +
 +  while (VEC_length (basic_block, queue)  !found)

looks like a do-while loop should be possible with a !VEC_empty ()
guard at the end.

 +{
 +  basic_block bb = VEC_pop (basic_block, queue);
 +  gimple_stmt_iterator gsi;
 +  bool stmt_found = false;
 +
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 + {
 +   gimple stmt = gsi_stmt (gsi);
 +   if (pointer_set_contains (not_executed_last_iteration, stmt))
 + {
 +   stmt_found = true;

we found one.

 +   break;
 + }
 +   if (gimple_has_side_effects (stmt))
 + {
 +   found = true;

we found sth else?

 +   break;
 + }
 + }
 +  if (!stmt_found  !found)
 + {

if we found

[Ada] Fix ICE on loop with modular iteration variable

2012-10-22 Thread Eric Botcazou

This is a regression at -O present on mainline and 4.7 branch.  The compiler 
inadvertently uses a non-base type for the base type of a modular iteration 
variable on 32-bit architectures.

Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/trans.c (Loop_Statement_to_gnu): Use gnat_type_for_size
directly to obtain an unsigned version of the base type.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/modular4.adb: New test.
* gnat.dg/modular4_pkg.ads: New helper.


-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 192648)
+++ gcc-interface/trans.c	(working copy)
@@ -2431,7 +2431,8 @@ Loop_Statement_to_gnu (Node_Id gnat_node
 	{
 	  if (TYPE_PRECISION (gnu_base_type)
 		   TYPE_PRECISION (size_type_node))
-		gnu_base_type = gnat_unsigned_type (gnu_base_type);
+		gnu_base_type
+		  = gnat_type_for_size (TYPE_PRECISION (gnu_base_type), 1);
 	  else
 		gnu_base_type = size_type_node;
 
-- { dg-do compile }
-- { dg-options -O }

with Modular4_Pkg; use Modular4_Pkg;

procedure Modular4 is
begin
  for I in Zero .. F mod 8 loop
raise Program_Error;
  end loop;
end;
package Modular4_Pkg is

   type Word is mod 2**48;

   Zero : constant Word := 0;

   function F return Word;

end Modular4_Pkg;

Re: [Ada] Do not generate special PARM_DECL in LTO mode

2012-10-22 Thread Richard Biener

On Mon, Oct 22, 2012 at 10:04 AM, Eric Botcazou ebotca...@adacore.com wrote:
 We generate a special PARM_DECL for Out parameters passed by copy at -O0, but
 it doesn't play nice with LTO so this patch removes it when LTO is enabled.

 Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch.

Shouldn't it be simply the abstract origin for the VAR_DECL?  Or be
not 'lowered'
here but be a 'proper' PARM_DECL with DECL_VALUE_EXPR?  That said,
how is debug info emitted in the optimize case?

No objection to the patch as-is, but guarding sth with flag_generate_lto always
makes me suspicious ;)

Richard.

 2012-10-22  Eric Botcazou  ebotca...@adacore.com

 * gcc-interface/decl.c (gnat_to_gnu_entity) E_Out_Parameter: Do not
 generate the special PARM_DECL for an Out parameter in LTO mode.


 --
 Eric Botcazou

[Ada] Fix ICE on new limited_with use in Ada 2012

2012-10-22 Thread Eric Botcazou

Ada 2012 has extended the use of limited_with and incomplete types coming from 
a limited context may now appear in parameter and result profiles.  This of 
course introduces more circularities, especially in -gnatct mode.

Tested on x86_64-suse-linux, applied on the mainline.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/decl.c (gnat_to_gnu_entity) E_Subprogram_Type: In
type annotation mode, break circularities introduced by AI05-0151.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/specs/limited_with4.ads: New test.
* gnat.dg/specs/limited_with4_pkg.ads: New helper.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 192667)
+++ gcc-interface/decl.c	(working copy)
@@ -4142,7 +4142,18 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  gnu_return_type = void_type_node;
 	else
 	  {
-	gnu_return_type = gnat_to_gnu_type (gnat_return_type);
+	/* Ada 2012 (AI05-0151): Incomplete types coming from a limited
+	   context may now appear in parameter and result profiles.  If
+	   we are only annotating types, break circularities here.  */
+	if (type_annotate_only
+		 IN (Ekind (gnat_return_type), Incomplete_Kind)
+	 From_With_Type (gnat_return_type)
+		 In_Extended_Main_Code_Unit
+		   (Non_Limited_View (gnat_return_type))
+		 !present_gnu_tree (Non_Limited_View (gnat_return_type)))
+	  gnu_return_type = ptr_void_type_node;
+	else
+	  gnu_return_type = gnat_to_gnu_type (gnat_return_type);
 
 	/* If this function returns by reference, make the actual return
 	   type the pointer type and make a note of that.  */
@@ -4238,11 +4249,30 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	 Present (gnat_param);
 	 gnat_param = Next_Formal_With_Extras (gnat_param), parmnum++)
 	  {
+	Entity_Id gnat_param_type = Etype (gnat_param);
 	tree gnu_param_name = get_entity_name (gnat_param);
-	tree gnu_param_type = gnat_to_gnu_type (Etype (gnat_param));
-	tree gnu_param, gnu_field;
-	bool copy_in_copy_out = false;
+	tree gnu_param_type, gnu_param, gnu_field;
 	Mechanism_Type mech = Mechanism (gnat_param);
+  	bool copy_in_copy_out = false, fake_param_type;
+
+	/* Ada 2012 (AI05-0151): Incomplete types coming from a limited
+	   context may now appear in parameter and result profiles.  If
+	   we are only annotating types, break circularities here.  */
+	if (type_annotate_only
+		 IN (Ekind (gnat_param_type), Incomplete_Kind)
+	 From_With_Type (Etype (gnat_param_type))
+		 In_Extended_Main_Code_Unit
+		   (Non_Limited_View (gnat_param_type))
+		 !present_gnu_tree (Non_Limited_View (gnat_param_type)))
+	  {
+		gnu_param_type = ptr_void_type_node;
+		fake_param_type = true;
+	  }
+	else
+	  {
+		gnu_param_type = gnat_to_gnu_type (gnat_param_type);
+		fake_param_type = false;
+	  }
 
 	/* Builtins are expanded inline and there is no real call sequence
 	   involved.  So the type expected by the underlying expander is
@@ -4280,10 +4310,28 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		mech = Default;
 	  }
 
-	gnu_param
-	  = gnat_to_gnu_param (gnat_param, mech, gnat_entity,
-   Has_Foreign_Convention (gnat_entity),
-   copy_in_copy_out);
+	/* Do not call gnat_to_gnu_param for a fake parameter type since
+	   it will try to use the real type again.  */
+	if (fake_param_type)
+	  {
+		if (Ekind (gnat_param) == E_Out_Parameter)
+		  gnu_param = NULL_TREE;
+		else
+		  {
+		gnu_param
+		  = create_param_decl (gnu_param_name, gnu_param_type,
+	   false);
+		Set_Mechanism (gnat_param,
+   mech == Default ? By_Copy : mech);
+		if (Ekind (gnat_param) == E_In_Out_Parameter)
+		  copy_in_copy_out = true;
+		  }
+	  }
+	else
+	  gnu_param
+		= gnat_to_gnu_param (gnat_param, mech, gnat_entity,
+ Has_Foreign_Convention (gnat_entity),
+ copy_in_copy_out);
 
 	/* We are returned either a PARM_DECL or a type if no parameter
 	   needs to be passed; in either case, adjust the type.  */-- { dg-do compile }
-- { dg-options -gnat12 -gnatct }

with Ada.Containers.Vectors;
with Limited_With4_Pkg;

package Limited_With4 is

   type Object is tagged private;
   type Object_Ref is access all Object;
   type Class_Ref is access all Object'Class;

   package Vec is new Ada.Containers.Vectors
 (Positive, Limited_With4_Pkg.Object_Ref,Limited_With4_Pkg .=);
   subtype Vector is Vec.Vector;

private

   type Object is tagged record
  V : Vector;
   end record;

end Limited_With4;
-- { dg-do compile }
-- { dg-options -gnat12 -gnatct }

limited with Limited_With4;

package Limited_With4_Pkg is

   type Object is tagged null record;
   type Object_Ref is access all Object;
   type Class_Ref is access all Object'Class;

   function Func return Limited_With4.Class_Ref;

[Ada] Adjust rest_of_record_type_compilation to sizetype change

2012-10-22 Thread Eric Botcazou

The function does a bit of pattern matching to emit the special encoding for 
variable-sized record types in the debug info and it needs to be adjusted to 
the sizetype change.

Tested on x86_64-suse-linux, applied on the mainline.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/utils.c (rest_of_record_type_compilation): Simplify and
robustify pattern machine code for masking operations.


-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 192648)
+++ gcc-interface/utils.c	(working copy)
@@ -1731,19 +1731,23 @@ rest_of_record_type_compilation (tree re
 	  tree offset = TREE_OPERAND (curpos, 0);
 	  align = tree_low_cst (TREE_OPERAND (curpos, 1), 1);
 
-	  /* An offset which is a bitwise AND with a negative power of 2
-		 means an alignment corresponding to this power of 2.  Note
-		 that, as sizetype is sign-extended but nonetheless unsigned,
-		 we don't directly use tree_int_cst_sgn.  */
+	  /* An offset which is a bitwise AND with a mask increases the
+		 alignment according to the number of trailing zeros.  */
 	  offset = remove_conversions (offset, true);
 	  if (TREE_CODE (offset) == BIT_AND_EXPR
-		   host_integerp (TREE_OPERAND (offset, 1), 0)
-		   TREE_INT_CST_HIGH (TREE_OPERAND (offset, 1))  0)
+		   TREE_CODE (TREE_OPERAND (offset, 1)) == INTEGER_CST)
 		{
-		  unsigned int pow
-		= - tree_low_cst (TREE_OPERAND (offset, 1), 0);
-		  if (exact_log2 (pow)  0)
-		align *= pow;
+		  unsigned HOST_WIDE_INT mask
+		= TREE_INT_CST_LOW (TREE_OPERAND (offset, 1));
+		  unsigned int i;
+
+		  for (i = 0; i  HOST_BITS_PER_WIDE_INT; i++)
+		{
+		  if (mask  1)
+			break;
+		  mask = 1;
+		  align *= 2;
+		}
 		}
 
 	  pos = compute_related_constant (curpos,

[Ada] Plug small hole in handling of volatile components

2012-10-22 Thread Eric Botcazou

This pertains only to small arrays, for which we fail to take into account a 
pragma Volatile on the component type or a pragma Volatile_Component.

Tested on x86_64-suse-linux, applied on the mainline and 4.7 branch.


2012-10-22  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/decl.c (gnat_to_gnu_entity) E_Array_Type: Force
BLKmode on the type if it is passed by reference.
E_Array_Subtype: Likewise.
E_Record_Type: Guard the call to Is_By_Reference_Type predicate.
E_Record_Subtype: Likewise.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 192671)
+++ gcc-interface/decl.c	(working copy)
@@ -2248,6 +2248,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	TYPE_MULTI_ARRAY_P (tem) = (index  0);
 	if (array_type_has_nonaliased_component (tem, gnat_entity))
 	  TYPE_NONALIASED_COMPONENT (tem) = 1;
+
+	/* If it is passed by reference, force BLKmode to ensure that
+	   objects of this type will always be put in memory.  */
+	if (TYPE_MODE (tem) != BLKmode
+		 Is_By_Reference_Type (gnat_entity))
+	  SET_TYPE_MODE (tem, BLKmode);
 	  }
 
 	/* If an alignment is specified, use it if valid.  But ignore it
@@ -2588,6 +2594,11 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  TYPE_MULTI_ARRAY_P (gnu_type) = (index  0);
 	  if (array_type_has_nonaliased_component (gnu_type, gnat_entity))
 		TYPE_NONALIASED_COMPONENT (gnu_type) = 1;
+
+	  /* See the E_Array_Type case for the rationale.  */
+	  if (TYPE_MODE (gnu_type) != BLKmode
+		   Is_By_Reference_Type (gnat_entity))
+		SET_TYPE_MODE (gnu_type, BLKmode);
 	}
 
 	  /* Attach the TYPE_STUB_DECL in case we have a parallel type.  */
@@ -3161,7 +3172,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 
 	/* If it is passed by reference, force BLKmode to ensure that objects
 	   of this type will always be put in memory.  */
-	if (Is_By_Reference_Type (gnat_entity))
+	if (TYPE_MODE (gnu_type) != BLKmode
+	 Is_By_Reference_Type (gnat_entity))
 	  SET_TYPE_MODE (gnu_type, BLKmode);
 
 	/* We used to remove the associations of the discriminants and _Parent
@@ -3527,12 +3539,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		 modify it below.  */
 	  finish_record_type (gnu_type, nreverse (gnu_field_list), 2,
   false);
+	  compute_record_mode (gnu_type);
 
 	  /* See the E_Record_Type case for the rationale.  */
-	  if (Is_By_Reference_Type (gnat_entity))
+	  if (TYPE_MODE (gnu_type) != BLKmode
+		   Is_By_Reference_Type (gnat_entity))
 		SET_TYPE_MODE (gnu_type, BLKmode);
-	  else
-		compute_record_mode (gnu_type);
 
 	  TYPE_VOLATILE (gnu_type) = Treat_As_Volatile (gnat_entity);

Re: [Ada] Do not generate special PARM_DECL in LTO mode

2012-10-22 Thread Eric Botcazou

 Shouldn't it be simply the abstract origin for the VAR_DECL?  Or be
 not 'lowered'
 here but be a 'proper' PARM_DECL with DECL_VALUE_EXPR?  That said,
 how is debug info emitted in the optimize case?

This is a PARM_DECL with DECL_VALUE_EXPR set to the VAR_DECL emitted in the 
outermost function scope.  It doesn't survive with optimization enabled so we 
don't bother generating it in this case.

-- 
Eric Botcazou

[AARCH64-4.7] Merge from upstream gcc-4_7-branch r192597

2012-10-22 Thread Sofiane Naci

Hi,

I have just merged upstream gcc-4_7-branch on the aarch64-4.7-branch up to
r192597.
 
Thanks
Sofiane

[PATCH,ARM] Fix PR55019 Incorrectly use live argument register to save high register in thumb1 prologue

2012-10-22 Thread Terry Guo

Hi,

Attached patch intends to fix bug 55019 which is exposed on 4.7 branch.
Although this bug can't be reproduced on trunk, I think this fix is still
useful to make trunk more robust. Tested with trunk regression test on
cortex-m0 and cortex-m3, no regression found. Also tested with various
benchmark like Dhrystone/coremark/eembc_v1 on cortex-m0, no regression on
performance and code size. Is it ok to go upstream and 4.7 branch?

BR,
Terry

gcc/ChangeLog

2012-10-22  Terry Guo  terry@arm.com

PR target/55019
* config/arm/arm.c (thumb1_expand_prologue): Don't push high regs
with
live argument regs.

gcc/testsuite/ChangeLog

2012-10-22  Terry Guo  terry@arm.com

PR target/55019
* gcc.target/arm/pr55019.c: New.

thumb1-argument-register-issue.patch
Description: Binary data

[AARCH64] Merge from upstream trunk r192598

2012-10-22 Thread Sofiane Naci

Hi,

I have merged upstream trunk into ARM/aarch64-branch, up to r192598.

Thanks
Sofiane

Re: [PATCH] PowerPC VLE port

2012-10-22 Thread James Lemke


On 10/19/2012 02:52 PM, David Edelsohn wrote:

How do you want to move forward with the VLE patch?  Can you localize
more of the changes?


David, I have been distracted by other tasks.  I expect to revisit VLE this
week.  However, I won't be able to invest much more time on VLE.  I'll look
at what else I can do.

--
Jim Lemke
Mentor Graphics / CodeSourcery
Orillia Ontario,  +1-613-963-1073

Re: Fix array bound niter estimate (PR middle-end/54937)

2012-10-22 Thread Jan Hubicka

  +static void
  +maybe_lower_iteration_bound (struct loop *loop)
  +{
  +  pointer_set_t *not_executed_last_iteration = pointer_set_create ();
  +  pointer_set_t *visited;
  +  struct nb_iter_bound *elt;
  +  bool found = false;
  +  VEC (basic_block, heap) *queue = NULL;
  +
  +  for (elt = loop-bounds; elt; elt = elt-next)
  +{
  +  if (!elt-is_exit
  +  elt-bound.ult (loop-nb_iterations_upper_bound))
  +   {
  + found = true;
  + pointer_set_insert (not_executed_last_iteration, elt-stmt);
  +   }
  +}
 
 So you are looking for all stmts a bound was derived from.

Yes, with bound smaller than the current estimate.
 
  +  if (!found)
  +{
  +  pointer_set_destroy (not_executed_last_iteration);
 
 create this on-demand in the above loop?

Will do. 
 
  +  return;
  +}
  +  visited = pointer_set_create ();
  +  VEC_safe_push (basic_block, heap, queue, loop-header);
  +  pointer_set_insert (visited, loop-header);
 
 pointer-set for BB visited?  In most other places we use a
 [s]bitmap with block numbers.

Will switch to bitmap though I think it is mostly because this tradition was
invented before pointer-set. bitmap has liner walk in it, pointer-set should
scale better.
 
 if we didn't find an exit we reduce count.  double_int_one looks magic
 here, but with the assertion that each queued 'stmt_found' upper
 bound was less than loop-nb_iterations_upper_bound, subtracting one
 is certainly conservative.  But why not use the maximum estimate from 
 all stmts_found?

Because it is always nb_iterations_upper_bound-1, see logic in record_estimate.
I plan to change this - basically we can change record_esitmate to record all
statements, not only those dominating exit and do Dijkstra's algorithm in this
walk looking for largest upper bound we can reach loopback with.

But as I wrote in the email, I would like to do this incrementally - fix the
bug first (possibly for 4.7, too - the bug is there I am not sure if it can
lead to wrong code) and change this next.  It means some further changes
thorough niter.c, but little challenge to implement Dijkstra with double-int
based queue.
 
 Thus, please add some comments, use a bitmap for visited and
 rename variables to be more descriptive.

Will do, and need to analyze the bounds fortran failures :)

Thanks,
Honza

Ping [Patch] Fix PR52945

2012-10-22 Thread Dominique Dhumieres

Could someone commit the patch at
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00758.html ?

TIA

Dominique

[PATCH] Fix PR55011

2012-10-22 Thread Richard Biener


This fixes PR55011, it seems nothing checks for invalid lattice
transitions in VRP, so the following adds that since we now
can produce a lot more UNDEFINED than before not doing so triggers
issues.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-10-22  Richard Biener  rguent...@suse.de

PR tree-optimization/55011
* tree-vrp.c (update_value_range): For invalid lattice transitions
drop to VARYING.

* gcc.dg/torture/pr55011.c: New testcase.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 192671)
--- gcc/tree-vrp.c  (working copy)
*** update_value_range (const_tree var, valu
*** 819,826 
   || !vrp_bitmap_equal_p (old_vr-equiv, new_vr-equiv);
  
if (is_new)
! set_value_range (old_vr, new_vr-type, new_vr-min, new_vr-max,
!new_vr-equiv);
  
BITMAP_FREE (new_vr-equiv);
  
--- 819,837 
   || !vrp_bitmap_equal_p (old_vr-equiv, new_vr-equiv);
  
if (is_new)
! {
!   /* Do not allow transitions up the lattice.  The following
!  is slightly more awkward than just new_vr-type  old_vr-type
!because VR_RANGE and VR_ANTI_RANGE need to be considered
!the same.  We may not have is_new when transitioning to
!UNDEFINED or from VARYING.  */
!   if (new_vr-type == VR_UNDEFINED
! || old_vr-type == VR_VARYING)
!   set_value_range_to_varying (old_vr);
!   else
!   set_value_range (old_vr, new_vr-type, new_vr-min, new_vr-max,
!new_vr-equiv);
! }
  
BITMAP_FREE (new_vr-equiv);
  
Index: gcc/testsuite/gcc.dg/torture/pr55011.c
===
*** gcc/testsuite/gcc.dg/torture/pr55011.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55011.c  (working copy)
***
*** 0 
--- 1,22 
+ /* { dg-do compile } */
+ 
+ char a;
+ 
+ void f(void)
+ {
+   char b = 2;
+ 
+   for(;;)
+ {
+   unsigned short s = 1, *p = s, *i;
+ 
+   for(*i = 0; *i  4; ++*i)
+   if(a | (*p /= (b += !!a)) = 63739)
+ return;
+ 
+   if(!s)
+   a = 0;
+ 
+   for(;;);
+ }
+ }

Re: [PATCH,ARM] Fix PR55019 Incorrectly use live argument register to save high register in thumb1 prologue

2012-10-22 Thread Richard Earnshaw


On 22/10/12 12:50, Terry Guo wrote:

Hi,

Attached patch intends to fix bug 55019 which is exposed on 4.7 branch.
Although this bug can't be reproduced on trunk, I think this fix is still
useful to make trunk more robust. Tested with trunk regression test on
cortex-m0 and cortex-m3, no regression found. Also tested with various
benchmark like Dhrystone/coremark/eembc_v1 on cortex-m0, no regression on
performance and code size. Is it ok to go upstream and 4.7 branch?

BR,
Terry

gcc/ChangeLog

2012-10-22  Terry Guo  terry@arm.com

PR target/55019
* config/arm/arm.c (thumb1_expand_prologue): Don't push high regs
with
live argument regs.

gcc/testsuite/ChangeLog

2012-10-22  Terry Guo  terry@arm.com

PR target/55019
* gcc.target/arm/pr55019.c: New.=


thumb1-argument-register-issue.patch


N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý




The test isn't thumb1 specific.  In fact, it isn't even ARM specific. 
So I think it should be moved to gcc.dg.


Otherwise OK for trunk and 4.7.

R.

[PATCH, i386]: Fix length attribute calculation for LEA and addr32 addresses, some improvements

2012-10-22 Thread Uros Bizjak

Hello!

We don't need to check for REG_P on base and index, we are sure that
non-null RTXes are registers only. Also, we should determine the mode
of RTXes in addr32 calculation from original RTXes.

2012-10-22  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (memory_address_length): Assert that non-null
base or index RTXes are registers.  Do not check for REG RTXes.
Determine addr32 prefix from original base and index RTXes.
Simplify code.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 192664)
+++ config/i386/i386.c  (working copy)
@@ -23764,7 +23764,7 @@ memory_address_length (rtx addr, bool lea)
 {
   struct ix86_address parts;
   rtx base, index, disp;
-  int len = 0;
+  int len;
   int ok;
 
   if (GET_CODE (addr) == PRE_DEC
@@ -23776,15 +23776,26 @@ memory_address_length (rtx addr, bool lea)
   ok = ix86_decompose_address (addr, parts);
   gcc_assert (ok);
 
-  if (parts.base  GET_CODE (parts.base) == SUBREG)
-parts.base = SUBREG_REG (parts.base);
-  if (parts.index  GET_CODE (parts.index) == SUBREG)
-parts.index = SUBREG_REG (parts.index);
+  len = (parts.seg == SEG_DEFAULT) ? 0 : 1;
 
+  /*  If this is not LEA instruction, add the length of addr32 prefix.  */
+  if (TARGET_64BIT  !lea
+   ((parts.base  GET_MODE (parts.base) == SImode)
+ || (parts.index  GET_MODE (parts.index) == SImode)))
+len++;
+
   base = parts.base;
   index = parts.index;
   disp = parts.disp;
 
+  if (base  GET_CODE (base) == SUBREG)
+base = SUBREG_REG (base);
+  if (index  GET_CODE (index) == SUBREG)
+index = SUBREG_REG (index);
+
+  gcc_assert (base == NULL_RTX || REG_P (base));
+  gcc_assert (index == NULL_RTX || REG_P (index));
+
   /* Rule of thumb:
- esp as the base always wants an index,
- ebp as the base always wants a displacement,
@@ -23797,14 +23808,13 @@ memory_address_length (rtx addr, bool lea)
   /* esp (for its index) and ebp (for its displacement) need
 the two-byte modrm form.  Similarly for r12 and r13 in 64-bit
 code.  */
-  if (REG_P (base)
-  (base == arg_pointer_rtx
- || base == frame_pointer_rtx
- || REGNO (base) == SP_REG
- || REGNO (base) == BP_REG
- || REGNO (base) == R12_REG
- || REGNO (base) == R13_REG))
-   len = 1;
+  if (base == arg_pointer_rtx
+ || base == frame_pointer_rtx
+ || REGNO (base) == SP_REG
+ || REGNO (base) == BP_REG
+ || REGNO (base) == R12_REG
+ || REGNO (base) == R13_REG)
+   len++;
 }
 
   /* Direct Addressing.  In 64-bit mode mod 00 r/m 5
@@ -23814,7 +23824,7 @@ memory_address_length (rtx addr, bool lea)
  by UNSPEC.  */
   else if (disp  !base  !index)
 {
-  len = 4;
+  len += 4;
   if (TARGET_64BIT)
{
  rtx symbol = disp;
@@ -23832,7 +23842,7 @@ memory_address_length (rtx addr, bool lea)
  || (XINT (symbol, 1) != UNSPEC_GOTPCREL
   XINT (symbol, 1) != UNSPEC_PCREL
   XINT (symbol, 1) != UNSPEC_GOTNTPOFF)))
-   len += 1;
+   len++;
}
 }
   else
@@ -23841,41 +23851,23 @@ memory_address_length (rtx addr, bool lea)
   if (disp)
{
  if (base  satisfies_constraint_K (disp))
-   len = 1;
+   len += 1;
  else
-   len = 4;
+   len += 4;
}
   /* ebp always wants a displacement.  Similarly r13.  */
-  else if (base  REG_P (base)
-   (REGNO (base) == BP_REG || REGNO (base) == R13_REG))
-   len = 1;
+  else if (base  (REGNO (base) == BP_REG || REGNO (base) == R13_REG))
+   len++;
 
   /* An index requires the two-byte modrm form  */
   if (index
  /* ...like esp (or r12), which always wants an index.  */
  || base == arg_pointer_rtx
  || base == frame_pointer_rtx
- || (base  REG_P (base)
-  (REGNO (base) == SP_REG || REGNO (base) == R12_REG)))
-   len += 1;
+ || (base  (REGNO (base) == SP_REG || REGNO (base) == R12_REG)))
+   len++;
 }
 
-  switch (parts.seg)
-{
-case SEG_FS:
-case SEG_GS:
-  len += 1;
-  break;
-default:
-  break;
-}
-
-  /*  If this is not LEA instruction, add the length of addr32 prefix.  */
-  if (TARGET_64BIT  !lea
-   ((base  GET_MODE (base) == SImode)
- || (index  GET_MODE (index) == SImode)))
-len += 1;
-
   return len;
 }

[PATCH] Fix PR55021

2012-10-22 Thread Richard Biener


Somehow bogus truncations slipped through in my LTO overflowed
INTEGER_CST streaming patch.  Oops.

Committed as obvious.

Richard.

2012-10-22  Richard Biener  rguent...@suse.de

PR lto/55021
* tree-streamer-in.c (unpack_ts_int_cst_value_fields): Remove
bogus truncations.

Index: gcc/tree-streamer-in.c
===
--- gcc/tree-streamer-in.c  (revision 192688)
+++ gcc/tree-streamer-in.c  (working copy)
@@ -146,8 +146,8 @@ unpack_ts_base_value_fields (struct bitp
 static void
 unpack_ts_int_cst_value_fields (struct bitpack_d *bp, tree expr)
 {
-  TREE_INT_CST_LOW (expr) = (unsigned) bp_unpack_var_len_unsigned (bp);
-  TREE_INT_CST_HIGH (expr) = (unsigned) bp_unpack_var_len_int (bp);
+  TREE_INT_CST_LOW (expr) = bp_unpack_var_len_unsigned (bp);
+  TREE_INT_CST_HIGH (expr) = bp_unpack_var_len_int (bp);
 }

Remove def operands cache, try 2

2012-10-22 Thread Michael Matz

Hi,

On Tue, 11 Sep 2012, Michael Matz wrote:

 the operands cache is ugly.  This patch removes it at least for the def 
 operands, saving three pointers for roughly each normal statement (the 
 pointer in gsbase, and two pointers from def_optype_d).  This is 
 relatively easy to do, because all statements except ASMs have at most one 
 def (and one vdef), which themself aren't pointed to by something else, 
 unlike the use operands which have more structure for the SSA web.
 
 Performance wise the patch is a slight improvement (1% for some C++ 
 testcases, but relatively noisy, but at least not slower), bootstrap time 
 is unaffected.  As the iterator is a bit larger code size increases by 1 
 promille.
 
 The patch is regstrapped on x86_64-linux.  If it's approved I'll adjust 
 the WORD count markers in gimple.h, I left it out in this submission as 
 it's just verbose noise in comments.

So, 2nd try after some internal feedback.  This version changes the 
operand order of asms to also have the defs at the beginning, which makes 
the iterators slightly nicer, and joins some more fields of the iterator, 
though not all that we could merge.

Again, if approved I'll adjust the word count markers.  Regstrapping on 
x86_64-linux in progress, speed similar as before.  Okay for trunk?


Ciao,
Michael.
-- 
* tree-ssa-operands.h (struct def_optype_d, def_optype_p): Remove.
(ssa_operands.free_defs): Remove.
(DEF_OP_PTR, DEF_OP): Remove.
(struct ssa_operand_iterator_d): Remove 'defs', add 'flags'
members, rename 'phi_stmt' to 'stmt', 'phi_i' to 'i' and 'num_phi'
to 'numops'.
* gimple.h (gimple_statement_with_ops.def_ops): Remove.
(gimple_def_ops, gimple_set_def_ops): Remove.
(gimple_vdef_op): Don't take const gimple, adjust.
(gimple_asm_input_op, gimple_asm_input_op_ptr,
gimple_asm_set_input_op, gimple_asm_output_op,
gimple_asm_output_op_ptr, gimple_asm_set_output_op): Adjust asserts,
and rewrite to move def operands to front.
(gimple_asm_clobber_op, gimple_asm_set_clobber_op,
gimple_asm_label_op, gimple_asm_set_label_op): Correct asserts.
* tree-ssa-operands.c (build_defs): Remove.
(init_ssa_operands): Don't initialize it.
(fini_ssa_operands): Don't free it.
(cleanup_build_arrays): Don't truncate it.
(finalize_ssa_stmt_operands): Don't assert on it.
(alloc_def, add_def_op, append_def): Remove.
(finalize_ssa_defs): Remove building of def_ops list.
(finalize_ssa_uses): Don't mark for SSA renaming here, ...
(add_stmt_operand): ... but here, don't call append_def.
(get_indirect_ref_operands): Remove recurse_on_base argument.
(get_expr_operands): Adjust call to get_indirect_ref_operands.
(verify_ssa_operands): Don't check def operands.
(free_stmt_operands): Don't free def operands.
* gimple.c (gimple_copy): Don't clear def operands.
* tree-flow-inline.h (op_iter_next_use): Adjust to explicitely
handle def operand.
(op_iter_next_tree, op_iter_next_def): Ditto.
(clear_and_done_ssa_iter): Clear new fields.
(op_iter_init): Adjust to setup new iterator structure.
(op_iter_init_phiuse): Adjust.

Index: tree-ssa-operands.h
===
--- tree-ssa-operands.h.orig2012-09-24 15:24:52.0 +0200
+++ tree-ssa-operands.h 2012-10-22 15:12:30.0 +0200
@@ -34,14 +34,6 @@ typedef ssa_use_operand_t *use_operand_p
 #define NULL_USE_OPERAND_P ((use_operand_p)NULL)
 #define NULL_DEF_OPERAND_P ((def_operand_p)NULL)
 
-/* This represents the DEF operands of a stmt.  */
-struct def_optype_d
-{
-  struct def_optype_d *next;
-  tree *def_ptr;
-};
-typedef struct def_optype_d *def_optype_p;
-
 /* This represents the USE operands of a stmt.  */
 struct use_optype_d
 {
@@ -68,7 +60,6 @@ struct GTY(()) ssa_operands {
 
bool ops_active;
 
-   struct def_optype_d * GTY ((skip ())) free_defs;
struct use_optype_d * GTY ((skip ())) free_uses;
 };
 
@@ -82,9 +73,6 @@ struct GTY(()) ssa_operands {
 #define USE_OP_PTR(OP) (((OP)-use_ptr))
 #define USE_OP(OP) (USE_FROM_PTR (USE_OP_PTR (OP)))
 
-#define DEF_OP_PTR(OP) ((OP)-def_ptr)
-#define DEF_OP(OP) (DEF_FROM_PTR (DEF_OP_PTR (OP)))
-
 #define PHI_RESULT_PTR(PHI)gimple_phi_result_ptr (PHI)
 #define PHI_RESULT(PHI)DEF_FROM_PTR (PHI_RESULT_PTR (PHI))
 #define SET_PHI_RESULT(PHI, V) SET_DEF (PHI_RESULT_PTR (PHI), (V))
@@ -133,13 +121,13 @@ enum ssa_op_iter_type {
 
 typedef struct ssa_operand_iterator_d
 {
-  bool done;
   enum ssa_op_iter_type iter_type;
-  def_optype_p defs;
+  bool done;
+  int flags;
+  unsigned i;
+  unsigned numops;
   use_optype_p uses;
-  int phi_i;
-  int num_phi;
-  gimple phi_stmt;
+  gimple stmt;
 } ssa_op_iter;
 
 /* These flags are used to

[PATCH, ARM] arm_return_in_msb needs to handle TImode.

2012-10-22 Thread Matti, Manjunath

Hi,

I observed the following failure on arm big-endian: 

FAIL: tmpdir-g++.dg-struct-layout-1/t024 cp_compat_x_tst.o compile,  (internal 
compiler error)

The compiler is configured as:

armeb-montavista-linux-gnueabi-gcc -v

Using built-in specs.
COLLECT_GCC=./armeb-tools/bin/armeb-montavista-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/manjunath/NCDtools/mips/toolchain/armeb-tools/bin/../libexec/gcc/armeb-montavista-linux-gnueabi/4.7.0/lto-wrapper
Target: armeb-montavista-linux-gnueabi
Configured with: 
/home/manjunath/NCDtools/mips/toolchain/scripts/../src/configure 
--disable-fixed-point --without-ppl --without-python --disable-werror 
--enable-checking --with-sysroot 
--with-local-prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools/armeb-montavista-linux-gnueabi/sys-root
 --disable-sim --enable-symvers=gnu --enable-__cxa_atexit --with-arch=armv7-a 
--with-fpu=vfpv3-d16 --with-float=hard --with-tune=cortex-a9 
--target=armeb-montavista-linux-gnueabi --enable-languages=c,c++ 
--prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools
Thread model: posix
gcc version 4.7.0 () 


Debugging shows that ITmode is not handled by arm_return_in_msb

debug snip ...
{{{
void test2001() void test2002() void test2003() void test2004() void test2005()
Breakpoint 1, shift_return_value (mode=TImode, left_p=0 '\000', 
value=0x7033b380)
at /home/manjunath/NCDtools/mips/toolchain/scripts/../src/gcc/calls.c:2127
2127  gcc_assert (REG_P (value)  HARD_REGISTER_P (value));

(gdb) p mode
$1 = TImode

(gdb) p left_p
$2 = 0 '\000'

(gdb) p debug_rtx(value)
(parallel:TI [
(expr_list:REG_DEP_TRUE (reg:DI 63 s0)
(const_int 0 [0]))
(expr_list:REG_DEP_TRUE (reg:DI 65 s2)
(const_int 8 [0x8]))
])
$3 = void
}}}

I have attached the patch which fixes the above problem, kindly review the patch
and accept it for mainline.


Regards,
Manjunath S Matti.


TImode_fix.patch
Description: TImode_fix.patch

Re: [PATCH] Fix PR55011

2012-10-22 Thread Michael Matz

Hi,

On Mon, 22 Oct 2012, Richard Biener wrote:

 
 This fixes PR55011, it seems nothing checks for invalid lattice
 transitions in VRP,

That makes sense, because the individual parts of VRP that produce new 
ranges are supposed to not generate invalid transitions.  So if anything 
such checking should be an assert and the causes be fixed.

 so the following adds that

It's a work around ...

 since we now can produce a lot more UNDEFINED than before

... for this.  We should never produce UNDEFINED when the input wasn't 
UNDEFINED already.

 not doing so triggers issues.

Hmm?


Ciao,
Michael.

Re: [PATCH, ARM] arm_return_in_msb needs to handle TImode.

2012-10-22 Thread Richard Earnshaw


On 22/10/12 15:14, Matti, Manjunath wrote:

Hi,

I observed the following failure on arm big-endian:

FAIL: tmpdir-g++.dg-struct-layout-1/t024 cp_compat_x_tst.o compile,  (internal 
compiler error)

The compiler is configured as:

armeb-montavista-linux-gnueabi-gcc -v

Using built-in specs.
COLLECT_GCC=./armeb-tools/bin/armeb-montavista-linux-gnueabi-gcc
COLLECT_LTO_WRAPPER=/home/manjunath/NCDtools/mips/toolchain/armeb-tools/bin/../libexec/gcc/armeb-montavista-linux-gnueabi/4.7.0/lto-wrapper
Target: armeb-montavista-linux-gnueabi
Configured with: 
/home/manjunath/NCDtools/mips/toolchain/scripts/../src/configure 
--disable-fixed-point --without-ppl --without-python --disable-werror 
--enable-checking --with-sysroot 
--with-local-prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools/armeb-montavista-linux-gnueabi/sys-root
 --disable-sim --enable-symvers=gnu --enable-__cxa_atexit --with-arch=armv7-a 
--with-fpu=vfpv3-d16 --with-float=hard --with-tune=cortex-a9 
--target=armeb-montavista-linux-gnueabi --enable-languages=c,c++ 
--prefix=/home/manjunath/NCDtools/mips/toolchain/scripts/../armeb-tools
Thread model: posix
gcc version 4.7.0 ()


Debugging shows that ITmode is not handled by arm_return_in_msb

debug snip ...
{{{
void test2001() void test2002() void test2003() void test2004() void test2005()
Breakpoint 1, shift_return_value (mode=TImode, left_p=0 '\000', 
value=0x7033b380)
 at /home/manjunath/NCDtools/mips/toolchain/scripts/../src/gcc/calls.c:2127
2127  gcc_assert (REG_P (value)  HARD_REGISTER_P (value));

(gdb) p mode
$1 = TImode

(gdb) p left_p
$2 = 0 '\000'

(gdb) p debug_rtx(value)
(parallel:TI [
 (expr_list:REG_DEP_TRUE (reg:DI 63 s0)
 (const_int 0 [0]))
 (expr_list:REG_DEP_TRUE (reg:DI 65 s2)
 (const_int 8 [0x8]))
 ])
$3 = void
}}}

I have attached the patch which fixes the above problem, kindly review the patch
and accept it for mainline.


Regards,
Manjunath S Matti.=


TImode_fix.patch


N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý




That doesn't look right.  The test is far too specific.  Even if this is 
the right place for the fix (and I'm yet to be convinced that it is), 
you should be testing that the size of mode is less than some limit, not 
that it's not a specific mode.


R.

Re: [PATCH] Fix PR55011

2012-10-22 Thread Richard Biener

On Mon, 22 Oct 2012, Michael Matz wrote:

 Hi,
 
 On Mon, 22 Oct 2012, Richard Biener wrote:
 
  
  This fixes PR55011, it seems nothing checks for invalid lattice
  transitions in VRP,
 
 That makes sense, because the individual parts of VRP that produce new 
 ranges are supposed to not generate invalid transitions.  So if anything 
 such checking should be an assert and the causes be fixed.

No, the checking should be done in update_value_range which copies
the new VR over to the lattice.  The job of that function is also
to detect lattice changes.

  so the following adds that
 
 It's a work around ...

No.

  since we now can produce a lot more UNDEFINED than before
 
 ... for this.  We should never produce UNDEFINED when the input wasn't 
 UNDEFINED already.

Why?  We shouldn't update the lattice this way, yes, but that is what
the patch ensures.  The workers only compute a new value-range
for a stmt based on input value ranges.

  not doing so triggers issues.
 
 Hmm?

It oscillates and thus never finishes.

Richard.

Re: [Patch, Fortran] PR 54997: -Wunused-function gives false warnings for procedures passed as actual argument

2012-10-22 Thread Janus Weil

Minor update to the patch: It now also sets TREE_USED for entry
masters in order to avoid bogus warnings for procedures with ENTRY
(cf. comment 6 in the PR, which like comment 0 is a 4.8 regression).

Still regtests cleanly. Ok?

Cheers,
Janus



2012/10/21 Janus Weil ja...@gcc.gnu.org:
 Hi all,

 here is another patch to silence some more of the bogus warnings about
 unused functions that gfortran is currently throwing (cf. also the
 previous patch for PR 54224).

 It fixes the usage of the 'referenced' attribute, which should only be
 given to procedures which are actually 'used' (called/referenced).
 Then TREE_USED is set according to this attribute, which in turn
 silences the warning in the middle-end.

 The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk?

 Cheers,
 Janus


 2012-10-21  Janus Weil  ja...@gcc.gnu.org

 PR fortran/54997
 * decl.c (match_procedure_decl): Don't set 'referenced' attribute
 for PROCEDURE declarations.
 * parse.c (gfc_fixup_sibling_symbols,parse_contained): Don't set
 'referenced' attribute for all contained procedures.
 * trans-decl.c (gfc_get_symbol_decl): Allow for unreferenced 
 procedures.
 (build_function_decl): Set TREE_USED for referenced procedures.

 2012-10-21  Janus Weil  ja...@gcc.gnu.org

 PR fortran/54997
 * gfortran.dg/warn_unused_function_2.f90: New.


warn_unused_function_2.f90
Description: Binary data


pr54997_v2.diff
Description: Binary data

Re: [PATCH] Fix PR55011

2012-10-22 Thread Michael Matz

Hi,

On Mon, 22 Oct 2012, Richard Biener wrote:

 On Mon, 22 Oct 2012, Michael Matz wrote:
 
  Hi,
  
  On Mon, 22 Oct 2012, Richard Biener wrote:
  
   
   This fixes PR55011, it seems nothing checks for invalid lattice
   transitions in VRP,
  
  That makes sense, because the individual parts of VRP that produce new 
  ranges are supposed to not generate invalid transitions.  So if anything 
  such checking should be an assert and the causes be fixed.
 
 No, the checking should be done in update_value_range 

Exactly.  And that's the routine you're changing, but you aren't adding 
checking, you silently fix invalid transitions.  What I tried to say 
is that the one calling update_value_range with new_vr being UNDEFINED is 
wrong, and update_value_range shouldn't fix it, but assert, so that this 
wrong caller may be fixed.

 which copies the new VR over to the lattice.  The job of that function 
 is also to detect lattice changes.

Sure, but not to fix invalid input.

   so the following adds that
  
  It's a work around ...
 
 No.
 
   since we now can produce a lot more UNDEFINED than before
  
  ... for this.  We should never produce UNDEFINED when the input wasn't 
  UNDEFINED already.
 
 Why?

Because doing so _always_ means an invalid lattice transition.  UNDEFINED 
is TOP, anything not UNDEFINED is not TOP.  So going from something to 
UNDEFINED is always going upward the lattice and hence in the wrong 
direction.

 We shouldn't update the lattice this way, yes, but that is what
 the patch ensures.

An assert ensures.  A work around works around a problem.  I say that the 
problem is in those routines that produced the new UNDEFINED range in 
the first place, and it's not update_value_range's job to fix that after 
the fact.

 The workers only compute a new value-range
 for a stmt based on input value ranges.

And if they produce UNDEFINED when the input wasn't so, then _that's_ 
where the bug is.

   not doing so triggers issues.
  
  Hmm?
 
 It oscillates and thus never finishes.

I'm not sure I understand.  You claim that the workers have to produce 
UNDEFINED from non-UNDEFINED in some cases, otherwise we oscillate?  That 
sounds strange.  Or do you mean that we oscillate without your patch to 
update_value_range?  That I believe, it's the natural result of going a 
lattice the wrong way, but I say that update_value_range is not the place 
to silently fix invalid transitions.


Ciao,
Michael.

Re: Fix array bound niter estimate (PR middle-end/54937)

2012-10-22 Thread Jan Hubicka

Hi,
here is updated patch with the comments.  The fortran failures turned out to be
funny interaction in between this patch and my other change that hoped that
loop closed SSA is closed on VOPs, but it is not.

Regtested x86_64-linux, bootstrap in progress, OK?

Honza

* tree-ssa-loop-niter.c (record_estimate): Do not try to lower
the bound of non-is_exit statements.
(maybe_lower_iteration_bound): Do it here.
(estimate_numbers_of_iterations_loop): Call it.

* gcc.c-torture/execute/pr54937.c: New testcase.
* gcc.dg/tree-ssa/cunroll-2.c: Update.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192632)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2535,7 +2541,6 @@ record_estimate (struct loop *loop, tree
 gimple at_stmt, bool is_exit, bool realistic, bool upper)
 {
   double_int delta;
-  edge exit;
 
   if (dump_file  (dump_flags  TDF_DETAILS))
 {
@@ -2570,14 +2577,10 @@ record_estimate (struct loop *loop, tree
 }
 
   /* Update the number of iteration estimates according to the bound.
- If at_stmt is an exit or dominates the single exit from the loop,
- then the loop latch is executed at most BOUND times, otherwise
- it can be executed BOUND + 1 times.  */
-  exit = single_exit (loop);
-  if (is_exit
-  || (exit != NULL
-  dominated_by_p (CDI_DOMINATORS,
-exit-src, gimple_bb (at_stmt
+ If at_stmt is an exit then the loop latch is executed at most BOUND times,
+ otherwise it can be executed BOUND + 1 times.  We will lower the estimate
+ later if such statement must be executed on last iteration  */
+  if (is_exit)
 delta = double_int_zero;
   else
 delta = double_int_one;
@@ -2953,6 +2956,110 @@ gcov_type_to_double_int (gcov_type val)
   return ret;
 }
 
+/* See if every path cross the loop goes through a statement that is known
+   to not execute at the last iteration. In that case we can decrese iteration
+   count by 1.  */
+
+static void
+maybe_lower_iteration_bound (struct loop *loop)
+{
+  pointer_set_t *not_executed_last_iteration = pointer_set_create ();
+  struct nb_iter_bound *elt;
+  bool found_exit = false;
+  VEC (basic_block, heap) *queue = NULL;
+  bitmap visited;
+
+  /* Collect all statements with interesting (i.e. lower than
+ nb_iterations_upper_bound) bound on them. 
+
+ TODO: Due to the way record_estimate choose estimates to store, the bounds
+ will be always nb_iterations_upper_bound-1.  We can change this to record
+ also statements not dominating the loop latch and update the walk bellow
+ to the shortest path algorthm.  */
+  for (elt = loop-bounds; elt; elt = elt-next)
+{
+  if (!elt-is_exit
+  elt-bound.ult (loop-nb_iterations_upper_bound))
+   {
+ if (!not_executed_last_iteration)
+   not_executed_last_iteration = pointer_set_create ();
+ pointer_set_insert (not_executed_last_iteration, elt-stmt);
+   }
+}
+  if (!not_executed_last_iteration)
+return;
+
+  /* Start DFS walk in the loop header and see if we can reach the
+ loop latch or any of the exits (including statements with side
+ effects that may terminate the loop otherwise) without visiting
+ any of the statements known to have undefined effect on the last
+ iteration.  */
+  VEC_safe_push (basic_block, heap, queue, loop-header);
+  visited = BITMAP_ALLOC (NULL);
+  bitmap_set_bit (visited, loop-header-index);
+  found_exit = false;
+
+  do
+{
+  basic_block bb = VEC_pop (basic_block, queue);
+  gimple_stmt_iterator gsi;
+  bool stmt_found = false;
+
+  /* Loop for possible exits and statements bounding the execution.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+ if (pointer_set_contains (not_executed_last_iteration, stmt))
+   {
+ stmt_found = true;
+ break;
+   }
+ if (gimple_has_side_effects (stmt))
+   {
+ found_exit = true;
+ break;
+   }
+   }
+  if (found_exit)
+   break;
+
+  /* If no bounding statement is found, continue the walk.  */
+  if (!stmt_found)
+   {
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb-succs)
+   {
+ if (loop_exit_edge_p (loop, e)
+ || e == loop_latch_edge (loop))
+   {
+ found_exit = true;
+ break;
+   }
+ if (bitmap_set_bit (visited, e-dest-index))
+   VEC_safe_push (basic_block, heap, queue, e-dest);
+   }
+   }
+}
+  while (VEC_length (basic_block, queue)  !found_exit);
+
+  /* If every path through the loop reach bounding statement before exit,
+ then we know

Minor record_upper_bound tweek

2012-10-22 Thread Jan Hubicka

Hi,
with profile feedback we may misupdate the profile and start to believe that 
loops
iterate more times than they do.  This patch makes at least 
nb_iterations_estimate
no greater than nb_iterations_upper_bound.  This makes the 
unrolling/peeling/unswitching
heuristics to behave more consistently.
Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree-sssa-loop-niter.c (record_niter_bound): Be sure that realistic
estimate is not bigger than upper bound.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192632)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2506,13 +2506,20 @@ record_niter_bound (struct loop *loop, d
 {
   loop-any_upper_bound = true;
   loop-nb_iterations_upper_bound = i_bound;
+  if (loop-any_estimate
+  i_bound.ult (loop-nb_iterations_estimate))
+loop-nb_iterations_estimate = i_bound;
 }
   if (realistic
(!loop-any_estimate
  || i_bound.ult (loop-nb_iterations_estimate)))
 {
   loop-any_estimate = true;
-  loop-nb_iterations_estimate = i_bound;
+  if (loop-nb_iterations_upper_bound.ult (i_bound)
+   loop-any_upper_bound)
+loop-nb_iterations_estimate = loop-nb_iterations_upper_bound;
+  else
+loop-nb_iterations_estimate = i_bound;
 }
 
   /* If an upper bound is smaller than the realistic estimate of the

Loop closed SSA loop update

2012-10-22 Thread Jan Hubicka

Hi,
this patch updates tree_unroll_loops_completely to update loop closed SSA.
WHen unlooping the loop some basic blocks may move out of the other loops
and that makes the need to check their use and add PHIs.
Fortunately update_loop_close_ssa already support local updates and thus
this can be done quite cheaply by recoridng the blocks in fix_bb_placements
and passing it along.

I tried the patch with TODO_update_ssa_no_phi but that causes weird bug
in 3 fortran testcases because VOPS seems to not be in the loop closed
form.  We can track this incrementally I suppose.

Bootstrapped/regtested x86_64-linux, OK?

Honza

PR middle-end/54967
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Take
loop_closed_ssa_invalidated parameter; pass it along.
(canonicalize_loop_induction_variables): Update loop closed SSA.
(tree_unroll_loops_completely): Likewise.
* cfgloop.h (unloop): UPdate prototype.
* cfgloopmanip.c (fix_bb_placements): Record BBs updated into
optional bitmap.
(unloop): Update to pass along loop_closed_ssa_invalidated.

* gfortran.dg/pr54967.f90: New testcase.
Index: tree-ssa-loop-ivcanon.c
===
--- tree-ssa-loop-ivcanon.c (revision 192632)
+++ tree-ssa-loop-ivcanon.c (working copy)
@@ -390,13 +390,16 @@ loop_edge_to_cancel (struct loop *loop)
EXIT is the exit of the loop that should be eliminated.  
IRRED_INVALIDATED is used to bookkeep if information about
irreducible regions may become invalid as a result
-   of the transformation.  */
+   of the transformation.  
+   LOOP_CLOSED_SSA_INVALIDATED is used to bookkepp the case
+   when we need to go into loop closed SSA form.  */
 
 static bool
 try_unroll_loop_completely (struct loop *loop,
edge exit, tree niter,
enum unroll_level ul,
-   bool *irred_invalidated)
+   bool *irred_invalidated,
+   bitmap loop_closed_ssa_invalidated)
 {
   unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
   gimple cond;
@@ -562,7 +565,7 @@ try_unroll_loop_completely (struct loop 
   locus = latch_edge-goto_locus;
 
   /* Unloop destroys the latch edge.  */
-  unloop (loop, irred_invalidated);
+  unloop (loop, irred_invalidated, loop_closed_ssa_invalidated);
 
   /* Create new basic block for the latch edge destination and wire
 it in.  */
@@ -615,7 +618,8 @@ static bool
 canonicalize_loop_induction_variables (struct loop *loop,
   bool create_iv, enum unroll_level ul,
   bool try_eval,
-  bool *irred_invalidated)
+  bool *irred_invalidated,
+  bitmap loop_closed_ssa_invalidated)
 {
   edge exit = NULL;
   tree niter;
@@ -663,7 +667,8 @@ canonicalize_loop_induction_variables (s
   (int)max_loop_iterations_int (loop));
 }
 
-  if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated))
+  if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated,
+ loop_closed_ssa_invalidated))
 return true;
 
   if (create_iv
@@ -683,13 +688,15 @@ canonicalize_induction_variables (void)
   struct loop *loop;
   bool changed = false;
   bool irred_invalidated = false;
+  bitmap loop_closed_ssa_invalidated = BITMAP_ALLOC (NULL);
 
   FOR_EACH_LOOP (li, loop, 0)
 {
   changed |= canonicalize_loop_induction_variables (loop,
true, UL_SINGLE_ITER,
true,
-   irred_invalidated);
+   irred_invalidated,
+   
loop_closed_ssa_invalidated);
 }
   gcc_assert (!need_ssa_update_p (cfun));
 
@@ -701,6 +708,13 @@ canonicalize_induction_variables (void)
  evaluation could reveal new information.  */
   scev_reset ();
 
+  if (!bitmap_empty_p (loop_closed_ssa_invalidated))
+{
+  gcc_checking_assert (loops_state_satisfies_p (LOOP_CLOSED_SSA));
+  rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+}
+  BITMAP_FREE (loop_closed_ssa_invalidated);
+
   if (changed)
 return TODO_cleanup_cfg;
   return 0;
@@ -794,11 +808,15 @@ tree_unroll_loops_completely (bool may_i
   bool changed;
   enum unroll_level ul;
   int iteration = 0;
+  bool irred_invalidated = false;
 
   do
 {
-  bool irred_invalidated = false;
   changed = false;
+  bitmap loop_closed_ssa_invalidated = NULL;
+
+  if (loops_state_satisfies_p (LOOP_CLOSED_SSA))
+   loop_closed_ssa_invalidated = BITMAP_ALLOC (NULL);

Patch ping

2012-10-22 Thread Jakub Jelinek

Hi!

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01538.html
  - PR54844 with lots of dups, C++ FE ICE with sizeof in template

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01700.html
  - PR54970 small DW_OP_GNU_implicit_pointer improvements
  - the dwarf2out.c and tree-sra.c bits of the patch already
acked, but cfgexpand.c and var-tracking.c bits are not

Jakub

Re: [PATCH] Intrinsics for fxsave[,64], xsave[,64], xsaveopt[,64]

2012-10-22 Thread Uros Bizjak

On Mon, Oct 22, 2012 at 5:25 PM, Alexander Ivchenko aivch...@gmail.com wrote:
 Please take a look at the updated patch. There is, thanks to Uros, changed
 expander and asm patterns.

 Considering H.J.'s comments:

 1) Yes, I added new option -mxsaveopt
 2) No.The FXSAVE and FXRSTOR instructions are not considered part of the SSE
 instruction group.
 3) Done.
 4) Fixed.
 5) I'm not sure, there was already BIT_FXSAVE in cpuid.h, that had been
 using in
/libgcc/config/i386/crtfastmath.c. I didn't change that. May be it would
 be enough to
change the option name from -mfxsave to -mfxsr?
 6) Not sure about the list of all processors, that support those features. I
 added to those I know support them.
 7) Done.

Restore-type insns do not store to memory, but read memory, so they
should be defined like:

  [(unspec_volatile [(match_operand:BLK 0 memory_operand m)]
UNSPECV_FXRSTOR)]

Where save-type insn should look like:

  [(set (match_operand:BLK 0 memory_operand =m)
(unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE)]

When they also read additional registers:

   [(unspec_volatile:BLK
  [(match_operand:BLK 0 memory_operand m)
   (match_operand:SI 1 register_operand a)
   (match_operand:SI 2 register_operand d)]
UNSPECV_XRSTOR)]

and

   [(set (match_operand:BLK 0 memory_operand =m)
(unspec_volatile:BLK
  [(match_operand:SI 1 register_operand a)
   (match_operand:SI 2 register_operand d)]
UNSPECV_XSAVE)]

(And in similar way a 32bit patterns with DImode operand).

I missed this detail in my previous review.

BTW: BLKmode is a bit unusual, so I hope these patterns work as expected.

Also, please do not use mem and mask in the headers; use __P and
__M for pointer and mask, as is the case in other headers.

Uros.

Re: Minimize downward code motion during reassociation

2012-10-22 Thread Easwaran Raman

On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

The reason I call the above multiple times is that gsi_move_before
might get called between two calls to the above. For instance, after
rewrite_expr_tree is called once, the following sequence of calls
could happen:
reassociate_bb - linearize_expr_tree - linearize_expr -
gsi_move_before.  So it is not sufficient to call
renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
use renumber_gimple_stmt_uids_in_blocks instead of
assign_uids_in_relevant_bbs?



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems to me
 this is quadratic in the number of ops in the op vector.

The call to ensure_ops_are_available inside rewrite_expr_tree is
guarded by if (!moved) and I am setting moved = true there to ensure
that ensure_ops_are_available inside is called once per reassociation
of a expression tree.


 Why make this all so complicated?  It seems to me that we should
 fixup stmt order only after the whole ops vector has been materialized.



 +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
 +  gimple def_stmt;
 +  basic_block def_bb;
 +  /* Ignore constants and operands with default definitons.  */
 +  if (TREE_CODE (oe-op) != SSA_NAME
 +  || SSA_NAME_IS_DEFAULT_DEF (oe-op))
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  def_bb = gimple_bb (def_stmt);
 +  if (def_bb != insert_bb
 +   !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
 +{
 +  insert_bb = def_bb;
 +  insert_stmt = def_stmt;
 +}
 +  else if (def_bb == insert_bb
 +gimple_uid (insert_stmt)  gimple_uid (def_stmt))
 +insert_stmt = def_stmt;
 +}
 +  if (insert_stmt == stmt)
 +return;
 +  gsistmt = gsi_for_stmt (stmt);
 +  /* If GSI_STMT is a phi node, then do not insert just

Re: Fix bugs introduced by switch-case profile propagation

2012-10-22 Thread Easwaran Raman

Ping.


On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote:
 Hi,
  This patch fixes bugs introduced by my previous patch to propagate
 profiles during switch expansion. Bootstrap and profiledbootstrap
 successful on x86_64. Confirmed that it fixes the crashes reported in
 PR middle-end/54957. OK for trunk?

 - Easwaran

 2012-10-17   Easwaran Raman  era...@google.com

 PR target/54938
 PR middle-end/54957
 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
 only if it doesn't already exist.
 * except.c (sjlj_emit_function_enter): Remove unused variable.
 * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
 (emit_case_dispatch_table): Handle the case where STMT_BB is
 NULL.
 (expand_sjlj_dispatch_table): Pass BB containing before_case
 to emit_case_dispatch_table.

 Index: gcc/optabs.c
 ===
 --- gcc/optabs.c (revision 192488)
 +++ gcc/optabs.c (working copy)
 @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
 profile_status != PROFILE_ABSENT
 insn
 JUMP_P (insn)
 -   any_condjump_p (insn))
 -{
 -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
 -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 -}
 +   any_condjump_p (insn)
 +   !find_reg_note (insn, REG_BR_PROB, 0))
 +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
  }

  /* Generate code to compare X with Y so that the condition codes are
 Index: gcc/except.c
 ===
 --- gcc/except.c (revision 192488)
 +++ gcc/except.c (working copy)
 @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
if (dispatch_label)
  {
  #ifdef DONT_USE_BUILTIN_SETJMP
 -  rtx x, last;
 +  rtx x;
x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
 LCT_RETURNS_TWICE,
 TYPE_MODE (integer_type_node), 1,
 plus_constant (Pmode, XEXP (fc, 0),
 Index: gcc/stmt.c
 ===
 --- gcc/stmt.c (revision 192488)
 +++ gcc/stmt.c (working copy)
 @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
edge e;
edge_iterator ei;
int prob_sum = 0;
 +  if (!bb)
 +return 0;
FOR_EACH_EDGE(e, ei, bb-succs)
  prob_sum += e-probability;
return prob_sum;
 @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
rtx fallback_label = label_rtx (case_list-code_label);
rtx table_label = gen_label_rtx ();
bool has_gaps = false;
 -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
 -  int default_prob = default_edge-probability;
 +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
 +  int default_prob = default_edge ? default_edge-probability : 0;
int base = get_outgoing_edge_probs (stmt_bb);
bool try_with_tablejump = false;

 @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
default_prob = 0;
  }

 -  default_edge-probability = default_prob;
 +  if (default_edge)
 +default_edge-probability = default_prob;

/* We have altered the probability of the default edge. So the 
 probabilities
   of all other edges need to be adjusted so that it sums up to
 @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,

emit_case_dispatch_table (index_expr, index_type,
   case_list, default_label,
 - minval, maxval, range, NULL);
 + minval, maxval, range,
 +BLOCK_FOR_INSN (before_case));
emit_label (default_label);
free_alloc_pool (case_node_pool);
  }

[PATCH] Fix CSE RTL sharing ICE (PR rtl-optimization/55010)

2012-10-22 Thread Jakub Jelinek

Hi!

On the following testcase we have IF_THEN_ELSE in insn notes,
and when folding it, folded_arg1 is a subreg from earlier CC setter,
as the other argument has equiv constant, simplify_relational_operation
is called on it to simplify it and we end up with invalid RTL sharing
of the subreg in between the CC setter insn and the insn with the REG_EQ*
note.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-10-22  Jakub Jelinek  ja...@redhat.com

PR rtl-optimization/55010
* cse.c (fold_rtx): Call copy_rtx on folded_arg{0,1} before passing
it to simplify_relational_operation.

* gcc.dg/pr55010.c: New test.

--- gcc/cse.c.jj2012-10-16 13:15:45.0 +0200
+++ gcc/cse.c   2012-10-22 10:44:34.100033945 +0200
@@ -3461,8 +3461,8 @@ fold_rtx (rtx x, rtx insn)
}
 
   {
-   rtx op0 = const_arg0 ? const_arg0 : folded_arg0;
-   rtx op1 = const_arg1 ? const_arg1 : folded_arg1;
+   rtx op0 = const_arg0 ? const_arg0 : copy_rtx (folded_arg0);
+   rtx op1 = const_arg1 ? const_arg1 : copy_rtx (folded_arg1);
 new_rtx = simplify_relational_operation (code, mode, mode_arg0, op0, 
op1);
   }
   break;
--- gcc/testsuite/gcc.dg/pr55010.c.jj   2012-10-22 10:47:47.289857369 +0200
+++ gcc/testsuite/gcc.dg/pr55010.c  2012-10-22 10:47:33.0 +0200
@@ -0,0 +1,13 @@
+/* PR rtl-optimization/55010 */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-additional-options -march=i686 { target { { i?86-*-* x86_64-*-* }  
ia32 } } } */
+
+long long int a;
+unsigned long long int b;
+
+void
+foo (void)
+{
+  a = (a  0) / ((a -= b) ? b = ((b = a) || 0) : 0);
+}

Jakub

[C++ PATCH] Fix cplus_decl_attributes (PR c++/54988)

2012-10-22 Thread Jakub Jelinek

Hi!

cplus_decl_attributes assumes that if attributes is NULL, there is nothing
to do in decl_attributes, unfortunately that call can add implicit
attributes based on currently active pragmas, at least for FUNCTION_DECLs.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2012-10-22  Jakub Jelinek  ja...@redhat.com

PR c++/54988
* decl2.c (cplus_decl_attributes): Don't return early
if attributes is NULL.

* c-c++-common/pr54988.c: New test.

--- gcc/cp/decl2.c.jj   2012-10-08 21:37:27.0 +0200
+++ gcc/cp/decl2.c  2012-10-22 12:43:04.994700609 +0200
@@ -1309,8 +1309,7 @@ void
 cplus_decl_attributes (tree *decl, tree attributes, int flags)
 {
   if (*decl == NULL_TREE || *decl == void_type_node
-  || *decl == error_mark_node
-  || attributes == NULL_TREE)
+  || *decl == error_mark_node)
 return;
 
   if (processing_template_decl)
@@ -1319,8 +1318,6 @@ cplus_decl_attributes (tree *decl, tree
return;
 
   save_template_attributes (attributes, decl);
-  if (attributes == NULL_TREE)
-   return;
 }
 
   cp_check_const_attributes (attributes);
--- gcc/testsuite/c-c++-common/pr54988.c.jj 2012-10-22 12:50:56.332853880 
+0200
+++ gcc/testsuite/c-c++-common/pr54988.c2012-10-22 12:50:04.0 
+0200
@@ -0,0 +1,20 @@
+/* PR c++/54988 */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-additional-options -msse2 { target { i?86-*-* x86_64-*-* } } } */
+
+#if defined(__i386__) || defined(__x86_64__)
+#pragma GCC target fpmath=sse
+#endif
+
+static inline __attribute__ ((always_inline)) int
+foo (int x)
+{
+  return x;
+}
+
+int
+bar (int x)
+{
+  return foo (x);
+}

Jakub

[PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Jakub Jelinek

Hi!

On the following testcase we have two endless loops before cddce2:

Sender_signal (int Connect)
{
  int State;
  unsigned int occurrence;

  bb 2:
  if (Connect_6(D) != 0)
goto bb 8;
  else
goto bb 7;

  bb 3:
  # occurrence_8 = PHI 0(7), occurrence_12(4)
  occurrence_12 = occurrence_8 + 1;
  __builtin_printf (Sender_Signal occurrence %u\n, occurrence_12);

  bb 4:
  goto bb 3;

  bb 5:

  bb 6:
  goto bb 5;

  bb 7:
  goto bb 3;

  bb 8:
  goto bb 5;

}

The problem are the two empty bbs on the path from the conditional
at the end of bb2 and the endless loops (i.e. bb7 and bb8).
In presence of infinite loops dominance.c adds fake edges to exit pretty
arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs
computes post-dominance and adds fake edges to exit), so with the above
testcases both bb7 and bb8 have exit block as immediate post-dominator,
so find_control_dependence stops at those bb's when starting from the
2-7 resp. 2-8 edges.  bb7/bb8 don't have a control stmt at the end,
so mark_last_stmt_necessary doesn't mark any stmt as necessary in them
and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary
and the whole endless loop with printfs in it is removed.

The following patch fixes it by detecting such problematic blocks
and recursing on them in mark_control_dependence_edges_necessary.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-10-22  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/55018
* tree-ssa-dce.c (mark_last_stmt_necessary): Return bool whether
mark_stmt_necessary was called.
(mark_control_dependence_edges_necessary): Recurse on cd_bb if
mark_last_stmt_necessary hasn't marked a control stmt, cd_bb
has exit block as immediate dominator and a single succ edge.

* gcc.dg/torture/pr55018.c: New test.

--- gcc/tree-ssa-dce.c.jj   2012-08-15 10:55:33.0 +0200
+++ gcc/tree-ssa-dce.c  2012-10-22 16:50:03.011497546 +0200
@@ -381,7 +381,7 @@ mark_stmt_if_obviously_necessary (gimple
 
 /* Mark the last statement of BB as necessary.  */
 
-static void
+static bool
 mark_last_stmt_necessary (basic_block bb)
 {
   gimple stmt = last_stmt (bb);
@@ -391,7 +391,11 @@ mark_last_stmt_necessary (basic_block bb
 
   /* We actually mark the statement only if it is a control statement.  */
   if (stmt  is_ctrl_stmt (stmt))
-mark_stmt_necessary (stmt, true);
+{
+  mark_stmt_necessary (stmt, true);
+  return true;
+}
+  return false;
 }
 
 
@@ -423,8 +427,18 @@ mark_control_dependent_edges_necessary (
  continue;
}
 
-  if (!TEST_BIT (last_stmt_necessary, cd_bb-index))
-   mark_last_stmt_necessary (cd_bb);
+  if (!TEST_BIT (last_stmt_necessary, cd_bb-index)
+  !mark_last_stmt_necessary (cd_bb))
+   {
+ /* In presence of infinite loops, some bbs on a path
+to an infinite loop might not end with a control stmt,
+but due to a fake edge to exit stop find_control_dependence.
+Recurse for those.  */
+ if (get_immediate_dominator (CDI_POST_DOMINATORS, cd_bb)
+ == EXIT_BLOCK_PTR
+  single_succ_p (cd_bb))
+   mark_control_dependent_edges_necessary (cd_bb, el, false);
+   }
 }
 
   if (!skipped)
--- gcc/testsuite/gcc.dg/torture/pr55018.c.jj   2012-10-22 16:53:56.623083723 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr55018.c  2012-10-22 16:54:21.278934668 
+0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/55018 */
+/* { dg-do compile } */
+/* { dg-options -fdump-tree-optimized } */
+
+void
+foo (int x)
+{
+  unsigned int a = 0;
+  int b = 3;
+  if (x)
+b = 0;
+lab:
+  if (x)
+goto lab;
+  a++;
+  if (b != 2)
+__builtin_printf (%u, a);
+  goto lab;
+}
+
+/* { dg-final { scan-tree-dump printf optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Jakub

Re: Fix PR 53701

2012-10-22 Thread Andrey Belevantsev


On 16.10.2012 11:50, Andrey Belevantsev wrote:


The below is the port of this patch to 4.7, took longer than expected but
still.  Will commit after retesting on x86-64 (testing on ia64 is already
fine) and with the fix for PR 53975.


Now the same patch is also committed to 4.6 after more wait and testing.

Andrey

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Steven Bosscher

On Mon, Oct 22, 2012 at 9:35 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 On the following testcase we have two endless loops before cddce2:

 Sender_signal (int Connect)
 {
   int State;
   unsigned int occurrence;

   bb 2:
   if (Connect_6(D) != 0)
 goto bb 8;
   else
 goto bb 7;

   bb 3:
   # occurrence_8 = PHI 0(7), occurrence_12(4)
   occurrence_12 = occurrence_8 + 1;
   __builtin_printf (Sender_Signal occurrence %u\n, occurrence_12);

   bb 4:
   goto bb 3;

   bb 5:

   bb 6:
   goto bb 5;

   bb 7:
   goto bb 3;

   bb 8:
   goto bb 5;

 }

 The problem are the two empty bbs on the path from the conditional
 at the end of bb2 and the endless loops (i.e. bb7 and bb8).
 In presence of infinite loops dominance.c adds fake edges to exit pretty
 arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs
 computes post-dominance and adds fake edges to exit), so with the above
 testcases both bb7 and bb8 have exit block as immediate post-dominator,
 so find_control_dependence stops at those bb's when starting from the
 2-7 resp. 2-8 edges.  bb7/bb8 don't have a control stmt at the end,
 so mark_last_stmt_necessary doesn't mark any stmt as necessary in them
 and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary
 and the whole endless loop with printfs in it is removed.

I'm not sure I'm following this alright, but AFAICT bb7 and bb8 are
control-dependent on the if in bb2. To preserve the infinite-loop
semantics the control parent of the infinite loop must be inherently
preserved (because empty infinite loops can't mark any feeding
statements). So shouldn't the code in find_obviously_necessary_stmts
that handles infinite loops mark the last statement of control parents
necessary?

Ciao!
Steven

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Jakub Jelinek

On Mon, Oct 22, 2012 at 09:48:16PM +0200, Steven Bosscher wrote:
 On Mon, Oct 22, 2012 at 9:35 PM, Jakub Jelinek ja...@redhat.com wrote:
  On the following testcase we have two endless loops before cddce2:
 
  Sender_signal (int Connect)
  {
int State;
unsigned int occurrence;
 
bb 2:
if (Connect_6(D) != 0)
  goto bb 8;
else
  goto bb 7;
 
bb 3:
# occurrence_8 = PHI 0(7), occurrence_12(4)
occurrence_12 = occurrence_8 + 1;
__builtin_printf (Sender_Signal occurrence %u\n, occurrence_12);
 
bb 4:
goto bb 3;
 
bb 5:
 
bb 6:
goto bb 5;
 
bb 7:
goto bb 3;
 
bb 8:
goto bb 5;
 
  }
 
  The problem are the two empty bbs on the path from the conditional
  at the end of bb2 and the endless loops (i.e. bb7 and bb8).
  In presence of infinite loops dominance.c adds fake edges to exit pretty
  arbitrarily (it uses FOR_EACH_BB_REVERSE and for unconnected bbs
  computes post-dominance and adds fake edges to exit), so with the above
  testcases both bb7 and bb8 have exit block as immediate post-dominator,
  so find_control_dependence stops at those bb's when starting from the
  2-7 resp. 2-8 edges.  bb7/bb8 don't have a control stmt at the end,
  so mark_last_stmt_necessary doesn't mark any stmt as necessary in them
  and thus if (Connect_6(D) != 0) GIMPLE_COND is never marked as necessary
  and the whole endless loop with printfs in it is removed.
 
 I'm not sure I'm following this alright, but AFAICT bb7 and bb8 are
 control-dependent on the if in bb2. To preserve the infinite-loop
 semantics the control parent of the infinite loop must be inherently
 preserved (because empty infinite loops can't mark any feeding
 statements). So shouldn't the code in find_obviously_necessary_stmts
 that handles infinite loops mark the last statement of control parents
 necessary?

If bb7 and bb8 aren't there and bb2 branches directly to bb3 and bb5,
then things work correctly, find_control_dependence then says that
the 2-3 edge is control parent of bb3 and bb4 (bb3 immediate post-dominator
is bb4, bb4 is immediately post-dominated through fake edge by exit) and
similarly 2-5 edge is control parent of bb5 and bb6.  Then
find_obviously_necessary_stmts does:
  FOR_EACH_LOOP (li, loop, 0)
if (!finite_loop_p (loop))
  {
if (dump_file)
  fprintf (dump_file, can not prove finiteness of loop %i\n, 
loop-num);
mark_control_dependent_edges_necessary (loop-latch, el, false);
  }
and that marks the control stmt in bb2 as necessary, because edge 2-3 is
in bb3 and bb4 bitmap and edge 2-5 is in bb5 and bb6 control dependence
bitmap.  The problem with bb7/bb8 is that because they have fake edges to
exit too, find_control_dependence stops at them, thus 2-7 is considered
control parent of bb7 and 2-8 control parent of bb8, and 7-3 is considered
control parent of bb3 and bb4 and 8-5 of bb5 and bb6.  Thus,
mark_control_dependent_edges_necessary called on say the bb4 latch calls
marks_last_stmt_necessary on bb7, but, there is no last stmt in that bb,
nothing to mark necessary and it silently stops there.

What my patch does is change it so that in that case it doesn't stop there,
but recurses.

Jakub

Re: unordered set design modification

2012-10-22 Thread François Dumont


Attached patch applied.

2012-10-22  François Dumont  fdum...@gcc.gnu.org

* include/bits/unordered_set.h (unordered_set): Prefer
aggregation to inheritance with _Hashtable.
(unordered_multiset): Likewise.
* include/debug/unordered_set (operator==): Adapt.
* include/profile/unordered_set (operator==): Adapt.

I will now take care of unordered_map and unordered_multimap.

François


On 10/22/2012 12:21 AM, Jonathan Wakely wrote:

On 21 October 2012 20:43, François Dumont wrote:

On 10/21/2012 06:21 PM, Jonathan Wakely wrote:

On 20 October 2012 22:07, François Dumont  wrote:

Hi

  Following remarks in PR 53067 regarding design of unordered
containers

Which remarks specifically?

My understanding was that Paolo's suggestion to redesign things was to
avoid public inheritance, which we now do anyway.


here is a patch to prefer aggregation to inheritance with _Hashtable. I
hope
it is what you had in mind Jonathan. If so I will do the same for
unordered_[multi]map.

Are you referring to my comments in the hashtable local iterator
thread last December?  Because IIRC my concern was about deriving from
the user-supplied Hash and Pred types and this new patch doesn't alter
that.  What is the advantage of this new patch?

(Apologies if I'm forgetting some other suggestion of mine.)

I think my concerns about deriving from user-supplied types are
addressed by using the EBO helper (which prevents deriving from types
with virtual functions, as the vptr makes the class non-empty) and by
using private inheritance.


This patch is coming from this remark:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52942#c4

You should be careful when you do remarks, they can have a strong impact ;-)

Ah yes, that comment.

As hinted at there, I was concerned about inheriting virtual
functions, but that's avoided by the EBO helper.

And I still think that using std::tuple would have avoided all the
issues with inheritance and kept the advantages of the EBO. That would
be too big a redesign now though.


I fully agree with this remark just because for me encapsulation is a very
important concept and aggregation offers better encapsulation than
inheritance. This way unordered containers will expose only Standard
methods. It doesn't fix any known issue at the moment even if this clean
design would have avoid the 53067 issue.

It doesn't expose any non-standard members now that we use private
inheritance.  I do think composition is better than inheritance, but
I'm concerned about more churn to that code, it would be nice if it
settled down soon!

But since we still need to exploit the EBO for the node allocator, I
guess the code still needs to change anyway, so I'm ok with your
patch.  Using the EBO for empty allocators reduces
sizeof(unordered_setint) from 64 to 56, although it obviously
changes the layout of the class in an incompatible way. It's
unfortunate the allocator is the first member.

Some comments on the comments:

+   *  @param __n  Minimal initial number of bucket.

Should be buckets

+   *  @param  __x  An %unordere_set of identical element and allocator

unordered_set

+   *  The newly-created %unordered_set contains the exact contents of @a x.

Should be __x not x.  This comment won't always be true once we
add C++11 allocator support, but we can fix the comment when that
happens.

+   *  All the elements of @a __x are copied, but unlike the copy
+   *  constructor, the allocator object is not copied.

This might not be true either, depending on the allocator.

+   *  This function fills a %unordered_set with copies of the elements in

an not a

+  ///  Returns the allocator object with which the %unordered_set was
+  ///  constructed.

In C++11 allocators can be replaced after construction.

+   *  Insertion requires atmortized constant time.

amortized (in several places)

+   *  This function only makes sense for unordered_multisets; for
+   *  unordered_set the result will either be 0 (not present) or 1
+   *  (present).

I don't like these only makes sense comments, but I realise they're
just copied from std:set so nevermind.

+   * @brief  Returns the number of element in a given bucket.

elements

The same issues occur in the unordered_multiset comments.

Unless Paolo has any other comments about the patch then it's OK with
the comment fixes.

Thanks!



Index: include/bits/unordered_set.h
===
--- include/bits/unordered_set.h	(revision 192694)
+++ include/bits/unordered_set.h	(working copy)
@@ -91,41 +91,624 @@
 	   class _Pred = std::equal_to_Value,
 	   class _Alloc = std::allocator_Value 
 class unordered_set
-: public __uset_hashtable_Value, _Hash, _Pred, _Alloc
 {
-  typedef __uset_hashtable_Value, _Hash, _Pred, _Alloc  _Base;
+  typedef __uset_hashtable_Value, _Hash, _Pred, _Alloc  _Hashtable;
+  _Hashtable _M_h;
 
 public:
-  typedef

Re: [MIPS] Implement static stack checking

2012-10-22 Thread Richard Sandiford

Eric Botcazou ebotca...@adacore.com writes:
 This implements static stack checking for MIPS, i.e. checking of the static 
 part of the frame in the prologue when -fstack-check is specified.  This is 
 very similar to the PowerPC and SPARC implementations and makes it possible 
 to 
 pass the full ACATS testsuite with -fstack-check.

 Tested on mips64el-linux-gnu (n32/32/64), OK for the mainline?

The Ada bits I'll leave to you. :-)  The config/mips stuff looks good,
but a couple of nits:

 +(define_insn probe_stack_rangeP:mode
 +  [(set (match_operand:P 0 register_operand =r)
 + (unspec_volatile:P [(match_operand:P 1 register_operand 0)
 + (match_operand:P 2 register_operand r)]
 + UNSPEC_PROBE_STACK_RANGE))]
 +  
 +  * return mips_output_probe_stack_range (operands[0], operands[2]);
 +  [(set_attr type unknown)
 +   (set_attr can_delay no)
 +   (set_attr mode MODE)])

Please use d rather than r in these constraints.  Please use:

  { return mips_output_probe_stack_range (operands[0], operands[2]); }

for the output line.

 +/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
 +   inclusive.  These are offsets from the current stack pointer.  */
 +
 +static void
 +mips_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
 +{

This function doesn't work with MIPS16 mode.  Maybe just:

  if (TARGET_MIPS16)
sorry (MIPS16 stack probes);

(We can't test TARGET_MIPS16 in something like STACK_CHECK_STATIC_BUILTIN
because MIPS16ness is a per-function property.)

 +  /* See if we have a constant small number of probes to generate.  If so,
 + that's the easy case.  */
 +  if (first + size = 32768)
 +{
 +  HOST_WIDE_INT i;
 +
 +  /* Probe at FIRST + N * PROBE_INTERVAL for values of N from 1 until
 +  it exceeds SIZE.  If only one probe is needed, this will not
 +  generate any code.  Then probe at FIRST + SIZE.  */
 +  for (i = PROBE_INTERVAL; i  size; i += PROBE_INTERVAL)
 +emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
 +  -(first + i)));
 +
 +  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
 +-(first + size)));
 +}
 +
 +  /* Otherwise, do the same as above, but in a loop.  Note that we must be
 + extra careful with variables wrapping around because we might be at
 + the very top (or the very bottom) of the address space and we have
 + to be able to handle this case properly; in particular, we use an
 + equality test for the loop condition.  */
 +  else
 +{
 +  HOST_WIDE_INT rounded_size;
 +  rtx r3 = gen_rtx_REG (Pmode, GP_REG_FIRST + 3);
 +  rtx r12 = gen_rtx_REG (Pmode, GP_REG_FIRST + 12);

Please use MIPS_PROLOGUE_TEMP for r3 (and probably rename r3).
I suppose GP_REG_FIRST + 12 should be MIPS_PROLOGUE_TEMP2, probably as:

#define MIPS_PROLOGUE_TEMP2_REGNUM \
  (TARGET_MIPS16 ? gcc_unreachable () \
   cfun-machine-interrupt_handler_p ? K1_REG_NUM : GP_REG_FIRST + 12)

#define MIPS_PROLOGUE_TEMP2(MODE) \
  gen_rtx_REG (MODE, MIPS_PROLOGUE_TEMP2_REGNUM)

and update the block comment above the MIPS_PROLOGUE_TEMP_REGNUM definition.

 +  /* Sanity check for the addressing mode we're going to use.  */
 +  gcc_assert (first = 32768);
 +
 +
 +  /* Step 1: round SIZE to the previous multiple of the interval.  */
 +
 +  rounded_size = size  -PROBE_INTERVAL;
 +
 +
 +  /* Step 2: compute initial and final value of the loop counter.  */
 +
 +  /* TEST_ADDR = SP + FIRST.  */
 +  emit_insn (gen_rtx_SET (VOIDmode, r3,
 +   plus_constant (Pmode, stack_pointer_rtx,
 +  -first)));
 +
 +  /* LAST_ADDR = SP + FIRST + ROUNDED_SIZE.  */
 +  if (rounded_size  32768)
 + {
 +  emit_move_insn (r12, GEN_INT (rounded_size));
 +   emit_insn (gen_rtx_SET (VOIDmode, r12,
 +   gen_rtx_MINUS (Pmode, r3, r12)));
 + }
 +  else
 + emit_insn (gen_rtx_SET (VOIDmode, r12,
 + plus_constant (Pmode, r3, -rounded_size)));
 +
 +
 +  /* Step 3: the loop
 +
 + while (TEST_ADDR != LAST_ADDR)
 +   {
 + TEST_ADDR = TEST_ADDR + PROBE_INTERVAL
 + probe at TEST_ADDR
 +   }
 +
 + probes at FIRST + N * PROBE_INTERVAL for values of N from 1
 + until it is equal to ROUNDED_SIZE.  */
 +
 +  if (TARGET_64BIT  TARGET_LONG64)
 + emit_insn (gen_probe_stack_rangedi (r3, r3, r12));
 +  else
 + emit_insn (gen_probe_stack_rangesi (r3, r3, r12));
 +
 +
 +  /* Step 4: probe at FIRST + SIZE if we cannot assert at compile-time
 +  that SIZE is equal to ROUNDED_SIZE.  */
 +
 +  if (size != rounded_size)
 + emit_stack_probe (plus_constant (Pmode, r12, rounded_size - size));

I Might Be Wrong, but it looks like this won't probe at FIRST + SIZE
in the case where SIZE ==

Re: Constant-fold vector comparisons

2012-10-22 Thread Marc Glisse


On Mon, 15 Oct 2012, Richard Biener wrote:


On Fri, Oct 12, 2012 at 4:07 PM, Marc Glisse marc.gli...@inria.fr wrote:

On Sat, 29 Sep 2012, Marc Glisse wrote:


1) it handles constant folding of vector comparisons,

2) it fixes another place where vectors are not expected



Here is a new version of this patch.

In a first try, I got bitten by the operator priorities in a  b?c:d,
which g++ doesn't warn about.


2012-10-12  Marc Glisse  marc.gli...@inria.fr

gcc/
* tree-ssa-forwprop.c (forward_propagate_into_cond): Handle vectors.

* fold-const.c (fold_relational_const): Handle VECTOR_CST.

gcc/testsuite/
* gcc.dg/tree-ssa/foldconst-6.c: New testcase.


Here is a new version, with the same ChangeLog plus

* doc/generic.texi (VEC_COND_EXPR): Document current policy.


Which means I'd prefer if you simply condition the existing ~ and ^
handling on COND_EXPR.


Done.


-  if (integer_onep (tmp))
+  if ((gimple_assign_rhs_code (stmt) == VEC_COND_EXPR)
+ ? integer_all_onesp (tmp) : integer_onep (tmp))


and cache gimple_assign_rhs_code as a 'code' variable at the beginning
of the function.


Done.


+  if (TREE_CODE (op0) == VECTOR_CST  TREE_CODE (op1) == VECTOR_CST)
+{
+  int count = VECTOR_CST_NELTS (op0);
+  tree *elts =  XALLOCAVEC (tree, count);
+  gcc_assert (TREE_CODE (type) == VECTOR_TYPE);


A better check would be that VECTOR_CST_NELTS of type is the same
as that of op0.


I wasn't sure which check you meant, so I added both possibilities. I am 
fine with removing either or both, actually.



Ok with these changes.


A few too many changes, I prefer to re-post, in case.


On Tue, 16 Oct 2012, Richard Biener wrote:


I liked your idea of the signed boolean vector, as a way to express that we
know some vector can only have values 0 and -1, but I am not sure how to use
it.


Ah no, I didn't mean to suggest that ;)


Maybe you didn't, but I still took the idea from your words ;-)


Thus, as we defined true to -1 and false to 0 we cannot, unless relaxing
what VEC_COND_EXRP treats as true or false, optimize any of ~ or ^ -1 away.


It seems to me that what prevents from optimizing is if we want to keep the
door open for a future relaxation of what VEC_COND_EXPR accepts as its first
argument. Which means: produce only -1 and 0, but don't assume we are only
reading -1 and 0 (unless we have a reason to know it, for instance because
it is the result of a comparison), and don't assume any specific
interpretation on those other values. Not sure how much that limits possible
optimizations.


I'm not sure either - I'd rather leave the possibility open until we see
a compelling reason to go either way (read: a testcase where it matters
in practice).


Ok, I implemented the safe way. My current opinion is that we should go 
with a VEC_COND_EXPR that only accepts 0 and -1 (it is easy to pass a 
LT_EXPR or NE_EXPR as first argument if that is what one wants), but it 
can wait.


--
Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options -O -fdump-tree-ccp1 } */
+
+typedef long vec __attribute__ ((vector_size (2 * sizeof(long;
+
+vec f ()
+{
+  vec a = { -2, 666 };
+  vec b = { 3, 2 };
+  return a  b;
+}
+
+/* { dg-final { scan-tree-dump-not 666 ccp1} } */
+/* { dg-final { cleanup-tree-dump ccp1 } } */

Property changes on: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 192695)
+++ gcc/fold-const.c(working copy)
@@ -16123,20 +16123,45 @@ fold_relational_const (enum tree_code co
  TREE_IMAGPART (op0),
  TREE_IMAGPART (op1));
   if (code == EQ_EXPR)
return fold_build2 (TRUTH_ANDIF_EXPR, type, rcond, icond);
   else if (code == NE_EXPR)
return fold_build2 (TRUTH_ORIF_EXPR, type, rcond, icond);
   else
return NULL_TREE;
 }
 
+  if (TREE_CODE (op0) == VECTOR_CST  TREE_CODE (op1) == VECTOR_CST)
+{
+  unsigned count = VECTOR_CST_NELTS (op0);
+  tree *elts =  XALLOCAVEC (tree, count);
+  gcc_assert (VECTOR_CST_NELTS (op1) == count);
+  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == count);
+
+  for (unsigned i = 0; i  count; i++)
+   {
+ tree elem_type = TREE_TYPE (type);
+ tree elem0 = VECTOR_CST_ELT (op0, i);
+ tree elem1 = VECTOR_CST_ELT (op1, i);
+
+ tree tem = fold_relational_const (code, elem_type,
+   elem0,

Re: [PATCH][RFC] Re-organize how we stream trees in LTO

2012-10-22 Thread Lawrence Crowl

On 10/16/12, Diego Novillo dnovi...@google.com wrote:
 On 2012-10-16 10:43 , Richard Biener wrote:
  Diego - is PTH still live?  Thus, do I need to bother about
  inventing things in a way that can be hook-ized?

 We will eventually revive PPH.  But not in the short term.  I think
 it will come back when/if we start implementing C++ modules.
 Jason, Lawrence, is that something that you see coming for the
 next standard?

There are some people working on it, though not very publically.
Many folks would like to see modules in the next full standard,
probably circa 2017.

It is likely that the design point for standard modules will differ
from PPH, and so I don't think that the current PPH implementation
should serve as a constraint on other work.

 I suspect that the front end will need to distance itself from
 'tree' and have its own streamable IL.  So, the hooks may not be
 something we need to keep long term.

 Emitting the trees in SCC groups should not affect the C++
 streamer too much.  It already is doing its own strategy of
 emitting tree headers so it can do declaration and type merging.
 As long as the trees can be fully materialized from the SCC groups,
 it should be fine.

-- 
Lawrence Crowl

Re: unordered set design modification

2012-10-22 Thread Jonathan Wakely

On 22 October 2012 20:59, François Dumont wrote:
 Attached patch applied.

 2012-10-22  François Dumont  fdum...@gcc.gnu.org


 * include/bits/unordered_set.h (unordered_set): Prefer
 aggregation to inheritance with _Hashtable.
 (unordered_multiset): Likewise.
 * include/debug/unordered_set (operator==): Adapt.
 * include/profile/unordered_set (operator==): Adapt.

+  //@{

Do these comments work correctly?
I think it needs to be ///@{ for Doxygen to recognise it.


 I will now take care of unordered_map and unordered_multimap.

Thanks.

It occurs to me now that the copy and move operations could be
defaulted, since all they do is forward to the member, which is both
copyable and movable.

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Jakub Jelinek

On Mon, Oct 22, 2012 at 10:27:52PM +0200, Steven Bosscher wrote:
 I understand what your patch does, but I don't understand why it is correct.
 
 Why are there fake edges from bb7 and bb8 to exit when both are
 reverse-reachable from exit via the infinite loops? The infinite loops
 should be connected to exit, and bb7 and bb8 should be found by the
 DFS from the really dead ends in the cfg.

See what dominance.c does:
  if (saw_unconnected)
{
  FOR_EACH_BB_REVERSE (b)
{
  if (di-dfs_order[b-index])
continue;
  bitmap_set_bit (di-fake_exit_edge, b-index);
  di-dfs_order[b-index] = di-dfsnum;
  di-dfs_to_bb[di-dfsnum] = b;
  di-dfs_parent[di-dfsnum] = di-dfs_order[last_basic_block];
  di-dfsnum++;
  calc_dfs_tree_nonrec (di, b, reverse);
}
}
bb7/bb8 (i.e. all bbs that are always in the end followed by infinite loops)
as well as all the bbs on the infinite loops are processed the above way,
they have no real path to exit, so aren't processed on the first iteration,
they aren't processed even after adding fake edges from zero successor bbs.
calc_dfs_tree then picks pretty much random bbs (one with highest index),
adds fake edge to it, walks it, then goes on with other bbs that are still
unconnected.
dominance.c doesn't use cfgloop.h (can it?  Isn't it used before loops are
computed, perhaps after loops destroyed, etc.), so there is no guarantee
that loop-latch of endless loop will have the fake edge added and no other
bb before it.  As 7 and 8 are bigger than 4 or 6, the above loop
starts with bb 8, finds that its predecessor has already been searched and
stops there, similarly for 7, then goes on with 6 with another fake edge to
exit.

Jakub

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Steven Bosscher

On Mon, Oct 22, 2012 at 10:39 PM, Jakub Jelinek ja...@redhat.com wrote:
 dominance.c doesn't use cfgloop.h (can it?  Isn't it used before loops are
 computed, perhaps after loops destroyed, etc.), so there is no guarantee
 that loop-latch of endless loop will have the fake edge added and no other
 bb before it.  As 7 and 8 are bigger than 4 or 6, the above loop
 starts with bb 8, finds that its predecessor has already been searched and
 stops there, similarly for 7, then goes on with 6 with another fake edge to
 exit.

At least it looks like some of the cfganal DFS code could be used in
dominance.c. I will have a look.

A hack like the following should result in no fake edges for bb7 and bb8.

Ciao!
Steven


Index: dominance.c
===
--- dominance.c (revision 192517)
+++ dominance.c (working copy)
@@ -353,12 +353,15 @@
 pretend that there is an edge to the exit block.  In the second
 case, we wind up with a forest.  We need to process all noreturn
 blocks before we know if we've got any infinite loops.  */
-
+  int *revcfg_postorder = XNEWVEC (int, n_basic_blocks);
+  int n = inverted_post_order_compute (revcfg_postorder);
+  unsigned int i = (unsigned) n;
   basic_block b;
   bool saw_unconnected = false;

-  FOR_EACH_BB_REVERSE (b)
+  while (i)
{
+ basic_block b = revcfg_postorder[--i];
  if (EDGE_COUNT (b-succs)  0)
{
  if (di-dfs_order[b-index] == 0)
@@ -375,8 +378,10 @@

   if (saw_unconnected)
{
- FOR_EACH_BB_REVERSE (b)
+ i = n;
+ while (i)
{
+ basic_block b = revcfg_postorder[--i];
  if (di-dfs_order[b-index])
continue;
  bitmap_set_bit (di-fake_exit_edge, b-index);

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Jakub Jelinek

On Mon, Oct 22, 2012 at 10:51:43PM +0200, Steven Bosscher wrote:
 On Mon, Oct 22, 2012 at 10:39 PM, Jakub Jelinek ja...@redhat.com wrote:
  dominance.c doesn't use cfgloop.h (can it?  Isn't it used before loops are
  computed, perhaps after loops destroyed, etc.), so there is no guarantee
  that loop-latch of endless loop will have the fake edge added and no other
  bb before it.  As 7 and 8 are bigger than 4 or 6, the above loop
  starts with bb 8, finds that its predecessor has already been searched and
  stops there, similarly for 7, then goes on with 6 with another fake edge to
  exit.
 
 At least it looks like some of the cfganal DFS code could be used in
 dominance.c. I will have a look.
 
 A hack like the following should result in no fake edges for bb7 and bb8.

Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c
and call it in calc_dfs_tree on each unconnected bb?
I.e. (untested with the exception of the testcase):

2012-10-22  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/55018
* cfganal.c (dfs_find_deadend): No longer static.
* basic-block.h (dfs_find_deadend): New prototype.
* dominance.c (calc_dfs_tree): If saw_unconnected,
traverse from dfs_find_deadend of unconnected b
instead of b directly.

* gcc.dg/torture/pr55018.c: New test.

--- gcc/cfganal.c.jj2012-08-14 08:45:00.0 +0200
+++ gcc/cfganal.c   2012-10-22 23:04:29.620117666 +0200
@@ -593,7 +593,7 @@ post_order_compute (int *post_order, boo
that all blocks in the region are reachable
by starting an inverted traversal from the returned block.  */
 
-static basic_block
+basic_block
 dfs_find_deadend (basic_block bb)
 {
   sbitmap visited = sbitmap_alloc (last_basic_block);
--- gcc/basic-block.h.jj2012-10-17 17:18:21.0 +0200
+++ gcc/basic-block.h   2012-10-17 17:18:21.0 +0200
@@ -787,6 +787,7 @@ extern void remove_fake_exit_edges (void
 extern void add_noreturn_fake_exit_edges (void);
 extern void connect_infinite_loops_to_exit (void);
 extern int post_order_compute (int *, bool, bool);
+extern basic_block dfs_find_deadend (basic_block);
 extern int inverted_post_order_compute (int *);
 extern int pre_and_rev_post_order_compute (int *, int *, bool);
 extern int dfs_enumerate_from (basic_block, int,
--- gcc/dominance.c.jj  2012-08-15 10:55:26.0 +0200
+++ gcc/dominance.c 2012-10-22 23:07:00.941220792 +0200
@@ -377,14 +377,18 @@ calc_dfs_tree (struct dom_info *di, bool
{
  FOR_EACH_BB_REVERSE (b)
{
+ basic_block b2;
  if (di-dfs_order[b-index])
continue;
- bitmap_set_bit (di-fake_exit_edge, b-index);
- di-dfs_order[b-index] = di-dfsnum;
- di-dfs_to_bb[di-dfsnum] = b;
+ b2 = dfs_find_deadend (b);
+ gcc_checking_assert (di-dfs_order[b2-index] == 0);
+ bitmap_set_bit (di-fake_exit_edge, b2-index);
+ di-dfs_order[b2-index] = di-dfsnum;
+ di-dfs_to_bb[di-dfsnum] = b2;
  di-dfs_parent[di-dfsnum] = di-dfs_order[last_basic_block];
  di-dfsnum++;
- calc_dfs_tree_nonrec (di, b, reverse);
+ calc_dfs_tree_nonrec (di, b2, reverse);
+ gcc_checking_assert (di-dfs_order[b-index]);
}
}
 }
--- gcc/testsuite/gcc.dg/torture/pr55018.c.jj   2012-10-22 16:53:56.623083723 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr55018.c  2012-10-22 16:54:21.278934668 
+0200
@@ -0,0 +1,22 @@
+/* PR tree-optimization/55018 */
+/* { dg-do compile } */
+/* { dg-options -fdump-tree-optimized } */
+
+void
+foo (int x)
+{
+  unsigned int a = 0;
+  int b = 3;
+  if (x)
+b = 0;
+lab:
+  if (x)
+goto lab;
+  a++;
+  if (b != 2)
+__builtin_printf (%u, a);
+  goto lab;
+}
+
+/* { dg-final { scan-tree-dump printf optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Jakub

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Steven Bosscher

On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek ja...@redhat.com wrote:
 Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c
 and call it in calc_dfs_tree on each unconnected bb?
 I.e. (untested with the exception of the testcase):

Better yet, I have a patch in testing now to use cfganal's machinery
to compute the DFS forest.

Hold on a bit, I'll post it ASAP (probably Wednesday) if that's early
enough for you.
(Oh, and feel free to assign the PR to me ;-)

Ciao!
Steven

Re: [MIPS] Implement static stack checking

2012-10-22 Thread Eric Botcazou

 This function doesn't work with MIPS16 mode.  Maybe just:
 
   if (TARGET_MIPS16)
 sorry (MIPS16 stack probes);
 
 (We can't test TARGET_MIPS16 in something like STACK_CHECK_STATIC_BUILTIN
 because MIPS16ness is a per-function property.)

I put

  if (TARGET_MIPS16)
sorry (-fstack-check=specific not implemented for MIPS16);

 Please use MIPS_PROLOGUE_TEMP for r3 (and probably rename r3).
 I suppose GP_REG_FIRST + 12 should be MIPS_PROLOGUE_TEMP2, probably as:
 
 #define MIPS_PROLOGUE_TEMP2_REGNUM \
   (TARGET_MIPS16 ? gcc_unreachable () \
cfun-machine-interrupt_handler_p ? K1_REG_NUM : GP_REG_FIRST + 12)
 
 #define MIPS_PROLOGUE_TEMP2(MODE) \
   gen_rtx_REG (MODE, MIPS_PROLOGUE_TEMP2_REGNUM)
 
 and update the block comment above the MIPS_PROLOGUE_TEMP_REGNUM definition.

Done.

 I Might Be Wrong, but it looks like this won't probe at FIRST + SIZE
 in the case where SIZE == ROUNDED_SIZE, because the loop exits on that
 value without probing it.  Should the last line be unconditional,
 or does the loop need to be a do-while instead?  (I suppose the latter,
 so that there isn't a hole bigger than PROBE_INTERVAL in the
 SIZE != ROUNDED_SIZE case?)

The loop probes at FIRST + N * PROBE_INTERVAL for values of N from 1 until it 
is equal to ROUNDED_SIZE, inclusive, so FIRST + SIZE is always probed.

 This only works in noreorder mode.  If there's an asm in the function,
 or something else that forces reorder mode (e.g. a -mfix-* option),
 the addition won't be put in the delay slot.
 
 %(%beq\t%0,%1, and daddiu\t%0,%0,%1%%) should work.  (Note that
 our MIPS asm output doesn't have a space before the delay slot; there's
 a blank line after it instead.  That's all handled by output_asm_insn
 though.)

Thanks for the incantation!

 OK with those changes, thanks.

I'll retest with the changes tomorrow.  Thanks for the review.

-- 
Eric Botcazou

Re: wide int patch #6: Replacement of hwi extraction from int-csts.

2012-10-22 Thread Lawrence Crowl

On 10/19/12, Richard Biener richard.guent...@gmail.com wrote:
 The existing tree_low_cst function performs checking, so
 tree_to_hwi should as well.

 I don't think mismatch of signedness of the variable assigned to
 with the sign we use for hwi extraction is any good.  C++ isn't
 type-safe here for the return value but if we'd use a reference
 as return slot we could make it so ...  (in exchange for quite
 some ugliness IMNSHO):

 void tree_to_shwi (const_tree tree, HOST_WIDE_INT hwi);

 vs.

 void tree_to_uhwi (const_tree tree, unsigned HOST_WIDE_INT hwi);

 maybe more natural would be

 void hwi_from_tree (HOST_WIDE_INT hwi, const_tree tree);
 void hwi_from_tree (unsigned HOST_WIDE_INT hwi, const_tree tree);

 let the C++ bikeshedding begin!  (the point is to do appropriate
 checking for a conversion of (INTEGER_CST) tree to HOST_WIDE_INT
 vs.  unsigned HOST_WIDE_INT)

We could add conversion operators to achieve the effect.  However,
we probably don't want to do so until we can make them explicit.
Unfortunately, explicit conversion operators are not available
until C++11.

 No, I don't want you to do the above transform with this patch ;)

-- 
Lawrence Crowl

Re: [MIPS] Implement static stack checking

2012-10-22 Thread Eric Botcazou

 Doh!  But in that case, rather than:
 
 1:
   beq r1,r2,2f
   addiu   r1,r1,interval
   b   1b
   sw  $0,0(r1)
 2:
 
 why not just:
 
 1:
   addiu   r1,r1,interval
   bne r1,r2,1b
   sw  $0,0(r1)
 
 ?

The latter will always probe once, the former won't, if ROUNDED_SIZE == 0.

-- 
Eric Botcazou

Re: [MIPS] Implement static stack checking

2012-10-22 Thread Richard Sandiford

Eric Botcazou ebotca...@adacore.com writes:
 Doh!  But in that case, rather than:
 
 1:
  beq r1,r2,2f
  addiu   r1,r1,interval
  b   1b
  sw  $0,0(r1)
 2:
 
 why not just:
 
 1:
  addiu   r1,r1,interval
  bne r1,r2,1b
  sw  $0,0(r1)
 
 ?

 The latter will always probe once, the former won't, if ROUNDED_SIZE == 0.

But why do we want the loop at all if the rounded size is zero?
It's a compile-time constant after all.

Richard

[PATCH, committed] Fix PR55008

2012-10-22 Thread Bill Schmidt

In straight-line strength reduction, a candidate expression of the form
(type1)x + (type2)x, where type1 and type2 are compatible, results in
two interpretations of the candidate with different result types.
Because the types are compatible, the first interpretation can appear to
be a legal basis for the second, resulting in an invalid replacement.
The obvious solution is to keep a statement from serving as its own
basis.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions, committed as obvious.

Thanks,
Bill
-- 
Bill Schmidt, Ph.D.
IBM Advance Toolchain for PowerLinux
IBM Linux Technology Center
wschm...@linux.vnet.ibm.com
wschm...@us.ibm.com



gcc:

2012-10-22  Bill Schmidt  wschm...@linux.vnet.ibm.com

PR tree-optimization/55008
* gimple-ssa-strength-reduction.c (find_basis_for_candidate): Don't
allow a candidate to be a basis for itself under another interpretation.

gcc/testsuite:

2012-10-22  Bill Schmidt  wschm...@linux.vnet.ibm.com

PR tree-optimization/55008
* gcc.dg/tree-ssa/pr55008.c: New test.



Index: gcc/testsuite/gcc.dg/tree-ssa/pr55008.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr55008.c (revision 0)
@@ -0,0 +1,17 @@
+/* This used to fail to compile; see PR55008.  */
+/* { dg-do compile } */
+/* { dg-options -O2 -w } */
+
+typedef unsigned long long T;
+
+void f(void)
+{
+int a, *p;
+
+T b = 6309343725;
+
+if(*p ? (b = 1) : 0)
+if(b - (a = b /= 0) ? : (a + b))
+while(1);
+}
+
Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 192691)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -366,6 +366,7 @@ find_basis_for_candidate (slsr_cand_t c)
   slsr_cand_t one_basis = chain-cand;
 
   if (one_basis-kind != c-kind
+ || one_basis-cand_stmt == c-cand_stmt
  || !operand_equal_p (one_basis-stride, c-stride, 0)
  || !types_compatible_p (one_basis-cand_type, c-cand_type)
  || !dominated_by_p (CDI_DOMINATORS,

Re: [MIPS] Implement static stack checking

2012-10-22 Thread Richard Sandiford

Sorry, one more thing (obviously a bad night)

Eric Botcazou ebotca...@adacore.com writes:
 +  if (TARGET_64BIT  TARGET_LONG64)
 + emit_insn (gen_probe_stack_rangedi (r3, r3, r12));
 +  else
 + emit_insn (gen_probe_stack_rangesi (r3, r3, r12));

Please use:

emit_insn (PMODE_INSN (gen_probe_stack_range, (r3, r3, r12)));

for this.  The patterns will need to be _P:mode rather
than just P:mode.

Richard

Re: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver

2012-10-22 Thread Dominique Dhumieres

This patch (r192676) is probably causing

FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/strcat-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/strncpy-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Os 
FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Os 

on i?86 (see http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg02350.html ).

TIA

Dominique

[C++ Patch] PR 54922

2012-10-22 Thread Paolo Carlini


Hi,

today I spent quite a bit of time on this reject legal issue filed by 
Daniel, having to do with constrexpr constructors and anonymous union 
members: I didn't want to make the loop much more complex but we have to 
handle correctly multiple anonymous union too and of course produce 
correct diagnostics in all cases (eg, together with multiple members 
initialization diagnostics too). I figured out the below. Tested 
x86_64-linux, as usual.


Thanks,
Paolo.


/cp
2012-10-22  Paolo Carlini  paolo.carl...@oracle.com

PR c++/54922
* semantics.c (cx_check_missing_mem_inits): Handle anonymous union
members.

/testsuite
2012-10-22  Paolo Carlini  paolo.carl...@oracle.com

PR c++/54922
* g++.dg/cpp0x/constexpr-union4.C: New.
Index: testsuite/g++.dg/cpp0x/constexpr-union4.C
===
--- testsuite/g++.dg/cpp0x/constexpr-union4.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-union4.C   (working copy)
@@ -0,0 +1,13 @@
+// PR c++/54922
+// { dg-do compile { target c++11 } }
+
+class nullable_int
+{
+  bool init_;
+  union {
+unsigned char for_value_init;
+int value_;
+  };
+public:
+  constexpr nullable_int() : init_(false), for_value_init() {}
+};
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 192692)
+++ cp/semantics.c  (working copy)
@@ -6139,17 +6139,23 @@ cx_check_missing_mem_inits (tree fun, tree body, b
   for (i = 0; i = nelts; ++i)
 {
   tree index;
+  tree anon_union_init_type = NULL_TREE;
   if (i == nelts)
index = NULL_TREE;
   else
{
  index = CONSTRUCTOR_ELT (body, i)-index;
+ /* Handle anonymous union members.  */
+ if (TREE_CODE (index) == COMPONENT_REF
+  ANON_UNION_TYPE_P (TREE_TYPE (TREE_OPERAND (index, 0
+   anon_union_init_type = TREE_TYPE (TREE_OPERAND (index, 0));
  /* Skip base and vtable inits.  */
- if (TREE_CODE (index) != FIELD_DECL
- || DECL_ARTIFICIAL (index))
+ else if (TREE_CODE (index) != FIELD_DECL
+  || DECL_ARTIFICIAL (index))
continue;
}
-  for (; field != index; field = DECL_CHAIN (field))
+  for (; field != index  TREE_TYPE (field) != anon_union_init_type;
+  field = DECL_CHAIN (field))
{
  tree ftype;
  if (TREE_CODE (field) != FIELD_DECL

Re: [PATCH] Fix CDDCE miscompilation (PR tree-optimization/55018)

2012-10-22 Thread Steven Bosscher

On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek ja...@redhat.com wrote:
 Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c
 and call it in calc_dfs_tree on each unconnected bb?
 I.e. (untested with the exception of the testcase):

FWIW, dfs_find_deadend looks broken to me for this usage case. It
could return a self-loop block with more than one successor. For a
pre-order search like dominance.c needs, you'd have to look as deep as
possible, something like this:

Index: cfganal.c
===
--- cfganal.c   (revision 192696)
+++ cfganal.c   (working copy)
@@ -598,18 +598,26 @@ dfs_find_deadend (basic_block bb)
 {
   sbitmap visited = sbitmap_alloc (last_basic_block);
   sbitmap_zero (visited);
+  basic_block next_bb = NULL;
+  edge_iterator ei;
+  edge e;

   for (;;)
 {
   SET_BIT (visited, bb-index);
-  if (EDGE_COUNT (bb-succs) == 0
-  || TEST_BIT (visited, EDGE_SUCC (bb, 0)-dest-index))
+  /* Look for any not yet visited successors.
+If all successors have been visited then
+this is the dead end we're looking for.  */
+  FOR_EACH_EDGE (e, ei, bb-succs)
+   if (! TEST_BIT (visited, e-dest-index))
+ break;
+  if (e == NULL)
 {
   sbitmap_free (visited);
   return bb;
 }

-  bb = EDGE_SUCC (bb, 0)-dest;
+  bb = e-dest;
 }

   gcc_unreachable ();


(And the (EDGE_COUNT(bb-succs) == 0) is unnecessary for
inverted_post_order_compute because it already puts all such blocks on
the initial work list :-)

Ciao!
Steven

Re: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver

2012-10-22 Thread Hans-Peter Nilsson

On Tue, 23 Oct 2012, Dominique Dhumieres wrote:
 This patch (r192676) is probably causing

 FAIL: gcc.c-torture/execute/builtins/memcpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/memmove-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/mempcpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/stpcpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/stpncpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/strcat-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/strcpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/strncat-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/strncpy-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Os
 FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Os

 on i?86 (see http://gcc.gnu.org/ml/gcc-testresults/2012-10/msg02350.html ).

Confirmed, now PR55030.  I'll revert that patch pending further
investigation.  Feel free to open PR's whenever something like
this happens.

brgds, H-P

[lra] patch to fix several testsuite failures

2012-10-22 Thread Vladimir Makarov


  The following patch fixes several new testsuite failures.

  Committed as rev. 192657.

2012-10-22  Vladimir Makarov  vmaka...@redhat.com

* inherit_reload_reg (inherit_reload_reg): Print bb numbers too.
(need_for_split_p): Don't split eliminable registers.
(fix_bb_live_info): Don't use EXECUTE_IF_AND_IN_BITMAP.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 192689)
+++ lra-constraints.c   (working copy)
@@ -3939,8 +3939,8 @@ inherit_reload_reg (bool def_p, int orig
 /* We now have a new usage insn for original regno.  */
 setup_next_usage_insn (original_regno, new_insns, reloads_num, false);
   if (lra_dump_file != NULL)
-fprintf (lra_dump_file,Original reg change %d-%d:\n,
-original_regno, REGNO (new_reg));
+fprintf (lra_dump_file,Original reg change %d-%d (bb%d):\n,
+original_regno, REGNO (new_reg), BLOCK_FOR_INSN (insn)-index);
   lra_reg_info[REGNO (new_reg)].restore_regno = original_regno;
   bitmap_set_bit (check_only_regs, REGNO (new_reg));
   bitmap_set_bit (check_only_regs, original_regno);
@@ -3969,8 +3969,10 @@ inherit_reload_reg (bool def_p, int orig
   lra_update_insn_regno_info (usage_insn);
   if (lra_dump_file != NULL)
{
- fprintf (lra_dump_file, Inheritance reuse change %d-%d:\n,
-  original_regno, REGNO (new_reg));
+ fprintf (lra_dump_file,
+  Inheritance reuse change %d-%d (bb%d):\n,
+  original_regno, REGNO (new_reg),
+  BLOCK_FOR_INSN (usage_insn)-index);
  debug_rtl_slim (lra_dump_file, usage_insn, usage_insn,
  -1, 0);
}
@@ -4015,6 +4017,13 @@ need_for_split_p (HARD_REG_SET potential
 
   lra_assert (hard_regno = 0);
   return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
+  /* Don't split eliminable hard registers, otherwise we can
+ split hard registers like hard frame pointer, which
+ lives on BB start/end according to DF-infrastructure,
+ when there is a pseudo assigned to the register and
+ living in the same BB.  */
+   (regno = FIRST_PSEUDO_REGISTER
+  || ! TEST_HARD_REG_BIT (eliminable_regset, hard_regno))
! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
   /* We need at least 2 reloads to make pseudo splitting
  profitable.  We should provide hard regno splitting in
@@ -4284,7 +4293,7 @@ update_ebb_live_info (rtx head, rtx tail
   edge e;
   edge_iterator ei;
 
-  last_bb = BLOCK_FOR_INSN (tail);
+  last_bb = BLOCK_FOR_INSN (tail); 
   prev_bb = NULL;
   for (curr_insn = tail;
curr_insn != PREV_INSN (head);
@@ -4492,7 +4501,7 @@ inherit_in_ebb (rtx head, rtx tail)
  after_p = (! JUMP_P (last_insn)
  (! CALL_P (last_insn)
 || (find_reg_note (last_insn,
-  REG_NORETURN, NULL) == NULL_RTX
+  REG_NORETURN, NULL_RTX) == NULL_RTX
  ! SIBLING_CALL_P (last_insn;
  REG_SET_TO_HARD_REG_SET (live_hard_regs, df_get_live_out (curr_bb));
  IOR_HARD_REG_SET (live_hard_regs, eliminable_regset);
@@ -4800,7 +4809,6 @@ lra_inheritance (void)
   edge e;
 
   timevar_push (TV_LRA_INHERITANCE);
-
   lra_inheritance_iter++;
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file, \n** Inheritance #%d: **\n\n,
@@ -4867,11 +4875,9 @@ fix_bb_live_info (bitmap live, bitmap re
   unsigned int regno;
   bitmap_iterator bi;
 
-  EXECUTE_IF_AND_IN_BITMAP (removed_pseudos, live, 0, regno, bi)
-{
-  bitmap_clear_bit (live, regno);
+  EXECUTE_IF_SET_IN_BITMAP (removed_pseudos, 0, regno, bi)
+if (bitmap_clear_bit (live, regno))
   bitmap_set_bit (live, lra_reg_info[regno].restore_regno);
-}
 }
 
 /* Return regno of the (subreg of) REG. Otherwise, return a negative

[PATCH v3] Add support for sparc compare-and-branch

2012-10-22 Thread David Miller


Differences from v2:

1) If another control transfer comes right after a cbcond we take
   an enormous performance penalty, some 20 cycles or more.  The
   documentation specifically warns about this, so emit a nop when
   we encounter this scenerio.

2) Add a heuristic to avoid using cbcond if we know at RTL emit
   time that we're going to compare against a constant that does
   not fit in the tiny 5-bit signed immediate field.

3) Use cbcond for unconditional jumps too.

Regstrapped on sparc-unknown-linux-gnu w/--with-cpu=niagara4.

Eric and Rainer, I think that functionally this patch is fully ready
to go into the tree except for the Solaris aspects which I do not have
the means to work on.  Have either of you made any progress in this
area?

Thanks!

gcc/

2012-10-12  David S. Miller  da...@davemloft.net

* configure.ac: Add check for assembler SPARC4 instruction
support.
* configure: Rebuild.
* config.in: Add HAVE_AS_SPARC4 section.
* config/sparc/sparc.opt (mcbcond): New option.
* doc/invoke.texi: Document it.
* config/sparc/constraints.md: New constraint 'A' for 5-bit signed
immediates.
* doc/md.texi: Document it.
* config/sparc/predicates.md (arith5_operand): New predicate.
* config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_CBCOND.
(sparc_option_override): Likewise.
(emit_cbcond_insn): New function.
(emit_conditional_branch_insn): Call it.
(emit_cbcond_nop): New function.
(output_ubranch): Use cbcond, remove label arg.
(output_cbcond): New function.
* config/sparc/sparc-protos.h (output_ubranch): Update.
(output_cbcond): Declare it.
(emit_cbcond_nop): Likewise.
* config/sparc/sparc.md (type attribute): New types 'cbcond'
and uncond_cbcond.
(emit_cbcond_nop): New attribute.
(length attribute): Handle cbcond and uncond_cbcond.
(in_call_delay attribute): Reject cbcond and uncond_cbcond.
(in_branch_delay attribute): Likewise.
(in_uncond_branch_delay attribute): Likewise.
(in_annul_branch_delay attribute): Likewise.
(*cbcond_sp32, *cbcond_sp64): New insn patterns.
(jump): Rewrite into an expander.
(*jump_ubranch, *jump_cbcond): New patterns.
* config/sparc/niagara4.md: Match 'cbcond' and 'uncond_cbcond' in
'n4_cti'.
* config/sparc/sparc.h (AS_NIAGARA4_FLAG): New macro, use it
when target default is niagara4.
(SPARC_SIMM5_P): Define.
* config/sparc/sol2.h (AS_SPARC64_FLAG): Adjust.
(AS_SPARC32_FLAG): Define.
(ASM_CPU32_DEFAULT_SPEC, ASM_CPU64_DEFAULT_SPEC): Use
AS_NIAGARA4_FLAG as needed.

diff --git a/gcc/config.in b/gcc/config.in
index b13805d..791d14a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -266,6 +266,12 @@
 #endif
 
 
+/* Define if your assembler supports SPARC4 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPARC4
+#endif
+
+
 /* Define if your assembler supports fprnd. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_FPRND
diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 472490f..8862ea1 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -18,7 +18,7 @@
 ;; http://www.gnu.org/licenses/.
 
 ;;; Unused letters:
-;;;AB   
+;;; B
 ;;;ajklq  tuv xyz
 
 
@@ -62,6 +62,11 @@
 
 ;; Integer constant constraints
 
+(define_constraint A
+ Signed 5-bit integer constant
+ (and (match_code const_int)
+  (match_test SPARC_SIMM5_P (ival
+
 (define_constraint H
  Valid operand of double arithmetic operation
  (and (match_code const_double)
diff --git a/gcc/config/sparc/niagara4.md b/gcc/config/sparc/niagara4.md
index 272c8ff..61ca801 100644
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@@ -56,7 +56,7 @@
 
 (define_insn_reservation n4_cti 2
   (and (eq_attr cpu niagara4)
-(eq_attr type 
branch,call,sibcall,call_no_delay_slot,uncond_branch,return))
+(eq_attr type 
cbcond,uncond_cbcond,branch,call,sibcall,call_no_delay_slot,uncond_branch,return))
   n4_slot1, nothing)
 
 (define_insn_reservation n4_fp 11
diff --git a/gcc/config/sparc/predicates.md b/gcc/config/sparc/predicates.md
index 326524b..b64e109 100644
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@@ -391,6 +391,14 @@
   (ior (match_operand 0 register_operand)
(match_operand 0 uns_small_int_operand)))
 
+;; Return true if OP is a register, or is a CONST_INT that can fit in a
+;; signed 5-bit immediate field.  This is an acceptable second operand for
+;; the cbcond instructions.
+(define_predicate arith5_operand
+  (ior (match_operand 0 register_operand)
+   (and (match_code const_int)
+(match_test SPARC_SIMM5_P (INTVAL (op))
+
 
 ;; Predicates for miscellaneous instructions.
 
diff --git a/gcc/config/sparc/sol2.h

gcc 4.7 libgo patch committed: Set libgo version number

2012-10-22 Thread Ian Lance Taylor

PR 54918 points out that libgo is not using version numbers as it
should.  At present none of libgo in 4.6, 4.7 and mainline are
compatible with each other.  This patch to the 4.7 branch sets the
version number for libgo there.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to 4.7 branch.

Ian

Index: configure.ac
===
--- configure.ac	(revision 191576)
+++ configure.ac	(working copy)
@@ -11,7 +11,7 @@ AC_INIT(package-unused, version-unused,,
 AC_CONFIG_SRCDIR(Makefile.am)
 AC_CONFIG_HEADER(config.h)
 
-libtool_VERSION=1:0:0
+libtool_VERSION=2:1:0
 AC_SUBST(libtool_VERSION)
 
 AM_ENABLE_MULTILIB(, ..)
Index: Makefile.am
===
--- Makefile.am	(revision 192024)
+++ Makefile.am	(working copy)
@@ -1753,7 +1753,8 @@ libgo_go_objs = \
 
 libgo_la_SOURCES = $(runtime_files)
 
-libgo_la_LDFLAGS = $(PTHREAD_CFLAGS) $(AM_LDFLAGS)
+libgo_la_LDFLAGS = \
+	-version-info $(libtool_VERSION) $(PTHREAD_CFLAGS) $(AM_LDFLAGS)
 
 libgo_la_LIBADD = \
 	$(libgo_go_objs) $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)

Rebased gccgo branch on trunk

2012-10-22 Thread Ian Lance Taylor

For the last few months the gccgo branch has been based on the 4.7
branch.  I just rebased it to be on trunk.  I did this by removing the
branch (revision 192707) and creating a new copy of it based on trunk
(committed as revision 192708).

Ian

67 matches

Mail list logo