Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-14 Thread Paolo Bonzini
Il 13/10/2012 00:25, Steven Bosscher ha scritto:
 On Fri, Oct 12, 2012 at 11:16 PM, Jan Hubicka hubi...@ucw.cz wrote:
 On Fri, Oct 12, 2012 at 10:44 PM, Jan Hubicka hubi...@ucw.cz wrote:
  1) computing liveness with REG_EQUAL included prior RD that means a lot
 of shuffling of REG_DEAD notes

 I was already working on a patch for this. I'll send it here later tonight.

 Great, thanks!  This is probably most sensible approach even if we will need 
 to
 recompute liveness before/after webizer.
 
 I don't think we have to touch the liveness sets. We can compute an
 extra set of registers live only for REG_EQUAL/REG_EQUIV notes.
 Attached is what I had in mind. Untested, etc. it's late (and the
 Yankees are playing) so I'll get back to properly testing this
 tomorrow.

Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV
notes that refer to a dead pseudo?

Paolo



Re: [i386] scalar ops that preserve the high part of a vector

2012-10-14 Thread Uros Bizjak
On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

 this patch provides an alternate pattern to let combine recognize scalar
 operations that preserve the high part of a vector. If the strategy is all
 right, I could do the same for more operations (mul, div, ...). Something
 similar is also possible for V4SF (different pattern though), but probably
 not as useful.

But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach. I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern. Also, as you
mention in PR54855, Comment #5, the approach is too fragile...

Uros.


Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling

2012-10-14 Thread Eric Botcazou
 Today appears to be RTL loop optimizer patch day, because here's
 another patch...
 
 The problem here is that variable expansion does not update REG_EQUAL
 notes when it performs replacement of the renamed register.
 
 I fixed this by using validate_replace_rtx_group(). There is already
 code in analyze_insn_to_expand_var() to make sure that the
 to-be-replaced register is only used to accumulate into, so I think
 that using validate_replace_rtx_group is safe. Could use a 2nd pair of
 eyes to make sure, though.

At least the comments in there make it clear that's indeed the intent.

 Tested with a bootstrapped compiler. Test coverage isn't great,
 because variable expansion is not enabled by default.

OK, thanks (if you also add the testcase to gcc.dg with the special options).

-- 
Eric Botcazou


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-10-14 Thread Iain Buclaw
On 5 October 2012 11:35, Richard Guenther richard.guent...@gmail.com wrote:
 On Fri, Oct 5, 2012 at 12:07 PM, Iain Buclaw ibuc...@ubuntu.com wrote:
 On 5 October 2012 01:06, Joseph S. Myers jos...@codesourcery.com wrote:
 On Thu, 4 Oct 2012, Iain Buclaw wrote:

 The only patches to gcc proper are documentation-related and adding
 the D frontend / libphobos to configure and make files.  I would have
 thought that these would typically only be included with the actual
 front-end?

 Looking back at my previous review comments, I suggested that you might
 need to split up c-common.[ch] so that certain parts of attribute handling
 could be shared with D, because duplicate code copied from elsewhere in
 GCC was not an appropriate implementation approach.  Have you then
 eliminated the duplicate code in some other way that does not involve
 splitting up those files so code can be shared?


 Ah, no; thanks for reminding me of this.

 The code duplicated from c-common.[ch] are the handlers for C
 __attributes__,  however gdc doesn't use all of them because some just
 don't have a fitting place eg: gnu_inline, artificial.

 Would the best approach be to move all handle_* functions and any
 helper functions into a new source file that can be shared between
 frontends, and define two new frontend hooks,
 LANG_HOOK_ATTRIBUTE_TABLE and LANG_HOOK_FORMAT_ATTRIBUTE_TABLE ?

 Btw, the LTO frontend also has most of the stuff duplicated ... (see
 lto/lto-lang.c).
 Not sure why ...

 Richard.


Looks like LTO's frontend has the relevant attributes duplicated in
order to support the attributes used for GCC builtins (const, pure,
nothrow, transaction_pure, etc...).  Probably only these handlers that
could move to a common frontend location, and keep the rest as part of
c-family.


Regards,
-- 
Iain Buclaw

*(p  e ? p++ : p) = (c  0x0f) + '0';


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-14 Thread Steven Bosscher
On Sat, Oct 13, 2012 at 11:12 PM, Vladimir Makarov vmaka...@redhat.com wrote:
 Ok for the idea.  If we have a problem later, we could fix it.  I'll look at
 the next version of the patch when you send it to give your the final
 approval.

Great, thanks!

Here is the updated patch, tested in the same way as the previous version.

Ciao!
Steven


ira-speedup-3.diff
Description: Binary data


Re: encoding all aliases options in .opt files

2012-10-14 Thread Andreas Schwab
Manuel López-Ibáñez lopeziba...@gmail.com writes:

 aux-infoFILE /* we could accept this to be compatible with some
 options like -B */

Concatenated option arguments (without separators like '=' or '-')
should only ever be used for single character options.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [i386] scalar ops that preserve the high part of a vector

2012-10-14 Thread Marc Glisse

On Sun, 14 Oct 2012, Uros Bizjak wrote:


On Sat, Oct 13, 2012 at 10:52 AM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

this patch provides an alternate pattern to let combine recognize scalar
operations that preserve the high part of a vector. If the strategy is all
right, I could do the same for more operations (mul, div, ...). Something
similar is also possible for V4SF (different pattern though), but probably
not as useful.


But, we _do_ have vec_merge pattern that describes the operation.
Adding another one to each operation just to satisfy combine is IMO
not correct approach.


At some point I wondered about _replacing_ the existing pattern, so there 
would only be one ;-)


The vec_merge pattern takes as argument 2 vectors instead of a vector and 
a scalar, and describes the operation as a vector operation where we drop 
half of the result, instead of a scalar operation where we re-add the top 
half of the vector. I don't know if that's the most convenient choice. 
Adding code in simplify-rtx to replace vec_merge with vec_concat / 
vec_select might be easier than the other way around.



If the middle-end somehow gave us:
(plus X (vec_concat Y 0))
it would seem a bit strange to add an optimization that turns it into:
(vec_merge (plus X (subreg:V2DF Y)) X 1)
but then producing:
(vec_concat (plus (vec_select X 0) Y) (vec_select X 1))
would be strange as well.
(ignoring the signed zero issues here)


I'd rather see generic RTX simplification that
simplifies your proposed pattern to vec_merge pattern.


Ok, I'll see what I can do.

Also, as you mention in PR54855, Comment #5, the approach is too 
fragile...


I am not sure I can make the RTX simplification much less fragile... 
Whenever I see (vec_concat X (vec_select Y 1)), I would have to check 
whether X is some (possibly large) tree of scalar computations involving 
Y[0], move it all to vec_merge computations, and fix other users of some 
of those scalars to now use S[0]. Seems too hard, I would stop at 
single-operation X that is used only once. Besides, the gain is larger in 
proportion when there is a single operation :-)


Thank you for your comments,

--
Marc Glisse


Re: encoding all aliases options in .opt files

2012-10-14 Thread Manuel López-Ibáñez
On 14 October 2012 13:38, Andreas Schwab sch...@linux-m68k.org wrote:
 Manuel López-Ibáñez lopeziba...@gmail.com writes:

 aux-infoFILE /* we could accept this to be compatible with some
 options like -B */

 Concatenated option arguments (without separators like '=' or '-')
 should only ever be used for single character options.

We could make that rule explicit in the options-handling machinery.

Cheers,

Manuel.


[PATCH, alpha]: Remove empty predicates and/or constraints from .md files

2012-10-14 Thread Uros Bizjak
Hello!

2012-10-14  Uros Bizjak  ubiz...@gmail.com

* config/alpha/alpha.md: Remove empty predicates and/or constraints.
* config/alpha/sync.md: Ditto.

Tested on alphaev68-pc-linux-gnu, committed to mainline SVN.

Uros.


a.diff.txt.gz
Description: GNU Zip compressed data


[C++ testcase] PR 52643

2012-10-14 Thread Paolo Carlini

Hi,

testcase added, issue closed as fixed. Tested x86_64-linux.

Thanks,
Paolo.


2012-10-14  Paolo Carlini  paolo.carl...@oracle.com

PR c++/52643
* g++.dg/opt/pr52643.C: New.
Index: g++.dg/opt/pr52643.C
===
--- g++.dg/opt/pr52643.C(revision 0)
+++ g++.dg/opt/pr52643.C(working copy)
@@ -0,0 +1,64 @@
+// PR c++/52643
+// { dg-options -O }
+
+templateclass T class already_AddRefd {};
+
+templateclass T
+class ObjRef
+{
+public:
+  ObjRef() {}
+
+  ObjRef(const already_AddRefdT aar) {}
+
+  ~ObjRef()
+  {
+T* mPtr;
+mPtr-release_ref();
+  }
+
+  operator T* () const
+  {
+return __null;
+  }
+
+  templateclass U
+  void operator= (const already_AddRefdU newAssign) {}
+};
+
+class MyRetClass {
+public:
+  void release_ref();
+};
+
+class MyClass
+{
+  void appendChild();
+  void getTripleOutOfByPredicate();
+  already_AddRefdMyRetClass getNextTriple();
+};
+
+void
+MyClass::getTripleOutOfByPredicate()
+{
+  ObjRefMyRetClass t (getNextTriple());
+
+  if (t == __null)
+throw MyRetClass();
+}
+
+void
+MyClass::appendChild()
+{
+  while (1)
+  {
+try
+{
+  ObjRefMyRetClass t (getNextTriple());
+  continue;
+}
+catch (MyRetClass)
+{
+}
+  }
+}


Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling

2012-10-14 Thread Jan Hubicka
 Hello,
 
 Today appears to be RTL loop optimizer patch day, because here's
 another patch...
 
 The problem here is that variable expansion does not update REG_EQUAL
 notes when it performs replacement of the renamed register.

Hehe. or rather REG_EQUAL patch day :)
It makes me wonder how much of the REG_EQUAL machinery we stil make good use of.
 
 I fixed this by using validate_replace_rtx_group(). There is already
 code in analyze_insn_to_expand_var() to make sure that the
 to-be-replaced register is only used to accumulate into, so I think
 that using validate_replace_rtx_group is safe. Could use a 2nd pair of
 eyes to make sure, though.
 
 Tested with a bootstrapped compiler. Test coverage isn't great,
 because variable expansion is not enabled by default.

Are there particular reasons to not enable it?  It seems like usefull 
optimization.

Honza


Re: Propagate profile counts during switch expansion

2012-10-14 Thread Jan Hubicka
Hi,

Index: optabs.c
===
--- optabs.c(revision 191879)
+++ optabs.c(working copy)
@@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int
we can do the branch.  */
 
 static void
-emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label)
+emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int 
prob)
 {
   enum machine_mode optab_mode;
   enum mode_class mclass;
@@ -4261,7 +4261,16 @@ static void
 
   gcc_assert (icode != CODE_FOR_nothing);
   gcc_assert (insn_operand_matches (icode, 0, test));
-  emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), 
label));
+  rtx insn = emit_insn (
+  GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label));

I think we did not change to style of mixing declaration and code yet.  So
please put declaration ahead.

I think you want to keep emit_jump_insn.  Also do nothing when profile_status
== PROFILE_ABSENT.

Index: cfgbuild.c
===
--- cfgbuild.c  (revision 191879)
+++ cfgbuild.c  (working copy)
@@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b)
  f-count = b-count - e-count;
  return;
}
+  else
+{
+  guess_outgoing_edge_probabilities (b);
+}

Add comment here that we rely on multiway BBs having sane probabilities already.
You still want to do guessing when the edges out are EH. Those also can be many.
Index: expr.h
===
--- expr.h  (revision 191879)
+++ expr.h  (working copy)
@@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx);
 /* Emit a pair of rtl insns to compare two rtx's and to jump
to a label if the comparison is true.  */
 extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx,
-enum machine_mode, int, rtx);
+enum machine_mode, int, rtx, int prob=-1);

Hmm, probably first appreance of this C++ construct. I suppose it is OK.
 
+static inline void
+reset_out_edges_aux (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE(e, ei, bb-succs)
+e-aux = (void *)0;
+}
+static inline void
+compute_cases_per_edge (gimple stmt)
+{
+  basic_block bb = gimple_bb (stmt);
+  reset_out_edges_aux (bb);
+  int ncases = gimple_switch_num_labels (stmt);
+  for (int i = ncases - 1; i = 1; --i)
+{
+  tree elt = gimple_switch_label (stmt, i);
+  tree lab = CASE_LABEL (elt);
+  basic_block case_bb = label_to_block_fn (cfun, lab);
+  edge case_edge = find_edge (bb, case_bb);
+  case_edge-aux = (void *)((long)(case_edge-aux) + 1);
+}
+}

Comments and newlines per coding standard.

With the these changes, the patch is OK

Thanks,
Honza


Re: Use conditional casting with symtab_node

2012-10-14 Thread Diego Novillo
On Fri, Oct 12, 2012 at 4:22 AM, Richard Biener
richard.guent...@gmail.com wrote:

 I also think that instead of

   if (cgraph_node *q = p-cast_to cgraph_node * ())

 we want

   if ((q = cast_to cgraph_node * (p))

 I see absolutely no good reason to make cast_to a member, given
 that the language has static_cast, const_cast and stuff.  cast_to
 would simply be our equivalent to dynamic_cast within our OO model.

 Then I'd call it *_cast instead of cast_*, so, why not gcc_cast  ?
 Or dyn_cast  ().  That way

   if ((q = dyn_cast function * (p))

This looks fine to me.


Diego.


Fix estimated number of iterations for loops with multiple exits

2012-10-14 Thread Jan Hubicka
Hi,
the update of RTL optimizers to use SCEV's loop bounds make them to be 
inexpectedly active.
One of reasons is invalid.  For loop

int *a;
int t()
{
   int i;
  for (i=0;i100;i++)
if (a[i])
  return 1;
  return 0;
}

We get realistic number of iteration estimate to be 999. This is quite
wrong.  We could however still predict loop:

int t2()
{
   int i;
  for (i=0;i300;i++)
if (a[i])
abort ();
  return 0;
}

This patch implements that by making estimate_numbers_of_iterations_loop to 
save the realistic estimate only when all other exits out of the loop are 
unlikely
(i.e. EH or predicted by NORETURN or similarly strong heuristic).

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* tree-ssa-loop-niter.c (estimate_numbers_of_iterations_loop): Do not
predict loops with multiple exits realistically.
* cfgloopanal.c (single_likely_exit): New function.

* gcc.dg/unroll_5.c: New testcase.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192432)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2965,6 +2965,7 @@ estimate_numbers_of_iterations_loop (str
   struct tree_niter_desc niter_desc;
   edge ex;
   double_int bound;
+  edge likely_exit;
 
   /* Give up if we already have tried to compute an estimation.  */
   if (loop-estimate_state != EST_NOT_COMPUTED)
@@ -2975,6 +2976,7 @@ estimate_numbers_of_iterations_loop (str
   loop-any_estimate = false;
 
   exits = get_loop_exit_edges (loop);
+  likely_exit = single_likely_exit (loop);
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
 {
   if (!number_of_iterations_exit (loop, ex, niter_desc, false))
@@ -2988,7 +2990,7 @@ estimate_numbers_of_iterations_loop (str
niter);
   record_estimate (loop, niter, niter_desc.max,
   last_stmt (ex-src),
-  true, true, true);
+  true, ex == likely_exit, true);
 }
   VEC_free (edge, heap, exits);
 
Index: cfgloopanal.c
===
--- cfgloopanal.c   (revision 192432)
+++ cfgloopanal.c   (working copy)
@@ -446,3 +446,40 @@ mark_loop_exit_edges (void)
 }
 }
 
+/* Return exit edge if loop has only one exit that is likely
+   to be executed on runtime (i.e. it is not EH or leading
+   to noreturn call.  */
+
+edge
+single_likely_exit (struct loop *loop)
+{
+  edge found = single_exit (loop);
+  VEC (edge, heap) *exits;
+  unsigned i;
+  edge ex;
+
+  if (found)
+return found;
+  exits = get_loop_exit_edges (loop);
+  FOR_EACH_VEC_ELT (edge, exits, i, ex)
+{
+  if (ex-flags  (EDGE_EH | EDGE_ABNORMAL_CALL))
+   continue;
+  /* The constant of 5 is set in a way so noreturn calls are
+ruled out by this test.  The static branch prediction algorithm
+ will not assign such a low probability to conditionals for usual
+ reasons.  */
+  if (profile_status != PROFILE_ABSENT
+  ex-probability  5  !ex-count)
+   continue;
+  if (!found)
+   found = ex;
+  else
+   {
+ VEC_free (edge, heap, exits);
+ return NULL;
+   }
+}
+  VEC_free (edge, heap, exits);
+  return found;
+}
Index: testsuite/gcc.dg/unroll_5.c
===
--- testsuite/gcc.dg/unroll_5.c (revision 0)
+++ testsuite/gcc.dg/unroll_5.c (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fdump-rtl-loop2_unroll -funroll-loops } */
+void abort (void);
+int *a;
+int t()
+{
+   int i;
+  for (i=0;i100;i++)
+if (a[i])
+  return 1;
+  return 0;
+}
+int t2()
+{
+   int i;
+  for (i=0;i300;i++)
+if (a[i])
+abort ();
+  return 0;
+}
+/* { dg-final { scan-rtl-dump-times upper bound: 99 1 loop2_unroll } } 
*/
+/* { dg-final { scan-rtl-dump-not realistic bound: 99 loop2_unroll } } 
*/
+/* { dg-final { scan-rtl-dump-times upper bound: 299 1 loop2_unroll } 
} */
+/* { dg-final { scan-rtl-dump-times realistic bound: 299 1 
loop2_unroll } } */
+/* { dg-final { cleanup-rtl-dump loop2_unroll } } */


LangEnabledBy with arguments

2012-10-14 Thread Manuel López-Ibáñez
Bootstrapped and regression tested on x86_64-linux-gnu. The additional
testcase was not failing before, but tests for something that the
current testsuite does not.

OK?

2012-10-14  Manuel López-Ibáñez  m...@gcc.gnu.org

PR c/53063
PR c/40989
gcc/
* optc-gen.awk: Handle new form of LangEnabledBy.
* opts.c (set_Wstrict_aliasing): Declare here. Make static.
* common.opt (Wstrict-aliasing=,Wstrict-overflow=): Do not use Init.
* doc/options.texi (LangEnabledBy): Document new form.
* flags.h (set_Wstrict_aliasing): Do not declare.
c-family/
* c.opt (Wstrict-aliasing=,Wstrict-overflow=): Use LangEnabledBy.
* c-opts.c (c_common_handle_option): Do not set them here. Add
comment.
(c_common_post_options): Likewise.
testsuite/
* gcc.dg/Wstrict-overflow-24.c: New.


lang-enabled-by-with-args2.diff
Description: Binary data


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-14 Thread Vladimir Makarov

On 12-10-14 6:16 AM, Steven Bosscher wrote:

On Sat, Oct 13, 2012 at 11:12 PM, Vladimir Makarov vmaka...@redhat.com wrote:

Ok for the idea.  If we have a problem later, we could fix it.  I'll look at
the next version of the patch when you send it to give your the final
approval.

Great, thanks!

Here is the updated patch, tested in the same way as the previous version.



Thanks, Steven.  IRA part is ok for me to commit.


Re: [lra] patch from Richard Sandiford's review of lra-assigns.c

2012-10-14 Thread Vladimir Makarov

On 12-10-12 11:00 AM, Richard Sandiford wrote:

Vladimir Makarov vmaka...@redhat.com writes:

The following patch implements most Richard's proposals for LRA
lra-spills.c and lra-coalesce.c files.

The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 192389.

Thanks for the updates.  Looks good to me.  Just one comment though:


@@ -125,7 +136,7 @@ process_copy_to_form_thread (int regno1,
   last = regno_assign_info[last].next)
regno_assign_info[last].first = regno1_first;
regno_assign_info[last].next = regno_assign_info[regno1_first].next;
-  regno_assign_info[regno1_first].first = regno2_first;
+  regno_assign_info[regno1_first].next = regno2_first;
regno_assign_info[regno1_first].freq
+= regno_assign_info[regno2_first].freq;
  }

I still think this is missing a:

regno_assign_info[last].first = regno1_first;



Thanks, Richard.  I fixed in my today patch.



[lra] new hint * interpreitation.

2012-10-14 Thread Vladimir Makarov

The following patch adds a new interpretation of hint * for LRA.

2012-10-14  Vladimir Makarov  vmaka...@redhat.com

* doc/tm.texi: Add new interpretation of hint * for LRA.


Committed as rev. 192436.

Index: doc/md.texi
===
--- doc/md.texi (revision 192325)
+++ doc/md.texi (working copy)
@@ -1,5 +1,5 @@
 @c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1996, 1998, 1999, 2000, 2001,
-@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
+@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
 @c Free Software Foundation, Inc.
 @c This is part of the GCC manual.
 @c For copying conditions, see the file gcc.texi.
@@ -1606,7 +1606,9 @@
 @item *
 Says that the following character should be ignored when choosing
 register preferences.  @samp{*} has no effect on the meaning of the
-constraint as a constraint, and no effect on reloading.
+constraint as a constraint, and no effect on reloading.  For LRA
+@samp{*} additionally disparages slightly the alternative if the
+following character matches the operand.
 
 @ifset INTERNALS
 Here is an example: the 68000 has an instruction to sign-extend a


Re: [lra] patch to fix GCC crash on a SPEC2006 test

2012-10-14 Thread Vladimir Makarov

On 12-10-13 11:37 AM, Peter Bergner wrote:

On Thu, 2012-10-11 at 23:53 -0400, Vladimir Makarov wrote:

Is the following comment better?

Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does not affect value
of insn_bitmap of the corresponding lra_reg_info.  That is because we
don't need to reload pseudos in CALL_INSN_FUNCTION_USAGEs.  So if we
process only insns in the insn_bitmap of given pseudo here, we can miss
the pseudo in some CALL_INSN_FUNCTION_USAGEs.

Sure, that's better.  Thanks.

Ok.  Fixed.








Re: [SH] PR 34777 - Add test case

2012-10-14 Thread Oleg Endo
On Wed, 2012-10-10 at 07:46 +0900, Kaz Kojima wrote:
 Oleg Endo oleg.e...@t-online.de wrote:
  Uhm, yes, I forgot to add the -fschedule-insns and -mprefergot options.
  Regarding the -Os option, I think it's better to test this one at
  multiple optimization levels, just in case.  I've looked through
  gcc.c-torture/compile and found some target specific test cases there,
  so I thought it would be OK to do the same :)
  Some targets also have their own torture subdir.  If it's better, I
  could also create gcc.target/sh/torture.
 
 Maybe.  For this specific test, I thought that -Os -fschedule-insns
 -fPIC -mprefergot would be enough because empirically these options
 will give high R0 register pressure which had caused that PR.
 

Sorry for the delayed reply.
The attached patch adds gcc.target/sh/torture and puts the test there.
The torture subdir might be also useful in the future.
Tested on rev 192417 with
make -k check-gcc RUNTESTFLAGS=--target_board=sh-sim\{-m2/-ml}

OK?

Cheers,
Oleg

testsuite/ChangeLog:

PR target/34777
* gcc.target/sh/torture/sh-torture.exp: New.
* gcc.target/sh/torture/pr34777.c: New.
Index: gcc/testsuite/gcc.target/sh/torture/pr34777.c
===
--- gcc/testsuite/gcc.target/sh/torture/pr34777.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/pr34777.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-additional-options -fschedule-insns -fPIC -mprefergot }  */
+/* { dg-skip-if  { sh*-*-* } { -m5* } {  } }  */
+
+static __inline __attribute__ ((__always_inline__)) void *
+_dl_mmap (void * start, int length, int prot, int flags, int fd,
+	  int offset)
+{
+  register long __sc3 __asm__ (r3) = 90;
+  register long __sc4 __asm__ (r4) = (long) start;
+  register long __sc5 __asm__ (r5) = (long) length;
+  register long __sc6 __asm__ (r6) = (long) prot;
+  register long __sc7 __asm__ (r7) = (long) flags;
+  register long __sc0 __asm__ (r0) = (long) fd;
+  register long __sc1 __asm__ (r1) = (long) offset;
+  __asm__ __volatile__ (trapa	%1
+			: =z (__sc0)
+			: i (0x10 + 6), 0 (__sc0), r (__sc4),
+			  r (__sc5), r (__sc6), r (__sc7),
+			  r (__sc3), r (__sc1)
+			: memory );
+}
+
+extern int _dl_pagesize;
+void
+_dl_dprintf(int fd, const char *fmt, ...)
+{
+  static char *buf;
+  buf = _dl_mmap ((void *) 0, _dl_pagesize, 0x1 | 0x2, 0x02 | 0x20, -1, 0);
+}
Index: gcc/testsuite/gcc.target/sh/torture/sh-torture.exp
===
--- gcc/testsuite/gcc.target/sh/torture/sh-torture.exp	(revision 0)
+++ gcc/testsuite/gcc.target/sh/torture/sh-torture.exp	(revision 0)
@@ -0,0 +1,41 @@
+#   Copyright (C) 2012 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# http://www.gnu.org/licenses/.
+
+# GCC testsuite that uses the `gcc-dg.exp' driver, looping over
+# optimization options.
+
+# Exit immediately if this isn't a SH target.
+if { ![istarget sh*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS  -ansi -pedantic-errors
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] $DEFAULT_CFLAGS
+
+# All done.
+dg-finish


[C++ testcase] PR 53581

2012-10-14 Thread Paolo Carlini

Hi,

testcase added, issue closed as fixed. Tested x86_64-linux.

Thanks,
Paolo.

/
2012-10-14  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53581
* g++.dg/template/crash113.C: New.
Index: g++.dg/template/crash113.C
===
--- g++.dg/template/crash113.C  (revision 0)
+++ g++.dg/template/crash113.C  (working copy)
@@ -0,0 +1,50 @@
+// PR c++/53581
+
+templateclass A, int M, int N
+class Child;
+
+templateclass A, int M, int N
+class Base
+{
+public:
+  ChildA, M, N operator-(const BaseA, M, N m) const
+  {
+ChildA, M, N diff;
+return diff;
+  }
+
+  A test() const
+  {
+return 0;
+  }
+
+private:
+  A values[M * N];
+};
+
+templateclass A, int N
+class Ops
+{
+public:
+  virtual ~Ops() {}
+
+  bool bar() const
+  {
+ChildA, N, N mat;
+return (*static_castconst ChildA, N, N*(this) - mat).test();
+  }
+};
+
+
+templateclass A, int N
+class ChildA, N, N : public BaseA, N, N, public OpsA, N {};
+
+class ImageWarp
+{
+  bool bar() const
+  {
+return foo.bar();
+  }
+
+  Childfloat, 3, 3 foo;
+};


Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling

2012-10-14 Thread Steven Bosscher
On Sun, Oct 14, 2012 at 4:18 PM, Jan Hubicka wrote:
 Tested with a bootstrapped compiler. Test coverage isn't great,
 because variable expansion is not enabled by default.

 Are there particular reasons to not enable it?  It seems like usefull 
 optimization.

I don't know of any reason not to enable it, but I have no access to
fancy benchmarks to see what happens if the option is enabled.
Wouldn't hurt to throw this at SPEC2k6 or something like that, just to
see what happens.

Ciao!
Steven


Re: [patch] PR54919 - fix variable expansion in RTL loop unrolling

2012-10-14 Thread Steven Bosscher
On Sun, Oct 14, 2012 at 11:11 AM, Eric Botcazou wrote:
 OK, thanks (if you also add the testcase to gcc.dg with the special options).

Thanks, committed as trunk r192439.

Ciao!
Steven


Re: [PR38711] Use DF_LIVE in IRA if it available (for -O2 and higher)

2012-10-14 Thread Steven Bosscher
On Sun, Oct 14, 2012 at 7:19 PM, Vladimir Makarov wrote:
 Thanks, Steven.  IRA part is ok for me to commit.

Thanks, I've committed this as trunk r192440. I'm aware I'm on the
hook for fixing any fall-out :-)

Ciao!
Steven


Tidy store_bit_field_1 co.

2012-10-14 Thread Richard Sandiford
insv, extv and extzv have an unusual interface: the structure operand is
supposed to have word_mode if stored in registers or byte_mode if stored
in memory.  Andrew's patch to try different insv modes:

   http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00126.html

prompted me to try making the patterns more like other optabs.

The use of word and byte units for registers and memory respectively
is pretty deeply engrained into the current expand routines, even in the
parts that don't deal directly with the .md patterns.  E.g. the bitnum
parameter to store_bit_field always counts from the leftmost bit of OP0,
but store_bit_field_1 internally converts it to a trio of unit,
offset (number of whole units) and bitpos (position within a unit).
The latter two are then also used in the interface to store_fixed_bit_field,
with the unit being implicit.  store_split_bit_field uses the original
bitnum-style parameter instead.

This patch makes the code use the original bitnum throughout,
and only separate into units where locally useful.

Also, if the field spans two words of a register OP0, store_bit_field_1
reduces OP0 to just the first word.   It then makes sure that we fall
through to store_fixed_bit_field, which in turn calls store_split_bit_field,
which knows that OP0 is only partial.  I think this is dangerous:
it's the only time that store_bit_field_1 trims OP0 to cover only
part of the field, and so adds another special case for the rest
of the function to handle and ignore.  It also makes the interface
to store_fixed_bit_field more complicated.

The patch instead makes store_bit_field_1 call store_split_bit_field
directly where appropriate.

diffstat for this patch and the one I'm about to post says:

 expmed.c |  640 +--
 1 file changed, 261 insertions(+), 379 deletions(-)

so I'd like to submit them as clean ups regardless of whether
I ever get around to the main patterns change.

The patch is probably quite hard to review, sorry.  I've made the changelog
a bit more detailed than usual in order to list the individual points.

Tested on x86_64-linux-gnu, powerpc64-linux-gnu, mipsisa64-elf (both -EL
and -EB) and mipsisa32-elf (also both -EL and -EB).  OK to install?

Richard


gcc/
* expmed.c (store_bit_field_1): Remove unit, offset, bitpos and
byte_offset from the outermost scope.  Express conditions in terms
of bitnum rather than offset, bitpos and byte_offset.  Split the
plain move cases into two, one for memory accesses and one for
register accesses.  Allow simplify_gen_subreg to fail rather
than calling validate_subreg.  Move the handling of multiword
OP0s after the code that coerces VALUE to an integer mode.
Use simplify_gen_subreg for this case and assert that it succeeds.
If the field still spans several words, pass it directly to
store_split_bit_field.  Assume after that point that both sources
and register targets fit within a word.  Replace x-prefixed
variables with non-prefixed forms.  Compute the bitpos for insv
register operands directly in the chosen unit size, rather than
going through an intermediate BITS_PER_WORD unit size.
Update the call to store_fixed_bit_field.
(store_fixed_bit_field): Replace the bitpos and offset parameters
with a single bitnum parameter, of the same form as store_bit_field.
Assume that OP0 contains the full field.  Simplify the memory offset
calculation.  Assert that the processed OP0 has an integral mode.
(store_split_bit_field): Update the call to store_fixed_bit_field.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-13 19:46:00.862780569 +0100
+++ gcc/expmed.c2012-10-14 11:41:48.692695324 +0100
@@ -49,7 +49,6 @@ static void store_fixed_bit_field (rtx,
   unsigned HOST_WIDE_INT,
   unsigned HOST_WIDE_INT,
   unsigned HOST_WIDE_INT,
-  unsigned HOST_WIDE_INT,
   rtx);
 static void store_split_bit_field (rtx, unsigned HOST_WIDE_INT,
   unsigned HOST_WIDE_INT,
@@ -409,15 +408,9 @@ store_bit_field_1 (rtx str_rtx, unsigned
   enum machine_mode fieldmode,
   rtx value, bool fallback_p)
 {
-  unsigned int unit
-= (MEM_P (str_rtx)) ? BITS_PER_UNIT : BITS_PER_WORD;
-  unsigned HOST_WIDE_INT offset, bitpos;
   rtx op0 = str_rtx;
-  int byte_offset;
   rtx orig_value;
 
-  enum machine_mode op_mode = mode_for_extraction (EP_insv, 3);
-
   while (GET_CODE (op0) == SUBREG)
 {
   /* The following line once was done only if WORDS_BIG_ENDIAN,
@@ -427,8 +420,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 always get higher addresses.  */
   int 

Re: Make try_unroll_loop_completely to use loop bounds recorded

2012-10-14 Thread Jan Hubicka
Hi,
here is an updated patch.  The idea of splitting loopback edge did not fly.  We
then remove the edge in cfgcleanup prior demolyshing the loop and we loose
track on what basic blocks needs updating because we no longer can get the loop
body.

As a good news however I do not need the changed loop depth walking.  The 
infinite
recursion I was running into before disappeared. I guess it was another bug I 
fixed
later properly.

I also looked into what unroll and loop_depth is doing and it is using 
IRREDUCIBLE
flags only to set the irred_invalidated flag.  Also the use of IRREDUCIBLE flag
within the unrolling itself (to locate the last exit of the loop) is safe WRT 
updates
we do, so we only need to recompute it when done after all the changes.  This 
solve
the quadratic time issue.

The pass also works when canonicalization is done on all loops, not just 
innermost
but I would also like to enable this separately of this change.

I also updated Java and Fortran for the builtin_unreachable macro.  Those are
the only constructing builtin_expect that is also used internaly.  I also
noticed that the builtin is missing CONST flag (it is looping const that is
possible to decare by combination of const and noreturn) but I will do that
incrementally.
I am honestly not sure what Ada and Go does here to get around to duplicate
all this mess, but they don't seem to handle other similar cases either.

The patch now adds a regression on Fortran testcase that simplifies into:
! { dg-do run }
! Program to check corner cases for DO statements.
program do_1
  implicit none
  integer i, j

  ! limit=HUGE(i), step 1
  j = 0
  do i = HUGE(i) - 10, HUGE(i), 1
j = j + 1
  end do
  if (j .ne. 11) call abort

end program

here loop iterates into INT_MAX and compiles as:
  bb 3:
  # i_8 = PHI 2147483637(2), i_9(3)
  # j_6 = PHI 0(2), j_7(3)
  # DEBUG j = j_6
  # DEBUG i = i_8
  j_7 = j_6 + 1;
  # DEBUG j = j_7
  i_9 = i_8 + 1;
  # DEBUG i = i_9
  if (i_8 == 2147483647)
goto bb 4;
  else
goto bb 3;

Now we try to estimate number of iterations as:
Statement i_9 = i_8 + 1;
 is executed at most 9 (bounded by 9) + 1 times in loop 1.
Loop 1 iterates at most 9 times.

This is one iteration fewer than it ought to be.  The problem is that result of
i_9=i_8+1 is undefined on the last iteration but program is still valid because
the value is not used (it is used only by the PHI on i_8).  So this seems like
another semi-latent bug in tree-ssa-niter.  Any ideas what to do here?  I think
we need to prove that the value is used in something that matters: i.e. loop
exit test or memory access and only bound number of executions of statements
using them.

The patch will also need upating in 
 gcc.target/i386/l_fma_* 
testcases.  The reason is that we peel the vectorized prologues/epilogues that
was in fact motivation for this whole patch.  The testcases counts number of
instructions appearing in them and needs compensation for different cost
models of the patch, so I plan to do it for the final version only.

Bootstrapped/regtested x86_64-linux (modulo the regressions above) and also
tested with -O3 bootstrap that passes with -Wno-error.

Honza

* gcc.dg/tree-ssa/cunroll-1.c: New testcase.
* gcc.dg/tree-ssa/cunroll-2.c: New testcase.
* gcc.dg/tree-ssa/cunroll-3.c: New testcase.
* gcc.dg/tree-ssa/cunroll-4.c: New testcase.
* gcc.dg/tree-ssa/cunroll-5.c: New testcase.

* cfgloopmanip.c (unloop): Export.
* tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate
also with unknown exit conditional.
(try_unroll_loop_completely): Use max_loop_iterations_int to unroll
also loops with low upper bound; handle unlooping of the last loop
even when exit conditional is not known; unloop loop that do not loop
even if they are not innermost.
(canonicalize_loop_induction_variables): Record niter bounds known;
try unrolling even if number of iterations is not known;
(canonicalize_induction_variables): Handle updating of irreducible loops
(tree_unroll_loops_completely): Likewise.
* cfgloop.h (unloop): Declare.

* f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable.
Index: java/builtins.c
===
*** java/builtins.c (revision 192432)
--- java/builtins.c (working copy)
*** VMSupportsCS8_builtin (tree method_retur
*** 453,458 
--- 453,460 
  
  #define BUILTIN_NOTHROW 1
  #define BUILTIN_CONST 2
+ #define BUILTIN_NORETURN 4
+ #define BUILTIN_LEAF 4
  /* Define a single builtin.  */
  static void
  define_builtin (enum built_in_function val,
*** define_builtin (enum built_in_function v
*** 475,480 
--- 477,487 
  TREE_NOTHROW (decl) = 1;
if (flags  BUILTIN_CONST)
  TREE_READONLY (decl) = 1;
+   if (flags  BUILTIN_NORETURN)
+ TREE_THIS_VOLATILE (decl) = 1;
+   if (flags  

Tidy extract_bit_field_1 co.

2012-10-14 Thread Richard Sandiford
Partnering the store_bit_field_1 patch that I just posted, this patch
tidies up the extract_bit_field code in the same way.

There is one deliberate behavioural change here.  The old code had a
single check for cases where the extraction could be done as a simple
move.  It started:

  if (((bitsize = BITS_PER_WORD  bitsize == GET_MODE_BITSIZE (mode)
 bitpos % BITS_PER_WORD == 0)
   || (mode1 != BLKmode
   /* ??? The big endian test here is wrong.  This is correct
  if the value is in a register, and if mode_for_size is not
  the same mode as op0.  This causes us to get unnecessarily
  inefficient code from the Thumb port when -mbig-endian.  */
(BYTES_BIG_ENDIAN
   ? bitpos + bitsize == BITS_PER_WORD
   : bitpos == 0)))

The BYTES_BIG_ENDIAN check didn't make sense for memory operands though,
because bitpos was based on byte units in that case.  That might well be
what the comment was complaining about; I'm not sure.

Also, I made the MODE1 computation take failures of mode_for_size
into account.

Tested on x86_64-linux-gnu, powerpc64-linux-gnu, mipsisa64-elf (both -EL
and -EB) and mipsisa32-elf (also both -EL and -EB).  OK to install?

Richard

gcc/
* expmed.c (store_split_bit_field): Update the calls to
extract_fixed_bit_field.  In the big-endian case, always
use the mode of OP0 to count the number of significant bits.
(extract_bit_field_1): Remove unit, offset, bitpos and
byte_offset from the outermost scope.  Express conditions in terms
of bitnum rather than offset, bitpos and byte_offset.  Move the
computation of MODE1 to the block that needs it.  Use MODE unless
the TMODE-based mode_for_size calculation succeeds.  Split the
plain move cases into two, one for memory accesses and one for
register accesses.  Generalize the memory case, freeing it from
the old register-based endian checks.  Move the INT_MODE calculation
above the code that needs it.  Use simplify_gen_subreg to handle
multiword OP0s.  If the field still spans several words, pass it
directly to extract_split_bit_field.  Assume after that point
that both targets and register sources fit within a word.
Replace x-prefixed variables with non-prefixed forms.
Compute the bitpos for ext(z)v register operands directly in the
chosen unit size, rather than going through an intermediate
BITS_PER_WORD unit size.  Simplify the containment check
used when forcing OP0 into a register.  Update the call to
extract_fixed_bit_field.
(extract_fixed_bit_field): Replace the bitpos and offset parameters
with a single bitnum parameter, of the same form as extract_bit_field.
Assume that OP0 contains the full field.  Simplify the memory offset
calculation and containment check for volatile bitfields.  Make the
offset explicit when volatile bitfields force a misaligned access.
Remove WARNED and fix long lines.  Assert that the processed OP0
has an integral mode.
(store_split_bit_field): Update the call to store_fixed_bit_field.

Index: gcc/expmed.c
===
--- gcc/expmed.c2012-10-14 11:44:27.359686486 +0100
+++ gcc/expmed.c2012-10-14 11:44:41.770685683 +0100
@@ -57,7 +57,6 @@ static void store_split_bit_field (rtx,
   rtx);
 static rtx extract_fixed_bit_field (enum machine_mode, rtx,
unsigned HOST_WIDE_INT,
-   unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT, rtx, int, bool);
 static rtx mask_rtx (enum machine_mode, int, int, int);
 static rtx lshift_value (enum machine_mode, rtx, int, int);
@@ -1114,28 +1113,21 @@ store_split_bit_field (rtx op0, unsigned
 
   if (BYTES_BIG_ENDIAN)
{
- int total_bits;
-
- /* We must do an endian conversion exactly the same way as it is
-done in extract_bit_field, so that the two calls to
-extract_fixed_bit_field will have comparable arguments.  */
- if (!MEM_P (value) || GET_MODE (value) == BLKmode)
-   total_bits = BITS_PER_WORD;
- else
-   total_bits = GET_MODE_BITSIZE (GET_MODE (value));
-
  /* Fetch successively less significant portions.  */
  if (CONST_INT_P (value))
part = GEN_INT (((unsigned HOST_WIDE_INT) (INTVAL (value))
  (bitsize - bitsdone - thissize))
 (((HOST_WIDE_INT) 1  thissize) - 1));
  else
-   /* The args are chosen so that the last part includes the
-  lsb.  Give extract_bit_field the value it needs (with
-  endianness compensation) to fetch the piece we want.  */
-   part = 

[patch] Back-port ifcvt.c changes from PR54146

2012-10-14 Thread Steven Bosscher
Hello,

This patch is a back-port of one of the scalability improvements I
made to perform, well, maybe not well but at least not so poorly on
the test case of PR54146, which has an extremely large function.

The problem in ifcvt.c has two parts. The first is that clearing
several arrays of size(max_reg_num) for every basic block slowed down
things. The second part is that this memory was being allocated with
alloca, so that a sufficiently large function could blow out the
stack.

The latter problem was now also found by a user trying to compile a
sensible and well-known piece of software (see
http://gcc.gnu.org/ml/gcc/2012-10/msg00202.html). This code compiles
with older GCC releases, so this problem is a regression. To fix the
problem in GCC 4.7, I'd like to propose this back-port.

Bootstrappedtested with release and default development checking on
x86_64-unknown-linux-gnu and on powerpc64-unknown-linux-gnu. The patch
has also already spent more than two months on the trunk now without
problems. OK for the GCC 4.7 release branch? Maybe also for the GCC
4.6 branch after testing?

Ciao!
Steven


PR54146_ifcvt_47.diff
Description: Binary data


[SH] Document function attributes

2012-10-14 Thread Oleg Endo
Hello,

The attached patch adds documentation for SH specific function
attributes which haven't been documented yet.
Tested with 'make info dvi pdf'.
OK?

Cheers,
Oleg

gcc/ChangeLog:

* config/sh/sh.c: Update function attribute comments.
* doc/extend.texi (function_vector): Rephrase SH2A specific 
part.
(nosave_low_regs, renesas, trapa_handler): Document SH specific 
attributes.
(sp_switch, trap_exit): Add to index.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 192417)
+++ gcc/config/sh/sh.c	(working copy)
@@ -9451,30 +9451,42 @@
   return;
 }
 
-/* Supported attributes:
+/*--
+/* Target specific attributes
+  Supported attributes are:
 
-   interrupt_handler -- specifies this function is an interrupt handler.
+   * interrupt_handler
+	Specifies this function is an interrupt handler.
 
-   trapa_handler - like above, but don't save all registers.
+   * trapa_handler
+	Like interrupt_handler, but don't save all registers.
 
-   sp_switch -- specifies an alternate stack for an interrupt handler
-   to run on.
+   * sp_switch
+	Specifies an alternate stack for an interrupt handler to run on.
 
-   trap_exit -- use a trapa to exit an interrupt function instead of
-   an rte instruction.
+   * trap_exit
+	Use a trapa to exit an interrupt function instead of rte.
 
-   nosave_low_regs - don't save r0..r7 in an interrupt handler.
- This is useful on the SH3 and upwards,
- which has a separate set of low regs for User and Supervisor modes.
- This should only be used for the lowest level of interrupts.  Higher levels
- of interrupts must save the registers in case they themselves are
- interrupted.
+   * nosave_low_regs
+	Don't save r0..r7 in an interrupt handler function.
+	This is useful on SH3* and SH4*, which have a separate set of low
+	regs for user and privileged modes.
+	This is mainly to be used for non-reentrant interrupt handlers (i.e.
+	those that run with interrupts disabled and thus can't be
+	interrupted thenselves).
 
-   renesas -- use Renesas calling/layout conventions (functions and
-   structures).
+   * renesas
+	Use Renesas calling/layout conventions (functions and structures).
 
-   resbank -- In case of an ISR, use a register bank to save registers
-   R0-R14, MACH, MACL, GBR and PR.  This is useful only on SH2A targets.
+   * resbank
+	In case of an interrupt handler function, use a register bank to
+	save registers R0-R14, MACH, MACL, GBR and PR.
+	This is available only on SH2A targets.
+
+   * function_vector
+	Declares a function to be called using the TBR relative addressing
+	mode.  Takes an argument that specifies the slot number in the table
+	where this function can be looked up by the JSR/N @@(disp8,TBR) insn.
 */
 
 /* Handle a 'resbank' attribute.  */
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 192417)
+++ gcc/doc/extend.texi	(working copy)
@@ -2682,17 +2682,16 @@
 the function vector has a limited size (maximum 128 entries on the H8/300
 and 64 entries on the H8/300H and H8S) and shares space with the interrupt vector.
 
-In SH2A target, this attribute declares a function to be called using the
+On SH2A targets, this attribute declares a function to be called using the
 TBR relative addressing mode.  The argument to this attribute is the entry
 number of the same function in a vector table containing all the TBR
-relative addressable functions.  For the successful jump, register TBR
-should contain the start address of this TBR relative vector table.
-In the startup routine of the user application, user needs to care of this
-TBR register initialization.  The TBR relative vector table can have at
-max 256 function entries.  The jumps to these functions will be generated
-using a SH2A specific, non delayed branch instruction JSR/N @@(disp8,TBR).
-You must use GAS and GLD from GNU binutils version 2.7 or later for
-this attribute to work correctly.
+relative addressable functions.  For correct operation the TBR must be setup
+accordingly to point to the start of the vector table before any functions with
+this attribute are invoked.  Usually a good place to do the initialization is
+the startup routine.  The TBR relative vector table can have at max 256 function
+entries.  The jumps to these functions will be generated using a SH2A specific,
+non delayed branch instruction JSR/N @@(disp8,TBR).  You must use GAS and GLD
+from GNU binutils version 2.7 or later for this attribute to work correctly.
 
 Please refer the example of M16C target, to see the use of this
 attribute while declaring a function,
@@ -3251,6 +3250,13 @@
 take function pointer arguments.  The @code{nothrow} attribute is not
 implemented in GCC versions earlier than 3.3.
 
+@item nosave_low_regs
+@cindex 

[patch] Fix PR rtl-optimization/54870

2012-10-14 Thread Eric Botcazou
Hi,

This is the execution failure of gfortran.dg/array_constructor_4.f90 in 64-bit
mode on SPARC/Solaris at -O3.  The dse2 dump for the reduced testcase reads:

dse: local deletions = 0, global deletions = 1, spill deletions = 0
starting the processing of deferred insns
deleting insn with uid = 25.
ending the processing of deferred insns

but the memory location stored to:

(insn 25 27 154 2 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp)
(const_int 2039 [0x7f7])) [6 A.1+16 S4 A64])
(reg:SI 1 %g1 [136])) array_constructor_4.f90:4 61 {*movsi_insn}
 (nil))

is read by a subsequent call to memcpy.

It turns out that this memcpy call is generated for an aggregate assignment:

  MEM[(c_char * {ref-all})i] = MEM[(c_char * {ref-all})A.17];

Note the A.1 in the store and the A.17 in the load. A.1 and A.17 are aggregate
variables sharing the same stack slot.  A.17 is correcty marked as addressable
because of the call to memcpy, but A.1 isn't since its address isn't taken, 
and DSE can optimize away (since 4.7) stores if their MEM_EXPR doesn't escape.

The store is reaching the load because an intermediate store into A.17:

(insn 78 76 82 6 (set (mem/c:SI (plus:DI (reg/f:DI 30 %fp)
(const_int 2039 [0x7f7])) [6 A.17+16 S4 A64])
(reg:SI 1 %g1 [136])) array_constructor_4.f90:14 61 {*movsi_insn}
 (nil))

has been deleted by postreload as no-op (because redundant), thus making A.1
partially escape without marking it as addressable.

The attached patch uses cfun-gimple_df-escaped.vars to plug the hole: when 
mark_addressable is called during RTL expansion and the decl is partitioned, 
all the variables in the partition are added to the bitmap.  Then can_escape 
is changed to additionally test cfun-gimple_df-escaped.vars.

Tested on x86-64/Linux and SPARC64/Solaris, OK for mainline and 4.7 branch?


2012-10-14  Eric Botcazou  ebotca...@adacore.com

PR rtl-optimization/54870
* dse.c (can_escape): Test cfun-gimple_df-escaped.vars as well.
* gimplify.c (mark_addressable): If this is a partition decl, add
all the variables in the partition to cfun-gimple_df-escaped.vars.


-- 
Eric BotcazouIndex: dse.c
===
--- dse.c	(revision 192353)
+++ dse.c	(working copy)
@@ -990,6 +990,7 @@ delete_dead_store_insn (insn_info_t insn
 }
 
 /* Check if EXPR can possibly escape the current function scope.  */
+
 static bool
 can_escape (tree expr)
 {
@@ -998,7 +999,10 @@ can_escape (tree expr)
 return true;
   base = get_base_address (expr);
   if (DECL_P (base)
-   !may_be_aliased (base))
+   !may_be_aliased (base)
+   !(cfun-gimple_df-escaped.vars
+	bitmap_bit_p (cfun-gimple_df-escaped.vars,
+			DECL_PT_UID (base
 return false;
   return true;
 }
Index: gimplify.c
===
--- gimplify.c	(revision 192353)
+++ gimplify.c	(working copy)
@@ -116,6 +116,26 @@ mark_addressable (tree x)
TREE_CODE (x) != RESULT_DECL)
 return;
   TREE_ADDRESSABLE (x) = 1;
+
+  /* If this is a partitioned decl, we need to mark all the variables in the
+ partition as escaped.  This is needed because a store into one of them
+ can be replaced with a store into another, and this may not change the
+ outcome of the escape analysis for DSE to work properly.  */
+  if (TREE_CODE (x) == VAR_DECL
+   !TREE_STATIC (x)
+   cfun-gimple_df != NULL
+   cfun-gimple_df-decls_to_pointers != NULL)
+{
+  void *namep
+	= pointer_map_contains (cfun-gimple_df-decls_to_pointers, x);
+  if (namep)
+	{
+	  struct ptr_info_def *pi = get_ptr_info (*(tree *)namep);
+	  if (cfun-gimple_df-escaped.vars == NULL)
+	cfun-gimple_df-escaped.vars = BITMAP_GGC_ALLOC ();
+	  bitmap_ior_into (cfun-gimple_df-escaped.vars, pi-pt.vars);
+	}
+}
 }
 
 /* Return a hash value for a formal temporary table entry.  */

Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-14 Thread Steven Bosscher
On Sun, Oct 14, 2012 at 9:02 AM, Paolo Bonzini wrote:
 Can we just simulate liveness for web, and drop REG_EQUAL/REG_EQUIV
 notes that refer to a dead pseudo?

I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a
pseudo that isn't live and still be valid. Consider a simple example
like this:

a = b + 3
// b dies here
c = a {REG_EQUAL b+3}

The REG_EQUAL note is valid and may help optimization. Removing it
just because b is dead at that point would be unnecessarily
pessimistic.

I also don't want to compute DF_LR taking EQ_USES into account as real
uses for liveness, because that involves recomputing and enlarging the
DF_LR sets (all of them, both globally and locally) before LRRD and
after LRRD. That's why I implemented the quick-and-dirty liveness
computation for the notes: It's non-intrusive on DF_LR and it's cheap.

Ciao!
Steven


Committed, MMIX: fix INCOMING_REGNO / OUTGOING_REGNO for return-value

2012-10-14 Thread Hans-Peter Nilsson
Back then, I must've missed that INCOMING_REGNO / OUTGOING_REGNO are
used to map return-value-register/s too.  Fixes:
FAIL: gcc.dg/builtin-apply4.c execution test
...
FAIL: gcc.dg/builtin-return-1.c execution test
...
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O0  execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O1  execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O2  execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O3 -fomit-frame-pointer  
execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O3 -g  execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -Os  execution test
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution tes\
t
FAIL: gcc.dg/torture/stackalign/builtin-apply-4.c  -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O0  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O1  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O2  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O3 -fomit-frame-pointer  
execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O3 -g  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -Os  execution test
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O2 -flto 
-fno-use-linker-plugin -flto-partition=none  execution te\
st
FAIL: gcc.dg/torture/stackalign/builtin-return-1.c  -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test

Committed.

* config/mmix/mmix.c (mmix_opposite_regno): Handle the
return-value register too.

--- gcc/config/mmix/mmix.c.prev 2012-10-09 02:00:51.0 +0200
+++ gcc/config/mmix/mmix.c  2012-10-14 00:59:54.0 +0200
@@ -392,15 +392,33 @@ mmix_conditional_register_usage (void)

 /* INCOMING_REGNO and OUTGOING_REGNO worker function.
Those two macros must only be applied to function argument
-   registers.  FIXME: for their current use in gcc, it'd be better
-   with an explicit specific additional FUNCTION_INCOMING_ARG_REGNO_P
-   a'la TARGET_FUNCTION_ARG / TARGET_FUNCTION_INCOMING_ARG instead of
+   registers and the function return value register for the opposite
+   use.  FIXME: for their current use in gcc, it'd be better with an
+   explicit specific additional FUNCTION_INCOMING_ARG_REGNO_P a'la
+   TARGET_FUNCTION_ARG / TARGET_FUNCTION_INCOMING_ARG instead of
forcing the target to commit to a fixed mapping and for any
-   unspecified register use.  */
+   unspecified register use.  Particularly when thinking about the
+   return-value, it is better to imagine INCOMING_REGNO and
+   OUTGOING_REGNO as named CALLEE_TO_CALLER_REGNO and INNER_REGNO as
+   named CALLER_TO_CALLEE_REGNO because the direction.  The incoming
+   and outgoing is from the perspective of the parameter-registers,
+   but the same macro is (must be, lacking an alternative like
+   suggested above) used to map the return-value-register from the
+   same perspective.  To make directions even more confusing, the macro
+   MMIX_OUTGOING_RETURN_VALUE_REGNUM holds the number of the register
+   in which to return a value, i.e. INCOMING_REGNO for the return-value-
+   register as received from a called function; the return-value on the
+   way out.  */

 int
 mmix_opposite_regno (int regno, int incoming)
 {
+  if (incoming  regno == MMIX_OUTGOING_RETURN_VALUE_REGNUM)
+return MMIX_RETURN_VALUE_REGNUM;
+
+  if (!incoming  regno == MMIX_RETURN_VALUE_REGNUM)
+return MMIX_OUTGOING_RETURN_VALUE_REGNUM;
+
   if (!mmix_function_arg_regno_p (regno, incoming))
 return regno;

brgds, H-P


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-14 Thread Eric Botcazou
 I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a
 pseudo that isn't live and still be valid. Consider a simple example
 like this:
 
 a = b + 3
 // b dies here
 c = a {REG_EQUAL b+3}
 
 The REG_EQUAL note is valid and may help optimization. Removing it
 just because b is dead at that point would be unnecessarily
 pessimistic.

But if you have a REG_DEAD note for b on the first insn, then you cannot 
rematerialize the REG_EQUAL note after it, otherwise bad things can happen.

See PR rtl-optimization/51505 for an example.

-- 
Eric Botcazou


Re: Fix twolf -funroll-loops -O3 miscompilation (a semi-latent web.c bug)

2012-10-14 Thread Steven Bosscher
On Sun, Oct 14, 2012 at 11:25 PM, Eric Botcazou wrote:
 I don't think we want to do that. A REG_EQUAL/REG_EQUIV note can use a
 pseudo that isn't live and still be valid. Consider a simple example
 like this:

 a = b + 3
 // b dies here
 c = a {REG_EQUAL b+3}

 The REG_EQUAL note is valid and may help optimization. Removing it
 just because b is dead at that point would be unnecessarily
 pessimistic.

 But if you have a REG_DEAD note for b on the first insn, then you cannot
 rematerialize the REG_EQUAL note after it, otherwise bad things can happen.

 See PR rtl-optimization/51505 for an example.

That's not the case here. The register is only dead because the
webizer renamed one of its live ranges but forgets to rename the
EQ_NOTE use.

Ciao!
Steven


Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-14 Thread Janne Blomqvist
On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter
tobias.schlue...@physik.uni-muenchen.de wrote:

 Hi,

 first a question also to non-gfortraners: if I want to use std::map, where
 do I #include map?  In system.h?

 Now to the patch-specific part: in this PR, module files are produced with
 random changes because the order in which symbols are written can depend on
 the memory layout.  This patch fixes this by recording which symbols need to
 be written and then processing them in order.  The patch doesn't make the
 more involved effort of putting all symbols into the module in an easily
 predicted order, instead it only makes sure that the order remains fixed
 across the compiler invocations.  The reason why the former is difficult is
 that during the process of writing a symbol, it can turn out that other
 symbols will have to be written as well (say, because they appear in array
 specifications).  Since the module-writing code determines which symbols to
 output while actually writing the file, recording all the symbols that need
 to be written before writing to the file would mean a lot of surgery.

 I'm putting forward two patches.  One uses a C++ map to very concisely build
 up and handle the ordered list of symbols.  This has three problems:
 1) gfortran maintainers may not want C++isms (even though in this case
it's very localized, and in my opinion very transparent), and
 2) it can't be backported to old release branches which are still
compiled as C.  Joost expressed interested in a backport.
 3) I don't know where to #include map (see above)
 Therefore I also propose a patch where I added the necessary ~50 lines of
 boilerplate code and added the necessary traversal function to use
 gfortran's GFC_BBT to maintain the ordered tree of symbols.

 Both patches pass the testsuite and Joost confirms that they fix the problem
 with CP2K.  I also verified with a few examples that they both produce
 identical .mod files as they should.

 Is the C++ patch, modified to do the #include correctly, ok for the trunk?
 If not, the C-only patch?  Can I put the C-only patch on the release
 branches?  And which?

Hi,

I'm pleasantly surprised that you managed to fix this PR with so little code!

- Personally, I'd prefer the C++ version; The C++ standard library is
widely used and documented and using it in favour of rolling our own
is IMHO a good idea.

- I'd be vary wrt backporting, in my experience the module.c code is
somewhat fragile and easily causes regressions. In any case, AFAICS PR
51727 is not a regression.

- I think one could go a step further and get rid of the BBT stuff in
pointer_info, replacing it with two file-level maps

std::mapvoid*, pointer_info* pmap; // Or could be std::unordered_map
if available
std::mapint, pointer_info* imap;

So when writing a module, use pmap similar to how pointer_info BBT is
used now, and then use imap to get a consistent order per your patch.
When reading, lookup/create mostly via imap, creating a pmap entry
also when creating a new imap entry; this avoids having to do a
brute-force search when looking up via pointer when reading (see
find_pointer2()).

(This 3rd point is mostly an idea for further work, and is not meant
as a requirement for accepting the patch)

Ok for trunk, although wait for a few days in case there is a storm of
protest on the C vs. C++ issue from other gfortran maintainers.


-- 
Janne Blomqvist


Re: PR fortran/51727: make module files reproducible, question on C++ in gcc

2012-10-14 Thread Jakub Jelinek
On Mon, Oct 15, 2012 at 12:35:27AM +0300, Janne Blomqvist wrote:
 On Sat, Oct 13, 2012 at 4:26 PM, Tobias Schlüter
  I'm putting forward two patches.  One uses a C++ map to very concisely build
  up and handle the ordered list of symbols.  This has three problems:
  1) gfortran maintainers may not want C++isms (even though in this case
 it's very localized, and in my opinion very transparent), and

Even if you prefer a C++isms, why don't you go for hash-table.h?
std::map at least with the default allocator will just crash the compiler
if malloc returns NULL (remember that we build with -fno-exceptions),
while when you use hash-table.h (or hashtab.h) you get proper OOM diagnostics.

Jakub


Re: [testsuite] gcc.target/arm/div64-unwinding.c: xfail for linux

2012-10-14 Thread Michael Hope
On 10 October 2012 22:57, Richard Earnshaw rearn...@arm.com wrote:
 On 10/10/12 03:11, Janis Johnson wrote:

 On 10/09/2012 07:39 AM, Richard Earnshaw wrote:

 On 27/09/12 01:02, Janis Johnson wrote:

 Test gcc.target/arm/div64-unwinding.c is known to fail for GNU/Linux
 targets, as described in PR54732.  This patch adds an XFAIL.

 Tested on arm-none-eabi and arm-none-linux-gnueabi, checked in on trunk.

 Janis


 gcc-20120926-5


 2012-09-26  Janis Johnson  jani...@codesourcery.com

 * gcc.target/arm/div64-unwinding.c: XFAIL for GNU/Linux.

 Index: gcc.target/arm/div64-unwinding.c
 ===
 --- gcc.target/arm/div64-unwinding.c(revision 191765)
 +++ gcc.target/arm/div64-unwinding.c(working copy)
 @@ -1,6 +1,7 @@
/* Performing a 64-bit division should not pull in the unwinder.  */

 -/* { dg-do run } */
 +/* The test is expected to fail for GNU/Linux; see PR54723.  */
 +/* { dg-do run { xfail *-*-linux* } } */
/* { dg-options -O0 } */

#include stdlib.h


 I don't like this.  To me, XFAIL means there's a bug here, but we're
 not too worried about it.  The behaviour on linux targets is correct,
 so this test should either PASS or be skipped.


 Richard,

 The impression I got from Julian is there's a bug here, but we're not
 too worried about it.  If you think it should be skipped instead then
 I'll gladly change the test.

 Janis



 I don't believe there's a bug here.   The ARM EABI defines __aeabi_idiv0 as
 a hook that will be called if division by zero occurs.  While the default
 implementation simply raises SIGFPE on linux, it is perfectly possible to
 provide your own definition of this hook and then throw() a C++ exception.
 In order to do that you'd need unwind information in the divdi
 implementation ([u]divsi tailcalls the hook).

 Technically you could argue the same for bare metal, but in that case the
 arguments against the code bloat outweigh this very small corner case and
 users wanting this will have to rebuild their support code.

 On linux, I think the presence of the unwind information is correct, since
 the code bloat problem is very much a secondary concern.

 So yes, please could you make the test be skipped on linux.

Julian's patch turns off the unwinding information for all ARM systems
including Linux.  The test currently fails as something else (glibc?)
ends up pulling in the unwinder.

-- Michael


[lra] merged with trunk @192442

2012-10-14 Thread Vladimir Makarov

 LRA branch was merged with trunk @192442.  Committed as rev. 192446.


Re: [PATCH] Fix gcov handling directories with periods

2012-10-14 Thread Ian Lance Taylor
On Sat, Oct 13, 2012 at 1:11 PM, Andreas Schwab sch...@linux-m68k.org wrote:
 Ian Lance Taylor i...@google.com writes:

 Suppose you drop this into include/libiberty.h:

 #ifdef __cplusplus
 inline char *lbasename(char *s) { return const_castchar*(lbasename (s)); }
 #endif

 That doesn't work:

 ../../gcc/libcpp/../include/libiberty.h: In function ‘char* lbasename(char*)’:
 ../../gcc/libcpp/../include/libiberty.h:123:31: error: declaration of C 
 function ‘char* lbasename(char*)’ conflicts with
 ../../gcc/libcpp/../include/libiberty.h:121:20: error: previous declaration 
 ‘const char* lbasename(const char*)’ here

Hmmm, of course.

OK, your patch with CONST_CAST is OK.

Thanks.

Ian


Ping^2: RFA: Process '*' in '@'-output-template alternatives

2012-10-14 Thread Joern Rennecke

The following patch is still awaiting review:

2011-09-19  Jorn Rennecke  joern.renne...@arc.com

* genoutput.c (process_template): Process '*' in '@' alternatives.
* doc/md.texi (node Output Statement): Provide example for the above.

http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01422.html


Ping: RFA: Improve doloop_begin support

2012-10-14 Thread Joern Rennecke

2012-09-26  Jorn Rennecke  joern.renne...@arc.com

* loop-doloop.c (doloop_modify): Pass doloop_end pattern to
gen_doloop_begin.
* loop-doloop.c (doloop_optimize): Pass flag to indicate if loop is
entered at top to gen_doloop_end.
* config/arm/thumb2.md (doloop_end): Accept extra operand.
* config/bfin/bfin.md (doloop_end): Likewise.
* config/c6x/c6x.md (doloop_end): Likewise.
* config/ia64/ia64.md (doloop_end): Likewise.
* config/mep/mep.md (doloop_begin, doloop_end): Likewise.
* config/rs6000/rs6000.md (doloop_end): Likewise.
* config/s390/s390.md (doloop_end): Likewise.
* config/sh/sh.md (doloop_end): Likewise.
* config/spu/spu.md (doloop_end): Likewise.
* config/tilegx/tilegx.md (doloop_end): Likewise.
* config/tilepro/tilepro.md (doloop_end): Likewise.
* doc/md.texi (doloop_end): Document new operand.

http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01807.html


Ping: RFA: Fix OP_INOUT handling of web.c:union_match_dups

2012-10-14 Thread Joern Rennecke

2012-10-02  Joern Rennecke  joern.renne...@embecosm.com

* web.c (union_match_dups): Properly handle OP_INOUT match_dups.

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00189.html


Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-14 Thread Joern Rennecke

2012-10-04  Joern Rennecke  joern.renne...@embecosm.com

* final.c (get_attr_length_1): Use direct recursion rather than
calling get_attr_length.
(get_attr_lock_length): New function.
(INSN_VARIABLE_LENGTH_P): Define.
(shorten_branches): Take HAVE_ATTR_lock_length into account.
Don't overwrite non-delay slot insn lengths with the lengths of
delay slot insns with same uid.
* genattrtab.c (lock_length_str): New variable.
(make_length_attrs): New parameter base.
(main): Initialize lock_length_str.
Generate lock_lengths attributes.
* genattr.c (gen_attr): Emit declarations for lock_length attribute
related functions.
* doc/md.texi (node Insn Lengths): Document lock_length attribute.

http://gcc.gnu.org/ml/gcc-patches/2012-10/msg00383.html