from:"Easwaran Raman"

Re: [4.9] PR 62146

2014-09-04 Thread Easwaran Raman

I've attached the revised patch. Bootstrapped and no test regressions
on x86_64/linux with 4.9 branch. Ok for 4.9 branch? While the bug
doesn't show up in trunk, seems obvious that this should go to trunk
as well. Is it ok for trunk if tests pass?

Btw, is g++.dg/opt the right directory for the test to go?

-

ChangeLog

2014-09-04  Easwaran Raman  era...@google.com

PR rtl-optimization/62146
* ifcvt.c (dead_or_predicable): Make removal of REG_EQUAL note of
hoisted instruction unconditional.

2014-09-04  Easwaran Raman  era...@google.com

PR rtl-optimization/62146
* testsuite/g++.dg/opt/pr62146.C: New.


On Wed, Sep 3, 2014 at 11:36 AM, Jeff Law l...@redhat.com wrote:
 On 09/02/14 12:52, Easwaran Raman wrote:

 It turns out that the REG_EQUAL note is removed on a hoisted
 instruction (relevant code is in dead_or_predicable in ifcvt.c) if the
 source of the move instruction is not a function invariant. In this
 case, the source is a function invariant (constant) and so that
 doesn't kick in. I don't understand why this exemption for function
 invariant is there and the original thread in
 https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain
 either. Should I just remove the REG_EQUAL notes of all hoisted
 instructions or are there cases where it is safe to leave the note?

 I suspect it was left in in an attempt to keep as many REG_EQUAL notes as
 possible as the notes can help later optimizations.  But even those
 equivalences are not necessarily safe.

 I'm pretty sure the right thing to do here is just drop the note regardless
 of whether or not its an invariant.

 jeff

Index: testsuite/g++.dg/opt/pr62146.C
===
--- testsuite/g++.dg/opt/pr62146.C	(revision 0)
+++ testsuite/g++.dg/opt/pr62146.C	(revision 0)
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/62146 */
+/* { dg-do compile } */
+/* { dg-options -O2  } */
+class F
+{
+public:
+virtual ~ F ();
+};
+template  class CL  class G:public F
+{
+int *member_;
+public:
+G ( int *b): member_ (0)
+{
+}
+};
+
+class D
+{
+public:
+template  class CL  void RegisterNonTagCallback (int,
+void (CL::
+  *p3) ())
+{
+InternalRegisterNonTag (p3 ? new G  CL  ( 0) : 0);
+} void InternalRegisterNonTag (F *);
+};
+
+void fn1 ();
+class C1
+{
+void  foo();
+class TokenType
+{
+public:
+void AddToken ()
+{
+}
+};
+C1::TokenType bar_t;
+};
+D a;
+void C1::foo()
+{
+if (bar_t)
+fn1 ();
+for (int i = 0; i  sizeof 0; ++i)
+a.RegisterNonTagCallback (0, TokenType::AddToken);
+}
+
+/* { dg-final { scan-assembler-not mov.*_ZN2C19TokenType8AddTokenEv, .\\\( } } */
Index: ifcvt.c
===
--- ifcvt.c	(revision 214472)
+++ ifcvt.c	(working copy)
@@ -4387,17 +4387,14 @@ dead_or_predicable (basic_block test_bb, basic_blo
   insn = head;
   do
 	{
-	  rtx note, set;
+	  rtx note;
 
 	  if (! INSN_P (insn))
 	continue;
 	  note = find_reg_note (insn, REG_EQUAL, NULL_RTX);
 	  if (! note)
 	continue;
-	  set = single_set (insn);
-	  if (!set || !function_invariant_p (SET_SRC (set))
-	  || !function_invariant_p (XEXP (note, 0)))
-	remove_note (insn, note);
+	  remove_note (insn, note);
 	} while (insn != end  (insn = NEXT_INSN (insn)));
 
   /* PR46315: when moving insns above a conditional branch, the REG_EQUAL

Re: [4.9] PR 62146

2014-09-02 Thread Easwaran Raman

It turns out that the REG_EQUAL note is removed on a hoisted
instruction (relevant code is in dead_or_predicable in ifcvt.c) if the
source of the move instruction is not a function invariant. In this
case, the source is a function invariant (constant) and so that
doesn't kick in. I don't understand why this exemption for function
invariant is there and the original thread in
https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain
either. Should I just remove the REG_EQUAL notes of all hoisted
instructions or are there cases where it is safe to leave the note?

Thanks,
Easwaran



On Fri, Aug 29, 2014 at 1:06 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/14 16:42, Easwaran Raman wrote:

 This patch deletes REG_EQUAL note when a src register is replaced by a
 constant in an assignment. This is to prevent spurious equivalences
 between the constant and the expression in the REG_EQUAL note. In the
 bug reported in PR 62146, an assignment in one branch (which is
 actually dead) of an IF statement has a REG_EQUAL note equating a
 register with an expression. Conditional copy propagation replaces the
 register with 0. The instruction is hoisted above the branch
 subsequently and then the value 0 is equated with the expression in
 the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?

 This patch looks applicable to trunk as well, but I don't have a test
 case to reproduce the issue in trunk.

 Something doesn't feel right with this patch.  It seems to me the real
 problem is when when hoist the insn with the note.  If the equivalence
 implied by the note is no longer valid at the insn's new location, then the
 note needs to be removed.

 Now determining if the note is no longer valid at the new location may prove
 difficult ;-)  You'd probably have to know why the note was created, how it
 was changed, etc.  So I suspect the right thing to do is just remove
 REG_EQUAL notes on any insns we hoist in this manner.

 Jeff

Re: [4.9] PR 62146

2014-08-27 Thread Easwaran Raman

On Mon, Aug 25, 2014 at 3:42 PM, Easwaran Raman era...@google.com wrote:
 This patch deletes REG_EQUAL note when a src register is replaced by a
 constant in an assignment. This is to prevent spurious equivalences
 between the constant and the expression in the REG_EQUAL note. In the
 bug reported in PR 62146, an assignment in one branch (which is
 actually dead) of an IF statement has a REG_EQUAL note equating a
 register with an expression. Conditional copy propagation replaces the
 register with 0. The instruction is hoisted above the branch
 subsequently and then the value 0 is equated with the expression in
 the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?

 This patch looks applicable to trunk as well, but I don't have a test
 case to reproduce the issue in trunk.


 ChangeLog:

 2014-08-25  Easwaran Raman  era...@google.com

 PR rtl-optimization/62146
 * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is
 propagated into the src of an assignment.

 testsuite/ChangeLog

 2014-08-25  Easwaran Raman  era...@google.com

 PR rtl-optimization/62146
 * testsuite/g++.dg/opt/pr62146.C: New.

[4.9] PR 62146

2014-08-25 Thread Easwaran Raman

This patch deletes REG_EQUAL note when a src register is replaced by a
constant in an assignment. This is to prevent spurious equivalences
between the constant and the expression in the REG_EQUAL note. In the
bug reported in PR 62146, an assignment in one branch (which is
actually dead) of an IF statement has a REG_EQUAL note equating a
register with an expression. Conditional copy propagation replaces the
register with 0. The instruction is hoisted above the branch
subsequently and then the value 0 is equated with the expression in
the REG_EQUAL. Is this ok for 4.9 branch if all tests pass?

This patch looks applicable to trunk as well, but I don't have a test
case to reproduce the issue in trunk.


ChangeLog:

2014-08-25  Easwaran Raman  era...@google.com

PR rtl-optimization/62146
* cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is
propagated into the src of an assignment.

testsuite/ChangeLog

2014-08-25  Easwaran Raman  era...@google.com

PR rtl-optimization/62146
* testsuite/g++.dg/opt/pr62146.C: New.
Index: gcc/testsuite/g++.dg/opt/pr62146.C
===
--- gcc/testsuite/g++.dg/opt/pr62146.C	(revision 0)
+++ gcc/testsuite/g++.dg/opt/pr62146.C	(revision 0)
@@ -0,0 +1,51 @@
+/* PR rtl-optimization/62146 */
+/* { dg-do compile } */
+/* { dg-options -O2  } */
+class F
+{
+public:
+virtual ~ F ();
+};
+template  class CL  class G:public F
+{
+int *member_;
+public:
+G ( int *b): member_ (0)
+{
+}
+};
+
+class D
+{
+public:
+template  class CL  void RegisterNonTagCallback (int,
+void (CL::
+  *p3) ())
+{
+InternalRegisterNonTag (p3 ? new G  CL  ( 0) : 0);
+} void InternalRegisterNonTag (F *);
+};
+
+void fn1 ();
+class C1
+{
+void  foo();
+class TokenType
+{
+public:
+void AddToken ()
+{
+}
+};
+C1::TokenType bar_t;
+};
+D a;
+void C1::foo()
+{
+if (bar_t)
+fn1 ();
+for (int i = 0; i  sizeof 0; ++i)
+a.RegisterNonTagCallback (0, TokenType::AddToken);
+}
+
+/* { dg-final { scan-assembler-not mov.*_ZN2C19TokenType8AddTokenEv, .\\\( } } */
Index: gcc/cprop.c
===
--- gcc/cprop.c	(revision 214472)
+++ gcc/cprop.c	(working copy)
@@ -790,8 +790,11 @@ try_replace_reg (rtx from, rtx to, rtx insn)
   /* REG_EQUAL may get simplified into register.
  We don't allow that. Remove that note. This code ought
  not to happen, because previous code ought to synthesize
- reg-reg move, but be on the safe side.  */
-  if (note  REG_NOTE_KIND (note) == REG_EQUAL  REG_P (XEXP (note, 0)))
+ reg-reg move, but be on the safe side. The REG_EQUAL note is
+ also removed when the source is a constant.  */
+  if (note  REG_NOTE_KIND (note) == REG_EQUAL
+   (REG_P (XEXP (note, 0))
+  || (set  CONSTANT_P (SET_SRC (set)
 remove_note (insn, note);
 
   return success;

Re: Fix for PR c++/59031

2013-11-15 Thread Easwaran Raman

Ping.


On Mon, Nov 11, 2013 at 9:46 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman era...@google.com wrote:
 Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
 is set and the call is direct. The changes at r193504 (to fix PR
 c++/11750) caused a regression to this behavior. This patch attempts
 to fix that. Bootstraps and no test regressions on x86_64/linux. Is
 this a correct fix and is this ok for trunk?

 Thanks,
 Easwaran

 2013-11-07  Easwaran Raman  era...@google.com

 PR c++/59031
 * call.c (build_new_method_call_1): Comnpare function context
 with BASELINK_BINFO type rather than instance type before
 marking the call with LOOKUP_NONVIRTUAL.

 testsuite/ChangeLog:

 2013-11-07  Easwaran Raman  era...@google.com

 PR c++/59031
 * g++.dg/inherit/virtual11.C: New test.

Re: Fix for PR c++/59031

2013-11-11 Thread Easwaran Raman

Ping.

On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman era...@google.com wrote:
 Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
 is set and the call is direct. The changes at r193504 (to fix PR
 c++/11750) caused a regression to this behavior. This patch attempts
 to fix that. Bootstraps and no test regressions on x86_64/linux. Is
 this a correct fix and is this ok for trunk?

 Thanks,
 Easwaran

 2013-11-07  Easwaran Raman  era...@google.com

 PR c++/59031
 * call.c (build_new_method_call_1): Comnpare function context
 with BASELINK_BINFO type rather than instance type before
 marking the call with LOOKUP_NONVIRTUAL.

 testsuite/ChangeLog:

 2013-11-07  Easwaran Raman  era...@google.com

 PR c++/59031
 * g++.dg/inherit/virtual11.C: New test.

Fix for PR c++/59031

2013-11-07 Thread Easwaran Raman

Before  r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL
is set and the call is direct. The changes at r193504 (to fix PR
c++/11750) caused a regression to this behavior. This patch attempts
to fix that. Bootstraps and no test regressions on x86_64/linux. Is
this a correct fix and is this ok for trunk?

Thanks,
Easwaran

2013-11-07  Easwaran Raman  era...@google.com

PR c++/59031
* call.c (build_new_method_call_1): Comnpare function context
with BASELINK_BINFO type rather than instance type before
marking the call with LOOKUP_NONVIRTUAL.

testsuite/ChangeLog:

2013-11-07  Easwaran Raman  era...@google.com

PR c++/59031
* g++.dg/inherit/virtual11.C: New test.
Index: testsuite/g++.dg/inherit/virtual11.C
===
--- testsuite/g++.dg/inherit/virtual11.C	(revision 0)
+++ testsuite/g++.dg/inherit/virtual11.C	(revision 0)
@@ -0,0 +1,17 @@
+// PR c++/59031 
+// { dg-do compile }
+// { dg-options -fdump-tree-gimple  }
+class B {
+ public:
+  virtual int add (int a, int b) {return a+ b;}
+};
+
+class D : public B {
+};
+
+int foo (int a, int b) {
+  D d;
+  return d.add(a, b);
+}
+// { dg-final { scan-tree-dump-not OBJ_TYPE_REF gimple } }
+// { dg-final { cleanup-tree-dump gimple } }
Index: cp/call.c
===
--- cp/call.c	(revision 204466)
+++ cp/call.c	(working copy)
@@ -7511,7 +7511,7 @@ build_new_method_call_1 (tree instance, tree fns,
   struct z_candidate *candidates = 0, *cand;
   tree explicit_targs = NULL_TREE;
   tree basetype = NULL_TREE;
-  tree access_binfo;
+  tree access_binfo, binfo;
   tree optype;
   tree first_mem_arg = NULL_TREE;
   tree name;
@@ -7550,6 +7550,7 @@ build_new_method_call_1 (tree instance, tree fns,
   if (!conversion_path)
 conversion_path = BASELINK_BINFO (fns);
   access_binfo = BASELINK_ACCESS_BINFO (fns);
+  binfo = BASELINK_BINFO (fns);
   optype = BASELINK_OPTYPE (fns);
   fns = BASELINK_FUNCTIONS (fns);
   if (TREE_CODE (fns) == TEMPLATE_ID_EXPR)
@@ -7802,7 +7803,7 @@ build_new_method_call_1 (tree instance, tree fns,
 		 do not want to perform a non-virtual call.  */
 	  if (DECL_VINDEX (fn)  ! (flags  LOOKUP_NONVIRTUAL)
 		   same_type_ignoring_top_level_qualifiers_p
-		  (DECL_CONTEXT (fn), TREE_TYPE (instance))
+		  (DECL_CONTEXT (fn), TREE_TYPE (binfo))
 		   resolves_to_fixed_type_p (instance, 0))
 		flags |= LOOKUP_NONVIRTUAL;
   if (explicit_targs)

Re: Add a param to decide stack slot sharing at -O0

2013-10-09 Thread Easwaran Raman

On Wed, Oct 9, 2013 at 4:11 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Oct 8, 2013 at 11:04 PM, Easwaran Raman era...@google.com wrote:
 In cfgexpand.c, variables in non-overlapping lexical scopes are
 assigned same stack locations at -O1 and above. At -O0, this is
 attempted only if the size of the stack objects is above a threshold
 (32). The rationale is at -O0, more variables are going to be in the
 stack and the O(n^2) stack slot sharing algorithm will increase the
 compilation time. This patch replaces the constant with a param which
 is set to 32 by default. We ran into a case where the presence of
 always_inline attribute triggered Wframe-larger-than warnings at -O0
 but not at -O2 since the different inlined copies share the stack. We
 are ok with a slight increase in compilation time to get smaller stack
 frames even at -O0 and this patch would allow us do that easily.

 Bootstraps on x86_64/linux. Is this ok for trunk?

 Ok with

 +DEFPARAM (PARAM_MIN_SIZE_FOR_STACK_SHARING,
 + min-size-for-stack-sharing,
 + Attempt to share stack slots among variables in different
 lexical blocks 
 + at O0 only if their sizes exceed this value,
 + 32, 0, 0)

 changed to

The minimum size of variables taking part in stack slot sharing 
when not optimizing
32, 0, 0)

 And with adding documentation for that param in doc/invoke.texi.

 Btw, I'm not sure the sharing algorithm is still quadratic - can you
 investigate on that?

The partition_stack_vars is still quadratic in worst case (all live
ranges overlap), but the expensive part is likely to be the
add_scope_conflicts function.

- Easwaran

 Thanks,
 Richard.

 Thanks,
 Easwaran


 2013-10-08  Easwaran Raman era...@google.com

 * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param...
 * cfgexpand.c (defer_stack_allocation): ...use here

Add a param to decide stack slot sharing at -O0

2013-10-08 Thread Easwaran Raman

In cfgexpand.c, variables in non-overlapping lexical scopes are
assigned same stack locations at -O1 and above. At -O0, this is
attempted only if the size of the stack objects is above a threshold
(32). The rationale is at -O0, more variables are going to be in the
stack and the O(n^2) stack slot sharing algorithm will increase the
compilation time. This patch replaces the constant with a param which
is set to 32 by default. We ran into a case where the presence of
always_inline attribute triggered Wframe-larger-than warnings at -O0
but not at -O2 since the different inlined copies share the stack. We
are ok with a slight increase in compilation time to get smaller stack
frames even at -O0 and this patch would allow us do that easily.

Bootstraps on x86_64/linux. Is this ok for trunk?

Thanks,
Easwaran


2013-10-08  Easwaran Raman era...@google.com

* params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param...
* cfgexpand.c (defer_stack_allocation): ...use here


cfgexpand.patch
Description: Binary data

Don't drop attributes on template member functions

2013-10-02 Thread Easwaran Raman

This attempts to address one issue reported in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33911 . Attributes at the
end of a declaration of template member functions are discarded in
cp_parser_init_declarator.  The suggested workaround is to place the
attributes into the declaration-specifiers. The code in
cp_parser_init_declarator actually parses the attributes at the end,
but only passes the prefix_attributes (from decl-specifier-seq) to
grokfield(). Appending the parsed atrtributes to prefix_attributes
seems to work, but I don't know if this is sufficient in all cases.
Bootstraps and all tests pass with the patch on x86_64/linux. Is this
ok for trunk?

2013-10-02  Easwaran Raman  era...@google.com

PR c++/33911
* parser.c (cp_parser_init_declarator): Do not drop attributes
of template member functions.

2013-10-02  Easwaran Raman  era...@google.com

PR c++/33911
* g++.dg/ext/attribute47.C: New.


parser_attribute.patch
Description: Binary data

[Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman

This patch increases comdat-sharing-probability to 80 under -Os. This
reduces the amount of inlining and helps internal benchmarks.
Unfortunately, this causes slight regression on spec 2006. Ok for
google branches if all tests pass?

- Easwaran


comdat_sharing.patch
Description: Binary data

Re: [Google] Adjust comdat-sharing-probability param for -Os

2013-09-27 Thread Easwaran Raman

On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li davi...@google.com wrote:
 d.growth -= (info-size
 * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY))
 + 50) / 100;

 What is the purpose of '50' here?

Rounding after division by 100.

 The patch is fine for Google branch.

 Other tunings to think about -- I think the sharing probability should
 not be a fixed value -- but depending on the function's charateristics
 -- such as size, number of callsites etc. For instance, for small leaf
 comdat functions, the sharing probability will be small.
 David


 On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote:
 This patch increases comdat-sharing-probability to 80 under -Os. This
 reduces the amount of inlining and helps internal benchmarks.
 Unfortunately, this causes slight regression on spec 2006. Ok for
 google branches if all tests pass?

 - Easwaran

Re: Fix PR middle-end/57393

2013-09-23 Thread Easwaran Raman

Ping.


On Mon, Sep 16, 2013 at 4:42 PM, Easwaran Raman era...@google.com wrote:
 There are two separate root causes for the problem reported in PR
 57393. This patch attempts to fix both.

 First is due to newly created stmts that have the default UID of 0
 which are compared with statements with valid UIDs leading to broken
 dependences. As discussed in an earlier thread, I check the UIDs
 before using them and ensure stmts have a valid UID. In the worst
 case, this reassigns UIDs for the entire BB containing the stmts in
 question.

 The second is due to debug stmts being out of sync with the IR after
 reassociation. I think the right fix is to create debug temps before
 actual reassociation to decouple them from the SSA variables involved
 in reassociation.

 This bootstraps in x86_64 and I am running the tests. Ok for trunk?

 Thanks,
 Easwaran

 2013-09-16  Easwaran Raman  era...@google.com

 PR middle-end/57393
 * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove.
 (build_and_add_sum): Do not set UID of newly created statements.
 (ensure_valid_uid): New function,
 (find_insert_point): called here before UIDs are compared.
 (insert_stmt_after): Do not reset debug statements.
 (regenerate_debug_stmts): New function,
 (reassociate_bb): use here.

 testsuite/ChangeLog:
 2013-09-16  Easwaran Raman  era...@google.com

 PR middle-end/57393
 * gcc.dg/tree-ssa/reassoc-32.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-33.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-34.c: New testcase.

Fix PR middle-end/57393

2013-09-16 Thread Easwaran Raman

There are two separate root causes for the problem reported in PR
57393. This patch attempts to fix both.

First is due to newly created stmts that have the default UID of 0
which are compared with statements with valid UIDs leading to broken
dependences. As discussed in an earlier thread, I check the UIDs
before using them and ensure stmts have a valid UID. In the worst
case, this reassigns UIDs for the entire BB containing the stmts in
question.

The second is due to debug stmts being out of sync with the IR after
reassociation. I think the right fix is to create debug temps before
actual reassociation to decouple them from the SSA variables involved
in reassociation.

This bootstraps in x86_64 and I am running the tests. Ok for trunk?

Thanks,
Easwaran

2013-09-16  Easwaran Raman  era...@google.com

PR middle-end/57393
* tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove.
(build_and_add_sum): Do not set UID of newly created statements.
(ensure_valid_uid): New function,
(find_insert_point): called here before UIDs are compared.
(insert_stmt_after): Do not reset debug statements.
(regenerate_debug_stmts): New function,
(reassociate_bb): use here.

testsuite/ChangeLog:
2013-09-16  Easwaran Raman  era...@google.com

PR middle-end/57393
* gcc.dg/tree-ssa/reassoc-32.c: New testcase.
* gcc.dg/tree-ssa/reassoc-33.c: New testcase.
* gcc.dg/tree-ssa/reassoc-34.c: New testcase.


pr57393.patch
Description: Binary data

Re: Fix PR middle-end/57370

2013-09-06 Thread Easwaran Raman

On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman era...@google.com wrote:
 On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.

 I looked into the problem related to the debug stmts in this failing
 test case in detail. Initially, the code sequence looks


 s$n_13 = MEM[(struct S *)s];
 # DEBUG s$n = s$n_13
 _2 = (double) s$n_13;
 _4 = _2 * a_3(D);
 _6 = (double) i_5(D);
 _7 = _4 * _6;
 _9 = (double) j_8(D);
 _10 = _7 * _9;
 bar (_10);
 # DEBUG D#2 = (double) k_12(D)
 # DEBUG D#1 = _7 * D#2
 # DEBUG t$n = (int) D#1

 After reassociation

 s$n_13 = MEM[(struct S *)s];
 # DEBUG s$n = s$n_13
 _2 = (double) s$n_13;
 _6 = (double) i_5(D);
 # DEBUG D#3 = _4 * _6
 _9 = (double) j_8(D);
 _4 = _9 * _2;
 _7 = _4 * a_3(D);
 _10 = _7 * _6;
 bar (_10);
 # DEBUG D#2 = (double) k_12(D)
 # DEBUG D#1 = D#3 * D#2
 # DEBUG t$n = (int) D#1

 In the above, # DEBUG D#3 = _4 * _6  appears above the statement that
 defines _4. But even if I move the def of D#3 below the statement
 defining _4, it is not sufficient. Before reassociation, t$n refers to
 the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
 reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
 k_12(D)). It seems the correct fix is to discard the debug temps whose
 RHS contains a variable that is involved in reassociation and then
 reconstruct it. Is that the expected fix here?

 The value of the DEBUG expression changes because the value
 that _4 computes changes - that is a no-no that may not happen.
 We cannot re-use _4 (without previously releasing it and allocating
 it newly) with a new value.  releasing _4 should have fixed up the
 debug stmts.

 So - can you verify whether we are indeed just replacing the
 RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of
 that expression?

I have confirmed that the SSA name of the LHS remains the same. The
reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2
and then calls update_stmt.

- Easwaran

 Another reason why re-using the same LHS for another value is
 wrong is that annotated information like points-to sets, alignment
 and soon value-range information will be wrong.

 Thanks,
 Richard.

 - Easwaran


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com 
 wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
 a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225

Re: Fix PR middle-end/57370

2013-09-05 Thread Easwaran Raman

On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.

I looked into the problem related to the debug stmts in this failing
test case in detail. Initially, the code sequence looks


s$n_13 = MEM[(struct S *)s];
# DEBUG s$n = s$n_13
_2 = (double) s$n_13;
_4 = _2 * a_3(D);
_6 = (double) i_5(D);
_7 = _4 * _6;
_9 = (double) j_8(D);
_10 = _7 * _9;
bar (_10);
# DEBUG D#2 = (double) k_12(D)
# DEBUG D#1 = _7 * D#2
# DEBUG t$n = (int) D#1

After reassociation

s$n_13 = MEM[(struct S *)s];
# DEBUG s$n = s$n_13
_2 = (double) s$n_13;
_6 = (double) i_5(D);
# DEBUG D#3 = _4 * _6
_9 = (double) j_8(D);
_4 = _9 * _2;
_7 = _4 * a_3(D);
_10 = _7 * _6;
bar (_10);
# DEBUG D#2 = (double) k_12(D)
# DEBUG D#1 = D#3 * D#2
# DEBUG t$n = (int) D#1

In the above, # DEBUG D#3 = _4 * _6  appears above the statement that
defines _4. But even if I move the def of D#3 below the statement
defining _4, it is not sufficient. Before reassociation, t$n refers to
the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
k_12(D)). It seems the correct fix is to discard the debug temps whose
RHS contains a variable that is involved in reassociation and then
reconstruct it. Is that the expected fix here?

- Easwaran


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
 phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525 + 
 +t5531 + t5533 + t5535 + t5537 + 
 +t5540
 +t5565 = t112 * t1685
 +t5575 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564 - 
 +0.80e1_dp * t5565 + t5568

Re: Fix PR middle-end/57370

2013-09-04 Thread Easwaran Raman

Submitted the patch (r202262) without the insert_stmt_after hunk. Also
fixed nits pointed out by Jakub.

- Easwaran

On Mon, Sep 2, 2013 at 2:31 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman era...@google.com wrote:
 On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote:
 Richard,
 Do you want me to commit everything but the  insert_stmt_after hunk
 now?

 Yes.

 There are more similar failures reported in
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
 the updated patch there. Shall I send that for review? Apart from the
 debug statement issue, almost all the bugs are due to dependence
 violation because certain newly inserted statements do not have the
 right UID. Instead of trying to catch all of them, will it be better
 if I check if the stmt has a proper uid (non-zero if it is not the
 first stmt) and assign a sensible value at the point where it is used
 (find_insert_point and appears_later_in_bb) instead of where the stmt
 is created? I think that would be less brittle.

 In the end all this placement stuff should be reverted and done as
 post-processing after reassoc is finished.  You can remember the
 inserted stmts that are candidates for moving (just set a pass-local
 flag on them) and assign UIDs for the whole function after all stmts
 are inserted.

 The problem is we need sane UID values as reassoc is happening - not
 after it is finished. As it process each stmt in reassoc_bb, it might
 generate new stmts which might have to be compared in
 find_insert_point or appears_later_in_bb.

 But if you no longer find_insert_point during reassoc but instead do
 a scheduling pass after it finished you won't need sane UIDs during
 reassoc.

 Richard.

 - Easwaran

 Richard.


 - Easwaran



 On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and 
 reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com 
 wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com 
 wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID 
 of a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446

Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman

On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote:
 Richard,
 Do you want me to commit everything but the  insert_stmt_after hunk
 now?

 Yes.

 There are more similar failures reported in
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
 the updated patch there. Shall I send that for review? Apart from the
 debug statement issue, almost all the bugs are due to dependence
 violation because certain newly inserted statements do not have the
 right UID. Instead of trying to catch all of them, will it be better
 if I check if the stmt has a proper uid (non-zero if it is not the
 first stmt) and assign a sensible value at the point where it is used
 (find_insert_point and appears_later_in_bb) instead of where the stmt
 is created? I think that would be less brittle.

 In the end all this placement stuff should be reverted and done as
 post-processing after reassoc is finished.  You can remember the
 inserted stmts that are candidates for moving (just set a pass-local
 flag on them) and assign UIDs for the whole function after all stmts
 are inserted.

The problem is we need sane UID values as reassoc is happening - not
after it is finished. As it process each stmt in reassoc_bb, it might
generate new stmts which might have to be compared in
find_insert_point or appears_later_in_bb.

- Easwaran

 Richard.


 - Easwaran



 On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com 
 wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
 a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525

Re: Fix PR middle-end/57370

2013-08-30 Thread Easwaran Raman

On Fri, Aug 30, 2013 at 9:26 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote:
  There are more similar failures reported in
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
  the updated patch there. Shall I send that for review? Apart from the
  debug statement issue, almost all the bugs are due to dependence
  violation because certain newly inserted statements do not have the
  right UID. Instead of trying to catch all of them, will it be better
  if I check if the stmt has a proper uid (non-zero if it is not the
  first stmt) and assign a sensible value at the point where it is used
  (find_insert_point and appears_later_in_bb) instead of where the stmt
  is created? I think that would be less brittle.
 
  In the end all this placement stuff should be reverted and done as
  post-processing after reassoc is finished.  You can remember the
  inserted stmts that are candidates for moving (just set a pass-local
  flag on them) and assign UIDs for the whole function after all stmts
  are inserted.

 The problem is we need sane UID values as reassoc is happening - not
 after it is finished. As it process each stmt in reassoc_bb, it might
 generate new stmts which might have to be compared in
 find_insert_point or appears_later_in_bb.

 A new stmt will be created with UID 0 and in case there is stmt movement,
 you could zero the UID during movement.  Then, you can just special case
 UID zero when you are looking at UIDs.  As on trunk/4.8 gsi_for_stmt is
 O(1), you can easily walk a couple of previous and/or later stmts and look
 for the first non-zero UID around it, and treat it as if the UID was
 previous non-zero UID + 0.5 or next non-zero UID - 0.5.  And, if you see
 too many consecutive statements with UID 0 (more than some constant, 32?),
 just reassign UIDs to the whole bb.  Or you could initially assign UIDs
 with some gaps, then keep filling those gaps and once you fill them,
 renumber again with new gaps.
 Jakub

Yes, this is pretty much what I was proposing. The current
implementation doesn't rely on UIDs being unique - they only have to
be monotonically non-decreasing. So, when a UID of 0 is encountered,
go up till a non-zero UID is found and then go down and assign this
non-zero UID. This effectively implies that each UID-0 stmt is visited
at most twice. I don't think we need to check if I see more than
certain number of UID-0 stmts and redo the entire BB.

- Easwaran

Re: [Google] Refine hot caller heuristic

2013-08-29 Thread Easwaran Raman

On Tue, Aug 20, 2013 at 9:35 PM, Xinliang David Li davi...@google.com wrote:
 Do you need to guard the jump function access with check if
 (ipa_node_params_vector.exists ())?
I believe it is not necessary since, for example, ipa_analyze_node
calls ipa_check_create_node_params that calls create. But I see it is
used in ipa-inline-analysis.c everywhere. So I have added a check and
conservatively return false.


 Ideally, useful_cold_callee should be folded into the inline hints
 estimation.  Question about the heuristic: why filtering out
 PASS_THROUGH parameter cases completely? Passing 'this' parameter in
 many cases can result in good PRE opportunities.  Why not skip the
 unknown type?

The rationale is it is useful to inline bar into foo in the snippet below:

void foo ()
{
  A a;
  bar(a);
  ...
}

Capturing this requires UNKNOWN and KNOWN_TYPE jump functions. I have
changed the check accordingly. I have attached the new patch.

- Easwaran

 David

 On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman era...@google.com wrote:
 The current hot caller heuristic simply promotes edges whose caller is
 hot. This patch does the following:
 * Turn it off for applications with large footprint since the size
 increase hurts them
 * Be more selective by considering arguments to callee when the
 heuristic is enabled.

 This performs well on internal benchmarks. Ok for google/4_8 branch if
 all tests pass?

 - Easwaran


hot_caller.patch
Description: Binary data

Re: [Google] Refine hot caller heuristic

2013-08-29 Thread Easwaran Raman

On Wed, Aug 21, 2013 at 6:47 AM, Teresa Johnson tejohn...@google.com wrote:
 +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means
 +   it is always applied, and 2 means it is applied only if the footprint is
 +   smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD.  */
  DEFPARAM (PARAM_INLINE_HOT_CALLER,
 inline-hot-caller,
 Consider cold callsites for inlining if caller contains hot code,
 +   2, 0, 2)
 +
 +/* The maximum code size estimate under which hot caller heuristic is
 +   applied.  */
 +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD,
 + hot-caller-codesize-threshold,
 + Maximum profile-based code size footprint estimate for 
 + hot caller heuristic  ,
 + 1, 0, 0)

 Out of curiousity, how sensitive is performance to the value of this
 parameter? I.e. is there a clear cutoff for the codes that benefit
 from disabling this inlining vs those that benefit from enabling it?

 Also, have you tried spec2006? I remember that the codesize of the gcc
 benchmark was above the larger 15000 threshold I use for tuning down
 unrolling/peeling, and I needed to refine my heuristics to identify
 profitable loops to unroll/peel even in the case of large codesize.
 I'm not sure if there are more benchmarks that will be above the
 smaller 10K threshold.
The cutoff works well for internal benchmarks, but there is a wide
range to set the value. I ran spec2006 and gcc benchmark does well.
However, there are two c++ benchmarks (which fall below the cutoff)
that degrades if I exclude PASS_THRU. I'll investigate that later.


 +
 +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE,
 +   inline-useful-cold-callee,
 +   Consider cold callsites for inlining if caller contains hot code,
 1, 0, 1)

 The description of this param is wrong (it is the same as the
 description of PARAM_INLINE_HOT_CALLER). It should probably be
 something like
 Only consider cold callsites for inlining if analysis finds
 optimization opportunities
Thanks. Fixed.



  /* Limit of iterations of early inliner.  This basically bounds number of
 Index: gcc/ipa-inline.c
 ===
 --- gcc/ipa-inline.c  (revision 201768)
 +++ gcc/ipa-inline.c  (working copy)
 @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e)
return false;
  }

 +/* Returns true if callee of edge E is considered useful to inline
 +   even if it is cold. A callee is considered useful if there is at
 +   least one argument of pointer type that is not a pass-through.  */

 Can you expand this comment a bit to add why such arguments indicate
 useful inlining?
Added comments.

- Easwaran

 Thanks,
 Teresa

 +
 +static inline bool
 +useful_cold_callee (struct cgraph_edge *e)
 +{
 +  gimple call = e-call_stmt;
 +  int n, arg_num = gimple_call_num_args (call);
 +  struct ipa_edge_args *args = IPA_EDGE_REF (e);
 +
 +  for (n = 0; n  arg_num; n++)
 +{
 +  tree arg = gimple_call_arg (call, n);
 +  if (POINTER_TYPE_P (TREE_TYPE (arg)))
 +{
 +   struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n);
 +   if (jfunc-type != IPA_JF_PASS_THROUGH)
 +return true;
 +}
 +}
 +  return false;
 +}
 +
 +/* Returns true if hot caller heuristic should be used.  */
 +
 +static inline bool
 +enable_hot_caller_heuristic (void)
 +{
 +
 +  gcov_working_set_t *ws = NULL;
 +  int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD);
 +  int num_counters = 0;
 +  int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER);
 +
 +  if (param_inline_hot_caller == 0)
 +return false;
 +  else if (param_inline_hot_caller == 1)
 +return true;
 +
 +  ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE));
 +  if (!ws)
 +return false;
 +  num_counters = ws-num_counters;
 +  return num_counters = size_threshold;
 +
 +}
  /* Returns true if an edge or its caller are hot enough to
 be considered for inlining.  */

  static bool
  edge_hot_enough_p (struct cgraph_edge *edge)
  {
 +  static bool use_hot_caller_heuristic = enable_hot_caller_heuristic ();
if (cgraph_maybe_hot_edge_p (edge))
  return true;

 @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge)
if (flag_auto_profile  edge-callee-count == 0
 edge-callee-max_bb_count  0)
  return false;
 -  if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER)
 -   maybe_hot_count_p (NULL, edge-caller-max_bb_count))
 -return true;
 +  if (use_hot_caller_heuristic)
 +{
 +  struct cgraph_node *where = edge-caller;
 +  if (maybe_hot_count_p (NULL, where-max_bb_count))
 +{
 +  if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE))
 +return useful_cold_callee (edge);
 +  else
 +return true;
 +}
 +}
return false;
  }

 On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman era...@google.com wrote:
 The current hot caller heuristic simply promotes edges whose

Re: Fix PR middle-end/57370

2013-08-29 Thread Easwaran Raman

Richard,
Do you want me to commit everything but the  insert_stmt_after hunk
now? There are more similar failures reported in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached
the updated patch there. Shall I send that for review? Apart from the
debug statement issue, almost all the bugs are due to dependence
violation because certain newly inserted statements do not have the
right UID. Instead of trying to catch all of them, will it be better
if I check if the stmt has a proper uid (non-zero if it is not the
first stmt) and assign a sensible value at the point where it is used
(find_insert_point and appears_later_in_bb) instead of where the stmt
is created? I think that would be less brittle.

- Easwaran



On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.

 Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
 at all.  The other hunks are ok, but we need to work harder to preserve
 debug stmts - simply removing all is not going to fly.

 Richard.


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
 phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525 + 
 +t5531 + t5533 + t5535 + t5537 + 
 +t5540
 +t5565 = t112 * t1685
 +t5575 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564 - 
 +0.80e1_dp * t5565 + t5568 + t5572 + 
 +t5574
 +t5611 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t5613 = t5469 + t5541 + t5575 + t5611
 +t6223 = t6189 - 
 +0.36e0_dp  * t83 * t84 * t5613 + 
 +t6222
 +t6227 = - t8 * (t5305 + t6223)
 +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii

[Google] Refine hot caller heuristic

2013-08-20 Thread Easwaran Raman

The current hot caller heuristic simply promotes edges whose caller is
hot. This patch does the following:
* Turn it off for applications with large footprint since the size
increase hurts them
* Be more selective by considering arguments to callee when the
heuristic is enabled.

This performs well on internal benchmarks. Ok for google/4_8 branch if
all tests pass?

- Easwaran


hot_caller.patch
Description: Binary data

Re: Fix PR middle-end/57370

2013-08-07 Thread Easwaran Raman

Ping.

On Wed, Jul 31, 2013 at 4:34 PM, Easwaran Raman era...@google.com wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.


 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  era...@google.com

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
 phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525 + 
 +t5531 + t5533 + t5535 + t5537 + 
 +t5540
 +t5565 = t112 * t1685
 +t5575 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564 - 
 +0.80e1_dp * t5565 + t5568 + t5572 + 
 +t5574
 +t5611 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t5613 = t5469 + t5541 + t5575 + t5611
 +t6223 = t6189 - 
 +0.36e0_dp  * t83 * t84 * t5613 + 
 +t6222
 +t6227 = - t8 * (t5305 + t6223)
 +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + 
 + t6227 * sx
 +t6352 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 + 
 +0.40e1_dp * t102  * t327 * t2057 * t557 - 
 +t5456 + t5459 - t5462 + t5466 - 
 +t5468
 +t6363 = t5480 - t5489 + 
 +0.96e2_dp  * t1054 * t640 * t3679
 +t6367 = t5472 - t5474 - t5478 - t5523 + 
 +t5525 + t5531 + t5533 + t5535 + 
 +t5537 - 0.20e1_dp * t102 * t105 * t6363 + 
 +t5540
 +t6370 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564  - 
 +0.40e1_dp * t5565 + 
 +t5568 + t5572 + t5574
 +t6373 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t6375 = t6352 + t6367 + t6370 + t6373
 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701

Re: Fix PR middle-end/57370

2013-07-31 Thread Easwaran Raman

I have a new patch that supersedes this. The new patch also fixes PR
tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
no test regression on x86_64/linux. Ok for trunk?

2013-07-31  Easwaran Raman  era...@google.com

PR middle-end/57370
* tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
of debug statements that cause inconsistent IR.


testsuite/ChangeLog:
2013-07-31  Easwaran Raman  era...@google.com

PR middle-end/57370
PR tree-optimization/57393
PR tree-optimization/58011
* gfortran.dg/reassoc_12.f90: New testcase.
* gcc.dg/tree-ssa/reassoc-31.c: New testcase.
* gcc.dg/tree-ssa/reassoc-31.c: New testcase.


On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525 + 
 +t5531 + t5533 + t5535 + t5537 + 
 +t5540
 +t5565 = t112 * t1685
 +t5575 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564 - 
 +0.80e1_dp * t5565 + t5568 + t5572 + 
 +t5574
 +t5611 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t5613 = t5469 + t5541 + t5575 + t5611
 +t6223 = t6189 - 
 +0.36e0_dp  * t83 * t84 * t5613 + 
 +t6222
 +t6227 = - t8 * (t5305 + t6223)
 +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + 
 + t6227 * sx
 +t6352 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 + 
 +0.40e1_dp * t102  * t327 * t2057 * t557 - 
 +t5456 + t5459 - t5462 + t5466 - 
 +t5468
 +t6363 = t5480 - t5489 + 
 +0.96e2_dp  * t1054 * t640 * t3679
 +t6367 = t5472 - t5474 - t5478 - t5523 + 
 +t5525 + t5531 + t5533 + t5535 + 
 +t5537 - 0.20e1_dp * t102 * t105 * t6363 + 
 +t5540
 +t6370 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564  - 
 +0.40e1_dp * t5565 + 
 +t5568 + t5572 + t5574
 +t6373 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t6375 = t6352 + t6367 + t6370 + t6373
 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665)
 +e_ndrho_ndrho_ndrho(ii

Re: Fix PR middle-end/57370

2013-07-12 Thread Easwaran Raman

Ping.

On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote:
 A newly generated statement in build_and_add_sum  function of
 tree-ssa-reassoc.c has to be assigned the UID of its adjacent
 statement. In one instance, it was assigned the wrong uid (of an
 earlier phi statement) which messed up the IR and caused the test
 program to hang. Bootstraps and no test regressions on x86_64/linux.
 Ok for trunk?

 Thanks,
 Easwaran


 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
 node for a non-phi gimple statement.

 testsuite/ChangeLog:
 2013-06-27  Easwaran Raman  era...@google.com

 PR middle-end/57370
 * gfortran.dg/reassoc_12.f90: New testcase.


 Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
 ===
 --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
 @@ -0,0 +1,74 @@
 +! { dg-do compile }
 +! { dg-options -O2 -ffast-math }
 +! PR middle-end/57370
 +
 + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
 +   grad_deriv,npoints, sx)
 +IMPLICIT REAL*8 (t)
 +INTEGER, PARAMETER :: dp=8
 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
 +   e_ndrho_ndrho_rho
 +  DO ii=1,npoints
 +  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
 +t1425 = t233 * t557
 +t1429 = beta * t225
 +t1622 = t327 * t1621
 +t1626 = t327 * t1625
 +t1632 = t327 * t1631
 +t1685 = t105 * t1684
 +t2057 = t1636 + t8 * (t2635 + t3288)
 +  END IF
 +  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
 +t5469 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 - t5456 + t5459  - 
 +t5462 + t5466 - t5468
 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
 +t5489 = 0.16e2_dp * t1429 * t1658
 +t5531 = 0.160e2_dp * t112 * t1626
 +t5533 = 0.160e2_dp * t112 * t1632
 +t5537 = 0.160e2_dp * t112 * t1622
 +t5541 = t5472 - t5478 - t5523 + t5525 + 
 +t5531 + t5533 + t5535 + t5537 + 
 +t5540
 +t5565 = t112 * t1685
 +t5575 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564 - 
 +0.80e1_dp * t5565 + t5568 + t5572 + 
 +t5574
 +t5611 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t5613 = t5469 + t5541 + t5575 + t5611
 +t6223 = t6189 - 
 +0.36e0_dp  * t83 * t84 * t5613 + 
 +t6222
 +t6227 = - t8 * (t5305 + t6223)
 +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + 
 + t6227 * sx
 +t6352 = t5440 - t5443 - t5446 - t5449 - 
 +t5451 - t5454 + 
 +0.40e1_dp * t102  * t327 * t2057 * t557 - 
 +t5456 + t5459 - t5462 + t5466 - 
 +t5468
 +t6363 = t5480 - t5489 + 
 +0.96e2_dp  * t1054 * t640 * t3679
 +t6367 = t5472 - t5474 - t5478 - t5523 + 
 +t5525 + t5531 + t5533 + t5535 + 
 +t5537 - 0.20e1_dp * t102 * t105 * t6363 + 
 +t5540
 +t6370 = t5545 - t5548 + t5551 + t5553 - 
 +t5558 + t5560 - t5562 + t5564  - 
 +0.40e1_dp * t5565 + 
 +t5568 + t5572 + t5574
 +t6373 = t5579 - t5585 + t5590 - t5595 + 
 +t5597 - t5602 + t5604 + t5607 + 
 +t5610
 +t6375 = t6352 + t6367 + t6370 + t6373
 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665)
 +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + 
 + t6669 * sx
 +  END IF
 +  END DO
 +  END SUBROUTINE xb88_lr_adiabatic_lda_calc
 +
 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c(revision 200429)
 +++ gcc/tree-ssa-reassoc.c(working copy)
 @@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2,
if (gimple_code (op1def) == GIMPLE_PHI)
  {
gsi = gsi_after_labels (gimple_bb (op1def));
 -   gimple_set_uid (sum, gimple_uid (op1def));
 +  gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi)));
gsi_insert_before (gsi, sum, GSI_NEW_STMT);
  }
else

Fix PR middle-end/57370

2013-06-27 Thread Easwaran Raman

A newly generated statement in build_and_add_sum  function of
tree-ssa-reassoc.c has to be assigned the UID of its adjacent
statement. In one instance, it was assigned the wrong uid (of an
earlier phi statement) which messed up the IR and caused the test
program to hang. Bootstraps and no test regressions on x86_64/linux.
Ok for trunk?

Thanks,
Easwaran


2013-06-27  Easwaran Raman  era...@google.com

PR middle-end/57370
* tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi
node for a non-phi gimple statement.

testsuite/ChangeLog:
2013-06-27  Easwaran Raman  era...@google.com

PR middle-end/57370
* gfortran.dg/reassoc_12.f90: New testcase.


Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
===
--- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
+++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
@@ -0,0 +1,74 @@
+! { dg-do compile }
+! { dg-options -O2 -ffast-math }
+! PR middle-end/57370
+
+ SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, 
+   grad_deriv,npoints, sx)
+IMPLICIT REAL*8 (t)
+INTEGER, PARAMETER :: dp=8
+REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, 
+   e_ndrho_ndrho_rho
+  DO ii=1,npoints
+  IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN
+t1425 = t233 * t557
+t1429 = beta * t225
+t1622 = t327 * t1621
+t1626 = t327 * t1625
+t1632 = t327 * t1631
+t1685 = t105 * t1684
+t2057 = t1636 + t8 * (t2635 + t3288)
+  END IF
+  IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN
+t5469 = t5440 - t5443 - t5446 - t5449 - 
+t5451 - t5454 - t5456 + t5459  - 
+t5462 + t5466 - t5468
+t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425
+t5489 = 0.16e2_dp * t1429 * t1658
+t5531 = 0.160e2_dp * t112 * t1626
+t5533 = 0.160e2_dp * t112 * t1632
+t5537 = 0.160e2_dp * t112 * t1622
+t5541 = t5472 - t5478 - t5523 + t5525 + 
+t5531 + t5533 + t5535 + t5537 + 
+t5540
+t5565 = t112 * t1685
+t5575 = t5545 - t5548 + t5551 + t5553 - 
+t5558 + t5560 - t5562 + t5564 - 
+0.80e1_dp * t5565 + t5568 + t5572 + 
+t5574
+t5611 = t5579 - t5585 + t5590 - t5595 + 
+t5597 - t5602 + t5604 + t5607 + 
+t5610
+t5613 = t5469 + t5541 + t5575 + t5611
+t6223 = t6189 - 
+0.36e0_dp  * t83 * t84 * t5613 + 
+t6222
+t6227 = - t8 * (t5305 + t6223)
+e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + 
+ t6227 * sx
+t6352 = t5440 - t5443 - t5446 - t5449 - 
+t5451 - t5454 + 
+0.40e1_dp * t102  * t327 * t2057 * t557 - 
+t5456 + t5459 - t5462 + t5466 - 
+t5468
+t6363 = t5480 - t5489 + 
+0.96e2_dp  * t1054 * t640 * t3679
+t6367 = t5472 - t5474 - t5478 - t5523 + 
+t5525 + t5531 + t5533 + t5535 + 
+t5537 - 0.20e1_dp * t102 * t105 * t6363 + 
+t5540
+t6370 = t5545 - t5548 + t5551 + t5553 - 
+t5558 + t5560 - t5562 + t5564  - 
+0.40e1_dp * t5565 + 
+t5568 + t5572 + t5574
+t6373 = t5579 - t5585 + t5590 - t5595 + 
+t5597 - t5602 + t5604 + t5607 + 
+t5610
+t6375 = t6352 + t6367 + t6370 + t6373
+t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
+t6669 = -t4704 - t8 * (t6344 + t6380 + t6665)
+e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + 
+ t6669 * sx
+  END IF
+  END DO
+  END SUBROUTINE xb88_lr_adiabatic_lda_calc
+
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c(revision 200429)
+++ gcc/tree-ssa-reassoc.c(working copy)
@@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2,
   if (gimple_code (op1def) == GIMPLE_PHI)
 {
   gsi = gsi_after_labels (gimple_bb (op1def));
-   gimple_set_uid (sum, gimple_uid (op1def));
+  gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi)));
   gsi_insert_before (gsi, sum, GSI_NEW_STMT);
 }
   else

[google gcc-4_8] Change size accounting during inlining in lipo mode

2013-06-20 Thread Easwaran Raman

In lipo mode, this patch updates the overall unit size only when the
eventual function to which the callee is inlined is in primary module.
This is to avoid the situation where the module growth budget is used
up by inlines into auxiliary module functions that never get inlined
into some primary module function. Ok for google/gcc-4_8 branch?

Thanks,
Easwaran


lipo_inline.patch
Description: Binary data

Re: PR tree-optimization/57337

2013-05-28 Thread Easwaran Raman

On Tue, May 28, 2013 at 4:11 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, May 27, 2013 at 10:20 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Sun, May 26, 2013 at 5:53 AM, Easwaran Raman era...@google.com wrote:
 On Sat, May 25, 2013 at 4:46 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 Easwaran Raman era...@google.com wrote:

In that case, if my insert_stmt immediately follows dep_stmt and both
have the same UID, not_dominated_by would return true and I will end
up updating insert_stmt to dep_stmt which is wrong.

 But there should be a safe default answer for
 Equal uids. Unless we are asking different questions in different places.
 Thus, I do not like the stmt walking but rather have a safe fallback.

 I am lost here. I don't see how we could avoid doing the stmt walking
 to resolve the equal uid case. How to ensure that not_dominated_by (a,
 b) returns true and not_dominated_by (b, a) returns false if A and B
 have the same UID and A appears before B without doing the statement
 walk. And, I don't see why the statement walk is bad. It is not likely
 that there is a long sequence of statements with the same UID.

 Sure, but if you are always asking a question like is placing X before Y 
 ok?
 then you can conservatively answer no and code should handle that ok.
 If you are asking questions both way then of course no conservative answer is
 possible.  Both current uses of not_dominated_by are of the same kind,
 if the placement is not ok then either the insert point needs adjustment or
 the debug stmt reset.

 Ok, thinking about this more I come to the conclusion that a safe default for
 equal UIDs is not possible as we may be not able to order two dep_stmts.
 I still dislike walking though, but to get the quite frequent regressions 
 fixed
 the patch is ok as-is.

 Please install.

 Thanks,
 Richard.

Thanks. Submitted the patch at r199385.

- Easwaran

Fix PR 57442

2013-05-28 Thread Easwaran Raman

I made a thinko when I asserted gcc_unreachable outside the main loop
of appears_later_in_bb in my previous fix to PR 57337. It is quite
possible for STMT1 to be followed by a one or more statements with the
same UID till the end of the BB without encountering STMT2. Right fix
is to return STMT1 outside the loop. Ok for trunk?

Thanks,
Easwaran

2013-05-28  Easwaran Raman  era...@google.com

PR tree-optimization/57442
* tree-ssa-reassoc.c (appears_later_in_bb): Return correct value
when control exits the main loop.

2013-05-28  Easwaran Raman  era...@google.com

PR tree-optimization/57442
* gcc.dg/tree-ssa/reassoc-30.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0)
@@ -0,0 +1,13 @@
+/* PR tree-optimization/57442 */
+/* { dg-do compile } */
+/* { dg-options -O1 } */
+short a;
+unsigned b;
+long c;
+int d;
+
+void f(void)
+{
+b = a ? : (a = b) - c + (d - (b + b));
+}
+
Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 199385)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2888,7 +2888,7 @@ appears_later_in_bb (gimple stmt1, gimple stmt2)
   else if (stmt == stmt2)
 return stmt2;
 }
-  gcc_unreachable ();
+  return stmt1;
 }

 /* Find the statement after which STMT must be moved so that the

Re: PR tree-optimization/57337

2013-05-25 Thread Easwaran Raman

On Sat, May 25, 2013 at 4:46 AM, Richard Biener
richard.guent...@gmail.com wrote:
 Easwaran Raman era...@google.com wrote:

In that case, if my insert_stmt immediately follows dep_stmt and both
have the same UID, not_dominated_by would return true and I will end
up updating insert_stmt to dep_stmt which is wrong.

 But there should be a safe default answer for
 Equal uids. Unless we are asking different questions in different places.
 Thus, I do not like the stmt walking but rather have a safe fallback.

I am lost here. I don't see how we could avoid doing the stmt walking
to resolve the equal uid case. How to ensure that not_dominated_by (a,
b) returns true and not_dominated_by (b, a) returns false if A and B
have the same UID and A appears before B without doing the statement
walk. And, I don't see why the statement walk is bad. It is not likely
that there is a long sequence of statements with the same UID.

Thanks,
Easwaran


 Richard.

- Easwaran

On Fri, May 24, 2013 at 1:07 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman era...@google.com
wrote:
 This addresses the case where UID alone is not sufficient to figure
 out which statement appears earlier in  a BB. Bootstraps and no test
 regressions in x86_64 on linux. Ok for trunk?

 Why not simply conservatively use gimple_uid (a) = gimple_uid (b)
 in not_dominated_by?

 Richard.



 Thanks,
 Easwaran


 2013-05-23  Easwaran Raman  era...@google.com

 PR tree-optimization/57337
 * tree-ssa-reassoc.c (appears_later_in_bb): New function.
 (find_insert_point): Correctly identify the insertion point
 when two statements with the same UID is compared.

 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c  (revision 199211)
 +++ gcc/tree-ssa-reassoc.c  (working copy)
 @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)

  }

 +/* Among STMT1 and STMT2, return the statement that appears later.
Both
 +   statements are in same BB and have the same UID.  */
 +
 +static gimple
 +appears_later_in_bb (gimple stmt1, gimple stmt2)
 +{
 +  unsigned uid = gimple_uid (stmt1);
 +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
 +  gsi_next (gsi);
 +  if (gsi_end_p (gsi))
 +return stmt1;
 +  for (; !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  /* If STMT has a different UID than STMT1 and we haven't seen
 + STMT2 during traversal, we know STMT1 appears later.  */
 +  if (gimple_uid (stmt) != uid)
 +return stmt1;
 +  else if (stmt == stmt2)
 +return stmt2;
 +}
 +  gcc_unreachable ();
 +}
 +
  /* Find the statement after which STMT must be moved so that the
 dependency from DEP_STMT to STMT is maintained.  */

 @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple
dep_stmt)
gimple insert_stmt = stmt;
if (dep_stmt == NULL)
  return stmt;
 -  if (not_dominated_by (insert_stmt, dep_stmt))
 +  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
 +   gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
 +   insert_stmt != dep_stmt)
 +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
 +  else if (not_dominated_by (insert_stmt, dep_stmt))
  insert_stmt = dep_stmt;
return insert_stmt;
  }

Re: PR tree-optimization/57337

2013-05-24 Thread Easwaran Raman

In that case, if my insert_stmt immediately follows dep_stmt and both
have the same UID, not_dominated_by would return true and I will end
up updating insert_stmt to dep_stmt which is wrong.

- Easwaran

On Fri, May 24, 2013 at 1:07 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman era...@google.com wrote:
 This addresses the case where UID alone is not sufficient to figure
 out which statement appears earlier in  a BB. Bootstraps and no test
 regressions in x86_64 on linux. Ok for trunk?

 Why not simply conservatively use gimple_uid (a) = gimple_uid (b)
 in not_dominated_by?

 Richard.



 Thanks,
 Easwaran


 2013-05-23  Easwaran Raman  era...@google.com

 PR tree-optimization/57337
 * tree-ssa-reassoc.c (appears_later_in_bb): New function.
 (find_insert_point): Correctly identify the insertion point
 when two statements with the same UID is compared.

 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c  (revision 199211)
 +++ gcc/tree-ssa-reassoc.c  (working copy)
 @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)

  }

 +/* Among STMT1 and STMT2, return the statement that appears later. Both
 +   statements are in same BB and have the same UID.  */
 +
 +static gimple
 +appears_later_in_bb (gimple stmt1, gimple stmt2)
 +{
 +  unsigned uid = gimple_uid (stmt1);
 +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
 +  gsi_next (gsi);
 +  if (gsi_end_p (gsi))
 +return stmt1;
 +  for (; !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +
 +  /* If STMT has a different UID than STMT1 and we haven't seen
 + STMT2 during traversal, we know STMT1 appears later.  */
 +  if (gimple_uid (stmt) != uid)
 +return stmt1;
 +  else if (stmt == stmt2)
 +return stmt2;
 +}
 +  gcc_unreachable ();
 +}
 +
  /* Find the statement after which STMT must be moved so that the
 dependency from DEP_STMT to STMT is maintained.  */

 @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt)
gimple insert_stmt = stmt;
if (dep_stmt == NULL)
  return stmt;
 -  if (not_dominated_by (insert_stmt, dep_stmt))
 +  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
 +   gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
 +   insert_stmt != dep_stmt)
 +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
 +  else if (not_dominated_by (insert_stmt, dep_stmt))
  insert_stmt = dep_stmt;
return insert_stmt;
  }

PR tree-optimization/57337

2013-05-23 Thread Easwaran Raman

This addresses the case where UID alone is not sufficient to figure
out which statement appears earlier in  a BB. Bootstraps and no test
regressions in x86_64 on linux. Ok for trunk?

Thanks,
Easwaran


2013-05-23  Easwaran Raman  era...@google.com

PR tree-optimization/57337
* tree-ssa-reassoc.c (appears_later_in_bb): New function.
(find_insert_point): Correctly identify the insertion point
when two statements with the same UID is compared.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 199211)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b)

 }

+/* Among STMT1 and STMT2, return the statement that appears later. Both
+   statements are in same BB and have the same UID.  */
+
+static gimple
+appears_later_in_bb (gimple stmt1, gimple stmt2)
+{
+  unsigned uid = gimple_uid (stmt1);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt1);
+  gsi_next (gsi);
+  if (gsi_end_p (gsi))
+return stmt1;
+  for (; !gsi_end_p (gsi); gsi_next (gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+
+  /* If STMT has a different UID than STMT1 and we haven't seen
+ STMT2 during traversal, we know STMT1 appears later.  */
+  if (gimple_uid (stmt) != uid)
+return stmt1;
+  else if (stmt == stmt2)
+return stmt2;
+}
+  gcc_unreachable ();
+}
+
 /* Find the statement after which STMT must be moved so that the
dependency from DEP_STMT to STMT is maintained.  */

@@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt)
   gimple insert_stmt = stmt;
   if (dep_stmt == NULL)
 return stmt;
-  if (not_dominated_by (insert_stmt, dep_stmt))
+  if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt)
+   gimple_bb (insert_stmt) == gimple_bb (dep_stmt)
+   insert_stmt != dep_stmt)
+insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt);
+  else if (not_dominated_by (insert_stmt, dep_stmt))
 insert_stmt = dep_stmt;
   return insert_stmt;
 }

Re: Fix PR tree-optimization/57322

2013-05-20 Thread Easwaran Raman

If you are suggesting using inc_gimple_stmt_max_uid(cfun) for all
statements, that won't work because we want to use the UIDs to
determine dominance within a BB. If your suggestion is to use that
instead of 1 when BB == NULL, that would work (even though setting it
to 1 is sufficient.)

Thanks,
Easwaran


On Mon, May 20, 2013 at 4:43 AM, Steven Bosscher stevenb@gmail.com wrote:
 On Mon, May 20, 2013 at 1:41 AM, Easwaran Raman era...@google.com wrote:
 PR tree-optimization/57322
 * (build_and_add_sum): If a BB is empty, set the UID of the statement
 added to the BB to be 1.

 Maybe  gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun));? That
 would keep the UIDs unique except for the copied UIDs.

 Ciao!
 Steven

Fix PR tree-optimization/57322

2013-05-19 Thread Easwaran Raman

The UID of a newly generated statement in build_and_add_sum is set to
that of an adjacent statement in the BB. This is a problem in one case
where the BB is empty. This fix sets the UID to be 1 if the BB is
empty. Bootstraps and no test regressions on x86_64 . OK for trunk?

Thanks,
Easwaran

---

2013-05-19   Easwaran Raman  era...@google.com

PR tree-optimization/57322
* (build_and_add_sum): If a BB is empty, set the UID of the statement
added to the BB to be 1.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 199048)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -1165,8 +1165,12 @@ build_and_add_sum (tree type, tree op1, tree op2,
   if ((!op1def || gimple_nop_p (op1def))
(!op2def || gimple_nop_p (op2def)))
 {
+  gimple first_stmt;
+  unsigned uid;
   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR));
-  gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi)));
+  first_stmt = gsi_stmt (gsi);
+  uid = first_stmt ? gimple_uid (first_stmt) : 1;
+  gimple_set_uid (sum, uid);
   gsi_insert_before (gsi, sum, GSI_NEW_STMT);
 }
   else if ((!op1def || gimple_nop_p (op1def))

Re: Minimize downward code motion during reassociation

2013-05-17 Thread Easwaran Raman

On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman era...@google.com wrote:
 It seems I need to reset the debug uses of a statement before moving
 the statement itself. The attached patch starts from the leaf to root
 of the tree to be reassociated and places them at the point where
 their dependences will be met after reassociation. This bootstraps and
 I am running the tests. Ok if there are no test failures?

 +  if ((dep_bb != insert_bb
 +!dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb))
 +  || (dep_bb == insert_bb
 +   gimple_uid (insert_stmt)  gimple_uid (dep_stmt)))
 +{
 +  insert_stmt = dep_stmt;
 +}

 superfluous {}
Removed.

 +  /* If there are any debug uses of LHS, reset them.  */
 +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
 +{
 +  if (is_gimple_debug (use_stmt))
 +{
 +  gimple_debug_bind_reset_value (use_stmt);

 that's only needed for debug stmts that are not dominated by insert_point, no?
You are right. I have added the check.


 +  /* Statements marked for throw can not be in the middle of a basic block. 
 So
 + we can not insert a statement (not marked for throw) immediately
 after.  */
 +  else if (stmt_can_throw_internal (insert_point))
 +{

 use stmt_ends_bb_p (insert_point) instead

 +  edge succ_edge = find_fallthru_edge (insert_bb-succs);
 +  insert_bb = succ_edge-dest;
 +  /* Insert STMT at the beginning of the successor basic block.  */
 +  gsi_insert = gsi_after_labels (insert_bb);
 +  gsi_move_before (gsistmt, gsi_insert);

 are you sure this is a proper insert location?  How would you even arrive in
 this situation given find_insert_point dominance checks?

Let's say the last statement in a BB which can throw feeds into a
statement that is to be reassociated. Then the insert point would be
this statement. Instead, you could always insert at the beginning of
its (lone) successor. This is independent of the the find_insert_point
checks.


 Otherwise the patch looks ok - can you add one or two testcases please?
I have added a test case and submitted at r199048.

Thanks,
Easwaran



 Thanks,
 Richard.

 Thanks,
 Easwaran

 2012-12-07   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(find_insert_point): New function.
 (insert_stmt_after): Likewise.
 (get_def_stmt): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.
 (build_and_add_sum): Assign UIDs for new statements.
 (linearize_expr): Likewise.
 (do_reassoc): Renumber gimple statement UIDs.



 On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote:
 I am unable to figure out the right way to handle the debug
 statements. What I tried was to find debug statements that use the SSA
 name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
 and then moved them as well at the right place. Thus, if I have to
 move t1 = a + b down (after the definition of 'd'), I also moved all
 debug statements that use t1 after the new position of t1. That still
 caused use-before-def problems in ssa_verify. I noticed that the debug
 statements got modified behind the scenes causing these issues. Any
 hints on what is the right way to handle the debug statements would be
 very helpful.

 I think you cannot (and should not) move debug statements.  Instead you
 have to invalidate them.  Otherwise you'll introduce confusion as debug
 info cannot handle overlapping live ranges.

 But maybe Alex can clarify.

 Richard.

Print column information in dump_loc

2013-05-14 Thread Easwaran Raman

This patch dumps the column number as part of dump_loc making the
output similar to inform(). This allows these messages to be pattern
matched by dg_message. Bootstraps with this change. Ok for trunk?

- Easwaran
-

2013-05-14  Easwaran Raman  era...@google.com

* dumpfile.c (dump_loc): Print column number.

Index: gcc/dumpfile.c
===
--- gcc/dumpfile.c (revision 198852)
+++ gcc/dumpfile.c (working copy)
@@ -261,12 +261,13 @@ dump_loc (int dump_kind, FILE *dfile, source_locat
   if (dump_kind)
 {
   if (LOCATION_LOCUS (loc)  BUILTINS_LOCATION)
-fprintf (dfile, \n%s:%d: note: , LOCATION_FILE (loc),
- LOCATION_LINE (loc));
+fprintf (dfile, \n%s:%d:%d: note: , LOCATION_FILE (loc),
+ LOCATION_LINE (loc), LOCATION_COLUMN (loc));
   else if (current_function_decl)
-fprintf (dfile, \n%s:%d: note: ,
+fprintf (dfile, \n%s:%d:%d: note: ,
  DECL_SOURCE_FILE (current_function_decl),
- DECL_SOURCE_LINE (current_function_decl));
+ DECL_SOURCE_LINE (current_function_decl),
+ DECL_SOURCE_COLUMN (current_function_decl));
 }
 }

Re: Minimize downward code motion during reassociation

2013-04-24 Thread Easwaran Raman

I want to resend this patch for consideration. I applied the patch to
trunk and confirmed that it bootstraps and doesn't cause test
regressions. Is this ok for trunk?

Thanks,
Easwaran

On Fri, Dec 7, 2012 at 12:01 PM, Easwaran Raman era...@google.com wrote:
 It seems I need to reset the debug uses of a statement before moving
 the statement itself. The attached patch starts from the leaf to root
 of the tree to be reassociated and places them at the point where
 their dependences will be met after reassociation. This bootstraps and
 I am running the tests. Ok if there are no test failures?

 Thanks,
 Easwaran

 2012-12-07   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(find_insert_point): New function.
 (insert_stmt_after): Likewise.
 (get_def_stmt): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.
 (build_and_add_sum): Assign UIDs for new statements.
 (linearize_expr): Likewise.
 (do_reassoc): Renumber gimple statement UIDs.



 On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote:
 I am unable to figure out the right way to handle the debug
 statements. What I tried was to find debug statements that use the SSA
 name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
 and then moved them as well at the right place. Thus, if I have to
 move t1 = a + b down (after the definition of 'd'), I also moved all
 debug statements that use t1 after the new position of t1. That still
 caused use-before-def problems in ssa_verify. I noticed that the debug
 statements got modified behind the scenes causing these issues. Any
 hints on what is the right way to handle the debug statements would be
 very helpful.

 I think you cannot (and should not) move debug statements.  Instead you
 have to invalidate them.  Otherwise you'll introduce confusion as debug
 info cannot handle overlapping live ranges.

 But maybe Alex can clarify.

 Richard.

Re: Minimize downward code motion during reassociation

2012-12-07 Thread Easwaran Raman

It seems I need to reset the debug uses of a statement before moving
the statement itself. The attached patch starts from the leaf to root
of the tree to be reassociated and places them at the point where
their dependences will be met after reassociation. This bootstraps and
I am running the tests. Ok if there are no test failures?

Thanks,
Easwaran

2012-12-07   Easwaran Raman  era...@google.com
* tree-ssa-reassoc.c(find_insert_point): New function.
(insert_stmt_after): Likewise.
(get_def_stmt): Likewise.
(ensure_ops_are_available): Likewise.
(rewrite_expr_tree): Do not move statements beyond what is
necessary. Remove call to swap_ops_for_binary_stmt...
(reassociate_bb): ... and move it here.
(build_and_add_sum): Assign UIDs for new statements.
(linearize_expr): Likewise.
(do_reassoc): Renumber gimple statement UIDs.



On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote:
 I am unable to figure out the right way to handle the debug
 statements. What I tried was to find debug statements that use the SSA
 name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
 and then moved them as well at the right place. Thus, if I have to
 move t1 = a + b down (after the definition of 'd'), I also moved all
 debug statements that use t1 after the new position of t1. That still
 caused use-before-def problems in ssa_verify. I noticed that the debug
 statements got modified behind the scenes causing these issues. Any
 hints on what is the right way to handle the debug statements would be
 very helpful.

 I think you cannot (and should not) move debug statements.  Instead you
 have to invalidate them.  Otherwise you'll introduce confusion as debug
 info cannot handle overlapping live ranges.

 But maybe Alex can clarify.

 Richard.


reassoc_revised.diff
Description: Binary data

Re: Minimize downward code motion during reassociation

2012-11-05 Thread Easwaran Raman

I am unable to figure out the right way to handle the debug
statements. What I tried was to find debug statements that use the SSA
name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE)
and then moved them as well at the right place. Thus, if I have to
move t1 = a + b down (after the definition of 'd'), I also moved all
debug statements that use t1 after the new position of t1. That still
caused use-before-def problems in ssa_verify. I noticed that the debug
statements got modified behind the scenes causing these issues. Any
hints on what is the right way to handle the debug statements would be
very helpful.

Thanks,
Easwaran

On Sun, Nov 4, 2012 at 9:10 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Oct 31, 2012 at 8:16 PM, Easwaran Raman era...@google.com wrote:
 On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote:
 On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com 
 wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
 hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the 
 statement
 +   which defines the operand. For all such basic blocks, assign UIDs. 
  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

 The reason I call the above multiple times is that gsi_move_before
 might get called between two calls to the above. For instance, after
 rewrite_expr_tree is called once, the following sequence of calls
 could happen:
 reassociate_bb - linearize_expr_tree - linearize_expr -
 gsi_move_before.  So it is not sufficient to call
 renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
 use renumber_gimple_stmt_uids_in_blocks instead of
 assign_uids_in_relevant_bbs?

 It's sufficient to call it once if you conservatively update UIDs at stmt
 move / insert time (assign the same UID as the stmt before/after).

 I was worried that that would make the code brittle, but looking at
 the places where a new statement is added or moved, I think it is
 fine.  I have made the change in the new patch.

 Thanks.



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

2012-11-05 Thread Easwaran Raman

I'd like to get a small patch to tree reassociation (
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01761.html ) in.

Thanks,
Easwaran

On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek ja...@redhat.com wrote:
 Status
 ==

 I'd like to close the stage 1 phase of GCC 4.8 development
 on Monday, November 5th.  If you have still patches for new features you'd
 like to see in GCC 4.8, please post them for review soon.  Patches
 posted before the freeze, but reviewed shortly after the freeze, may
 still go in, further changes should be just bugfixes and documentation
 fixes.


 Quality Data
 

 Priority  #   Change from Last Report
 ---   ---
 P1   23   + 23
 P2   77   +  8
 P3   85   + 84
 ---   ---
 Total   185   +115


 Previous Report
 ===

 http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html

 The next report will be sent by me again, announcing end of stage 1.

Re: Minimize downward code motion during reassociation

2012-11-02 Thread Easwaran Raman

Ping.

- Easwaran

On Wed, Oct 31, 2012 at 12:16 PM, Easwaran Raman era...@google.com wrote:
 On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote:
 On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com 
 wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
 hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the 
 statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  
 */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

 The reason I call the above multiple times is that gsi_move_before
 might get called between two calls to the above. For instance, after
 rewrite_expr_tree is called once, the following sequence of calls
 could happen:
 reassociate_bb - linearize_expr_tree - linearize_expr -
 gsi_move_before.  So it is not sufficient to call
 renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
 use renumber_gimple_stmt_uids_in_blocks instead of
 assign_uids_in_relevant_bbs?

 It's sufficient to call it once if you conservatively update UIDs at stmt
 move / insert time (assign the same UID as the stmt before/after).

 I was worried that that would make the code brittle, but looking at
 the places where a new statement is added or moved, I think it is
 fine.  I have made the change in the new patch.

 Thanks.



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems 
 to me
 this is quadratic in the number of ops in the op vector.

 The call to ensure_ops_are_available inside rewrite_expr_tree is
 guarded by if (!moved) and I am setting moved = true there to ensure
 that ensure_ops_are_available inside is called once per reassociation
 of a expression tree.

 It would be a lot easier to understand this function if you call it once
 after rewrite_expr_tree finished.
 I think you meant  before rewrite_expr_tree. One thing to note here
 is this function is called only when we first see

Re: Fix bugs introduced by switch-case profile propagation

2012-10-31 Thread Easwaran Raman

On Fri, Oct 26, 2012 at 8:05 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
  Ping.
 
 
  On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote:
   Hi,
This patch fixes bugs introduced by my previous patch to propagate
   profiles during switch expansion. Bootstrap and profiledbootstrap
   successful on x86_64. Confirmed that it fixes the crashes reported in
   PR middle-end/54957. OK for trunk?
  
   - Easwaran
  
   2012-10-17   Easwaran Raman  era...@google.com
  
   PR target/54938
   PR middle-end/54957
   * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
   only if it doesn't already exist.
   * except.c (sjlj_emit_function_enter): Remove unused variable.
   * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
 
  Seems fine, but under what conditions you get NULL here?
 Wasn't sure if this is an OK for the patch or if I need to address
 anything else.

 Actually I think you should make the except.c to setcurrent_bb when expanding
 the switch instead.

In the current code, in sjlj_emit_dispatch_table (except.c), a
sequence of instructions including the dispatch table jump are first
generated and emitted to a BB before the first of the jump table
targets. I think you are asking me to first emit the instructions
before the indirect jump into a new bb so that BLOCK_FOR_INSN
(before_case) in stmt.c is non NULL. This would need some refactoring
inside except.c, but more importantly this BB won't have the correct
control flow right? So, while I can avoid the check for BB being NULL
in get_outgoing_edge_probs, I still need to guard for default_edge
being NULL and default_prob will still be 0. Would it be ok if I
remove the check for NULL inside get_outgoing_edge_probs and instead
check it when I initialize BASE?

Thanks,
Easwaran

 OK with this change.

 Honza

Re: Minimize downward code motion during reassociation

2012-10-31 Thread Easwaran Raman

On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote:
 On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com 
 wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, 
 hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the 
 statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  
 */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

 The reason I call the above multiple times is that gsi_move_before
 might get called between two calls to the above. For instance, after
 rewrite_expr_tree is called once, the following sequence of calls
 could happen:
 reassociate_bb - linearize_expr_tree - linearize_expr -
 gsi_move_before.  So it is not sufficient to call
 renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
 use renumber_gimple_stmt_uids_in_blocks instead of
 assign_uids_in_relevant_bbs?

 It's sufficient to call it once if you conservatively update UIDs at stmt
 move / insert time (assign the same UID as the stmt before/after).

 I was worried that that would make the code brittle, but looking at
 the places where a new statement is added or moved, I think it is
 fine.  I have made the change in the new patch.

 Thanks.



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems to 
 me
 this is quadratic in the number of ops in the op vector.

 The call to ensure_ops_are_available inside rewrite_expr_tree is
 guarded by if (!moved) and I am setting moved = true there to ensure
 that ensure_ops_are_available inside is called once per reassociation
 of a expression tree.

 It would be a lot easier to understand this function if you call it once
 after rewrite_expr_tree finished.
 I think you meant  before rewrite_expr_tree. One thing to note here
 is this function is called only when we first see a statement whose
 operand changes after reassociation. That's the reason the moving

Re: Fix bugs introduced by switch-case profile propagation

2012-10-25 Thread Easwaran Raman

Hi,

On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Ping.


 On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote:
  Hi,
   This patch fixes bugs introduced by my previous patch to propagate
  profiles during switch expansion. Bootstrap and profiledbootstrap
  successful on x86_64. Confirmed that it fixes the crashes reported in
  PR middle-end/54957. OK for trunk?
 
  - Easwaran
 
  2012-10-17   Easwaran Raman  era...@google.com
 
  PR target/54938
  PR middle-end/54957
  * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
  only if it doesn't already exist.
  * except.c (sjlj_emit_function_enter): Remove unused variable.
  * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.

 Seems fine, but under what conditions you get NULL here?
Wasn't sure if this is an OK for the patch or if I need to address
anything else.

Thanks,
Easwaran


 Honza
  (emit_case_dispatch_table): Handle the case where STMT_BB is
  NULL.
  (expand_sjlj_dispatch_table): Pass BB containing before_case
  to emit_case_dispatch_table.
 
  Index: gcc/optabs.c
  ===
  --- gcc/optabs.c (revision 192488)
  +++ gcc/optabs.c (working copy)
  @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
  profile_status != PROFILE_ABSENT
  insn
  JUMP_P (insn)
  -   any_condjump_p (insn))
  -{
  -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
  -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
  -}
  +   any_condjump_p (insn)
  +   !find_reg_note (insn, REG_BR_PROB, 0))
  +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
   }
 
   /* Generate code to compare X with Y so that the condition codes are
  Index: gcc/except.c
  ===
  --- gcc/except.c (revision 192488)
  +++ gcc/except.c (working copy)
  @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
 if (dispatch_label)
   {
   #ifdef DONT_USE_BUILTIN_SETJMP
  -  rtx x, last;
  +  rtx x;
 x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
  LCT_RETURNS_TWICE,
  TYPE_MODE (integer_type_node), 1,
  plus_constant (Pmode, XEXP (fc, 0),
  Index: gcc/stmt.c
  ===
  --- gcc/stmt.c (revision 192488)
  +++ gcc/stmt.c (working copy)
  @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
 edge e;
 edge_iterator ei;
 int prob_sum = 0;
  +  if (!bb)
  +return 0;
 FOR_EACH_EDGE(e, ei, bb-succs)
   prob_sum += e-probability;
 return prob_sum;
  @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
 rtx fallback_label = label_rtx (case_list-code_label);
 rtx table_label = gen_label_rtx ();
 bool has_gaps = false;
  -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
  -  int default_prob = default_edge-probability;
  +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
  +  int default_prob = default_edge ? default_edge-probability : 0;
 int base = get_outgoing_edge_probs (stmt_bb);
 bool try_with_tablejump = false;
 
  @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
 default_prob = 0;
   }
 
  -  default_edge-probability = default_prob;
  +  if (default_edge)
  +default_edge-probability = default_prob;
 
 /* We have altered the probability of the default edge. So the 
  probabilities
of all other edges need to be adjusted so that it sums up to
  @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
 
 emit_case_dispatch_table (index_expr, index_type,
case_list, default_label,
  - minval, maxval, range, NULL);
  + minval, maxval, range,
  +BLOCK_FOR_INSN (before_case));
 emit_label (default_label);
 free_alloc_pool (case_node_pool);
   }

Re: Fix bugs introduced by switch-case profile propagation

2012-10-23 Thread Easwaran Raman

On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Ping.


 On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote:
  Hi,
   This patch fixes bugs introduced by my previous patch to propagate
  profiles during switch expansion. Bootstrap and profiledbootstrap
  successful on x86_64. Confirmed that it fixes the crashes reported in
  PR middle-end/54957. OK for trunk?
 
  - Easwaran
 
  2012-10-17   Easwaran Raman  era...@google.com
 
  PR target/54938
  PR middle-end/54957
  * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
  only if it doesn't already exist.
  * except.c (sjlj_emit_function_enter): Remove unused variable.
  * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.

 Seems fine, but under what conditions you get NULL here?

When expand_sjlj_dispatch_table calls emit_case_dispatch_table,
stmt_bb is NULL.

- Easwaran

 Honza
  (emit_case_dispatch_table): Handle the case where STMT_BB is
  NULL.
  (expand_sjlj_dispatch_table): Pass BB containing before_case
  to emit_case_dispatch_table.
 
  Index: gcc/optabs.c
  ===
  --- gcc/optabs.c (revision 192488)
  +++ gcc/optabs.c (working copy)
  @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
  profile_status != PROFILE_ABSENT
  insn
  JUMP_P (insn)
  -   any_condjump_p (insn))
  -{
  -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
  -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
  -}
  +   any_condjump_p (insn)
  +   !find_reg_note (insn, REG_BR_PROB, 0))
  +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
   }
 
   /* Generate code to compare X with Y so that the condition codes are
  Index: gcc/except.c
  ===
  --- gcc/except.c (revision 192488)
  +++ gcc/except.c (working copy)
  @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
 if (dispatch_label)
   {
   #ifdef DONT_USE_BUILTIN_SETJMP
  -  rtx x, last;
  +  rtx x;
 x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
  LCT_RETURNS_TWICE,
  TYPE_MODE (integer_type_node), 1,
  plus_constant (Pmode, XEXP (fc, 0),
  Index: gcc/stmt.c
  ===
  --- gcc/stmt.c (revision 192488)
  +++ gcc/stmt.c (working copy)
  @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
 edge e;
 edge_iterator ei;
 int prob_sum = 0;
  +  if (!bb)
  +return 0;
 FOR_EACH_EDGE(e, ei, bb-succs)
   prob_sum += e-probability;
 return prob_sum;
  @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
 rtx fallback_label = label_rtx (case_list-code_label);
 rtx table_label = gen_label_rtx ();
 bool has_gaps = false;
  -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
  -  int default_prob = default_edge-probability;
  +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
  +  int default_prob = default_edge ? default_edge-probability : 0;
 int base = get_outgoing_edge_probs (stmt_bb);
 bool try_with_tablejump = false;
 
  @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
 default_prob = 0;
   }
 
  -  default_edge-probability = default_prob;
  +  if (default_edge)
  +default_edge-probability = default_prob;
 
 /* We have altered the probability of the default edge. So the 
  probabilities
of all other edges need to be adjusted so that it sums up to
  @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,
 
 emit_case_dispatch_table (index_expr, index_type,
case_list, default_label,
  - minval, maxval, range, NULL);
  + minval, maxval, range,
  +BLOCK_FOR_INSN (before_case));
 emit_label (default_label);
 free_alloc_pool (case_node_pool);
   }

Re: Minimize downward code motion during reassociation

2012-10-23 Thread Easwaran Raman

On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the 
 statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

 The reason I call the above multiple times is that gsi_move_before
 might get called between two calls to the above. For instance, after
 rewrite_expr_tree is called once, the following sequence of calls
 could happen:
 reassociate_bb - linearize_expr_tree - linearize_expr -
 gsi_move_before.  So it is not sufficient to call
 renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
 use renumber_gimple_stmt_uids_in_blocks instead of
 assign_uids_in_relevant_bbs?

 It's sufficient to call it once if you conservatively update UIDs at stmt
 move / insert time (assign the same UID as the stmt before/after).

I was worried that that would make the code brittle, but looking at
the places where a new statement is added or moved, I think it is
fine.  I have made the change in the new patch.



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems to me
 this is quadratic in the number of ops in the op vector.

 The call to ensure_ops_are_available inside rewrite_expr_tree is
 guarded by if (!moved) and I am setting moved = true there to ensure
 that ensure_ops_are_available inside is called once per reassociation
 of a expression tree.

 It would be a lot easier to understand this function if you call it once
 after rewrite_expr_tree finished.
I think you meant  before rewrite_expr_tree. One thing to note here
is this function is called only when we first see a statement whose
operand changes after reassociation. That's the reason the moving of
statements happen inside rewrite_expr_tree.


 Why make this all so complicated?  It seems to me that we should
 fixup stmt order only after the whole ops vector has been materialized

Re: Minimize downward code motion during reassociation

2012-10-22 Thread Easwaran Raman

On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}

 Please assign UIDs once using the existing renumber_gimple_stmt_uids ().
 You seem to call the above multiple times and thus do work bigger than
 O(number of basic blocks).

The reason I call the above multiple times is that gsi_move_before
might get called between two calls to the above. For instance, after
rewrite_expr_tree is called once, the following sequence of calls
could happen:
reassociate_bb - linearize_expr_tree - linearize_expr -
gsi_move_before.  So it is not sufficient to call
renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to
use renumber_gimple_stmt_uids_in_blocks instead of
assign_uids_in_relevant_bbs?



 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)
 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{

 Likewise you call this for each call to rewrite_expr_tree, so it seems to me
 this is quadratic in the number of ops in the op vector.

The call to ensure_ops_are_available inside rewrite_expr_tree is
guarded by if (!moved) and I am setting moved = true there to ensure
that ensure_ops_are_available inside is called once per reassociation
of a expression tree.


 Why make this all so complicated?  It seems to me that we should
 fixup stmt order only after the whole ops vector has been materialized.



 +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
 +  gimple def_stmt;
 +  basic_block def_bb;
 +  /* Ignore constants and operands with default definitons.  */
 +  if (TREE_CODE (oe-op) != SSA_NAME
 +  || SSA_NAME_IS_DEFAULT_DEF (oe-op))
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  def_bb = gimple_bb (def_stmt);
 +  if (def_bb != insert_bb
 +   !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
 +{
 +  insert_bb = def_bb;
 +  insert_stmt = def_stmt;
 +}
 +  else if (def_bb == insert_bb
 +gimple_uid (insert_stmt)  gimple_uid (def_stmt))
 +insert_stmt = def_stmt;
 +}
 +  if (insert_stmt == stmt)
 +return;
 +  gsistmt = gsi_for_stmt (stmt);
 +  /* If GSI_STMT is a phi node, then do not insert just

Re: Fix bugs introduced by switch-case profile propagation

2012-10-22 Thread Easwaran Raman

Ping.


On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote:
 Hi,
  This patch fixes bugs introduced by my previous patch to propagate
 profiles during switch expansion. Bootstrap and profiledbootstrap
 successful on x86_64. Confirmed that it fixes the crashes reported in
 PR middle-end/54957. OK for trunk?

 - Easwaran

 2012-10-17   Easwaran Raman  era...@google.com

 PR target/54938
 PR middle-end/54957
 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
 only if it doesn't already exist.
 * except.c (sjlj_emit_function_enter): Remove unused variable.
 * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
 (emit_case_dispatch_table): Handle the case where STMT_BB is
 NULL.
 (expand_sjlj_dispatch_table): Pass BB containing before_case
 to emit_case_dispatch_table.

 Index: gcc/optabs.c
 ===
 --- gcc/optabs.c (revision 192488)
 +++ gcc/optabs.c (working copy)
 @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
 profile_status != PROFILE_ABSENT
 insn
 JUMP_P (insn)
 -   any_condjump_p (insn))
 -{
 -  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
 -  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 -}
 +   any_condjump_p (insn)
 +   !find_reg_note (insn, REG_BR_PROB, 0))
 +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
  }

  /* Generate code to compare X with Y so that the condition codes are
 Index: gcc/except.c
 ===
 --- gcc/except.c (revision 192488)
 +++ gcc/except.c (working copy)
 @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
if (dispatch_label)
  {
  #ifdef DONT_USE_BUILTIN_SETJMP
 -  rtx x, last;
 +  rtx x;
x = emit_library_call_value (setjmp_libfunc, NULL_RTX, 
 LCT_RETURNS_TWICE,
 TYPE_MODE (integer_type_node), 1,
 plus_constant (Pmode, XEXP (fc, 0),
 Index: gcc/stmt.c
 ===
 --- gcc/stmt.c (revision 192488)
 +++ gcc/stmt.c (working copy)
 @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
edge e;
edge_iterator ei;
int prob_sum = 0;
 +  if (!bb)
 +return 0;
FOR_EACH_EDGE(e, ei, bb-succs)
  prob_sum += e-probability;
return prob_sum;
 @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
rtx fallback_label = label_rtx (case_list-code_label);
rtx table_label = gen_label_rtx ();
bool has_gaps = false;
 -  edge default_edge = EDGE_SUCC(stmt_bb, 0);
 -  int default_prob = default_edge-probability;
 +  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
 +  int default_prob = default_edge ? default_edge-probability : 0;
int base = get_outgoing_edge_probs (stmt_bb);
bool try_with_tablejump = false;

 @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
default_prob = 0;
  }

 -  default_edge-probability = default_prob;
 +  if (default_edge)
 +default_edge-probability = default_prob;

/* We have altered the probability of the default edge. So the 
 probabilities
   of all other edges need to be adjusted so that it sums up to
 @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,

emit_case_dispatch_table (index_expr, index_type,
   case_list, default_label,
 - minval, maxval, range, NULL);
 + minval, maxval, range,
 +BLOCK_FOR_INSN (before_case));
emit_label (default_label);
free_alloc_pool (case_node_pool);
  }

Re: Minimize downward code motion during reassociation

2012-10-19 Thread Easwaran Raman

On Fri, Oct 19, 2012 at 5:13 PM, Xinliang David Li davi...@google.com wrote:
 On Thu, Oct 18, 2012 at 3:36 PM, Easwaran Raman era...@google.com wrote:
 Hi,

 During expression reassociation, statements are conservatively moved
 downwards to ensure that dependences are correctly satisfied after
 reassocation. This could lead to lengthening of live ranges. This
 patch moves statements only to the extent necessary. Bootstraps and no
 test regression on x86_64/linux. OK for trunk?

 Thanks,
 Easwaran

 2012-10-18   Easwaran Raman  era...@google.com
 * tree-ssa-reassoc.c(assign_uids): New function.
 (assign_uids_in_relevant_bbs): Likewise.
 (ensure_ops_are_available): Likewise.
 (rewrite_expr_tree): Do not move statements beyond what is
 necessary. Remove call to swap_ops_for_binary_stmt...
 (reassociate_bb): ... and move it here.



 Index: gcc/tree-ssa-reassoc.c
 ===
 --- gcc/tree-ssa-reassoc.c (revision 192487)
 +++ gcc/tree-ssa-reassoc.c (working copy)
 @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
  }
  }

 +/* Assign UIDs to statements in basic block BB.  */
 +
 +static void
 +assign_uids (basic_block bb)
 +{
 +  unsigned uid = 0;
 +  gimple_stmt_iterator gsi;
 +  /* First assign uids to phis.  */
 +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +
 +  /* Then assign uids to stmts.  */
 +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +{
 +  gimple stmt = gsi_stmt (gsi);
 +  gimple_set_uid (stmt, uid++);
 +}
 +}
 +
 +/* For each operand in OPS, find the basic block that contains the statement
 +   which defines the operand. For all such basic blocks, assign UIDs.  */
 +
 +static void
 +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
 +{
 +  operand_entry_t oe;
 +  int i;
 +  struct pointer_set_t *seen_bbs = pointer_set_create ();
 +
 +  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
 +{
 +  gimple def_stmt;
 +  basic_block bb;
 +  if (TREE_CODE (oe-op) != SSA_NAME)
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  bb = gimple_bb (def_stmt);
 +  if (!pointer_set_contains (seen_bbs, bb))
 +{
 +  assign_uids (bb);
 +  pointer_set_insert (seen_bbs, bb);
 +}
 +}
 +  pointer_set_destroy (seen_bbs);
 +}
 +
 +/* Ensure that operands in the OPS vector starting from OPINDEXth
 entry are live
 +   at STMT.  This is accomplished by moving STMT if needed.  */
 +
 +static void
 +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
 ops, int opindex)

 format issue?

 +{
 +  int i;
 +  int len = VEC_length (operand_entry_t, ops);
 +  gimple insert_stmt = stmt;
 +  basic_block insert_bb = gimple_bb (stmt);
 +  gimple_stmt_iterator gsi_insert, gsistmt;
 +  for (i = opindex; i  len; i++)
 +{
 +  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
 +  gimple def_stmt;
 +  basic_block def_bb;
 +  /* Ignore constants and operands with default definitons.  */
 +  if (TREE_CODE (oe-op) != SSA_NAME
 +  || SSA_NAME_IS_DEFAULT_DEF (oe-op))
 +continue;
 +  def_stmt = SSA_NAME_DEF_STMT (oe-op);
 +  def_bb = gimple_bb (def_stmt);
 +  if (def_bb != insert_bb
 +   !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
 +{
 +  insert_bb = def_bb;
 +  insert_stmt = def_stmt;
 +}
 +  else if (def_bb == insert_bb
 +gimple_uid (insert_stmt)  gimple_uid (def_stmt))
 +insert_stmt = def_stmt;
 +}
 +  if (insert_stmt == stmt)
 +return;
 +  gsistmt = gsi_for_stmt (stmt);
 +  /* If GSI_STMT is a phi node, then do not insert just after that 
 statement.

 GSI_STMT -- INSERT_STMT?

Ok.
 + Instead, find the first non-label gimple statement in BB and insert 
 before
 + that.  */
 +  if (gimple_code (insert_stmt) == GIMPLE_PHI)
 +{
 +  gsi_insert = gsi_after_labels (insert_bb);
 +  gsi_move_before (gsistmt, gsi_insert);
 +}
 +  /* Statements marked for throw can not be in the middle of a basic block. 
 So
 + we can not insert a statement (not marked for throw) immediately
 after.  */
 +  else if (lookup_stmt_eh_lp (insert_stmt)  0
 +stmt_can_throw_internal (insert_stmt))




 +{
 +  edge e, succ_edge = NULL;
 +  edge_iterator ei;
 +
 +  /* There should be exactly one normal edge out of the basic block.  */
 +  FOR_EACH_EDGE (e, ei, insert_bb-succs)
 +{
 +  if (!(e-flags  EDGE_COMPLEX))
 +{
 +  gcc_assert (succ_edge == NULL);
 +  succ_edge = e;
 +}
 +}
 +  /* Insert STMT at the beginning of the successor basic block.  */
 +  insert_bb = succ_edge-dest;
 +  gsi_insert = gsi_after_labels (insert_bb

Minimize downward code motion during reassociation

2012-10-18 Thread Easwaran Raman

Hi,

During expression reassociation, statements are conservatively moved
downwards to ensure that dependences are correctly satisfied after
reassocation. This could lead to lengthening of live ranges. This
patch moves statements only to the extent necessary. Bootstraps and no
test regression on x86_64/linux. OK for trunk?

Thanks,
Easwaran

2012-10-18   Easwaran Raman  era...@google.com
* tree-ssa-reassoc.c(assign_uids): New function.
(assign_uids_in_relevant_bbs): Likewise.
(ensure_ops_are_available): Likewise.
(rewrite_expr_tree): Do not move statements beyond what is
necessary. Remove call to swap_ops_for_binary_stmt...
(reassociate_bb): ... and move it here.



Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 192487)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
 }
 }

+/* Assign UIDs to statements in basic block BB.  */
+
+static void
+assign_uids (basic_block bb)
+{
+  unsigned uid = 0;
+  gimple_stmt_iterator gsi;
+  /* First assign uids to phis.  */
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  gimple_set_uid (stmt, uid++);
+}
+
+  /* Then assign uids to stmts.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  gimple_set_uid (stmt, uid++);
+}
+}
+
+/* For each operand in OPS, find the basic block that contains the statement
+   which defines the operand. For all such basic blocks, assign UIDs.  */
+
+static void
+assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops)
+{
+  operand_entry_t oe;
+  int i;
+  struct pointer_set_t *seen_bbs = pointer_set_create ();
+
+  for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++)
+{
+  gimple def_stmt;
+  basic_block bb;
+  if (TREE_CODE (oe-op) != SSA_NAME)
+continue;
+  def_stmt = SSA_NAME_DEF_STMT (oe-op);
+  bb = gimple_bb (def_stmt);
+  if (!pointer_set_contains (seen_bbs, bb))
+{
+  assign_uids (bb);
+  pointer_set_insert (seen_bbs, bb);
+}
+}
+  pointer_set_destroy (seen_bbs);
+}
+
+/* Ensure that operands in the OPS vector starting from OPINDEXth
entry are live
+   at STMT.  This is accomplished by moving STMT if needed.  */
+
+static void
+ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) *
ops, int opindex)
+{
+  int i;
+  int len = VEC_length (operand_entry_t, ops);
+  gimple insert_stmt = stmt;
+  basic_block insert_bb = gimple_bb (stmt);
+  gimple_stmt_iterator gsi_insert, gsistmt;
+  for (i = opindex; i  len; i++)
+{
+  operand_entry_t oe = VEC_index (operand_entry_t, ops, i);
+  gimple def_stmt;
+  basic_block def_bb;
+  /* Ignore constants and operands with default definitons.  */
+  if (TREE_CODE (oe-op) != SSA_NAME
+  || SSA_NAME_IS_DEFAULT_DEF (oe-op))
+continue;
+  def_stmt = SSA_NAME_DEF_STMT (oe-op);
+  def_bb = gimple_bb (def_stmt);
+  if (def_bb != insert_bb
+   !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb))
+{
+  insert_bb = def_bb;
+  insert_stmt = def_stmt;
+}
+  else if (def_bb == insert_bb
+gimple_uid (insert_stmt)  gimple_uid (def_stmt))
+insert_stmt = def_stmt;
+}
+  if (insert_stmt == stmt)
+return;
+  gsistmt = gsi_for_stmt (stmt);
+  /* If GSI_STMT is a phi node, then do not insert just after that statement.
+ Instead, find the first non-label gimple statement in BB and insert before
+ that.  */
+  if (gimple_code (insert_stmt) == GIMPLE_PHI)
+{
+  gsi_insert = gsi_after_labels (insert_bb);
+  gsi_move_before (gsistmt, gsi_insert);
+}
+  /* Statements marked for throw can not be in the middle of a basic block. So
+ we can not insert a statement (not marked for throw) immediately
after.  */
+  else if (lookup_stmt_eh_lp (insert_stmt)  0
+stmt_can_throw_internal (insert_stmt))
+{
+  edge e, succ_edge = NULL;
+  edge_iterator ei;
+
+  /* There should be exactly one normal edge out of the basic block.  */
+  FOR_EACH_EDGE (e, ei, insert_bb-succs)
+{
+  if (!(e-flags  EDGE_COMPLEX))
+{
+  gcc_assert (succ_edge == NULL);
+  succ_edge = e;
+}
+}
+  /* Insert STMT at the beginning of the successor basic block.  */
+  insert_bb = succ_edge-dest;
+  gsi_insert = gsi_after_labels (insert_bb);
+  gsi_move_before (gsistmt, gsi_insert);
+}
+  else
+{
+  gsi_insert = gsi_for_stmt (insert_stmt);
+  gsi_move_after (gsistmt, gsi_insert);
+}
+
+}
+
 /* Recursively rewrite our linearized statements so that the operators
match those in OPS[OPINDEX], putting the computation in rank
order.  */
@@ -2262,11 +2384,6 @@ rewrite_expr_tree

Fix bugs introduced by switch-case profile propagation

2012-10-17 Thread Easwaran Raman

Hi,
 This patch fixes bugs introduced by my previous patch to propagate
profiles during switch expansion. Bootstrap and profiledbootstrap
successful on x86_64. Confirmed that it fixes the crashes reported in
PR middle-end/54957. OK for trunk?

- Easwaran

2012-10-17   Easwaran Raman  era...@google.com

PR target/54938
PR middle-end/54957
* optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note
only if it doesn't already exist.
* except.c (sjlj_emit_function_enter): Remove unused variable.
* stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL.
(emit_case_dispatch_table): Handle the case where STMT_BB is
NULL.
(expand_sjlj_dispatch_table): Pass BB containing before_case
to emit_case_dispatch_table.

Index: gcc/optabs.c
===
--- gcc/optabs.c (revision 192488)
+++ gcc/optabs.c (working copy)
@@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m
profile_status != PROFILE_ABSENT
insn
JUMP_P (insn)
-   any_condjump_p (insn))
-{
-  gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0));
-  add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
-}
+   any_condjump_p (insn)
+   !find_reg_note (insn, REG_BR_PROB, 0))
+add_reg_note (insn, REG_BR_PROB, GEN_INT (prob));
 }

 /* Generate code to compare X with Y so that the condition codes are
Index: gcc/except.c
===
--- gcc/except.c (revision 192488)
+++ gcc/except.c (working copy)
@@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label)
   if (dispatch_label)
 {
 #ifdef DONT_USE_BUILTIN_SETJMP
-  rtx x, last;
+  rtx x;
   x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE,
TYPE_MODE (integer_type_node), 1,
plus_constant (Pmode, XEXP (fc, 0),
Index: gcc/stmt.c
===
--- gcc/stmt.c (revision 192488)
+++ gcc/stmt.c (working copy)
@@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb)
   edge e;
   edge_iterator ei;
   int prob_sum = 0;
+  if (!bb)
+return 0;
   FOR_EACH_EDGE(e, ei, bb-succs)
 prob_sum += e-probability;
   return prob_sum;
@@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in
   rtx fallback_label = label_rtx (case_list-code_label);
   rtx table_label = gen_label_rtx ();
   bool has_gaps = false;
-  edge default_edge = EDGE_SUCC(stmt_bb, 0);
-  int default_prob = default_edge-probability;
+  edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL;
+  int default_prob = default_edge ? default_edge-probability : 0;
   int base = get_outgoing_edge_probs (stmt_bb);
   bool try_with_tablejump = false;

@@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in
   default_prob = 0;
 }

-  default_edge-probability = default_prob;
+  if (default_edge)
+default_edge-probability = default_prob;

   /* We have altered the probability of the default edge. So the probabilities
  of all other edges need to be adjusted so that it sums up to
@@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index,

   emit_case_dispatch_table (index_expr, index_type,
  case_list, default_label,
- minval, maxval, range, NULL);
+ minval, maxval, range,
+BLOCK_FOR_INSN (before_case));
   emit_label (default_label);
   free_alloc_pool (case_node_pool);
 }

Re: Propagate profile counts during switch expansion

2012-10-15 Thread Easwaran Raman

On Sun, Oct 14, 2012 at 8:09 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,

 Index: optabs.c
 ===
 --- optabs.c(revision 191879)
 +++ optabs.c(working copy)
 @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int
 we can do the branch.  */

  static void
 -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label)
 +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int 
 prob)
  {
enum machine_mode optab_mode;
enum mode_class mclass;
 @@ -4261,7 +4261,16 @@ static void

gcc_assert (icode != CODE_FOR_nothing);
gcc_assert (insn_operand_matches (icode, 0, test));
 -  emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), 
 label));
 +  rtx insn = emit_insn (
 +  GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label));

 I think we did not change to style of mixing declaration and code yet.  So
 please put declaration ahead.
Ok.


 I think you want to keep emit_jump_insn.  Also do nothing when profile_status
 == PROFILE_ABSENT.

Why should this be dependent on profile_status? The PROB passed could
also come from static prediction right.

 Index: cfgbuild.c
 ===
 --- cfgbuild.c  (revision 191879)
 +++ cfgbuild.c  (working copy)
 @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b)
   f-count = b-count - e-count;
   return;
 }
 +  else
 +{
 +  guess_outgoing_edge_probabilities (b);
 +}

 Add comment here that we rely on multiway BBs having sane probabilities 
 already.
 You still want to do guessing when the edges out are EH. Those also can be 
 many.
I think this should work:

-  if (single_succ_p (b))
+  else if (single_succ_p (b))
 {
   e = single_succ_edge (b);
   e-probability = REG_BR_PROB_BASE;
   e-count = b-count;
   return;
 }
-  guess_outgoing_edge_probabilities (b);
+  else
+{
+  /* We rely on BBs with more than two successors to have sane
probabilities
+ and do not guess them here. For BBs terminated by switch statements
+ expanded to jump-table jump, we have done the right thing during
+ expansion. For EH edges, we still guess the probabilities here.  */
+  bool complex_edge = false;
+  FOR_EACH_EDGE (e, ei, b-succs)
+if (e-flags  EDGE_COMPLEX)
+  {
+complex_edge = true;
+break;
+  }
+  if (complex_edge)
+guess_outgoing_edge_probabilities (b);
+}
+


 Index: expr.h
 ===
 --- expr.h  (revision 191879)
 +++ expr.h  (working copy)
 @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx);
  /* Emit a pair of rtl insns to compare two rtx's and to jump
 to a label if the comparison is true.  */
  extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx,
 -enum machine_mode, int, rtx);
 +enum machine_mode, int, rtx, int 
 prob=-1);

 Hmm, probably first appreance of this C++ construct. I suppose it is OK.
http://gcc.gnu.org/codingconventions.html#Default says it is ok for POD values.


 +static inline void
 +reset_out_edges_aux (basic_block bb)
 +{
 +  edge e;
 +  edge_iterator ei;
 +  FOR_EACH_EDGE(e, ei, bb-succs)
 +e-aux = (void *)0;
 +}
 +static inline void
 +compute_cases_per_edge (gimple stmt)
 +{
 +  basic_block bb = gimple_bb (stmt);
 +  reset_out_edges_aux (bb);
 +  int ncases = gimple_switch_num_labels (stmt);
 +  for (int i = ncases - 1; i = 1; --i)
 +{
 +  tree elt = gimple_switch_label (stmt, i);
 +  tree lab = CASE_LABEL (elt);
 +  basic_block case_bb = label_to_block_fn (cfun, lab);
 +  edge case_edge = find_edge (bb, case_bb);
 +  case_edge-aux = (void *)((long)(case_edge-aux) + 1);
 +}
 +}

 Comments and newlines per coding standard.
Ok.

 With the these changes, the patch is OK

Thanks,
Easwaran

 Thanks,
 Honza

Re: Propagate profile counts during switch expansion

2012-10-13 Thread Easwaran Raman

Ping.


On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman era...@google.com wrote:
 I have attached a revised patch. The updated ChangeLog is given below
 and I have responded to your comments inline:

 2012-10-08   Easwaran Raman  era...@google.com
 * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to
 specificy the probability of taking the jump.
 (emit_cmp_and_jump_insns): Likewise.
 (expand_compare_and_swap_loop): Make the jump predicted not taken.
 * dojump.c (do_compare_rtx_and_jump): Remove the code attaching
 REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns.
 * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing
 probabilities for branches with more than two successors.
 * expr.c (emit_block_move_via_loop): Predict the loop backedge loop
 to be highly taken.
 (try_casesi): Pass the probability of jumping to the default label.
 (try_tablejump): Likewise.
 (do_tablejump): Likewise.
 * expr.h (try_tablejump): Add a new parameter.
 (try_casesi): Likewise.
 (emit_cmp_and_jump_insns): Add probability as default parameter with a
 default value of -1.
 * except.c (sjlj_emit_function_enter): Pass probability to
 emit_cmp_and_jump_insns.
 * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB.
 (do_jump_if_equal): Pass probability for REG_BR_PROB note.
 (add_case_node): Pass estimated probability of jumping to the case
 label.
 (emit_case_decision_tree): Pass default_prob to emit_case_nodes.
 (get_outgoing_edge_probs): New function.
 (conditional_probability): Likewise.
 (reset_out_edges_aux): Likewise.
 (compute_cases_per_edge): Likewise.
 (emit_case_dispatch_table): Update probabilities of edges coming out
 of the switch statement.
 (expand_case): Compute and propagate default edge probability to
 emit_case_dispatch_table.
 (expand_sjlj_dispatch_table): Update calls to add_case_node and
 emit_case_dispatch_table.
 (balance_case_nodes): Update subtree_prob values.
 (emit_case_nodes): Compute edge probabilities and add pass them to
 emit_cmp_and_jump_insns.

 gcc/testsuite/ChangeLog:
 2012-10-02   Easwaran Raman  era...@google.com
 * gcc.dg/tree-prof/switch-case-1.c: New test case.
 * gcc.dg/tree-prof/switch-case-2.c: New test case.




 On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Hi,
  This patch propagates the profile counts during RTL expansion. In
 many cases, there is no way to determine the exact count of an edge
 generated during the expansion. So this patch uses some simple
 heuristics to estimate the edge counts but ensures that the counts of
 the basic blocks corresponding to the cases are (nearly the) same as
 at the gimple level.

 Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for 
 trunk?
 Index: gcc/expr.c
 ===
 --- gcc/expr.c (revision 191879)
 +++ gcc/expr.c (working copy)
 @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
  #ifdef PUSH_ROUNDING
  static void emit_single_push_insn (enum machine_mode, rtx, tree);
  #endif
 -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
 +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
  static rtx const_vector_from_tree (tree);
  static void write_complex_part (rtx, rtx, bool);

 @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree

  static void
  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx 
 table_label,
 -  rtx default_label)
 +  rtx default_label, int default_probability)

 Please document default_probability.
 Done.

  {
rtx temp, vector;

 @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
   the maximum value of the range.  */

if (default_label)
 -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 - default_label);
 +{
 +  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 +   default_label);
 +  if (default_probability != -1)
 +{
 +  rtx jump_insn = get_last_insn();
 +  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
 (default_probability));
 +}
 +}

 dojump already does this kind of logic, but it is bit more cureful:
   emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp,
if_true_label);
   if (prob != -1  profile_status != PROFILE_ABSENT)
 {
   for (last = NEXT_INSN (last);
last  NEXT_INSN (last);
last = NEXT_INSN (last))
 if (JUMP_P (last))
   break;
   if (last
JUMP_P (last)
! NEXT_INSN (last)
any_condjump_p (last))
 {
   gcc_assert (!find_reg_note (last, REG_BR_PROB, 0));
   add_reg_note (last, REG_BR_PROB, GEN_INT (prob));
 }
 }

 What about making emit_cmp_and_jump_insns taking the probability argument
 and moving the code above inside

Re: Move statements upwards after reassociation

2012-10-12 Thread Easwaran Raman

On Fri, Oct 12, 2012 at 1:45 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, Oct 12, 2012 at 3:09 AM, Easwaran Raman era...@google.com wrote:
 Thanks for the comments. As David wrote, the intent of the patch is
 not to do a general purpose scheduling, but to compensate for the
 possible live range lengthening introduced by reassociation.


 On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman era...@google.com wrote:

 +/* Move STMT up within its BB until it can not be moved any further.  */
 +
 +static void move_stmt_upwards (gimple stmt)
 +{
 +  gimple_stmt_iterator gsi, gsistmt;
 +  tree rhs1, rhs2;
 +  gimple rhs1def = NULL, rhs2def = NULL;
 +  rhs1 = gimple_assign_rhs1 (stmt);
 +  rhs2 = gimple_assign_rhs2 (stmt);
 +  gcc_assert (rhs1);

 Please no such senseless asserts.  The following line will segfault anyway
 if rhs1 is NULL and with a debugger an assert doesn't add any useful
 information.
 Ok.

 +  if (TREE_CODE (rhs1) == SSA_NAME)
 +rhs1def = SSA_NAME_DEF_STMT (rhs1);
 +  else if (TREE_CODE (rhs1) != REAL_CST
 +TREE_CODE (rhs1) != INTEGER_CST)
 +return;
 +  if (rhs2)

 You may not access gimple_assign_rhs2 if it is not there.  So you have
 to check whether you have an unary, binary or ternary (yes) operation.

 gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment
 has less than two operands.  Regarding the check for ternary
 operation, I believe it is not necessary. A statement is considered
 for reassociation only if the RHS class is GIMPLE_BINARY_RHS.
 Subsequently, for rhs1 and rhs2, it checks if the def statements also
 have the same code and so it seems to me that a statement with a
 ternary operator in the RHS will never be considered in
 rewrite_expr_tree.



 +{
 +  if (TREE_CODE (rhs2) == SSA_NAME)
 +rhs2def = SSA_NAME_DEF_STMT (rhs2);
 +  else if (TREE_CODE (rhs1) != REAL_CST
 +TREE_CODE (rhs1) != INTEGER_CST)
 +return;
 +}
 +  gsi = gsi_for_stmt (stmt);
 +  gsistmt = gsi;
 +  gsi_prev (gsi);
 +  for (; !gsi_end_p (gsi); gsi_prev (gsi))

 

 This doesn't make much sense.  You can move a stmt to the nearest
 common post-dominator.  Assuming you only want to handle placing
 it after rhs1def or after rhs2def(?) you don't need any loop, just
 two dominator queries and an insertion after one of the definition
 stmts.

 Within a BB isn't that still O(size of BB)?

 Please document the fact that both stmts are in the same BB.
Ok.
 And no, it isn't, it is O (size of BB ^ 2).  You don't need a loop.
 operand rank should reflect the dominance relation inside the BB.

The rank of phi definitions would mess this up.

 If that doesn't work simply assign UIDs to the stmts first.
Ok.


 But this code should consider BBs.
 For reassociation to look across BBs, the code should look something like 
 this:

 L1 :
a_2 = a_1 + 10
jc L3
 L2:
   a_3 = a_2 + 20

 - L1 should dominate L2 (otherwise there will be a phi node at L2 and
 the reassociation of a_3 will not consider the definition of a_2)
 - There are no other uses of a_2 other than the one in L3.

 After reassociation, the stmt defining a_2 would be moved to L2.  In
 that case, the downward code motion of a_2 = a_1 + 10 to L2 is
 beneficial (one less instruction if the branch is taken). It is not
 obvious to me that moving  it to L1 (or whereever a_1 is defined) is
 beneficial.

 In this case it doesn't matter whether a1 lives through a3 or if a2 does.
 But moving the stmt is not necessary, so why not simply avoid it.
I used that example to show that the current downward motion has a
useful side effect and this patch preserves it. Yes, in this example
the downward motion can be avoided but in general it may not be
possible. I agree with you that there is unnecessary code motion in
many cases.

 You cannot undo it with yout patch anyway.

  And I don't see why more optimal
 placement cannot be done during rewrite_expr_tree itself.

 I started with that idea, but my current approach looks more simpler.

 Simpler, but it's a hack.

 So, the only place we move stmts in rewrite_expr_tree is here:

   if (!moved)
 {
   gimple_stmt_iterator gsinow, gsirhs1;
   gimple stmt1 = stmt, stmt2;
   unsigned int count;

   gsinow = gsi_for_stmt (stmt);
   count = VEC_length (operand_entry_t, ops) - opindex - 2;
   while (count-- != 0)
 {
   stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt1));
   gsirhs1 = gsi_for_stmt (stmt2);
   gsi_move_before (gsirhs1, gsinow);

 that moving is done unconditionally, even if stmt2 already dominates stmt1.
 Why not improve that instead?

stmt2 does dominate stmt1. The issue is it unconditionally moves stmt2
downwards even if the statement that defines ops[op_index+1] (the new
RHS of stmt2 after reassociation) already dominates stmt2

[google] Move delete with size to its own file (issue6655052)

2012-10-11 Thread Easwaran Raman

This patch moves the two argument delete operator into its own file.
When a program provides its own definition of operator delete (void
*), but not operator delete (void *, size_t), we could end up linking
two files that define _ZdlPv resulting in a linker error. Bootstraps
with no test regressions on a x86_64 machine running linux. Ok for
google/4_7 and google/main branches?

Google ref b/6982747

2012-10-11  Easwaran Raman  era...@google.com

* libsupc++/Makefile.am: Add del_opsz.cc to sources.
* libsupc++/Makefile.in: Regenerated.
* libsupc++/del_opsz.cc: New file.
* libsupc++/del_op.cc(operator delete(void*,std::size_t)):
  Remove.

Index: libstdc++-v3/libsupc++/Makefile.in
===
--- libstdc++-v3/libsupc++/Makefile.in  (revision 192373)
+++ libstdc++-v3/libsupc++/Makefile.in  (working copy)
@@ -92,8 +92,8 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_
 libsupc___la_LIBADD =
 am__objects_1 = array_type_info.lo atexit_arm.lo bad_alloc.lo \
bad_cast.lo bad_typeid.lo class_type_info.lo del_op.lo \
-   del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo eh_alloc.lo \
-   eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \
+   del_opsz.lo del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo \
+   eh_alloc.lo eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \
eh_exception.lo eh_globals.lo eh_personality.lo eh_ptr.lo \
eh_term_handler.lo eh_terminate.lo eh_tm.lo eh_throw.lo \
eh_type.lo eh_unex_handler.lo enum_type_info.lo \
@@ -362,6 +362,7 @@ sources = \
bad_typeid.cc \
class_type_info.cc \
del_op.cc \
+   del_opsz.cc \
del_opnt.cc \
del_opv.cc \
del_opvnt.cc \
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc(revision 192373)
+++ libstdc++-v3/libsupc++/del_op.cc(working copy)
@@ -47,11 +47,3 @@ operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
   if (ptr)
 std::free(ptr);
 }
-
-_GLIBCXX_WEAK_DEFINITION void
-operator delete(void* ptr,
-std::size_t bytes __attribute__((__unused__))) throw ()
-{
-  if (ptr)
-std::free(ptr);
-}
Index: libstdc++-v3/libsupc++/del_opsz.cc
===
--- libstdc++-v3/libsupc++/del_opsz.cc  (revision 0)
+++ libstdc++-v3/libsupc++/del_opsz.cc  (revision 0)
@@ -0,0 +1,50 @@
+// Boilerplate support routines for -*- C++ -*- dynamic memory management.
+
+// Copyright (C) 2012
+// Free Software Foundation
+//
+// This file is part of GCC.
+//
+// GCC is free software; you can redistribute it and/or modify
+// it under the terms of the GNU General Public License as published by
+// the Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// GCC is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// http://www.gnu.org/licenses/.
+
+#include bits/c++config.h
+
+#if !_GLIBCXX_HOSTED
+// A freestanding C runtime may not provide free -- but there is no
+// other reasonable way to implement operator delete.
+namespace std
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+  extern C void free(void*);
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+#else
+# include cstdlib
+#endif
+
+#include new
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/Makefile.am
===
--- libstdc++-v3/libsupc++/Makefile.am  (revision 192373)
+++ libstdc++-v3/libsupc++/Makefile.am  (working copy)
@@ -53,6 +53,7 @@ sources = \
bad_typeid.cc \
class_type_info.cc \
del_op.cc \
+   del_opsz.cc \
del_opnt.cc \
del_opv.cc \
del_opvnt.cc \

--
This patch is available for review at http://codereview.appspot.com/6655052

Re: Move statements upwards after reassociation

2012-10-11 Thread Easwaran Raman

Thanks for the comments. As David wrote, the intent of the patch is
not to do a general purpose scheduling, but to compensate for the
possible live range lengthening introduced by reassociation.


On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman era...@google.com wrote:

 +/* Move STMT up within its BB until it can not be moved any further.  */
 +
 +static void move_stmt_upwards (gimple stmt)
 +{
 +  gimple_stmt_iterator gsi, gsistmt;
 +  tree rhs1, rhs2;
 +  gimple rhs1def = NULL, rhs2def = NULL;
 +  rhs1 = gimple_assign_rhs1 (stmt);
 +  rhs2 = gimple_assign_rhs2 (stmt);
 +  gcc_assert (rhs1);

 Please no such senseless asserts.  The following line will segfault anyway
 if rhs1 is NULL and with a debugger an assert doesn't add any useful
 information.
Ok.

 +  if (TREE_CODE (rhs1) == SSA_NAME)
 +rhs1def = SSA_NAME_DEF_STMT (rhs1);
 +  else if (TREE_CODE (rhs1) != REAL_CST
 +TREE_CODE (rhs1) != INTEGER_CST)
 +return;
 +  if (rhs2)

 You may not access gimple_assign_rhs2 if it is not there.  So you have
 to check whether you have an unary, binary or ternary (yes) operation.

gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment
has less than two operands.  Regarding the check for ternary
operation, I believe it is not necessary. A statement is considered
for reassociation only if the RHS class is GIMPLE_BINARY_RHS.
Subsequently, for rhs1 and rhs2, it checks if the def statements also
have the same code and so it seems to me that a statement with a
ternary operator in the RHS will never be considered in
rewrite_expr_tree.



 +{
 +  if (TREE_CODE (rhs2) == SSA_NAME)
 +rhs2def = SSA_NAME_DEF_STMT (rhs2);
 +  else if (TREE_CODE (rhs1) != REAL_CST
 +TREE_CODE (rhs1) != INTEGER_CST)
 +return;
 +}
 +  gsi = gsi_for_stmt (stmt);
 +  gsistmt = gsi;
 +  gsi_prev (gsi);
 +  for (; !gsi_end_p (gsi); gsi_prev (gsi))

 

 This doesn't make much sense.  You can move a stmt to the nearest
 common post-dominator.  Assuming you only want to handle placing
 it after rhs1def or after rhs2def(?) you don't need any loop, just
 two dominator queries and an insertion after one of the definition
 stmts.

Within a BB isn't that still O(size of BB)?

 But this code should consider BBs.
For reassociation to look across BBs, the code should look something like this:

L1 :
   a_2 = a_1 + 10
   jc L3
L2:
  a_3 = a_2 + 20

- L1 should dominate L2 (otherwise there will be a phi node at L2 and
the reassociation of a_3 will not consider the definition of a_2)
- There are no other uses of a_2 other than the one in L3.

After reassociation, the stmt defining a_2 would be moved to L2.  In
that case, the downward code motion of a_2 = a_1 + 10 to L2 is
beneficial (one less instruction if the branch is taken). It is not
obvious to me that moving  it to L1 (or whereever a_1 is defined) is
beneficial.

  And I don't see why more optimal
 placement cannot be done during rewrite_expr_tree itself.

I started with that idea, but my current approach looks more simpler.


Thanks,
Easwaran

 NB, the whole reassoc code needs a re-write to avoid the excessive
 stmt modifications when it does nothing.  So I'd very much rather avoid
 adding anything to reassoc until that rewrite happened.

 Richard.

 +{
 +  gimple curr_stmt = gsi_stmt (gsi);
 +  if (curr_stmt == rhs1def || curr_stmt == rhs2def) {
 +gsi_move_after (gsistmt, gsi);
 +return;
 +  }
 +}
 +
 +}
 +
  /* Recursively rewrite our linearized statements so that the operators
 match those in OPS[OPINDEX], putting the computation in rank
 order.  */

  static void
  rewrite_expr_tree (gimple stmt, unsigned int opindex,
 -   VEC(operand_entry_t, heap) * ops, bool moved)
 +   VEC(operand_entry_t, heap) * ops, bool moved,
 +   VEC(gimple, heap) **stmts_to_move)
  {
tree rhs1 = gimple_assign_rhs1 (stmt);
tree rhs2 = gimple_assign_rhs2 (stmt);
 @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
print_gimple_stmt (dump_file, stmt, 0, 0);
  }
   }
 +  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
return;
  }

 @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
  }
/* Recurse on the LHS of the binary operator, which is guaranteed to
   be the non-leaf side.  */
 -  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved);
 +  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved,
 + stmts_to_move);
 +  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
  }

  /* Find out how many cycles we need to compute statements chain.
 @@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb)
  {
gimple_stmt_iterator gsi;
basic_block son;
 +  VEC(gimple, heap) *stmts_to_move = NULL;
 +  gimple stmt;
 +  int i;

for (gsi = gsi_last_bb (bb

Move statements upwards after reassociation

2012-10-10 Thread Easwaran Raman

Hi,
 In the expression reassociation pass, statements might get moved
downwards to ensure that dependences are not violated after
reassociation. This can increase the live range and, in a tight loop,
result in spills.  This patch simply does a code motion of those
statements involved in reassociation and pushes them upwards in the BB
as far as possible without violating dependences. Bootstraps and no
tests regressions on a x86_64 machine running linux. Ok for trunk?


- Easwaran

---
2012-10-10  Easwaran Raman  era...@google.com

   * tree-ssa-reassoc.c (move_stmt_upwards): New function.
 (rewrite_expr_tree): Record statements to be moved.
 (reassociate_bb): Move statements affected by reassociation
 as early as possible.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c (revision 191879)
+++ gcc/tree-ssa-reassoc.c (working copy)
@@ -2250,13 +2250,51 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea
 }
 }

+/* Move STMT up within its BB until it can not be moved any further.  */
+
+static void move_stmt_upwards (gimple stmt)
+{
+  gimple_stmt_iterator gsi, gsistmt;
+  tree rhs1, rhs2;
+  gimple rhs1def = NULL, rhs2def = NULL;
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+  gcc_assert (rhs1);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+rhs1def = SSA_NAME_DEF_STMT (rhs1);
+  else if (TREE_CODE (rhs1) != REAL_CST
+TREE_CODE (rhs1) != INTEGER_CST)
+return;
+  if (rhs2)
+{
+  if (TREE_CODE (rhs2) == SSA_NAME)
+rhs2def = SSA_NAME_DEF_STMT (rhs2);
+  else if (TREE_CODE (rhs1) != REAL_CST
+TREE_CODE (rhs1) != INTEGER_CST)
+return;
+}
+  gsi = gsi_for_stmt (stmt);
+  gsistmt = gsi;
+  gsi_prev (gsi);
+  for (; !gsi_end_p (gsi); gsi_prev (gsi))
+{
+  gimple curr_stmt = gsi_stmt (gsi);
+  if (curr_stmt == rhs1def || curr_stmt == rhs2def) {
+gsi_move_after (gsistmt, gsi);
+return;
+  }
+}
+
+}
+
 /* Recursively rewrite our linearized statements so that the operators
match those in OPS[OPINDEX], putting the computation in rank
order.  */

 static void
 rewrite_expr_tree (gimple stmt, unsigned int opindex,
-   VEC(operand_entry_t, heap) * ops, bool moved)
+   VEC(operand_entry_t, heap) * ops, bool moved,
+   VEC(gimple, heap) **stmts_to_move)
 {
   tree rhs1 = gimple_assign_rhs1 (stmt);
   tree rhs2 = gimple_assign_rhs2 (stmt);
@@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
   print_gimple_stmt (dump_file, stmt, 0, 0);
 }
  }
+  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
   return;
 }

@@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind
 }
   /* Recurse on the LHS of the binary operator, which is guaranteed to
  be the non-leaf side.  */
-  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved);
+  rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved,
+ stmts_to_move);
+  VEC_safe_push (gimple, heap, *stmts_to_move, stmt);
 }

 /* Find out how many cycles we need to compute statements chain.
@@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  VEC(gimple, heap) *stmts_to_move = NULL;
+  gimple stmt;
+  int i;

   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi))
 {
@@ -3542,7 +3586,7 @@ reassociate_bb (basic_block bb)
VEC_length (operand_entry_t, ops)  3)
 rewrite_expr_tree_parallel (stmt, width, ops);
   else
-rewrite_expr_tree (stmt, 0, ops, false);
+rewrite_expr_tree (stmt, 0, ops, false, stmts_to_move);

   /* If we combined some repeated factors into a
  __builtin_powi call, multiply that result by the
@@ -3560,6 +3604,7 @@ reassociate_bb (basic_block bb)
target_ssa);
   gimple_set_location (mul_stmt, gimple_location (stmt));
   gsi_insert_after (gsi, mul_stmt, GSI_NEW_STMT);
+  VEC_safe_push (gimple, heap, stmts_to_move, mul_stmt);
 }
  }

@@ -3567,6 +3612,11 @@ reassociate_bb (basic_block bb)
 }
  }
 }
+
+  FOR_EACH_VEC_ELT (gimple, stmts_to_move, i, stmt)
+move_stmt_upwards (stmt);
+  VEC_free (gimple, heap, stmts_to_move);
+
   for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
son;
son = next_dom_son (CDI_POST_DOMINATORS, son))

Re: Propagate profile counts during switch expansion

2012-10-04 Thread Easwaran Raman

Hi Honza,
 I am addressing some of the questions you raise here. Will send an
updated patch later.


On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote:

  @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b)
 return;
}
   }
  -
 if (single_succ_p (b))
   {
 e = single_succ_edge (b);
  @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b)
 e-count = b-count;
 return;
   }
  -  guess_outgoing_edge_probabilities (b);
  +  else if (!gen_probabilities_from_existing_counts (b-succs)){
  +/* All outgoing edges of B have zero count. Guess probabilities.  */
  +guess_outgoing_edge_probabilities (b);
  +  }

 Hmm, I do not quite follow logic here.
 basic block B is one of many basic blocks that the original BB was split from.
 It is possible that B may have some of original edges, but there may be new 
 ones.
 How you can guess the outgoing probabilitie shere.  Do you have an example?

The code reaches here only when the BB has more than two successor. I
assume that this happens only when the BB is terminated by a switch
statement. During conversion to RTL, the outgoing edges of this BB and
their counts ar untouched. So I am just using those counts to compute
the edge probabilities.

 Also gen_probabilities_from_existing_counts could probably also work based
 on original edge frequencies.

Just to be clear I understand it right, you want me to use the
frequency instead of count since the frequency is derived from the
counts in profile use builds?


  +
  +  default_edge-count = default_count;
  +  if (count)
  +{
  +  edge e;
  +  edge_iterator ei;
  +  FOR_EACH_EDGE (e, ei, stmt_bb-succs)
  +e-probability = e-count * REG_BR_PROB_BASE / count;
  +}

 Hmm, this updates origina BB containing the switch statement?
The out_edges of the original BB.
 Of course,
 modulo roundoff errors, this should hold.  I wonder where did you got the
 diferences and why do you need this?

Originally, I obtained e-probability as e-count * REG_BR_PROB_BASE /
bb-count. During profile bootstrap, I noticed BBs whose sum of
outgoing edge counts exceeded the BB's count and hence I normalized
with the sum of edge counts.

 You are going to output new control flow
 and find_many_sub_basic_blocks will recompute all the counts/frequencies 
 inside
 anyway?
Sorry, I am lost here. I am just updating the probability of
stmt_bb-succs right?



  @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt
unsignedp),
  GT, NULL_RTX, mode, unsignedp,
  label_rtx (node-right-code_label));
  -  emit_case_nodes (index, node-left, default_label, index_type);
  +  probability = case_probability (node-right-count,
  +  subtree_count + 
  default_count);
  +  add_prob_note_to_last_insn (probability);
  +  emit_case_nodes (index, node-left, default_label, default_count,
  +   index_type);

 Hmm, here you seem to be distributing the probabilities of the counts reached.
 What happens in the case when the edge probability needs to be distributed 
 across
 nodes of the decision tree. I.e. in
 t(int a)
 {
   switch (a)
{
  case 100:
  case 200:
  case 300:
  case 400:
  case 500:
  case 600:
  case 700:
  case 800:
  case 900:
t();
  case 101:
  case 202:
  case 303:
  case 404:
  case 505:
  case 606:
  case 707:
  case 808:
  case 909:
q();
}
 }

Ok, that's a bug in my patch. In the above example, there are two
outgoing edges from the BB containing switch but many case nodes. I
would end up multiplying the counts by the number of cases that lead
to a label. If I divide the counts of each case_node by the number of
case nodes that jump to the same label, will that be reasonable? In
the above case, if t() is called 900 times, I will assume that each of
the 9 cases contribute a count of 100.

Thanks,
Easwaran

Propagate profile counts during switch expansion

2012-10-02 Thread Easwaran Raman

Hi,
 This patch propagates the profile counts during RTL expansion. In
many cases, there is no way to determine the exact count of an edge
generated during the expansion. So this patch uses some simple
heuristics to estimate the edge counts but ensures that the counts of
the basic blocks corresponding to the cases are (nearly the) same as
at the gimple level.

Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk?

- Easwaran

--
2012-10-02   Easwaran Raman  era...@google.com

* cfgbuild.c (gen_probabilities_from_existing_counts): New function.
(compute_outgoing_frequencies): If at least one successor of a
BB has non-zero profile count, use the counts to compute
probabilities.
* expr.c (do_tablejump): Add a REG_BR_PROB note on the
jump to default label.
(try_tablejump): Add a parameter to specify the probability
of jumping to the default label.
* expr.h (try_tablejump): Add a new parameter.
* stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
(do_jump_if_equal): Pass probability for REG_BR_PROB note.
(add_case_node): Pass execution count of the case node and use
it to initialize COUNT field.
(emit_case_decision_tree): Pass default_count to emit_case_nodes.
(get_outgoing_edge_counts): New function.
(add_prob_note_to_last_insn): Likewise.
(case_probability): New macro.
(emit_case_dispatch_table): Compute probability of jumping to default
label and apply note to the jump.
(expand_case): Compute and propagate default edge count to
emit_case_dispatch_table.
(expand_sjlj_dispatch_table): Update calls to add_case_node and
emit_case_dispatch_table.
(balance_case_nodes): Update subtree_counts.
(emit_case_nodes): Compute edge probabilities and add note.

gcc/testsuite/ChangeLog:
2012-10-02   Easwaran Raman  era...@google.com
* gcc.dg/tree-prof/switch-case-1.c: New test case.
* gcc.dg/tree-prof/switch-case-2.c: New test case.

Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options -O2 -fdump-rtl-expand-all } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 1:
+  g++; break;
+case 2:
+  g += 2; break;
+case 3:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i  1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count
4000 2 expand} } */
+/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count
2000 1 expand} } */
+/* { dg-final-use { cleanup-rtl-dump expand } } */
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options -O2 -fdump-rtl-expand-all } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 99:
+  g += 2; break;
+case 1:
+  g++; break;
+case 100:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i  1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count
4000 2 expand} } */
+/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count
2000 1 expand} } */
+/* { dg-final-use { cleanup-rtl-dump expand } } */
Index: gcc/expr.c
===
--- gcc/expr.c (revision 191879)
+++ gcc/expr.c (working copy)
@@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);

@@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree

 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
-  rtx default_label)
+  rtx default_label, int default_probability)
 {
   rtx temp, vector;

@@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
  the maximum value of the range

Change the ordering of cdce pass

2012-06-14 Thread Easwaran Raman

The conditional dead call elimination pass shrink wraps certain dead
calls to math functions. It doesn't handle case like this:

D.142420_139 = powD.549 (D.142421_138, D.142419_132);
 fooD.120935.barD.113815 = D.142420_139;
# foo.bar is dead here.

This code gets cleaned up by DCE and leaves only pow, which can then
be shrink-wrapped by cdce. So it seems reasonable to do this
reordering. Bootstraps on x86_64 on linux with no test regression. OK
for trunk?

- Easwaran

--

2012-06-14   Easwaran Raman  era...@google.com

* gcc/passes.c (init_optimization_passes): Remove pass_call_cdce
from its current position and insert after pass_dce.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 188535)
+++ gcc/passes.c(working copy)
@@ -1374,7 +1374,6 @@ init_optimization_passes (void)
   NEXT_PASS (pass_complete_unrolli);
   NEXT_PASS (pass_ccp);
   NEXT_PASS (pass_forwprop);
-  NEXT_PASS (pass_call_cdce);
   /* pass_build_alias is a dummy pass that ensures that we
 execute TODO_rebuild_alias at this point.  Re-building
 alias information also rewrites no longer addressed
@@ -1387,6 +1386,7 @@ init_optimization_passes (void)
   NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_vrp);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_call_cdce);
   NEXT_PASS (pass_cselim);
   NEXT_PASS (pass_tree_ifcombine);
   NEXT_PASS (pass_phiopt);

Re: Change the ordering of cdce pass

2012-06-14 Thread Easwaran Raman

ChangeLog entry has a gcc/ prefix that shouldn't be there. Here is the
revised entry:

2012-06-14   Easwaran Raman  era...@google.com

* passes.c (init_optimization_passes): Remove pass_call_cdce
from its current position and insert after pass_dce.



On Thu, Jun 14, 2012 at 6:38 PM, Easwaran Raman era...@google.com wrote:
 The conditional dead call elimination pass shrink wraps certain dead
 calls to math functions. It doesn't handle case like this:

 D.142420_139 = powD.549 (D.142421_138, D.142419_132);
  fooD.120935.barD.113815 = D.142420_139;
 # foo.bar is dead here.

 This code gets cleaned up by DCE and leaves only pow, which can then
 be shrink-wrapped by cdce. So it seems reasonable to do this
 reordering. Bootstraps on x86_64 on linux with no test regression. OK
 for trunk?

 - Easwaran

 --

 2012-06-14   Easwaran Raman  era...@google.com

        * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce
        from its current position and insert after pass_dce.

 Index: gcc/passes.c
 ===
 --- gcc/passes.c        (revision 188535)
 +++ gcc/passes.c        (working copy)
 @@ -1374,7 +1374,6 @@ init_optimization_passes (void)
       NEXT_PASS (pass_complete_unrolli);
       NEXT_PASS (pass_ccp);
       NEXT_PASS (pass_forwprop);
 -      NEXT_PASS (pass_call_cdce);
       /* pass_build_alias is a dummy pass that ensures that we
         execute TODO_rebuild_alias at this point.  Re-building
         alias information also rewrites no longer addressed
 @@ -1387,6 +1386,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_merge_phi);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_dce);
 +      NEXT_PASS (pass_call_cdce);
       NEXT_PASS (pass_cselim);
       NEXT_PASS (pass_tree_ifcombine);
       NEXT_PASS (pass_phiopt);

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-16 Thread Easwaran Raman

Ping.

On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable 
 thing
 to do.

 I am planning to explore improvements to switch expansion (peeling
 some cases and using jump tables for the rest, for example) and I
 think the reorganization you suggest is the right way to do such
 improvements. But until I can spend time on it and get it done, I
 would like this patch to get reviewed and checked in.

 Thanks,
 Easwaran

 Ping.

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-04-09 Thread Easwaran Raman

On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote:
 On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable thing
 to do.

 I am planning to explore improvements to switch expansion (peeling
 some cases and using jump tables for the rest, for example) and I
 think the reorganization you suggest is the right way to do such
 improvements. But until I can spend time on it and get it done, I
 would like this patch to get reviewed and checked in.

 Thanks,
 Easwaran

Ping.

Re: [google]Add support for sampled profile collection (issue4438083)

2012-03-30 Thread Easwaran Raman

I want to revive this patch for mainline and have some questions on
Honza's comments.

On Fri, Apr 29, 2011 at 1:48 PM, Jan Hubicka hubi...@ucw.cz wrote:

 A known limitation is that value profiling is not yet sampled, but it
 does not seem to cause problems.

 For the performance alone, we probably don't need to care that much
 given the fact that we value porfile only relatively expensive operations.
 But if we want to have the turn off/on feature, then i gueess we need to
 guard everything.  It is not much of pain to add the code generating
 conditionals everywhere, after all.

If we sample value profiling instrumentation as well, does it make
sense to use a single counter and rate for all instrumentations. If
not, does the additional complexity (and flags) justify the benefit of
uniformity?

  +/* Insert STMT_IF around given sequence of consecutive statements in the
  +   same basic block starting with STMT_START, ending with STMT_END.  */
  +
  +static void
  +insert_if_then (gimple stmt_start, gimple stmt_end, gimple stmt_if)
  +{
  +  gimple_stmt_iterator gsi;
  +  basic_block bb_original, bb_before_if, bb_after_if;
  +  edge e_if_taken, e_then_join;
  +
  +  gsi = gsi_for_stmt (stmt_start);
  +  gsi_insert_before (gsi, stmt_if, GSI_SAME_STMT);
  +  bb_original = gsi_bb (gsi);
  +  e_if_taken = split_block (bb_original, stmt_if);
  +  e_if_taken-flags = ~EDGE_FALLTHRU;
  +  e_if_taken-flags |= EDGE_TRUE_VALUE;
  +  e_then_join = split_block (e_if_taken-dest, stmt_end);
  +  bb_before_if = e_if_taken-src;
  +  bb_after_if = e_then_join-dest;

 On mainline when we do profile estimation before profiling instrumentaiton, 
 now,
 you really want to update profile for performance here.
I am not sure I understand this.

  +  make_edge (bb_before_if, bb_after_if, EDGE_FALSE_VALUE);
  +}
  +
  +/* Transform:
  +
  +   ORIGINAL CODE
  +
  +   Into:
  +
  +   __gcov_sample_counter++;
  +   if (__gcov_sample_counter = __gcov_sampling_rate)
  +     {
  +       __gcov_sample_counter = 0;
  +       ORIGINAL CODE
  +     }

 Hmm, I think the probability that internal loop of program will interfere with
 sampling rate is relatively high, but I see it is bit hard to do something 
 about
 it.  Can we think of some very basic randomization of sampling_rate?

The predominant use case we have for this technique for server
workloads is as follows: Have a high value for __gcov_sampling_rate
(that should probably be named __gcov_sampling_interval)  during
server start up so that it reduces the overhead as well as ensures
that the startup period doesn't pollute the rest of the counters.
After startup, change the sampling interval to a small value (in many
cases, to 1) using the external interface in libgcov.c. In this use
case, randomization doesn't make sense during startup (since we want
to skip profile collection) as well as the steady phase (since the
interval is 1 or a small number).  If you want to have randomization
support, do you have any suggestions for how to make it work with low
sampling intervals?


Thanks,
Easwaran

 Honza

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-25 Thread Easwaran Raman

On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 Hi,
 while this is resonable thing to do, I belive it would make most sense
 to make switch lowering at gimple level: i.e. reorganize the existing
 code to produce gimple statement and change expansion code to produce
 tablejump from every gimple switch statement that was left in the
 code.

 This would be both cleaner and would allow gimple optimizers to improve the
 generated code. Incrementally it would also allow us to improve switch 
 exansion
 that is quite baroque and not realy producing very good code in some common
 cases.

 If you would be interested working on this (i.e. reorganizing the expansion
 code to produce gimple), I would be very happy. If not, I can review the
 profile updating part for mainline, since in any case this is desriable thing
 to do.

I am planning to explore improvements to switch expansion (peeling
some cases and using jump tables for the rest, for example) and I
think the reorganization you suggest is the right way to do such
improvements. But until I can spend time on it and get it done, I
would like this patch to get reviewed and checked in.

Thanks,
Easwaran

 Honza

Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman

This patch propagates execution count of thee case labels of a
switch-case statement after its expansion. Bootstraps and all
tests pass. OK for trunk?

2012-03-23   Easwaran Raman  era...@google.com

* cfgbuild.c (non_zero_profile_counts): New function.
(compute_outgoing_frequencies): If at least one successor of a
BB has non-zero profile count, retain the counts.
* expr.c (do_tablejump): Add a REG_BR_PROB note on the
jump to default label.
(try_tablejump): Add a parameter to specify the probability
of jumping to the default label.
* expr.h (try_tablejump): Add a new parameter.
* stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
(add_case_node): Pass execution count of the case node and use
it to initialize COUNT field.
(case_probability): New macro.
(expand_case): Propagate execution counts to generated
branches using REG_BR_PROB notes.
(emit_case_nodes): Likewise.
(do_jump_if_equal): Pass probability for REG_BR_PROB note.
(compute_subtree_counts): New function to compute
SUBTREE_COUNT fields of case nodes.
(add_prob_note_to_last_insn): Add a REG_BR_PROB note with the
given probability to the last generated instruction.

gcc/testsuite/ChangeLog:
2012-03-23   Easwaran Raman  era...@google.com
* gcc.dg/tree-prof/switch-case-1.c: New test case.
* gcc.dg/tree-prof/switch-case-2.c: New test case.

diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 692fea8..d75fbda 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb)
 purge_dead_tablejump_edges (bb, table);
 }
 
+/* Check if there is at least one edge in EDGES with a non-zero count
+   field.  */
+
+static bool
+non_zero_profile_counts ( VEC(edge,gc) *edges) {
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE(e, ei, edges)
+{
+  if (e-count  0)
+return true;
+}
+  return false;
+}
+
 /*  Assume that frequency of basic block B is known.  Compute frequencies
 and probabilities of outgoing edges.  */
 
@@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b)
   e-count = b-count;
   return;
 }
+  else if (non_zero_profile_counts (b-succs)){
+/*Profile counts already set, but REG_NOTE missing. Retain the counts.  */
+return;
+  }
   guess_outgoing_edge_probabilities (b);
   if (b-count)
 FOR_EACH_EDGE (e, ei, b-succs)
diff --git a/gcc/expr.c b/gcc/expr.c
index f9de908..fb8eef9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode);
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);
 
@@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree 
minval, tree range,
TABLE_LABEL is a CODE_LABEL rtx for the table itself.
 
DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the
-   index value is out of range.  */
+   index value is out of range.
+   DEFAULT_PROBABILITY is the probability of jumping to the
+   DEFAULT_LABEL.  */
 
 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
- rtx default_label)
+ rtx default_label, int default_probability)
 {
   rtx temp, vector;
 
@@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
range, rtx table_label,
  the maximum value of the range.  */
 
   if (default_label)
-emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
-default_label);
+{
+  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
+  default_label);
+  if (default_probability != -1)
+{
+  rtx jump_insn = get_last_insn();
+  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability));
+}
+}
+
 
   /* If index is in range, it must fit in Pmode.
  Convert to Pmode so we can index with it.  */
@@ -10758,7 +10768,7 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
range, rtx table_label,
 
 int
 try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
-  rtx table_label, rtx default_label)
+  rtx table_label, rtx default_label, int default_probability)
 {
   rtx index;
 
@@ -10776,7 +10786,7 @@ try_tablejump (tree index_type, tree index_expr, tree 
minval, tree range,
   TYPE_MODE (TREE_TYPE (range)),
   expand_normal (range),
   TYPE_UNSIGNED (TREE_TYPE (range))),
-   table_label, default_label);
+   table_label, default_label

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman

Some more background on this patch: Right now, while the execution
counts of different case labels of a switch statement are obtained
during profile collection, they are not propagated to RTL. Instead,
counts are regenerated at the RTL level using static heuristics that
tend to weigh branches equally which can cause poor optimization of
hot code. This patch ensures that the counts collected during profile
collection are correctly propagated allowing hot code to be better
optimized by RTL optimizations.  Patch tested on x86_64.

- Easwaran

On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman era...@google.com wrote:
 This patch propagates execution count of thee case labels of a
 switch-case statement after its expansion. Bootstraps and all
 tests pass. OK for trunk?

 2012-03-23   Easwaran Raman  era...@google.com

        * cfgbuild.c (non_zero_profile_counts): New function.
        (compute_outgoing_frequencies): If at least one successor of a
        BB has non-zero profile count, retain the counts.
        * expr.c (do_tablejump): Add a REG_BR_PROB note on the
        jump to default label.
        (try_tablejump): Add a parameter to specify the probability
        of jumping to the default label.
        * expr.h (try_tablejump): Add a new parameter.
        * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT.
        (add_case_node): Pass execution count of the case node and use
        it to initialize COUNT field.
        (case_probability): New macro.
        (expand_case): Propagate execution counts to generated
        branches using REG_BR_PROB notes.
        (emit_case_nodes): Likewise.
        (do_jump_if_equal): Pass probability for REG_BR_PROB note.
        (compute_subtree_counts): New function to compute
        SUBTREE_COUNT fields of case nodes.
        (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the
        given probability to the last generated instruction.

 gcc/testsuite/ChangeLog:
 2012-03-23   Easwaran Raman  era...@google.com
        * gcc.dg/tree-prof/switch-case-1.c: New test case.
        * gcc.dg/tree-prof/switch-case-2.c: New test case.

 diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
 index 692fea8..d75fbda 100644
 --- a/gcc/cfgbuild.c
 +++ b/gcc/cfgbuild.c
 @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb)
     purge_dead_tablejump_edges (bb, table);
  }

 +/* Check if there is at least one edge in EDGES with a non-zero count
 +   field.  */
 +
 +static bool
 +non_zero_profile_counts ( VEC(edge,gc) *edges) {
 +  edge e;
 +  edge_iterator ei;
 +  FOR_EACH_EDGE(e, ei, edges)
 +    {
 +      if (e-count  0)
 +        return true;
 +    }
 +  return false;
 +}
 +
  /*  Assume that frequency of basic block B is known.  Compute frequencies
     and probabilities of outgoing edges.  */

 @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b)
       e-count = b-count;
       return;
     }
 +  else if (non_zero_profile_counts (b-succs)){
 +    /*Profile counts already set, but REG_NOTE missing. Retain the counts.  
 */
 +    return;
 +  }
   guess_outgoing_edge_probabilities (b);
   if (b-count)
     FOR_EACH_EDGE (e, ei, b-succs)
 diff --git a/gcc/expr.c b/gcc/expr.c
 index f9de908..fb8eef9 100644
 --- a/gcc/expr.c
 +++ b/gcc/expr.c
 @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode);
  #ifdef PUSH_ROUNDING
  static void emit_single_push_insn (enum machine_mode, rtx, tree);
  #endif
 -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
 +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
  static rtx const_vector_from_tree (tree);
  static void write_complex_part (rtx, rtx, bool);

 @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree 
 minval, tree range,
    TABLE_LABEL is a CODE_LABEL rtx for the table itself.

    DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the
 -   index value is out of range.  */
 +   index value is out of range.
 +   DEFAULT_PROBABILITY is the probability of jumping to the
 +   DEFAULT_LABEL.  */

  static void
  do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
 -             rtx default_label)
 +             rtx default_label, int default_probability)
  {
   rtx temp, vector;

 @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx 
 range, rtx table_label,
      the maximum value of the range.  */

   if (default_label)
 -    emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 -                            default_label);
 +    {
 +      emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 +                              default_label);
 +      if (default_probability != -1)
 +        {
 +          rtx jump_insn = get_last_insn();
 +          add_reg_note (jump_insn, REG_BR_PROB, GEN_INT 
 (default_probability));
 +        }
 +    }
 +

   /* If index is in range, it must fit in Pmode.
      Convert to Pmode so we can index with it.  */
 @@ -10758,7 +10768,7

Re: Propagate profile counts after switch case expansion (issue5896043)

2012-03-23 Thread Easwaran Raman

On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen a...@firstfloor.org wrote:
 Easwaran Raman era...@google.com writes:

 Some more background on this patch: Right now, while the execution
 counts of different case labels of a switch statement are obtained
 during profile collection, they are not propagated to RTL. Instead,
 counts are regenerated at the RTL level using static heuristics that
 tend to weigh branches equally which can cause poor optimization of
 hot code. This patch ensures that the counts collected during profile
 collection are correctly propagated allowing hot code to be better
 optimized by RTL optimizations.  Patch tested on x86_64.

 I think your patch doesn't use the probably to weight the decision
 tree for non tablejump, right? I looked at this some time ago,
 but the patch always had problems.

Do you mean use the weights to decide the shape of the binary tree
(similar to COST_TABLE heuristic)? I am planning to send a separate
patch for that. This one just makes sure that the profile counts are
propagated correctly. So you will still have a situation where a
branch corresponding to an infrequently executed case dominates a
frequently executed case, but the BB of the cases gets the right
profile weight.

- Easwaran

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only

[google] Use delete with size parameter in STL deallocate (issue5794070)

2012-03-12 Thread Easwaran Raman

This patch makes -fsized-delete define a macro __GXX_DELETE_WITH_SIZE__. If
this macro is defined, the deallocate function in new_allocator.h invokes
the two parameter delete operator with the size of the object. Tested by
bootstrapping the compiler and ensuring no tests are broken. Also verified
that compiling a C++ program that uses std::vector with -fsized-delete invokes
the two parameter version of operator delete.

OK for google/main and google/4_6 branches?

c-family/ChangeLog.google-main:
2012-03-12   Easwaran Raman  era...@google.com

* c-cppbuiltin.c (c_cpp_builtins): Define __GXX_DELETE_WITH_SIZE__ for
C++ files compiled with -fsized-delete.

libstdc++-v3/ChangeLog.google-main:
2012-03-12   Easwaran Raman  era...@google.com
* include/ext/new_allocator.h (deallocate): Call delete with size if
__GXX_DELETE_WITH_SIZE__ is defined.


diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index ccf57fd..3c9a96b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -970,6 +970,8 @@ c_cpp_builtins (cpp_reader *pfile)
  format.  */
   if (ENABLE_DECIMAL_FLOAT  ENABLE_DECIMAL_BID_FORMAT)
 cpp_define (pfile, __DECIMAL_BID_FORMAT__);
+  if (c_dialect_cxx ()  flag_sized_delete)
+cpp_define (pfile, __GXX_DELETE_WITH_SIZE__);
 }
 
 /* Pass an object-like macro.  If it doesn't lie in the user's
diff --git a/libstdc++-v3/include/ext/new_allocator.h 
b/libstdc++-v3/include/ext/new_allocator.h
index 0c82bd0..c972c1e 100644
--- a/libstdc++-v3/include/ext/new_allocator.h
+++ b/libstdc++-v3/include/ext/new_allocator.h
@@ -94,10 +94,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return static_cast_Tp*(::operator new(__n * sizeof(_Tp)));
   }
 
+#ifdef __GXX_DELETE_WITH_SIZE__
+  // __p is not permitted to be a null pointer.
+  void
+  deallocate(pointer __p, size_type __t)
+  { ::operator delete(__p, __t * sizeof(_Tp)); }
+#else
   // __p is not permitted to be a null pointer.
   void
   deallocate(pointer __p, size_type)
   { ::operator delete(__p); }
+#endif
 
   size_type
   max_size() const _GLIBCXX_USE_NOEXCEPT

--
This patch is available for review at http://codereview.appspot.com/5794070

Re: [google] Propagate profile information to RTL level during switch expansion

2012-02-01 Thread Easwaran Raman

On Tue, Jan 31, 2012 at 10:16 PM, Xinliang David Li davi...@google.com wrote:
 Ok for google branch with minor changes below.

 thanks,

 David

 +#define case_probability(x, y) ((y) ? ((x) * REG_BR_PROB_BASE  / (y))  : 
 -1)
 +

 Using upper case for macro?
From http://gcc.gnu.org/codingconventions.html,

Macros names should be in ALL_CAPS when it's important to be aware
that it's a macro (e.g. accessors and simple predicates), but in
lowercase (e.g., size_int) where the macro is a wrapper for efficiency
that should be considered as a function

In this case case_probability is a wrapper for efficiency and hence I
decided to go with the lower case name.


 +                  count -= default_count;
 +                  default_count = 0;

 Why resetting it to 0?

If there are no gaps, then the indirect jump can't jump to the default
label. Later I set
  default_edge-count = default_count;
and this value should be 0 if there are no gaps.

I have addressed the rest of your comments.

-Easwaran




 +
 +static void
 +add_prob_note_to_last_insn(int probability)
 +{

 Missing document.

[google] Propagate profile information to RTL level during switch expansion

2012-01-31 Thread Easwaran Raman

This patch propagates profile information to RTL level when expanding
switch statements using jump table or a binary tree of branches.

Ok for google/gcc-4_6 branch? I  would like the patch to be considered
for trunk when stage 1 opens again.

-Easwaran

2012-01-31  Easwaran Raman  era...@google.com

* expr.c (do_tablejump): Add default_probability as the
sixth parameter and use it to generate REG_BR_PROB note.
(try_tablejump): Add default_probability as a parameter.
* cfgbuild.c (non_zero_profile_counts): New function.
(compute_outgoing_frequencies): If edges have profile counts
set, don't replace them with guessed values.
* expr.h (try_tablejump): Add additional parameter to the
declaration.
* stmt.c (tree-flow.h): Include.
(case_node): Add new fields count and subtree_count.
(add_case_node): Pass count as a paramater and initialize
count field of case_node.
(case_probability): New macro.
(expand_case): Propagate profile information from the tree
level to RTL during switch case expansion.
(balance_case_nodes): Compute subtree_count for case nodes.
(emit_case_nodes): Set branch probabilities for generated
branches.

testsuite/ChengeLog.google-4_6:

2012-01-31  Easwaran Raman  era...@google.com

* tree-prof/switch-case-1.c: New test.
* tree-prof/switch-case-2.c: New test.
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options -O2 -fdump-rtl-expand-blocks -fdump-tree-optimized-blocks } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 1:
+  g++; break;
+case 2:
+  g += 2; break;
+case 3:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i  1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:4000 4 expand} } */
+/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:2000 1 expand} } */
+/* { dg-final-use { cleanup-rtl-dump expand } } */
Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c
===
--- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c	(revision 0)
@@ -0,0 +1,40 @@
+/* { dg-options -O2 -fdump-rtl-expand-blocks } */
+int g;
+
+__attribute__((noinline)) void foo (int  n)
+{
+  switch (n)
+{
+case 99:
+  g += 2; break;
+case 1:
+  g++; break;
+case 100:
+  g += 1; break;
+case 4:
+  g += 3; break;
+case 5:
+  g += 4; break;
+case 6:
+  g += 5; break;
+case 7:
+  g += 6; break;
+case 8:
+  g += 7; break;
+case 9:
+  g += 8; break;
+default:
+  g += 8; break;
+   }
+}
+
+int main ()
+{
+ int i;
+ for (i = 0; i  1; i++)
+   foo ((i * i) % 5);
+ return 0;
+}
+/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:4000 4 expand} } */
+/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:2000 1 expand} } */
+/* { dg-final-use { cleanup-rtl-dump expand } } */
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 183262)
+++ gcc/expr.c	(working copy)
@@ -155,7 +155,7 @@ static rtx do_store_flag (sepops, rtx, enum machin
 #ifdef PUSH_ROUNDING
 static void emit_single_push_insn (enum machine_mode, rtx, tree);
 #endif
-static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx);
+static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int);
 static rtx const_vector_from_tree (tree);
 static void write_complex_part (rtx, rtx, bool);
 
@@ -10245,7 +10245,7 @@ try_casesi (tree index_type, tree index_expr, tree
 
 static void
 do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label,
-	  rtx default_label)
+	  rtx default_label, int default_probability)
 {
   rtx temp, vector;
 
@@ -10261,9 +10261,17 @@ do_tablejump (rtx index, enum machine_mode mode, r
  the maximum value of the range.  */
 
   if (default_label)
-emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
-			 default_label);
+{
+  emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
+			   default_label);
+  if (default_probability != -1)
+{
+  rtx jump_insn = get_last_insn();
+  add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability));
+}
+}
 
+
   /* If index

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-17 Thread Easwaran Raman

Based on Paolo's comments I am dropping the changes to
baseline_symbols.txt. As far as minor version, I am bumping it up to
18. Assuming that this patch will be able to go in gcc 4.8 (with minor
version 18), I want to keep it at the same version in google/main and
google/gcc-4_6.

Bootstraps and no test regression. OK for google/main and
google/gcc-4_6 branches?

-
011-12-17  Easwaran Raman  era...@google.com

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-main:

2011-12-17  Easwaran Raman  era...@google.com

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-main:

2011-12-17  Easwaran Raman  era...@google.com

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-main:

2011-12-17  Easwaran Raman  era...@google.com

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.18.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version
  GLIBCXX_3.4.18.

On Wed, Dec 14, 2011 at 3:31 AM, Paolo Carlini paolo.carl...@oracle.com wrote:
 Hi,

 On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
 paolo.carl...@oracle.com  wrote:

 On 12/12/2011 09:37 PM, Easwaran Raman wrote:

 Thanks for the comments Paolo. I have attached the new patch. I have
 bumped the version to 3.4.18

 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
 version beyond the current 17.

 Ok, I then don't understand your comment
  Note that backporting the patch as-is to 4_6-branch would be very
 wrong in terms of ABI (in mainline we already have a 3.4.17).
 My original patch added the new symbol in version 3.4.17. Since we
 don't want to add the symbol to 3.4.16 (if we have a situation where
 the new runtime is not available when running a program compiled with
 -fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
 I had to bump up the version.

 Note this is going to be an academic discussion, because it's too late for
 4.7 anyway, and I'm not even sure it's conforming to add overloads like this
 for operators.

 Anyway.

 For 4.6 branch at this point the situation would be very difficult. We could
 have a small time window before 4.7 is out to move **all** its new symbols
 vs 4.6, currently in 3.4.17,  to a new 3.4.18 and then we could bump 4.6 to
 3.4.17. You see, after a minor is delivered you cannot add anything to it,
 and also you have the general requirement that the minor version is the
 minor ersion, thus irrespective whether we are talking about gcc4.6 or
 gcc4.7, a specific minor version number defines which symbols are exported.
 I hope now the issues are more clear.You must be always very careful.

  and used _ZdlPv[jmy] in gnu.ver.  I have
 also added the symbol to baseline_symbols.txt of other targets.

 You should not, just read again what I wrote. And you don't have to
 believe
 me: just browse the libstdc++ ChangeLogs and see if somebody ever does
 that
 when the linker map is touched.

 Sorry, I again misunderstood this as well (and still don't have a good
 picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok?

 Looks fine yes. But I didn't really check the letters. General rule of thumb
 when fiddling with linker scripts: double check what you are doing with a
 multilib system, like x86_64, at least you can catch errors when you are
 mistakenly not accounting for the 32-bit version of the symbols. In the case
 at issue, size_t can boil down be unsigned int, unsigned long, unsigned long
 long, make sure the pattern includes all three.

 I added
 that by mimicking the symbol  _Znw[jmy] found in the same file. From
 the log, it looks like the baseline_symbols.txt seems to be generated,
 but I am not sure how that is to be done. For example, r145437 says a
 bunch of these baseline_symbols.txt are regenerated, but I don't see
 any other change from which this might be generated.

 General rule of thumb: normally, if you aren't a release manager, or a
 maintainer, **never** regenerate the baseline_symbols files. Just add
 symbols to the linker script, that is normally fine, doesn't break anything
 per se, adding is fine (subtracting/changing is not, of course).

 Paolo.
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc	(revision 182251)
+++ libstdc++-v3/libsupc++/del_op.cc	(working copy)
@@ -46,3 +46,11 @@ operator delete(void* ptr) throw ()
   if (ptr)
 std::free

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-13 Thread Easwaran Raman

On Mon, Dec 12, 2011 at 1:52 PM, Easwaran Raman era...@google.com wrote:
 On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
 paolo.carl...@oracle.com wrote:
 On 12/12/2011 09:37 PM, Easwaran Raman wrote:

 Thanks for the comments Paolo. I have attached the new patch. I have
 bumped the version to 3.4.18

 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
 version beyond the current 17.
 Ok, I then don't understand your comment
  Note that backporting the patch as-is to 4_6-branch would be very
 wrong in terms of ABI (in mainline we already have a 3.4.17).
 My original patch added the new symbol in version 3.4.17. Since we
 don't want to add the symbol to 3.4.16 (if we have a situation where
 the new runtime is not available when running a program compiled with
 -fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
 I had to bump up the version.


  and used _ZdlPv[jmy] in gnu.ver.  I have
 also added the symbol to baseline_symbols.txt of other targets.

 You should not, just read again what I wrote. And you don't have to believe
 me: just browse the libstdc++ ChangeLogs and see if somebody ever does that
 when the linker map is touched.

 Sorry, I again misunderstood this as well (and still don't have a good
 picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added
 that by mimicking the symbol  _Znw[jmy] found in the same file. From
 the log, it looks like the baseline_symbols.txt seems to be generated,
 but I am not sure how that is to be done. For example, r145437 says a
 bunch of these baseline_symbols.txt are regenerated, but I don't see
 any other change from which this might be generated.

 Thanks,
 Easwaran

It looks like running 'make new-abi-baseline' under
TARGET/libstdc++-v3/testsuite generates the baseline file. Should I
check in that? What about other targets?

Thanks,
Easwaran


 Paolo.

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-12 Thread Easwaran Raman

Thanks for the comments Paolo. I have attached the new patch. I have
bumped the version to 3.4.18 and used _ZdlPv[jmy] in gnu.ver.  I have
also added the symbol to baseline_symbols.txt of other targets.

-Easwaran


2011-12-11  Easwaran Raman  era...@google.com

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.18.
* config/abi/post/i386-linux-gnu/baseline_symbols.txt: Add
  new symbol _ZdlPvm@@GLIBCXX_3.4.18.
* config/abi/post/sparc-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/mips64-linux-gnu/64/baseline_symbols.txt: Likewise.
* config/abi/post/mips64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/s390-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/mips-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt: Likewise.
* config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/amd64/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/sparcv9/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.10/baseline_symbols.txt: Likewise.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.8/sparcv9/baseline_symbols.txt: Likewise.
* config/abi/post/solaris2.8/baseline_symbols.txt: Likewise.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version
  GLIBCXX_3.4.18.

On Mon, Dec 12, 2011 at 5:25 AM, Paolo Carlini paolo.carl...@oracle.com wrote:
 On 12/12/2011 02:21 PM, Diego Novillo wrote:

 Ah, right. I missed the ABI implications.

 For possible inclusion in mainline too, things don't seem completely ok:
 nothing should be added to the baseline and very likely the export should be
 adjusted to accommodate for different size_t on the various targets, by
 using [] in the pattern. See, eg, the existing operator new[](size_t).

 Paolo.
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc	(revision 182251)
+++ libstdc++-v3/libsupc++/del_op.cc	(working copy)
@@ -46,3 +46,11 @@ operator delete(void* ptr) throw ()
   if (ptr)
 std::free(ptr);
 }
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/new
===
--- libstdc++-v3/libsupc++/new	(revision 182251)
+++ libstdc++-v3/libsupc++/new	(working copy)
@@ -93,6 +93,7 @@ namespace std
 void* operator new(std::size_t) throw (std::bad_alloc);
 void* operator new[](std::size_t) throw (std::bad_alloc);
 void operator delete(void*) throw();
+void operator delete(void*, std::size_t) throw();
 void operator delete[](void*) throw();
 void* operator new(std::size_t, const std::nothrow_t) throw();
 void* operator new[](std::size_t, const std::nothrow_t) throw();
Index: libstdc++-v3/testsuite/util/testsuite_abi.cc
===
--- libstdc++-v3/testsuite/util/testsuite_abi.cc	(revision 182251)
+++ libstdc++-v3/testsuite/util/testsuite_abi.cc	(working copy)
@@ -195,6 +195,7 @@ check_version(symbol test, bool added)
   known_versions.push_back(GLIBCXX_3.4.15);
   known_versions.push_back

Re: [google] Add support for delete operator that takes the size of the object as a parameter

2011-12-12 Thread Easwaran Raman

On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini
paolo.carl...@oracle.com wrote:
 On 12/12/2011 09:37 PM, Easwaran Raman wrote:

 Thanks for the comments Paolo. I have attached the new patch. I have
 bumped the version to 3.4.18

 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor
 version beyond the current 17.
Ok, I then don't understand your comment
 Note that backporting the patch as-is to 4_6-branch would be very
wrong in terms of ABI (in mainline we already have a 3.4.17).
My original patch added the new symbol in version 3.4.17. Since we
don't want to add the symbol to 3.4.16 (if we have a situation where
the new runtime is not available when running a program compiled with
-fsized-delete) and you   said I shouldn't be using 3.4.17, I assumed
I had to bump up the version.


  and used _ZdlPv[jmy] in gnu.ver.  I have
 also added the symbol to baseline_symbols.txt of other targets.

 You should not, just read again what I wrote. And you don't have to believe
 me: just browse the libstdc++ ChangeLogs and see if somebody ever does that
 when the linker map is touched.

Sorry, I again misunderstood this as well (and still don't have a good
picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added
that by mimicking the symbol  _Znw[jmy] found in the same file. From
the log, it looks like the baseline_symbols.txt seems to be generated,
but I am not sure how that is to be done. For example, r145437 says a
bunch of these baseline_symbols.txt are regenerated, but I don't see
any other change from which this might be generated.

Thanks,
Easwaran


 Paolo.

[google] Add support for delete operator that takes the size of the object as a parameter

2011-12-11 Thread Easwaran Raman

In the tcmalloc memory
allocator(http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html),
deallocation involves a costly lookup to get the size of the deleted
object. The size is required to find the right free list to release
back the object. By passing the  size of the object to the delete
call, this lookup can be avoided. This patch adds support for operator
delete(void*, size_t) guarded by a -fsized-delete flag. It also adds a
default implementation of delete(void *, size_t) that ignores the size
parameter.

Bootstraps and no test regressions. OK for google/gcc-4_6 branch?
---

2011-12-11  Easwaran Raman  era...@google.com

* common.opt (fsized-delete): New option.

cp/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* decl.c (cxx_init_decl_processing): Specify a function that
  takes a void* and size_t for DELETE_EXPR.
* call.c (build_op_delete_call): If fsized-delete is used, use
  the variant that takes size_t as the second parameter except
  when deleteting a pointer of type void *.

testsuite/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* g++.dg/other/sized-delete-1.C: New test.

libstdc++-v3/ChangeLog.google-4_6:

2011-12-11  Easwaran Raman  era...@google.com

* libsupc++/del_op.cc (delete): Define a new variant
  void operator delete(void*, std::size_t).
* libsupc++/new (delete): Declare
  void operator delete(void*, std::size_t) throw();
* testsuite/util/testsuite_abi.cc (check_version): Add new
  known version GLIBCXX_3.4.17.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Add
  new symbol _ZdlPvm@@GLIBCXX_3.4.17.
* config/abi/pre/gnu.ver: Add new symbol _ZdlPvm at version
  GLIBCXX_3.4.17.
Index: libstdc++-v3/libsupc++/del_op.cc
===
--- libstdc++-v3/libsupc++/del_op.cc	(revision 182213)
+++ libstdc++-v3/libsupc++/del_op.cc	(working copy)
@@ -46,3 +46,11 @@ operator delete(void* ptr) throw ()
   if (ptr)
 std::free(ptr);
 }
+
+_GLIBCXX_WEAK_DEFINITION void
+operator delete(void* ptr,
+std::size_t bytes __attribute__((__unused__))) throw ()
+{
+  if (ptr)
+std::free(ptr);
+}
Index: libstdc++-v3/libsupc++/new
===
--- libstdc++-v3/libsupc++/new	(revision 182213)
+++ libstdc++-v3/libsupc++/new	(working copy)
@@ -93,6 +93,7 @@ namespace std
 void* operator new(std::size_t) throw (std::bad_alloc);
 void* operator new[](std::size_t) throw (std::bad_alloc);
 void operator delete(void*) throw();
+void operator delete(void*, std::size_t) throw();
 void operator delete[](void*) throw();
 void* operator new(std::size_t, const std::nothrow_t) throw();
 void* operator new[](std::size_t, const std::nothrow_t) throw();
Index: libstdc++-v3/testsuite/util/testsuite_abi.cc
===
--- libstdc++-v3/testsuite/util/testsuite_abi.cc	(revision 182213)
+++ libstdc++-v3/testsuite/util/testsuite_abi.cc	(working copy)
@@ -194,6 +194,7 @@ check_version(symbol test, bool added)
   known_versions.push_back(GLIBCXX_3.4.14);
   known_versions.push_back(GLIBCXX_3.4.15);
   known_versions.push_back(GLIBCXX_3.4.16);
+  known_versions.push_back(GLIBCXX_3.4.17);
   known_versions.push_back(GLIBCXX_LDBL_3.4);
   known_versions.push_back(GLIBCXX_LDBL_3.4.7);
   known_versions.push_back(GLIBCXX_LDBL_3.4.10);
@@ -560,4 +561,3 @@ demangle(const std::string mangled)
 }
   return name;
 }
-
Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt
===
--- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt	(revision 182213)
+++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt	(working copy)
@@ -2422,6 +2422,7 @@ FUNC:_ZTv0_n24_NSt9strstreamD1Ev@@GLIBCXX_3.4
 FUNC:_ZdaPv@@GLIBCXX_3.4
 FUNC:_ZdaPvRKSt9nothrow_t@@GLIBCXX_3.4
 FUNC:_ZdlPv@@GLIBCXX_3.4
+FUNC:_ZdlPvm@@GLIBCXX_3.4.17
 FUNC:_ZdlPvRKSt9nothrow_t@@GLIBCXX_3.4
 FUNC:_Znam@@GLIBCXX_3.4
 FUNC:_ZnamRKSt9nothrow_t@@GLIBCXX_3.4
Index: libstdc++-v3/config/abi/pre/gnu.ver
===
--- libstdc++-v3/config/abi/pre/gnu.ver	(revision 182213)
+++ libstdc++-v3/config/abi/pre/gnu.ver	(working copy)
@@ -1279,6 +1279,13 @@ GLIBCXX_3.4.16 {
 
 } GLIBCXX_3.4.15;
 
+GLIBCXX_3.4.17 {
+
+# operator delete(void*, , unsigned long)
+_ZdlPvm;
+
+} GLIBCXX_3.4.16;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
Index: gcc/testsuite/g++.dg/other/sized-delete-1.C
===
--- gcc/testsuite/g++.dg/other/sized-delete-1.C	(revision 0)
+++ gcc/testsuite/g++.dg/other/sized-delete-1.C	(revision

[google] Fix bugs in sampled profile collection

2011-09-30 Thread Easwaran Raman

This fixes two issues with sampled profile collection. It delays
cleanup of instrumentation_to_be_sampled after all callgraph nodes
have been instrumented and prevents  gcov_sample_counter_decl and
gcov_sampling_rate_decl from being garbage collected.

 Ok for google/gcc-4_6 and google/main branches?

-Easwaran

2011-09-30  Easwaran Raman  era...@google.com

* tree-profile.c (gcov_sample_counter_decl): Add GTY marker.
(gcov_sampling_rate_decl): Likewise.
(add_sampling_to_edge_counters): Do not free
instrumentation_to_be_sampled.
(cleanup_instrumentation_sampling): New function.
(tree_profiling): Call cleanup_instrumentation_sampling at the end.

testsuite/ChangeLog.google-4_6:

2011-09-30  Easwaran Raman  era...@google.com

* gcc.dg/sample-profile-generate-1.c: New test.

Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c
===
--- gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile} */
+/* { dg-options -O2 -fprofile-generate -fprofile-generate-sampling } */
+
+void foobar(int);
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i  100; i++)
+{
+  foobar(i);
+}
+}
+
+void
+bar (void)
+{
+  int i;
+  for (i = 0; i  100; i++)
+{
+  foobar(i);
+}
+}
+
+/* { dg-final { cleanup-coverage-files } } */

Index: tree-profile.c
===
--- tree-profile.c  (revision 178897)
+++ tree-profile.c  (working copy)
@@ -163,10 +163,10 @@ init_ic_make_global_vars (void)
 static struct pointer_set_t *instrumentation_to_be_sampled = NULL;

 /* extern __thread gcov_unsigned_t __gcov_sample_counter  */
-static tree gcov_sample_counter_decl = NULL_TREE;
+static GTY(()) tree gcov_sample_counter_decl = NULL_TREE;

 /* extern gcov_unsigned_t __gcov_sampling_rate  */
-static tree gcov_sampling_rate_decl = NULL_TREE;
+static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE;

 /* forward declaration.  */
 void gimple_init_instrumentation_sampling (void);
@@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void)
 break;
   }
   }
+}

+static void
+cleanup_instrumentation_sampling (void)
+{
   /* Free the bitmap.  */
-  if (instrumentation_to_be_sampled)
+  if (flag_profile_generate_sampling  instrumentation_to_be_sampled)
 {
   pointer_set_destroy (instrumentation_to_be_sampled);
   instrumentation_to_be_sampled = NULL;
@@ -1452,6 +1456,7 @@ tree_profiling (void)
 }

   del_node_map();
+  cleanup_instrumentation_sampling();
   return 0;
 }

Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-27 Thread Easwaran Raman

+static inline hashval_t
+edge_hash_function (unsigned int id1, unsigned int id2)
+{
+  /* If the number of functions is less than 1000, this gives a unique value
+     for every function id combination.  */
+  const int MULTIPLIER = 1000;
+  return id1* MULTIPLIER + id2;

Change to id1  16 | id2

+  if (edge_map == NULL)
+    {
+      edge_map = htab_create (25, edge_map_htab_hash_descriptor,
+  edge_map_htab_eq_descriptor , NULL);

 Use a  larger size and don't hardcode the value.

+  if (function_map == NULL)
+    {
+      function_map = htab_create (25, function_map_htab_hash_descriptor,
+  function_map_htab_eq_descriptor , NULL);

 Use a  larger size and don't hardcode the value.

+  caller_node = get_function_node (caller);
+  /* Real nodes are nodes that correspond to .text sections found.  These will
+     definitely be sorted.  */
+  set_as_real_node (caller_node);

This is redundant as set_node_type sets the type correctly.

+  if (*slot == NULL)
+    {
+      reset_functions (edge, new_node, kept_node);
+      *slot = edge;
+      add_edge_to_node (new_node, edge);

edge will still be in old_node's edge_list

+static void set_node_type (Node *n)
+{
+  void **slot;
+  char *name = n-name;
+  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name),
+   INSERT);

Use htab_find_with_hash or pass NOINSERT.

+  /* Allocate section_map.  */
+  if (section_map == NULL)
+    {
+      section_map = htab_create (10, section_map_htab_hash_descriptor,
+ section_map_htab_eq_descriptor , NULL);

With -ffunction-sections, this should be roughly equal to size of function_map.

+static void
+write_out_node (FILE *fp, char *name,
+ void **handles, unsigned int *shndx, int position)
+{
+  void **slot;
+  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name),
+   INSERT);

Use htab_find_with_hash or pass NOINSERT.

Index: function_reordering_plugin/function_reordering_plugin.c
===
--- function_reordering_plugin/function_reordering_plugin.c (revision 0)
+++ function_reordering_plugin/function_reordering_plugin.c (revision 0)

+  parse_callgraph_section_contents (section_contents,
(unsigned int)length);
+}
+  else if (name != NULL)
+free (name);

name should be freed in the body of then and else-if  as well.

-Easwaran
On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam tmsri...@google.com wrote:

 *Resending as plain text*

 I am attaching a patch of the updated files. This patch was meant for
 the google gcc-4_6 branch. Sorry for not mentioning this upfront in my
 original mail.
 Thanks,
  -Sri.



  On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam tmsri...@google.com 
  wrote:
 
  On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd nfr...@mozilla.com wrote:
   On 9/23/2011 6:03 PM, Sriraman Tallam wrote:
  
   This patch adds a new linker plugin to re-order functions.
  
   This is great stuff.  We were experimenting with using the coverage 
   files to
   generate an ordering for --section-ordering-file, but this might be even
   better, will have to experiment with it.
  
   A couple of comments on the code itself:
  
   Index: function_reordering_plugin/callgraph.h
   +inline static Edge_list *
   +make_edge_list (Edge *e)
   +{
   +  Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list));
  
   If you are going to use libiberty via hashtab.h, you might as well make 
   use
   of the *ALLOC family of macros to make this and other allocations a 
   little
   neater.
  
   +/*Represents an edge in the call graph.  */
   +struct __edge__
  
   I think the usual convention is to call this edge_d or something similar,
   avoiding the profusion of underscores.
  
   +void
   +map_section_name_to_index (char *section_name, void *handle, int shndx)
   +{
   +  void **slot;
   +  char *function_name;
   +
   +  if (is_prefix_of (.text.hot., section_name))
   +    function_name = section_name + 10 /* strlen (.text.hot.) */;
   +  else if (is_prefix_of (.text.unlikely., section_name))
   +    function_name = section_name + 15 /* strlen (.text.unlikely.) */;
   +  else if (is_prefix_of (.text.cold., section_name))
   +    function_name = section_name + 11 /* strlen (.text.cold.) */;
   +  else if (is_prefix_of (.text.startup., section_name))
   +    function_name = section_name + 14 /* strlen (.text.startup.) */;
   +  else
   +    function_name = section_name + 6 /*strlen (.text.) */;
  
   You don't handle plain .text; this is assuming -ffunction-sections, 
   right?
    Can this limitation be easily removed?
 
  You need to compile with -ffunction-sections or the individual
  sections cannot be re-ordered. That is why I assumed a .text.* prefix
  before every function section. Did you have something else in mind?
 
  Thanks for your other comments. I will make those changes.
 
  -Sri.
 
  
   I think it is customary to put the comment after the semicolon.
  
   Might just be easier to

Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)

2011-09-27 Thread Easwaran Raman

OK for google/gcc-4_6 and google/main branches.
-Easwaran

 On Tue, Sep 27, 2011 at 4:07 PM, Sriraman Tallam tmsri...@google.com wrote:

 Made all the changes. Attaching new patch of updated files.

 On Tue, Sep 27, 2011 at 11:26 AM, Easwaran Raman era...@google.com wrote:
 
  +static inline hashval_t
  +edge_hash_function (unsigned int id1, unsigned int id2)
  +{
  +  /* If the number of functions is less than 1000, this gives a unique 
  value
  +     for every function id combination.  */
  +  const int MULTIPLIER = 1000;
  +  return id1* MULTIPLIER + id2;
 
  Change to id1  16 | id2
 
  +  if (edge_map == NULL)
  +    {
  +      edge_map = htab_create (25, edge_map_htab_hash_descriptor,
  +  edge_map_htab_eq_descriptor , NULL);
 
   Use a  larger size and don't hardcode the value.
 
  +  if (function_map == NULL)
  +    {
  +      function_map = htab_create (25, function_map_htab_hash_descriptor,
  +  function_map_htab_eq_descriptor , NULL);
 
   Use a  larger size and don't hardcode the value.
 
  +  caller_node = get_function_node (caller);
  +  /* Real nodes are nodes that correspond to .text sections found.  These 
  will
  +     definitely be sorted.  */
  +  set_as_real_node (caller_node);
 
  This is redundant as set_node_type sets the type correctly.
 
  +  if (*slot == NULL)
  +    {
  +      reset_functions (edge, new_node, kept_node);
  +      *slot = edge;
  +      add_edge_to_node (new_node, edge);
 
  edge will still be in old_node's edge_list

 I do not have to explicitly delete this as the merged node's edge_list
 will never be traversed except when cleaning up.

 
  +static void set_node_type (Node *n)
  +{
  +  void **slot;
  +  char *name = n-name;
  +  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string 
  (name),
  +   INSERT);
 
  Use htab_find_with_hash or pass NOINSERT.
 
  +  /* Allocate section_map.  */
  +  if (section_map == NULL)
  +    {
  +      section_map = htab_create (10, section_map_htab_hash_descriptor,
  + section_map_htab_eq_descriptor , NULL);
 
  With -ffunction-sections, this should be roughly equal to size of 
  function_map.
 
  +static void
  +write_out_node (FILE *fp, char *name,
  + void **handles, unsigned int *shndx, int position)
  +{
  +  void **slot;
  +  slot = htab_find_slot_with_hash (section_map, name, htab_hash_string 
  (name),
  +   INSERT);
 
  Use htab_find_with_hash or pass NOINSERT.
 
  Index: function_reordering_plugin/function_reordering_plugin.c
  ===
  --- function_reordering_plugin/function_reordering_plugin.c     (revision 
  0)
  +++ function_reordering_plugin/function_reordering_plugin.c     (revision 
  0)
 
  +          parse_callgraph_section_contents (section_contents,
  (unsigned int)length);
  +        }
  +      else if (name != NULL)
  +        free (name);
 
  name should be freed in the body of then and else-if  as well.
 
  -Easwaran
  On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam tmsri...@google.com 
  wrote:
  
   *Resending as plain text*
  
   I am attaching a patch of the updated files. This patch was meant for
   the google gcc-4_6 branch. Sorry for not mentioning this upfront in my
   original mail.
   Thanks,
    -Sri.
  
  
  
On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam tmsri...@google.com 
wrote:
   
On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd nfr...@mozilla.com 
wrote:
 On 9/23/2011 6:03 PM, Sriraman Tallam wrote:

 This patch adds a new linker plugin to re-order functions.

 This is great stuff.  We were experimenting with using the coverage 
 files to
 generate an ordering for --section-ordering-file, but this might be 
 even
 better, will have to experiment with it.

 A couple of comments on the code itself:

 Index: function_reordering_plugin/callgraph.h
 +inline static Edge_list *
 +make_edge_list (Edge *e)
 +{
 +  Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list));

 If you are going to use libiberty via hashtab.h, you might as well 
 make use
 of the *ALLOC family of macros to make this and other allocations a 
 little
 neater.

 +/*Represents an edge in the call graph.  */
 +struct __edge__

 I think the usual convention is to call this edge_d or something 
 similar,
 avoiding the profusion of underscores.

 +void
 +map_section_name_to_index (char *section_name, void *handle, int 
 shndx)
 +{
 +  void **slot;
 +  char *function_name;
 +
 +  if (is_prefix_of (.text.hot., section_name))
 +    function_name = section_name + 10 /* strlen (.text.hot.) */;
 +  else if (is_prefix_of (.text.unlikely., section_name))
 +    function_name = section_name + 15 /* strlen 
 (.text.unlikely.) */;
 +  else if (is_prefix_of (.text.cold., section_name))
 +    function_name = section_name + 11 /* strlen (.text.cold

[google] Backport r175063, r175082 and r175384 from trunk to google/gcc-4_6 branch.

2011-07-24 Thread Easwaran Raman

Commited.

Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-23 Thread Easwaran Raman

On Thu, Jun 23, 2011 at 3:22 AM, Eric Botcazou ebotca...@adacore.com wrote:
 So, what's the patch(es) that need approval now?

 Original expr.c patch for PR rtl-optimization/49429 + adjusted and augmented
 calls.c patch for PR target/49454.  Everything is in this thread.

 Easwaran, would you mind posting a consolidated patch?

 --
 Eric Botcazou

Here is the revised patch. Bootstraps and all tests pass on
x86_64-unknown-linux. OK for trunk?


2011-06-23  Easwaran Raman  era...@google.com

   PR rtl-optimization/49429
   PR target/49454
   * expr.c (emit_block_move_hints):  Mark MEM_EXPR(x) and
   MEM_EXPR(y) addressable if emit_block_move_via_libcall is
   used to copy y into x.
   * calls.c (initialize_argument_information): Mark
   an argument addressable if it is passed by invisible reference.
   (emit_library_call_value_1): Mark  MEM_EXPR (val) addressable
   if it is passed by reference.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 175346)
+++ gcc/expr.c  (working copy)
@@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu
   else if (may_use_call
ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
-retval = emit_block_move_via_libcall (x, y, size,
- method == BLOCK_OP_TAILCALL);
+{
+  /* Since x and y are passed to a libcall, mark the corresponding
+ tree EXPR as addressable.  */
+  tree y_expr = MEM_EXPR (y);
+  tree x_expr = MEM_EXPR (x);
+  if (y_expr)
+mark_addressable (y_expr);
+  if (x_expr)
+mark_addressable (x_expr);
+  retval = emit_block_move_via_libcall (x, y, size,
+   method == BLOCK_OP_TAILCALL);
+}
+
   else
 emit_block_move_via_loop (x, y, size, align);

Index: gcc/calls.c
===
--- gcc/calls.c (revision 175346)
+++ gcc/calls.c (working copy)
@@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
   TREE_CODE (base) != SSA_NAME
   (!DECL_P (base) || MEM_P (DECL_RTL (base)
{
+  mark_addressable (args[i].tree_value);
+
  /* We can't use sibcalls if a callee-copied argument is
 stored in the current function's frame.  */
  if (!call_from_thunk_p  DECL_P (base)  !TREE_STATIC (base))
@@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun,
}

  if (MEM_P (val)  !must_copy)
-   slot = val;
+{
+  tree val_expr = MEM_EXPR (val);
+  if (val_expr)
+mark_addressable (val_expr);
+ slot = val;
+}
  else
{
  slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),

Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-23 Thread Easwaran Raman

On Thu, Jun 23, 2011 at 12:16 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Jun 23, 2011 at 12:02:35PM -0700, Easwaran Raman wrote:
 +      if (y_expr)
 +        mark_addressable (y_expr);

 Please watch formatting, a tab should be used instead of 8 spaces.

 +      if (x_expr)
 +        mark_addressable (x_expr);

 Ditto.

 @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
                  TREE_CODE (base) != SSA_NAME
                  (!DECL_P (base) || MEM_P (DECL_RTL (base)
           {
 +              mark_addressable (args[i].tree_value);
 +

 Likewise, plus the line is indented too much, each level should be indented
 by 2 chars.

        Jakub


I have attached a new patch that fixes the formatting  issues.

Thanks,
Easwaran
Index: gcc/expr.c
===
--- gcc/expr.c	(revision 175346)
+++ gcc/expr.c	(working copy)
@@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu
   else if (may_use_call
 	ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
-retval = emit_block_move_via_libcall (x, y, size,
-	  method == BLOCK_OP_TAILCALL);
+{
+  /* Since x and y are passed to a libcall, mark the corresponding
+	 tree EXPR as addressable.  */
+  tree y_expr = MEM_EXPR (y);
+  tree x_expr = MEM_EXPR (x);
+  if (y_expr)
+	mark_addressable (y_expr);
+  if (x_expr)
+	mark_addressable (x_expr);
+  retval = emit_block_move_via_libcall (x, y, size,
+	method == BLOCK_OP_TAILCALL);
+}
+
   else
 emit_block_move_via_loop (x, y, size, align);
 
Index: gcc/calls.c
===
--- gcc/calls.c	(revision 175346)
+++ gcc/calls.c	(working copy)
@@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A
 		   TREE_CODE (base) != SSA_NAME
 		   (!DECL_P (base) || MEM_P (DECL_RTL (base)
 	{
+	  mark_addressable (args[i].tree_value);
+
 	  /* We can't use sibcalls if a callee-copied argument is
 		 stored in the current function's frame.  */
 	  if (!call_from_thunk_p  DECL_P (base)  !TREE_STATIC (base))
@@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun,
 	}
 
 	  if (MEM_P (val)  !must_copy)
-	slot = val;
+	{
+	  tree val_expr = MEM_EXPR (val);
+	  if (val_expr)
+		mark_addressable (val_expr);
+	  slot = val;
+	}
 	  else
 	{
 	  slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),

Re: Mark variables addressable if they are copied using libcall in RTL expander

2011-06-22 Thread Easwaran Raman

On Wed, Jun 22, 2011 at 7:13 AM, Eric Botcazou ebotca...@adacore.com wrote:
 I fear this isn't enough considering pass-by-value aggregates that
 are callee copied.

 It's indeed not sufficient for arguments passed by reference but 
 callee-copied.

 This is PR target/49454.  For gcc.c-torture/execute/2717-1.c:

 typedef struct trio { int a, b, c; } trio;

 int
 foo (trio t, int i)
 {
  return bar (i, t);
 }

 yiedls in the .optimized dump:

 foo (struct trio t, int i)
 {
  int D.1968;
  struct trio t.0;

 bb 2:
  t.0 = t;
  D.1968_2 = bar (i_1(D), t.0);
  return D.1968_2;
 }

 and the aggregate copy is elided by DSE because t.0 isn't may_be_aliased.  
 This
 seems to be a pre-existing bug though: its address is passed to bar in RTL.

 --
 Eric Botcazou


Is the following patch a reasonable fix for this case?  I assume I
should add similar code inside emit_library_call_value_1.

-Easwaran


--- gcc/calls.c (revision 175081)
+++ gcc/calls.c (working copy)
@@ -1073,6 +1073,8 @@ initialize_argument_information (int num_actuals A
  callee_copies
= reference_callee_copied (args_so_far, TYPE_MODE (type),
   type, argpos  n_named_args);
+  if (callee_copies)
+mark_addressable (args[i].tree_value);

  /* If we're compiling a thunk, pass through invisible references
 instead of making a copy.  */

Re: Improve DSE in the presence of calls

2011-06-17 Thread Easwaran Raman

This patch seems to break ia64 and some other targets. I have updated
Bug 49429 with a test case triage. It looks like for some EXPR,
may_be_aliased returns incorrect value and hence can_escape
incorrectly concludes the variable can't escape resulting in removal
of useful stores.


(So can_escape() returns false. But later on, in the same BB, I see:

In the following list of instructions,

(insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
(reg:DI 112 in0)) y.c:23 5 {movdi_internal}
 (expr_list:REG_DEAD (reg:DI 112 in0)
(nil)))

...

(insn 36 30 37 2 (set (reg:DI 120 out0)
(reg/f:DI 357)) 5 {movdi_internal}
 (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
(const_int 62 [0x3e]))
(nil)))
(insn 37 36 38 2 (set (reg:DI 121 out1)
(reg/f:DI 341)) 5 {movdi_internal}
 (expr_list:REG_DEAD (reg/f:DI 341)
(expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp)
(const_int 96 [0x60]))
(nil
(insn 38 37 39 2 (set (reg:DI 122 out2)
(const_int 31 [0x1f])) 5 {movdi_internal}
 (nil))
(call_insn 39 38 42 2 (parallel [
(set (reg:DI 8 r8)
(call (mem:DI (symbol_ref:DI (memcpy) [flags 0x41]
function_decl 0x770d2e00 memcpy) [0 memcpy S8 A64])
(const_int 1 [0x1])))
(clobber (reg:DI 320 b0))
(clobber (scratch:DI))
(clobber (scratch:DI))
]) 332 {call_value_gp}
 (expr_list:REG_DEAD (reg:DI 122 out2)
(expr_list:REG_DEAD (reg:DI 121 out1)
(expr_list:REG_DEAD (reg:DI 120 out0)
(expr_list:REG_UNUSED (reg:DI 8 r8)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))
(expr_list:REG_DEP_TRUE (use (reg:DI 1 r1))
(expr_list:REG_DEP_TRUE (use (reg:DI 122 out2))
(expr_list:REG_DEP_TRUE (use (reg:DI 121 out1))
(expr_list:REG_DEP_TRUE (use (reg:DI 120 out0))
(nil))


for the memory expression

(set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64])
(reg:DI 112 in0))

may_be_aliased() returns false, but register 341 is passed as the
second parameter to memcpy. I suspect this is a bug elsewhere which is
exposed by this patch. If someone knows why this might be happening, I
can tighten the  can_escape() function appropriately.

Thanks,
Easwaran



On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law l...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 05/10/11 13:18, Easwaran Raman wrote:

 I am not sure I understand the problem here.  If there is a wild read
 from asm, the instruction has the wild_read flag set. The if statement
 checks if that flag is set and if so it clears the bitmap - which was
 the original behavior. Originally, only if read_rec is non NULL you
 need to recompute the kill set. Now, even if read_rec is NULL,
 non_frame_wild_read could be set requiring the kill set to be
 modified, which is what this patch does.  In fact, isn't what you have
 written above the equivalent to what is in the patch as '/* Leave this
 clause unchanged */' is the same as

                  if (dump_file)
                    fprintf (dump_file, regular read\n);
                  scan_reads_nospill (insn_info, v, NULL);


 -Easwaran


 Ping.  I have changed the test case to use int and added another test
 case that shows DSE doesn't happen when  the struct instance is
 volatile (wild_read gets set in that case)


 What's the purpose behind using unit64_t in the testcase?  Somehow I
 suspect using int64_t means the test is unlikely not going to work
 across targets with different word sizes.
 Sorry for the exceedingly long wait.  Things have been a bit crazy the
 last several weeks.

 On a positive note, re-reading things now I think my objection/comment
 was mis-guided.

 Patch approved, and again, sorry for the absurdly long period of
 non-responsiveness.

 jeff
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM
 Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y
 R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5
 6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r
 40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O
 G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw=
 =o7AI
 -END PGP SIGNATURE-

Re: Ping: Re: Improve DSE in the presence of calls

2011-06-07 Thread Easwaran Raman

Ping.

Diego, David,
 Is this patch OK for google/main?

-Easwaran

On Thu, Jun 2, 2011 at 4:48 PM, Easwaran Raman era...@google.com wrote:
 Ping.

 On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html

Re: Ping: Re: Improve DSE in the presence of calls

2011-06-02 Thread Easwaran Raman

Ping.

On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html

Re: Ping: Re: Improve DSE in the presence of calls

2011-05-20 Thread Easwaran Raman

Ping.

On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html

Ping: Re: Improve DSE in the presence of calls

2011-05-14 Thread Easwaran Raman

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html

Re: Improve DSE in the presence of calls

2011-05-10 Thread Easwaran Raman

On Tue, May 3, 2011 at 9:40 AM, Easwaran Raman era...@google.com wrote:
 On Mon, May 2, 2011 at 8:37 PM, Jeff Law l...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 04/26/11 16:06, Easwaran Raman wrote:


 You're right. The patch has correctness issues. It is not possible to
 simply not call add_wild_read because it also resets
 active_local_stores and frees read_recs. During the local phase it
 seems better to just treat calls as wild reads and reset
 active_local_stores and free read_recs. I've now refactored
 add_wild_read so that resetting active_local_stores and free read_recs
 are in separate methods that can be called on non-const/non-memset
 calls. In addition, I have added a non_frame_wild_read field in
 insn_info to mark non-const and non-memset calls.

 I've attached the revised patch.
 Looking better.  Just a few more things.

 Don't all locals escape if the callee has a static chain?  Is that
 handled anywhere?

 Don't you still have the potential for wild reads in dse_step5_nospill
 (say from an asm)?  if so, then the change can't be correct.

 Couldn't you just add a clause like this before the else-if?

 else if (insn_info-non_frame_wild_read)
  {
    if (dump_file)
      fprintf (dump_file, non-frame wild read\n);
    scan_reads_nospill (insn_info, v, NULL);
  }
 else if (insn_info-read_rec)
  {
    /* Leave this clause unchanged */
  }

 Am I missing something?

 I am not sure I understand the problem here.  If there is a wild read
 from asm, the instruction has the wild_read flag set. The if statement
 checks if that flag is set and if so it clears the bitmap - which was
 the original behavior. Originally, only if read_rec is non NULL you
 need to recompute the kill set. Now, even if read_rec is NULL,
 non_frame_wild_read could be set requiring the kill set to be
 modified, which is what this patch does.  In fact, isn't what you have
 written above the equivalent to what is in the patch as '/* Leave this
 clause unchanged */' is the same as

                  if (dump_file)
                    fprintf (dump_file, regular read\n);
                  scan_reads_nospill (insn_info, v, NULL);


 -Easwaran


Ping.  I have changed the test case to use int and added another test
case that shows DSE doesn't happen when  the struct instance is
volatile (wild_read gets set in that case)


 What's the purpose behind using unit64_t in the testcase?  Somehow I
 suspect using int64_t means the test is unlikely not going to work
 across targets with different word sizes.

 Jeff
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJNv3iGAAoJEBRtltQi2kC7C8oH+gPi/6DvV2JkmitT3xEeUaW8
 szOHC0GysQkot/TiVlQQy6fiz61G49ii0mz68W4cfSntvuN7iaqBqVWpRKWIzvkx
 alk4U1snj9a2t9+9ZRTX4xm3quggTv+mUDK81a+DIS2zAf6i/HRXLvQbx6fhDOYD
 sqXqSkvCKqkH2pPHHYEqnBtS/cLFtAfZJe+JSx3u2oqL+sBFGLftdoV6yJzkShLS
 LpmYHMeDbzhdCtZTf15GQm3h/cBlyrChxjQsjJxLiXk5rrcDI/X+Pqi+cll21Gwg
 mlzMBi0iYToENl+7aO8DNGvXfliCEzQ7iEUyTctJiTDt3/RgVcaxgRJgqHZNSBI=
 =VOdm
 -END PGP SIGNATURE-


Index: gcc/testsuite/gcc.dg/pr44194-1.c
===
--- gcc/testsuite/gcc.dg/pr44194-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr44194-1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-dse1 } */
+#include stdint.h
+
+struct ints { int a, b, c; } foo();
+void bar(int a, int b);
+
+void func() {
+  struct ints s = foo();
+  bar(s.a, s.b);
+}
+/* { dg-final { scan-rtl-dump global deletions = 2  dse1 } } */
+/* { dg-final { cleanup-rtl-dump dse1 } } */
Index: gcc/testsuite/gcc.dg/pr44194-2.c
===
--- gcc/testsuite/gcc.dg/pr44194-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr44194-2.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-dse1 } */
+#include stdint.h
+
+struct ints { int a, b, c; } foo();
+void bar(int a, int b);
+
+void func() {
+  volatile struct ints s = foo();
+  bar(s.a, s.b);
+}
+/* { dg-final { scan-rtl-dump global deletions = 0  dse1 } } */
+/* { dg-final { cleanup-rtl-dump dse1 } } */
Index: gcc/dse.c
===
--- gcc/dse.c	(revision 172844)
+++ gcc/dse.c	(working copy)
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include dbgcnt.h
 #include target.h
 #include params.h
+#include tree-flow.h
 
 /* This file contains three techniques for performing Dead Store
Elimination (dse).
@@ -326,6 +327,11 @@ struct insn_info
  contains a wild read, the use_rec will be null.  */
   bool wild_read;
 
+  /* This is true only for CALL instructions which could potentially read
+ any non-frame memory location. This field is used by the global
+ algorithm.  */
+  bool non_frame_wild_read;
+
   /* This field is only used for the processing of const functions

[google]Implement an optimization based on reuse distance profiling (issue4517049)

2011-05-09 Thread Easwaran Raman

This patch by Silvius Rus replaces calls to  certain functions with a 
specialized version that uses non-temporal stores based on memory reuse 
distance profiling. Bootstraps, no test regressions and the profiling works for 
a small test case. Ok for google/main.?

-Easwaran

2011-05-09  Silvius Rus silvius@gmail.com

* value-prof.c (interesting_stringop_to_profile_p): Disbale
stringop profiling if reuse distance profiling is turned on.
* value-prof.h (gimple_gen_reusedist, optimize_reusedist): Declare.
* gcov-io.h: (GCOV_COUNTER_REUSE_DIST): New counter.
(GCOV_COUNTERS): Update.
(GCOV_COUNTER_NAMES): Add reuse_distance.
(GCOV_MERGE_FUNCTIONS): Add __gcov_merge_reusedist.
(__gcov_merge_reusedist): New declaration.
* profile.c (branch_prob): Enable reuse distance profiling and
optimization.
* common.opt (fprofile-reusedist, foptimize-locality): New options.
* tree-profile.c: Include params.h.
(stringop_subst, reusedist_t): New structures.
(stringop_subst_t, reusedist_t): New typedefs.
(stringop_decl): New global array.
(RD_NUM_COUNTERS): New constant.
(get_stringop_subst, reusedist_is_interesting_call)
(reusedist_instr_func_name, reusedist_get_instr_decl)
(reusedist_make_instr_call, reusedist_from_counters)
(gimple_gen_reusedist, nt_op_name, reusedist_get_size_threshold)
(reusedist_get_distance_large_threshold)
(reusedist_get_distance_small_threshold)
(reusedist_get_count_threshold, reusedist_nt_is_worth_it)
(reusedist_get_nt_decl, maybe_issue_profile_use_note)
(reusedist_maybe_replace_with_nt_version, optimize_reusedist): New
functions.
(gate_tree_profile_ipa): Return true if reuse distance instrumentation
or use is turned on.
* Makefile.in (LIBGCOV): Add _gcov_merge_reusedist.
* libgcov.c (__gcov_weighted_mean2, __gcov_merge_reusedist): New
functions.
* params.def (PARAM_REUSEDIST_MEAN_DIST_LARGE_THRESH)
(PARAM_REUSEDIST_MEAN_DIST_SMALL_THRESH)
(PARAM_REUSEDIST_CALL_COUNT_THRESH, PARAM_REUSEDIST_MEMCPY_SIZE_THRESH)
(PARAM_REUSEDIST_MEMSET_SIZE_THRESH): New params.



Index: gcc/value-prof.c
===
--- gcc/value-prof.c(revision 173496)
+++ gcc/value-prof.c(working copy)
@@ -1708,6 +1708,14 @@ interesting_stringop_to_profile_p (tree fndecl, gi
 {
   enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl);
 
+  /* Disable stringop collection with reuse distance instrumentation
+ or optimization.  Otherwise we end up with hard to correct profile
+ mismatches for functions where reuse distance-based transformation are
+ made.  We see a number of memcpy at instrumentation time and a different
+ number of memcpy at profile use time.  */
+  if (flag_profile_reusedist || flag_optimize_locality)
+return false;
+
   if (fcode != BUILT_IN_MEMCPY  fcode != BUILT_IN_MEMPCPY
fcode != BUILT_IN_MEMSET  fcode != BUILT_IN_BZERO)
 return false;
Index: gcc/value-prof.h
===
--- gcc/value-prof.h(revision 173496)
+++ gcc/value-prof.h(working copy)
@@ -103,6 +103,8 @@ extern void gimple_gen_const_delta_profiler (histo
 unsigned, unsigned);
 extern void gimple_gen_average_profiler (histogram_value, unsigned, unsigned);
 extern void gimple_gen_ior_profiler (histogram_value, unsigned, unsigned);
+extern void gimple_gen_reusedist (void);
+extern void optimize_reusedist (void);
 
 /* In profile.c.  */
 extern void init_branch_prob (void);
Index: gcc/gcov-io.h
===
--- gcc/gcov-io.h   (revision 173496)
+++ gcc/gcov-io.h   (working copy)
@@ -374,7 +374,8 @@ typedef HOST_WIDEST_INT gcov_type;
 #define GCOV_LAST_VALUE_COUNTER 8  /* The last of counters used for value
  profiling.  */
 #define GCOV_COUNTER_DIRECT_CALL 9 /* Direct call counts.  */
-#define GCOV_COUNTERS  10
+#define GCOV_COUNTER_REUSE_DIST 10 /* Reuse distance measure.  */
+#define GCOV_COUNTERS  11
 
 /* Number of counters used for value profiling.  */
 #define GCOV_N_VALUE_COUNTERS \
@@ -383,7 +384,8 @@ typedef HOST_WIDEST_INT gcov_type;
   /* A list of human readable names of the counters */
 #define GCOV_COUNTER_NAMES {arcs, interval, pow2, single, \
 delta,indirect_call, average, ior, \
-indirect_call_topn, direct_call}
+indirect_call_topn, direct_call, \
+ reuse_distance}
 
 #define GCOV_ICALL_TOPN_VAL  2   /* Track two hottest callees */
 #define GCOV_ICALL_TOPN_NCOUNTS  9 /* The number of counter entries per icall

[google] Backport r172837 and r172788 to google/main

2011-05-06 Thread Easwaran Raman

Backported r172788 and r172837  from trunk to google/main.


2011-05-06  Easwaran Raman  era...@google.com

Backport r172837:

* cfgexpand.c (stack_var): Remove OFFSET...
(add_stack_var): ...and its reference here...
(expand_stack_vars): ...and here.
(stack_var_cmp): Sort by descending order of size.
(partition_stack_vars): Change heuristic.
(union_stack_vars): Fix to reflect changes in
partition_stack_vars.
(dump_stack_var_partition): Add newline after each partition.

2011-05-06  Easwaran Raman  era...@google.com

Backport r172788:

* cfgexpand.c (add_alias_set_conflicts): Add conflicts with a variable
containing union type only with -fstrict-aliasing.

testsuite/ChangeLog.google-main:

2011-05-06  Easwaran Raman  era...@google.com

Backport r172837:

* gcc.dg/stack-layout-2.c: New test.

2011-05-06  Easwaran Raman  era...@google.com

Backport r172788:

* gcc.dg/stack-layout-1.c: New test.
Index: gcc/testsuite/gcc.dg/stack-layout-1.c
===
--- gcc/testsuite/gcc.dg/stack-layout-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-1.c	(revision 173499)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fno-strict-aliasing -fdump-rtl-expand } */
+union U {
+  int a;
+  float b;
+};
+struct A {
+  union U u1;
+  char a[100];
+};
+void bar (struct A *);
+void foo ()
+  {
+{
+  struct A a;
+  bar (a);
+}
+{
+  struct A a;
+  bar (a);
+}
+  }
+
+/* { dg-final { scan-rtl-dump-times Partition 1 expand } } */
+/* { dg-final { cleanup-rtl-dump expand } } */
Index: gcc/testsuite/gcc.dg/stack-layout-2.c
===
--- gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/stack-layout-2.c	(revision 173499)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-expand } */
+void bar( char *);
+int foo()
+{
+  int i=0;
+  {
+char a[8000];
+bar(a);
+i += a[0];
+  }
+  {
+char a[8192];
+char b[32];
+bar(a);
+i += a[0];
+bar(b);
+i += a[0];
+  }
+  return i;
+}
+/* { dg-final { scan-rtl-dump size 8192 expand } } */
+/* { dg-final { scan-rtl-dump size 32 expand } } */
Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c	(revision 173498)
+++ gcc/cfgexpand.c	(revision 173499)
@@ -158,11 +158,6 @@ struct stack_var
   /* The Variable.  */
   tree decl;
 
-  /* The offset of the variable.  During partitioning, this is the
- offset relative to the partition.  After partitioning, this
- is relative to the stack frame.  */
-  HOST_WIDE_INT offset;
-
   /* Initially, the size of the variable.  Later, the size of the partition,
  if this variable becomes it's partition's representative.  */
   HOST_WIDE_INT size;
@@ -267,7 +262,6 @@ add_stack_var (tree decl)
   v = stack_vars[stack_vars_num];
 
   v-decl = decl;
-  v-offset = 0;
   v-size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1);
   /* Ensure that all variables have size, so that a != b for any two
  variables that are simultaneously live.  */
@@ -372,8 +366,9 @@ add_alias_set_conflicts (void)
 		 to elements will conflict.  In case of unions we have
 		 to be careful as type based aliasing rules may say
 		 access to the same memory does not conflict.  So play
-		 safe and add a conflict in this case.  */
-	  || contains_union)
+		 safe and add a conflict in this case when
+ -fstrict-aliasing is used.  */
+  || (contains_union  flag_strict_aliasing))
 	add_stack_var_conflict (i, j);
 	}
 }
@@ -403,9 +398,9 @@ stack_var_cmp (const void *a, const void *b)
 return (int)largeb - (int)largea;
 
   /* Secondary compare on size, decreasing  */
-  if (sizea  sizeb)
-return -1;
   if (sizea  sizeb)
+return -1;
+  if (sizea  sizeb)
 return 1;
 
   /* Tertiary compare on true alignment, decreasing.  */
@@ -564,28 +559,19 @@ update_alias_info_with_stack_vars (void)
 
 /* A subroutine of partition_stack_vars.  The UNION portion of a UNION/FIND
partitioning algorithm.  Partitions A and B are known to be non-conflicting.
-   Merge them into a single partition A.
+   Merge them into a single partition A.  */
 
-   At the same time, add OFFSET to all variables in partition B.  At the end
-   of the partitioning process we've have a nice block easy to lay out within
-   the stack frame.  */
-
 static void
-union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset)
+union_stack_vars (size_t a, size_t b)
 {
-  size_t i, last;
   struct stack_var *vb = stack_vars[b];
   bitmap_iterator bi;
   unsigned u;
 
-  /* Update each element of partition B with the given offset,
- and merge them into partition A.  */
-  for (last = i = b; i != EOC; last = i, i = stack_vars[i].next)
-{
-  stack_vars[i].offset

Re: [google] Backport r172837 and r172788 to google/main

2011-05-06 Thread Easwaran Raman

Thanks. Fixed them (as well as the same issue in some earlier entries).

-Easwaran

On Fri, May 6, 2011 at 11:27 AM, Diego Novillo dnovi...@google.com wrote:
 On Fri, May 6, 2011 at 14:22, Easwaran Raman era...@google.com wrote:
 Backported r172788 and r172837  from trunk to google/main.


 Minor nit:

 2011-05-06  Easwaran Raman  era...@google.com

 Backport r172837:


 This needs to be indented by 1 tab.

        * cfgexpand.c (stack_var): Remove OFFSET...
        (add_stack_var): ...and its reference here...
        (expand_stack_vars): ...and here.


 Diego.

Re: Improve DSE in the presence of calls

2011-05-03 Thread Easwaran Raman

On Mon, May 2, 2011 at 8:37 PM, Jeff Law l...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 04/26/11 16:06, Easwaran Raman wrote:


 You're right. The patch has correctness issues. It is not possible to
 simply not call add_wild_read because it also resets
 active_local_stores and frees read_recs. During the local phase it
 seems better to just treat calls as wild reads and reset
 active_local_stores and free read_recs. I've now refactored
 add_wild_read so that resetting active_local_stores and free read_recs
 are in separate methods that can be called on non-const/non-memset
 calls. In addition, I have added a non_frame_wild_read field in
 insn_info to mark non-const and non-memset calls.

 I've attached the revised patch.
 Looking better.  Just a few more things.

 Don't all locals escape if the callee has a static chain?  Is that
 handled anywhere?

 Don't you still have the potential for wild reads in dse_step5_nospill
 (say from an asm)?  if so, then the change can't be correct.

 Couldn't you just add a clause like this before the else-if?

 else if (insn_info-non_frame_wild_read)
  {
    if (dump_file)
      fprintf (dump_file, non-frame wild read\n);
    scan_reads_nospill (insn_info, v, NULL);
  }
 else if (insn_info-read_rec)
  {
    /* Leave this clause unchanged */
  }

 Am I missing something?

I am not sure I understand the problem here.  If there is a wild read
from asm, the instruction has the wild_read flag set. The if statement
checks if that flag is set and if so it clears the bitmap - which was
the original behavior. Originally, only if read_rec is non NULL you
need to recompute the kill set. Now, even if read_rec is NULL,
non_frame_wild_read could be set requiring the kill set to be
modified, which is what this patch does.  In fact, isn't what you have
written above the equivalent to what is in the patch as '/* Leave this
clause unchanged */' is the same as

  if (dump_file)
fprintf (dump_file, regular read\n);
  scan_reads_nospill (insn_info, v, NULL);


-Easwaran

 What's the purpose behind using unit64_t in the testcase?  Somehow I
 suspect using int64_t means the test is unlikely not going to work
 across targets with different word sizes.

 Jeff
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJNv3iGAAoJEBRtltQi2kC7C8oH+gPi/6DvV2JkmitT3xEeUaW8
 szOHC0GysQkot/TiVlQQy6fiz61G49ii0mz68W4cfSntvuN7iaqBqVWpRKWIzvkx
 alk4U1snj9a2t9+9ZRTX4xm3quggTv+mUDK81a+DIS2zAf6i/HRXLvQbx6fhDOYD
 sqXqSkvCKqkH2pPHHYEqnBtS/cLFtAfZJe+JSx3u2oqL+sBFGLftdoV6yJzkShLS
 LpmYHMeDbzhdCtZTf15GQm3h/cBlyrChxjQsjJxLiXk5rrcDI/X+Pqi+cll21Gwg
 mlzMBi0iYToENl+7aO8DNGvXfliCEzQ7iEUyTctJiTDt3/RgVcaxgRJgqHZNSBI=
 =VOdm
 -END PGP SIGNATURE-

[google]Add support for sampled profile collection (issue4438083)

2011-04-28 Thread Easwaran Raman

This patch from Silvius Rus  adds support for sampled edge profile collection 
to reduce instrumentation run overhead. Bootstraps and no test regressions. Ok 
for google/main?

2011-04-28  Silvius Rus  silvius@gmail.com

* doc/invoke.texi: Document -fprofile-generate-sampling option.
* gcov-io.h (__gcov_set_sampling_rate): New declaration.
* profile.c (branch_prob): Add support for sampled profile
collection.
* profile.h (add_sampling_to_edge_counters): New declaration.
* common.opt (fprofile-generate-sampling): New option.
* tree-profile: Include header files; define EDGE_COUNTER_STMT_COUNT.
(instrumentation_to_be_sampled, gcov_sample_counter_decl)
(gcov_sampling_rate_decl): New globals.
(insert_if_then, add_sampling_wrapper, is_instrumentation_to_be_sampled)
(add_sampling_to_edge_counters, gimple_init_instrumentation_sampling):
New functions.
(gimple_init_edge_profiler): Call gimple_init_instrumentation_sampling.
(gimple_gen_edge_profiler): Mark start of instrumentation block.
* libgcov.c (__gcov_sampling_rate): New extern declaration.
(gcov_sampling_rate_initialized, __gcov_sample_counter): New globals.
(gcov_exit): Set sampling rate; minor coding style fixes.
* params.def (PARAM_PROFILE_GENERATE_SAMPLING_RATE): New parameter.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 173136)
+++ gcc/doc/invoke.texi (working copy)
@@ -375,7 +375,7 @@ Objective-C and Objective-C++ Dialects}.
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
 -fprefetch-loop-arrays @gol
 -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
--fprofile-generate=@var{path} @gol
+-fprofile-generate=@var{path} -fprofile-generate-sampling @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
 -freorder-blocks-and-partition -freorder-functions @gol
@@ -7923,6 +7923,20 @@ The following options are enabled: @code{-fprofile
 If @var{path} is specified, GCC will look at the @var{path} to find
 the profile feedback data files. See @option{-fprofile-dir}.
 
+@item -fprofile-generate-sampling
+@opindex -fprofile-generate-sampling
+
+Enable sampling for instrumented binaries.  Instead of recording every event,
+record only every N-th event, where N (the sampling rate) can be set either
+at compile time using
+@option{--param profile-generate-sampling-rate=@var{value}}, or
+at execution start time through environment variable @samp{GCOV_SAMPLING_RATE}.
+
+At this time sampling applies only to branch counters.  A sampling rate of 100
+decreases instrumentated binary slowdown from up to 20x for heavily threaded
+applications down to around 2x.  @option{-fprofile-correction} is always
+needed with sampling.
+
 @item -fprofile-use
 @itemx -fprofile-use=@var{path}
 @opindex fprofile-use
@@ -9138,6 +9152,9 @@ recognize.
 If you want to pass an option that takes an argument, you must use
 @option{-Xassembler} twice, once for the option and once for the argument.
 
+@item profile-generate-sampling-rate
+Set the sampling rate with @option{-fprofile-generate-sampling}.
+
 @end table
 
 @node Link Options
Index: gcc/gcov-io.h
===
--- gcc/gcov-io.h   (revision 173136)
+++ gcc/gcov-io.h   (working copy)
@@ -544,6 +544,9 @@ struct dyn_imp_mod
 /* Register a new object file module.  */
 extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN;
 
+/* Set sampling rate to RATE.  */
+extern void __gcov_set_sampling_rate (unsigned int rate);
+
 /* Called before fork, to avoid double counting.  */
 extern void __gcov_flush (void) ATTRIBUTE_HIDDEN;
 
Index: gcc/profile.c
===
--- gcc/profile.c   (revision 173136)
+++ gcc/profile.c   (working copy)
@@ -1210,6 +1210,9 @@ branch_prob (void)
 
   /* Commit changes done by instrumentation.  */
   gsi_commit_edge_inserts ();
+
+  if (flag_profile_generate_sampling)
+add_sampling_to_edge_counters ();
 }
 
   free_aux_for_edges ();
Index: gcc/profile.h
===
--- gcc/profile.h   (revision 173136)
+++ gcc/profile.h   (working copy)
@@ -47,4 +47,10 @@ extern gcov_type sum_edge_counts (VEC (edge, gc) *
 extern void init_node_map (void);
 extern void del_node_map (void);
 
+/* Implement sampling to avoid writing to edge counters very often.
+   Many concurrent writes to the same counters, or to counters that share
+   the same cache line leads to up to 30x slowdown on an application running
+   on 8 CPUs.  With sampling, the slowdown reduced to 2x.  */
+extern void add_sampling_to_edge_counters (void);
+
 #endif /* PROFILE_H */
Index: gcc/common.opt

Re: Improve DSE in the presence of calls

2011-04-26 Thread Easwaran Raman

On Mon, Apr 25, 2011 at 10:03 AM, Jeff Law l...@redhat.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 04/22/11 14:19, Easwaran Raman wrote:
 Hi,
  This patch improves RTL DSE by not assuming that calls read all
 memory locations. With this patch, calls are assumed to read any
 non-frame memory and any stack variables that can potentially escape.
 This patch partly addresses PR rtl-optimization/44194. Bootstraps and
 no test regressions. OK for trunk?

 Thanks,
 Easwaran

 2011-04-22  Easwaran Raman  era...@google.com

 PR rtl-optimization/44194
       * dse.c (header files): Include tree-flow.h.
       (group_info): Add fields.
       (globals): Add a new variable kill_on_calls.
       (get_group_info): Initialize added fields.
       (dse_step0): Initialize kill_on_calls.
       (can_escape): New function.
       (record_store): Pass EXPR corresponding to MEM to
       set_usage_bits.
       (dse_step2_nospill): Set kill_on_calls based on
       group-escaped_*.
       (scan_reads_nospill): Handle call instructions.
       (find_insn_before_first_wild_read): Remove the function.
       (dse_step3_scan): Remove call to find_insn_before_first_wild_read.
       (dse_step5_nospill): Do not kill everything on call.

 testsuite/ChangeLog:

 2011-04-22  Easwaran Raman  era...@google.com

 PR rtl-optimization/44194
       * gcc.dg/pr44194-1.c: New test.
 On a bookkeeping note, it doesn't appear that you ever release the
 bitmaps for gi-escaped_{p,n} or kill_on_calls.  This should probably be
 done in dse_step7.
I've now freed the bitmaps in dse_step7.

 I'm going to assume that you and Richi have agreed upon the form of
 can_escape.

Yes, Richard suggested to use may_be_aliased for now to check for escaping.

 AFAICT, even after your patch we still have a wild read recorded for
 most CALL_INSNs (see this code in scan_insn which was not modified):




      else
        /* Every other call, including pure functions, may read memory.  */
        add_wild_read (bb_info);

 It appears that you effectively ignore the wild_read in
 scan_reads_nospill by special casing CALL_INSNs.  Correct?  Is so, does
 it make still sense to call add_wild_read in scan_insn?


 ISTM that wild reads are still possible due to a few other constructs
 such as asms, address canonicalization failure, etc, so why is the
 removal of the wild read support in dse_step5_nospill correct?

You're right. The patch has correctness issues. It is not possible to
simply not call add_wild_read because it also resets
active_local_stores and frees read_recs. During the local phase it
seems better to just treat calls as wild reads and reset
active_local_stores and free read_recs. I've now refactored
add_wild_read so that resetting active_local_stores and free read_recs
are in separate methods that can be called on non-const/non-memset
calls. In addition, I have added a non_frame_wild_read field in
insn_info to mark non-const and non-memset calls.

I've attached the revised patch.

-Easwaran

 Just to be clear, I generally like the change, but there's some details
 that I think need to be better understood before installing.

 jeff


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iQEcBAEBAgAGBQJNtalUAAoJEBRtltQi2kC7evMH/0UyasIBTcZM7MSeCDGvefdR
 byd5/XaaH1FtdRS64ne1bGbjB/h96AJbXYfc3QECtKugfmwgSbJeaN/BD8T4WY/L
 2z1MkeSwcpswnxufjTR6G5WezuUsxagB4/xoSqk7NRfJdsM3eBR9n+ehTwlfsU5q
 bTvQ/Uat0x027ddyrifD6vIOIIqFwMba9CvvN7vV0O+yzuxjzxNfe4IEcyMg2RIZ
 j6xK+Bv7pge9pHC8ERKOFO17CPdK2JBe4ovtFL8s1sadBkjO2044uRszcl8E/9rj
 MUGHExtFEIypzzi0wBqWyy3fz5QRCBuN/Xj6ptk4Cw7dkuBTZPWjySWrXV5kEs4=
 =omzC
 -END PGP SIGNATURE-


2011-04-26  Easwaran Raman  era...@google.com

* dse.c: Include tree-flow.h
(insn_info): Add new field non_frame_wild_read.
(group_info): Add new fields escaped_n and escaped_p.
(kill_on_calls): New variable.
(get_group_info): Initialize gi-escaped_n and gi-escaped_p.
(dse_step0): Initialize kill_on_calls.
(can_escape): New function.
(set_usage_bits): Add additional parameter; record information
about escaped locations.
(record_store): Pass EXPR corresponding to MEM to
set_usage_bits.
(dse_step2_nospill): Set kill_on_calls based on
group-escaped_n and group-escaped_n.
(add_wild_read): Refactor into...
(reset_active_stores): ... New method, and
(free_read_records): ... New method.
(add_non_frame_wild_read): New method.
(scan_insn): Call add_non_frame_wild_read on non-const calls.
(scan_reads_nospill): Handle instructions with
non_frame_wild_read.
(dse_step5_nospill): Call scan_reads_nospill for instructions
marked as non_frame_wild_read.
(dse_step7): Free escaped_n, escaped_p and kill_on_calls
bitmaps.

testsuite/ChangeLog

2011-04-26  Easwaran Raman  era...@google.com

Re: [google]Pass --save-temps to the assembler (issue4436049)

2011-04-25 Thread Easwaran Raman

On Thu, Apr 21, 2011 at 3:20 PM, Andrew Pinski pins...@gmail.com wrote:
 On Thu, Apr 21, 2011 at 3:18 PM, Easwaran Raman era...@google.com wrote:
 When using gcc with a post-assembly tool, we use a wrapper that
 invokes the tool and needs to know if the tool's output (another
 assembly file) needs to be saved. The wrapper doesn't generate any
 additional .o files and hence doesn't need to know whether the obj
 file is to be saved (which is done by the driver). Why do we need to
 pass -save-temps=obj in that case?

 -save-temps=obj means place the save temps in the same directory as
 the object file would be put rather than the current directory.

 Thanks,
 Andrew Pinski

Sorry, I misunderstood that option. Is the following patch ok?
Configured with --with-gnu-as option, verified that it bootstraps and
gcc passes --save-temps to 'as' when run with -save-temps=obj.
Diego, OK for google/main?

-Easwaran

2011-04-25  Easwaran Raman  era...@google.com

* gcc/gcc.c (asm_options): Pass --save-temps to assembler
when invoked with -save-temps= option.

Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 172932)
+++ gcc/gcc.c   (working copy)
@@ -784,7 +784,7 @@ static const char *asm_options =
 #if HAVE_GNU_AS
 /* If GNU AS is used, then convert -w (no warnings), -I, and -v
to the assembler equivalents.  */
-%{v} %{w:-W} %{I*} %{save-temps:--save-temps} 
+%{v} %{w:-W} %{I*} %{save-temps*:--save-temps} 
 #endif
 %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O};

Improve DSE in the presence of calls

2011-04-22 Thread Easwaran Raman

Hi,
 This patch improves RTL DSE by not assuming that calls read all
memory locations. With this patch, calls are assumed to read any
non-frame memory and any stack variables that can potentially escape.
This patch partly addresses PR rtl-optimization/44194. Bootstraps and
no test regressions. OK for trunk?

Thanks,
Easwaran

2011-04-22  Easwaran Raman  era...@google.com

PR rtl-optimization/44194
* dse.c (header files): Include tree-flow.h.
(group_info): Add fields.
(globals): Add a new variable kill_on_calls.
(get_group_info): Initialize added fields.
(dse_step0): Initialize kill_on_calls.
(can_escape): New function.
(record_store): Pass EXPR corresponding to MEM to
set_usage_bits.
(dse_step2_nospill): Set kill_on_calls based on
group-escaped_*.
(scan_reads_nospill): Handle call instructions.
(find_insn_before_first_wild_read): Remove the function.
(dse_step3_scan): Remove call to find_insn_before_first_wild_read.
(dse_step5_nospill): Do not kill everything on call.

testsuite/ChangeLog:

2011-04-22  Easwaran Raman  era...@google.com

PR rtl-optimization/44194
* gcc.dg/pr44194-1.c: New test.
Index: gcc/testsuite/gcc.dg/pr44194-1.c
===
--- gcc/testsuite/gcc.dg/pr44194-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr44194-1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-rtl-dse1 } */
+#include stdint.h
+
+struct twoints { uint64_t a, b; } foo();
+void bar(uint64_t a, uint64_t b);
+
+void func() {
+  struct twoints s = foo();
+  bar(s.a, s.b);
+}
+/* { dg-final { scan-rtl-dump global deletions = 2  dse1 } } */
+/* { dg-final { cleanup-rtl-dump dse1 } } */
Index: gcc/dse.c
===
--- gcc/dse.c	(revision 172844)
+++ gcc/dse.c	(working copy)
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include dbgcnt.h
 #include target.h
 #include params.h
+#include tree-flow.h
 
 /* This file contains three techniques for performing Dead Store
Elimination (dse).
@@ -501,6 +502,11 @@ struct group_info
  deleted.  */
   bitmap store1_n, store1_p, store2_n, store2_p;
 
+  /* These bitmaps keep track of offsets in this group escape this function.
+ An offset escapes if it corresponds to a named variable whose
+ addressable flag is set.  */
+  bitmap escaped_n, escaped_p;
+
   /* The positions in this bitmap have the same assignments as the in,
  out, gen and kill bitmaps.  This bitmap is all zeros except for
  the positions that are occupied by stores for this group.  */
@@ -588,6 +594,9 @@ static int spill_deleted;
 
 static bitmap all_blocks;
 
+/* Locations that are killed by calls in the global phase.  */
+static bitmap kill_on_calls;
+
 /* The number of bits used in the global bitmaps.  */
 static unsigned int current_position;
 
@@ -692,6 +701,8 @@ get_group_info (rtx base)
 	  gi-store1_p = BITMAP_ALLOC (NULL);
 	  gi-store2_n = BITMAP_ALLOC (NULL);
 	  gi-store2_p = BITMAP_ALLOC (NULL);
+	  gi-escaped_p = BITMAP_ALLOC (NULL);
+	  gi-escaped_n = BITMAP_ALLOC (NULL);
 	  gi-group_kill = BITMAP_ALLOC (NULL);
 	  gi-process_globally = false;
 	  gi-offset_map_size_n = 0;
@@ -714,6 +725,8 @@ get_group_info (rtx base)
   gi-store1_p = BITMAP_ALLOC (NULL);
   gi-store2_n = BITMAP_ALLOC (NULL);
   gi-store2_p = BITMAP_ALLOC (NULL);
+  gi-escaped_p = BITMAP_ALLOC (NULL);
+  gi-escaped_n = BITMAP_ALLOC (NULL);
   gi-group_kill = BITMAP_ALLOC (NULL);
   gi-process_globally = false;
   gi-frame_related =
@@ -739,6 +752,7 @@ dse_step0 (void)
   spill_deleted = 0;
 
   scratch = BITMAP_ALLOC (NULL);
+  kill_on_calls = BITMAP_ALLOC (NULL);
 
   rtx_store_info_pool
 = create_alloc_pool (rtx_store_info_pool,
@@ -881,31 +895,48 @@ delete_dead_store_insn (insn_info_t insn_info)
   insn_info-wild_read = false;
 }
 
+/* Check if EXPR can possibly escape the current function scope.  */
+static bool
+can_escape (tree expr)
+{
+  tree base;
+  if (!expr)
+return true;
+  base = get_base_address (expr);
+  if (DECL_P (base)
+   !may_be_aliased (base))
+return false;
+  return true;
+}
 
 /* Set the store* bitmaps offset_map_size* fields in GROUP based on
OFFSET and WIDTH.  */
 
 static void
-set_usage_bits (group_info_t group, HOST_WIDE_INT offset, HOST_WIDE_INT width)
+set_usage_bits (group_info_t group, HOST_WIDE_INT offset, HOST_WIDE_INT width,
+tree expr)
 {
   HOST_WIDE_INT i;
-
+  bool expr_escapes = can_escape (expr);
   if (offset  -MAX_OFFSET  offset + width  MAX_OFFSET)
 for (i=offset; ioffset+width; i++)
   {
 	bitmap store1;
 	bitmap store2;
+bitmap escaped;
 	int ai;
 	if (i  0)
 	  {
 	store1 = group-store1_n;
 	store2 = group-store2_n;
+	escaped = group-escaped_n;
 	ai = -i;
 	  }
 	else
 	  {
 	store1

Re: Improve DSE in the presence of calls

2011-04-22 Thread Easwaran Raman

On Fri, Apr 22, 2011 at 1:26 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Apr 22, 2011 at 01:19:17PM -0700, Easwaran Raman wrote:

 The ChangeLog entry has various issues:

 2011-04-22  Easwaran Raman  era...@google.com

 PR rtl-optimization/44194

 This should have tab before PR as well.

       * dse.c (header files): Include tree-flow.h.

 This should be just
        * dse.c: Include tree-flow.h.

       (group_info): Add fields.

        (struct group_info): Add escaped_n and escaped_p fields.

 (it is always better to be be explicit on what you added).

       (globals): Add a new variable kill_on_calls.

        (kill_on_calls): New variable.

       (get_group_info): Initialize added fields.

 Instead of added fields again mention which fields were added.

 2011-04-22  Easwaran Raman  era...@google.com

 PR rtl-optimization/44194

 Again, tab before PR.

       * gcc.dg/pr44194-1.c: New test.

        Jakub

Thanks for the comments. The fixed ChangeLog given below.
-Easwaran

2011-04-22  Easwaran Raman  era...@google.com

PR rtl-optimization/44194
* dse.c: Include tree-flow.h.
(group_info): Add escaped_n and escaped_p fields.
(kill_on_calls): New variable.
(get_group_info): Initialize gi-escaped_n and gi-escaped_p.
(dse_step0): Initialize kill_on_calls.
(can_escape): New function.
(record_store): Pass EXPR corresponding to MEM to
set_usage_bits.
(dse_step2_nospill): Set kill_on_calls based on
group-escaped_n and group-escaped_n.
(scan_reads_nospill): Handle call instructions.
(find_insn_before_first_wild_read): Remove the function.
(dse_step3_scan): Remove call to find_insn_before_first_wild_read.
(dse_step5_nospill): Do not kill everything on call.

testsuite/ChangeLog:

2011-04-22  Easwaran Raman  era...@google.com

PR rtl-optimization/44194
* gcc.dg/pr44194-1.c: New test.

Re: Improve stack layout heuristic.

2011-04-21 Thread Easwaran Raman

On Thu, Apr 21, 2011 at 8:43 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Apr 21, 2011 at 5:22 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Wed, 20 Apr 2011, Easwaran Raman wrote:

 But you're right - not adding that conflict doesn't actually reduce the
 size of bit maps. Reverting back to what was there originally.

 Thanks, I have no more issues with the patch.  You'll need to find someone
 who can formally approve it, though.

 Ok with a proper changelog entry.

 Richard.


 Ciao,
 Michael.



Committed with the following Changelog entry:
2011-04-21  Easwaran Raman  era...@google.com

* gcc/cfgexpand.c (stack_var): Remove OFFSET...
(add_stack_var): ...and its reference here...
(expand_stack_vars): ...and here.
(stack_var_cmp): Sort by descending order of size.
(partition_stack_vars): Change heuristic.
(union_stack_vars): Fix to reflect changes in
partition_stack_vars.
(dump_stack_var_partition): Add newline after each partition.

testsuite/Changelog:

2011-04-21  Easwaran Raman  era...@google.com

* gcc.dg/stack-layout-2.c: New test.


Thanks,
Easwaran

1 2 >

1 - 100 of 111 matches

Mail list logo