Re: [4.9] PR 62146
I've attached the revised patch. Bootstrapped and no test regressions on x86_64/linux with 4.9 branch. Ok for 4.9 branch? While the bug doesn't show up in trunk, seems obvious that this should go to trunk as well. Is it ok for trunk if tests pass? Btw, is g++.dg/opt the right directory for the test to go? - ChangeLog 2014-09-04 Easwaran Raman era...@google.com PR rtl-optimization/62146 * ifcvt.c (dead_or_predicable): Make removal of REG_EQUAL note of hoisted instruction unconditional. 2014-09-04 Easwaran Raman era...@google.com PR rtl-optimization/62146 * testsuite/g++.dg/opt/pr62146.C: New. On Wed, Sep 3, 2014 at 11:36 AM, Jeff Law l...@redhat.com wrote: On 09/02/14 12:52, Easwaran Raman wrote: It turns out that the REG_EQUAL note is removed on a hoisted instruction (relevant code is in dead_or_predicable in ifcvt.c) if the source of the move instruction is not a function invariant. In this case, the source is a function invariant (constant) and so that doesn't kick in. I don't understand why this exemption for function invariant is there and the original thread in https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain either. Should I just remove the REG_EQUAL notes of all hoisted instructions or are there cases where it is safe to leave the note? I suspect it was left in in an attempt to keep as many REG_EQUAL notes as possible as the notes can help later optimizations. But even those equivalences are not necessarily safe. I'm pretty sure the right thing to do here is just drop the note regardless of whether or not its an invariant. jeff Index: testsuite/g++.dg/opt/pr62146.C === --- testsuite/g++.dg/opt/pr62146.C (revision 0) +++ testsuite/g++.dg/opt/pr62146.C (revision 0) @@ -0,0 +1,51 @@ +/* PR rtl-optimization/62146 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +class F +{ +public: +virtual ~ F (); +}; +template class CL class G:public F +{ +int *member_; +public: +G ( int *b): member_ (0) +{ +} +}; + +class D +{ +public: +template class CL void RegisterNonTagCallback (int, +void (CL:: + *p3) ()) +{ +InternalRegisterNonTag (p3 ? new G CL ( 0) : 0); +} void InternalRegisterNonTag (F *); +}; + +void fn1 (); +class C1 +{ +void foo(); +class TokenType +{ +public: +void AddToken () +{ +} +}; +C1::TokenType bar_t; +}; +D a; +void C1::foo() +{ +if (bar_t) +fn1 (); +for (int i = 0; i sizeof 0; ++i) +a.RegisterNonTagCallback (0, TokenType::AddToken); +} + +/* { dg-final { scan-assembler-not mov.*_ZN2C19TokenType8AddTokenEv, .\\\( } } */ Index: ifcvt.c === --- ifcvt.c (revision 214472) +++ ifcvt.c (working copy) @@ -4387,17 +4387,14 @@ dead_or_predicable (basic_block test_bb, basic_blo insn = head; do { - rtx note, set; + rtx note; if (! INSN_P (insn)) continue; note = find_reg_note (insn, REG_EQUAL, NULL_RTX); if (! note) continue; - set = single_set (insn); - if (!set || !function_invariant_p (SET_SRC (set)) - || !function_invariant_p (XEXP (note, 0))) - remove_note (insn, note); + remove_note (insn, note); } while (insn != end (insn = NEXT_INSN (insn))); /* PR46315: when moving insns above a conditional branch, the REG_EQUAL
Re: [4.9] PR 62146
It turns out that the REG_EQUAL note is removed on a hoisted instruction (relevant code is in dead_or_predicable in ifcvt.c) if the source of the move instruction is not a function invariant. In this case, the source is a function invariant (constant) and so that doesn't kick in. I don't understand why this exemption for function invariant is there and the original thread in https://gcc.gnu.org/ml/gcc/2005-05/msg01710.html doesn't explain either. Should I just remove the REG_EQUAL notes of all hoisted instructions or are there cases where it is safe to leave the note? Thanks, Easwaran On Fri, Aug 29, 2014 at 1:06 PM, Jeff Law l...@redhat.com wrote: On 08/25/14 16:42, Easwaran Raman wrote: This patch deletes REG_EQUAL note when a src register is replaced by a constant in an assignment. This is to prevent spurious equivalences between the constant and the expression in the REG_EQUAL note. In the bug reported in PR 62146, an assignment in one branch (which is actually dead) of an IF statement has a REG_EQUAL note equating a register with an expression. Conditional copy propagation replaces the register with 0. The instruction is hoisted above the branch subsequently and then the value 0 is equated with the expression in the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? This patch looks applicable to trunk as well, but I don't have a test case to reproduce the issue in trunk. Something doesn't feel right with this patch. It seems to me the real problem is when when hoist the insn with the note. If the equivalence implied by the note is no longer valid at the insn's new location, then the note needs to be removed. Now determining if the note is no longer valid at the new location may prove difficult ;-) You'd probably have to know why the note was created, how it was changed, etc. So I suspect the right thing to do is just remove REG_EQUAL notes on any insns we hoist in this manner. Jeff
Re: [4.9] PR 62146
On Mon, Aug 25, 2014 at 3:42 PM, Easwaran Raman era...@google.com wrote: This patch deletes REG_EQUAL note when a src register is replaced by a constant in an assignment. This is to prevent spurious equivalences between the constant and the expression in the REG_EQUAL note. In the bug reported in PR 62146, an assignment in one branch (which is actually dead) of an IF statement has a REG_EQUAL note equating a register with an expression. Conditional copy propagation replaces the register with 0. The instruction is hoisted above the branch subsequently and then the value 0 is equated with the expression in the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? This patch looks applicable to trunk as well, but I don't have a test case to reproduce the issue in trunk. ChangeLog: 2014-08-25 Easwaran Raman era...@google.com PR rtl-optimization/62146 * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is propagated into the src of an assignment. testsuite/ChangeLog 2014-08-25 Easwaran Raman era...@google.com PR rtl-optimization/62146 * testsuite/g++.dg/opt/pr62146.C: New.
[4.9] PR 62146
This patch deletes REG_EQUAL note when a src register is replaced by a constant in an assignment. This is to prevent spurious equivalences between the constant and the expression in the REG_EQUAL note. In the bug reported in PR 62146, an assignment in one branch (which is actually dead) of an IF statement has a REG_EQUAL note equating a register with an expression. Conditional copy propagation replaces the register with 0. The instruction is hoisted above the branch subsequently and then the value 0 is equated with the expression in the REG_EQUAL. Is this ok for 4.9 branch if all tests pass? This patch looks applicable to trunk as well, but I don't have a test case to reproduce the issue in trunk. ChangeLog: 2014-08-25 Easwaran Raman era...@google.com PR rtl-optimization/62146 * cprop.c (try_replace_reg): Remove REG_EQUAL note when a constant is propagated into the src of an assignment. testsuite/ChangeLog 2014-08-25 Easwaran Raman era...@google.com PR rtl-optimization/62146 * testsuite/g++.dg/opt/pr62146.C: New. Index: gcc/testsuite/g++.dg/opt/pr62146.C === --- gcc/testsuite/g++.dg/opt/pr62146.C (revision 0) +++ gcc/testsuite/g++.dg/opt/pr62146.C (revision 0) @@ -0,0 +1,51 @@ +/* PR rtl-optimization/62146 */ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +class F +{ +public: +virtual ~ F (); +}; +template class CL class G:public F +{ +int *member_; +public: +G ( int *b): member_ (0) +{ +} +}; + +class D +{ +public: +template class CL void RegisterNonTagCallback (int, +void (CL:: + *p3) ()) +{ +InternalRegisterNonTag (p3 ? new G CL ( 0) : 0); +} void InternalRegisterNonTag (F *); +}; + +void fn1 (); +class C1 +{ +void foo(); +class TokenType +{ +public: +void AddToken () +{ +} +}; +C1::TokenType bar_t; +}; +D a; +void C1::foo() +{ +if (bar_t) +fn1 (); +for (int i = 0; i sizeof 0; ++i) +a.RegisterNonTagCallback (0, TokenType::AddToken); +} + +/* { dg-final { scan-assembler-not mov.*_ZN2C19TokenType8AddTokenEv, .\\\( } } */ Index: gcc/cprop.c === --- gcc/cprop.c (revision 214472) +++ gcc/cprop.c (working copy) @@ -790,8 +790,11 @@ try_replace_reg (rtx from, rtx to, rtx insn) /* REG_EQUAL may get simplified into register. We don't allow that. Remove that note. This code ought not to happen, because previous code ought to synthesize - reg-reg move, but be on the safe side. */ - if (note REG_NOTE_KIND (note) == REG_EQUAL REG_P (XEXP (note, 0))) + reg-reg move, but be on the safe side. The REG_EQUAL note is + also removed when the source is a constant. */ + if (note REG_NOTE_KIND (note) == REG_EQUAL + (REG_P (XEXP (note, 0)) + || (set CONSTANT_P (SET_SRC (set) remove_note (insn, note); return success;
Re: Fix for PR c++/59031
Ping. On Mon, Nov 11, 2013 at 9:46 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman era...@google.com wrote: Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL is set and the call is direct. The changes at r193504 (to fix PR c++/11750) caused a regression to this behavior. This patch attempts to fix that. Bootstraps and no test regressions on x86_64/linux. Is this a correct fix and is this ok for trunk? Thanks, Easwaran 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * call.c (build_new_method_call_1): Comnpare function context with BASELINK_BINFO type rather than instance type before marking the call with LOOKUP_NONVIRTUAL. testsuite/ChangeLog: 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * g++.dg/inherit/virtual11.C: New test.
Re: Fix for PR c++/59031
Ping. On Thu, Nov 7, 2013 at 11:14 AM, Easwaran Raman era...@google.com wrote: Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL is set and the call is direct. The changes at r193504 (to fix PR c++/11750) caused a regression to this behavior. This patch attempts to fix that. Bootstraps and no test regressions on x86_64/linux. Is this a correct fix and is this ok for trunk? Thanks, Easwaran 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * call.c (build_new_method_call_1): Comnpare function context with BASELINK_BINFO type rather than instance type before marking the call with LOOKUP_NONVIRTUAL. testsuite/ChangeLog: 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * g++.dg/inherit/virtual11.C: New test.
Fix for PR c++/59031
Before r193504, if a method can not be overridden, LOOKUP_NONVIRTUAL is set and the call is direct. The changes at r193504 (to fix PR c++/11750) caused a regression to this behavior. This patch attempts to fix that. Bootstraps and no test regressions on x86_64/linux. Is this a correct fix and is this ok for trunk? Thanks, Easwaran 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * call.c (build_new_method_call_1): Comnpare function context with BASELINK_BINFO type rather than instance type before marking the call with LOOKUP_NONVIRTUAL. testsuite/ChangeLog: 2013-11-07 Easwaran Raman era...@google.com PR c++/59031 * g++.dg/inherit/virtual11.C: New test. Index: testsuite/g++.dg/inherit/virtual11.C === --- testsuite/g++.dg/inherit/virtual11.C (revision 0) +++ testsuite/g++.dg/inherit/virtual11.C (revision 0) @@ -0,0 +1,17 @@ +// PR c++/59031 +// { dg-do compile } +// { dg-options -fdump-tree-gimple } +class B { + public: + virtual int add (int a, int b) {return a+ b;} +}; + +class D : public B { +}; + +int foo (int a, int b) { + D d; + return d.add(a, b); +} +// { dg-final { scan-tree-dump-not OBJ_TYPE_REF gimple } } +// { dg-final { cleanup-tree-dump gimple } } Index: cp/call.c === --- cp/call.c (revision 204466) +++ cp/call.c (working copy) @@ -7511,7 +7511,7 @@ build_new_method_call_1 (tree instance, tree fns, struct z_candidate *candidates = 0, *cand; tree explicit_targs = NULL_TREE; tree basetype = NULL_TREE; - tree access_binfo; + tree access_binfo, binfo; tree optype; tree first_mem_arg = NULL_TREE; tree name; @@ -7550,6 +7550,7 @@ build_new_method_call_1 (tree instance, tree fns, if (!conversion_path) conversion_path = BASELINK_BINFO (fns); access_binfo = BASELINK_ACCESS_BINFO (fns); + binfo = BASELINK_BINFO (fns); optype = BASELINK_OPTYPE (fns); fns = BASELINK_FUNCTIONS (fns); if (TREE_CODE (fns) == TEMPLATE_ID_EXPR) @@ -7802,7 +7803,7 @@ build_new_method_call_1 (tree instance, tree fns, do not want to perform a non-virtual call. */ if (DECL_VINDEX (fn) ! (flags LOOKUP_NONVIRTUAL) same_type_ignoring_top_level_qualifiers_p - (DECL_CONTEXT (fn), TREE_TYPE (instance)) + (DECL_CONTEXT (fn), TREE_TYPE (binfo)) resolves_to_fixed_type_p (instance, 0)) flags |= LOOKUP_NONVIRTUAL; if (explicit_targs)
Re: Add a param to decide stack slot sharing at -O0
On Wed, Oct 9, 2013 at 4:11 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Oct 8, 2013 at 11:04 PM, Easwaran Raman era...@google.com wrote: In cfgexpand.c, variables in non-overlapping lexical scopes are assigned same stack locations at -O1 and above. At -O0, this is attempted only if the size of the stack objects is above a threshold (32). The rationale is at -O0, more variables are going to be in the stack and the O(n^2) stack slot sharing algorithm will increase the compilation time. This patch replaces the constant with a param which is set to 32 by default. We ran into a case where the presence of always_inline attribute triggered Wframe-larger-than warnings at -O0 but not at -O2 since the different inlined copies share the stack. We are ok with a slight increase in compilation time to get smaller stack frames even at -O0 and this patch would allow us do that easily. Bootstraps on x86_64/linux. Is this ok for trunk? Ok with +DEFPARAM (PARAM_MIN_SIZE_FOR_STACK_SHARING, + min-size-for-stack-sharing, + Attempt to share stack slots among variables in different lexical blocks + at O0 only if their sizes exceed this value, + 32, 0, 0) changed to The minimum size of variables taking part in stack slot sharing when not optimizing 32, 0, 0) And with adding documentation for that param in doc/invoke.texi. Btw, I'm not sure the sharing algorithm is still quadratic - can you investigate on that? The partition_stack_vars is still quadratic in worst case (all live ranges overlap), but the expensive part is likely to be the add_scope_conflicts function. - Easwaran Thanks, Richard. Thanks, Easwaran 2013-10-08 Easwaran Raman era...@google.com * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param... * cfgexpand.c (defer_stack_allocation): ...use here
Add a param to decide stack slot sharing at -O0
In cfgexpand.c, variables in non-overlapping lexical scopes are assigned same stack locations at -O1 and above. At -O0, this is attempted only if the size of the stack objects is above a threshold (32). The rationale is at -O0, more variables are going to be in the stack and the O(n^2) stack slot sharing algorithm will increase the compilation time. This patch replaces the constant with a param which is set to 32 by default. We ran into a case where the presence of always_inline attribute triggered Wframe-larger-than warnings at -O0 but not at -O2 since the different inlined copies share the stack. We are ok with a slight increase in compilation time to get smaller stack frames even at -O0 and this patch would allow us do that easily. Bootstraps on x86_64/linux. Is this ok for trunk? Thanks, Easwaran 2013-10-08 Easwaran Raman era...@google.com * params.def (PARAM_MIN_SIZE_FOR_STACK_SHARING): New param... * cfgexpand.c (defer_stack_allocation): ...use here cfgexpand.patch Description: Binary data
Don't drop attributes on template member functions
This attempts to address one issue reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33911 . Attributes at the end of a declaration of template member functions are discarded in cp_parser_init_declarator. The suggested workaround is to place the attributes into the declaration-specifiers. The code in cp_parser_init_declarator actually parses the attributes at the end, but only passes the prefix_attributes (from decl-specifier-seq) to grokfield(). Appending the parsed atrtributes to prefix_attributes seems to work, but I don't know if this is sufficient in all cases. Bootstraps and all tests pass with the patch on x86_64/linux. Is this ok for trunk? 2013-10-02 Easwaran Raman era...@google.com PR c++/33911 * parser.c (cp_parser_init_declarator): Do not drop attributes of template member functions. 2013-10-02 Easwaran Raman era...@google.com PR c++/33911 * g++.dg/ext/attribute47.C: New. parser_attribute.patch Description: Binary data
[Google] Adjust comdat-sharing-probability param for -Os
This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran comdat_sharing.patch Description: Binary data
Re: [Google] Adjust comdat-sharing-probability param for -Os
On Fri, Sep 27, 2013 at 4:08 PM, Xinliang David Li davi...@google.com wrote: d.growth -= (info-size * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) + 50) / 100; What is the purpose of '50' here? Rounding after division by 100. The patch is fine for Google branch. Other tunings to think about -- I think the sharing probability should not be a fixed value -- but depending on the function's charateristics -- such as size, number of callsites etc. For instance, for small leaf comdat functions, the sharing probability will be small. David On Fri, Sep 27, 2013 at 2:57 PM, Easwaran Raman era...@google.com wrote: This patch increases comdat-sharing-probability to 80 under -Os. This reduces the amount of inlining and helps internal benchmarks. Unfortunately, this causes slight regression on spec 2006. Ok for google branches if all tests pass? - Easwaran
Re: Fix PR middle-end/57393
Ping. On Mon, Sep 16, 2013 at 4:42 PM, Easwaran Raman era...@google.com wrote: There are two separate root causes for the problem reported in PR 57393. This patch attempts to fix both. First is due to newly created stmts that have the default UID of 0 which are compared with statements with valid UIDs leading to broken dependences. As discussed in an earlier thread, I check the UIDs before using them and ensure stmts have a valid UID. In the worst case, this reassigns UIDs for the entire BB containing the stmts in question. The second is due to debug stmts being out of sync with the IR after reassociation. I think the right fix is to create debug temps before actual reassociation to decouple them from the SSA variables involved in reassociation. This bootstraps in x86_64 and I am running the tests. Ok for trunk? Thanks, Easwaran 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove. (build_and_add_sum): Do not set UID of newly created statements. (ensure_valid_uid): New function, (find_insert_point): called here before UIDs are compared. (insert_stmt_after): Do not reset debug statements. (regenerate_debug_stmts): New function, (reassociate_bb): use here. testsuite/ChangeLog: 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * gcc.dg/tree-ssa/reassoc-32.c: New testcase. * gcc.dg/tree-ssa/reassoc-33.c: New testcase. * gcc.dg/tree-ssa/reassoc-34.c: New testcase.
Fix PR middle-end/57393
There are two separate root causes for the problem reported in PR 57393. This patch attempts to fix both. First is due to newly created stmts that have the default UID of 0 which are compared with statements with valid UIDs leading to broken dependences. As discussed in an earlier thread, I check the UIDs before using them and ensure stmts have a valid UID. In the worst case, this reassigns UIDs for the entire BB containing the stmts in question. The second is due to debug stmts being out of sync with the IR after reassociation. I think the right fix is to create debug temps before actual reassociation to decouple them from the SSA variables involved in reassociation. This bootstraps in x86_64 and I am running the tests. Ok for trunk? Thanks, Easwaran 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * tree-ssa-reassoc.c (get_stmt_uid_with_default): Remove. (build_and_add_sum): Do not set UID of newly created statements. (ensure_valid_uid): New function, (find_insert_point): called here before UIDs are compared. (insert_stmt_after): Do not reset debug statements. (regenerate_debug_stmts): New function, (reassociate_bb): use here. testsuite/ChangeLog: 2013-09-16 Easwaran Raman era...@google.com PR middle-end/57393 * gcc.dg/tree-ssa/reassoc-32.c: New testcase. * gcc.dg/tree-ssa/reassoc-33.c: New testcase. * gcc.dg/tree-ssa/reassoc-34.c: New testcase. pr57393.patch Description: Binary data
Re: Fix PR middle-end/57370
On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman era...@google.com wrote: On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. I looked into the problem related to the debug stmts in this failing test case in detail. Initially, the code sequence looks s$n_13 = MEM[(struct S *)s]; # DEBUG s$n = s$n_13 _2 = (double) s$n_13; _4 = _2 * a_3(D); _6 = (double) i_5(D); _7 = _4 * _6; _9 = (double) j_8(D); _10 = _7 * _9; bar (_10); # DEBUG D#2 = (double) k_12(D) # DEBUG D#1 = _7 * D#2 # DEBUG t$n = (int) D#1 After reassociation s$n_13 = MEM[(struct S *)s]; # DEBUG s$n = s$n_13 _2 = (double) s$n_13; _6 = (double) i_5(D); # DEBUG D#3 = _4 * _6 _9 = (double) j_8(D); _4 = _9 * _2; _7 = _4 * a_3(D); _10 = _7 * _6; bar (_10); # DEBUG D#2 = (double) k_12(D) # DEBUG D#1 = D#3 * D#2 # DEBUG t$n = (int) D#1 In the above, # DEBUG D#3 = _4 * _6 appears above the statement that defines _4. But even if I move the def of D#3 below the statement defining _4, it is not sufficient. Before reassociation, t$n refers to the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * k_12(D)). It seems the correct fix is to discard the debug temps whose RHS contains a variable that is involved in reassociation and then reconstruct it. Is that the expected fix here? The value of the DEBUG expression changes because the value that _4 computes changes - that is a no-no that may not happen. We cannot re-use _4 (without previously releasing it and allocating it newly) with a new value. releasing _4 should have fixed up the debug stmts. So - can you verify whether we are indeed just replacing the RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of that expression? I have confirmed that the SSA name of the LHS remains the same. The reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2 and then calls update_stmt. - Easwaran Another reason why re-using the same LHS for another value is wrong is that annotated information like points-to sets, alignment and soon value-range information will be wrong. Thanks, Richard. - Easwaran testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225
Re: Fix PR middle-end/57370
On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. I looked into the problem related to the debug stmts in this failing test case in detail. Initially, the code sequence looks s$n_13 = MEM[(struct S *)s]; # DEBUG s$n = s$n_13 _2 = (double) s$n_13; _4 = _2 * a_3(D); _6 = (double) i_5(D); _7 = _4 * _6; _9 = (double) j_8(D); _10 = _7 * _9; bar (_10); # DEBUG D#2 = (double) k_12(D) # DEBUG D#1 = _7 * D#2 # DEBUG t$n = (int) D#1 After reassociation s$n_13 = MEM[(struct S *)s]; # DEBUG s$n = s$n_13 _2 = (double) s$n_13; _6 = (double) i_5(D); # DEBUG D#3 = _4 * _6 _9 = (double) j_8(D); _4 = _9 * _2; _7 = _4 * a_3(D); _10 = _7 * _6; bar (_10); # DEBUG D#2 = (double) k_12(D) # DEBUG D#1 = D#3 * D#2 # DEBUG t$n = (int) D#1 In the above, # DEBUG D#3 = _4 * _6 appears above the statement that defines _4. But even if I move the def of D#3 below the statement defining _4, it is not sufficient. Before reassociation, t$n refers to the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * k_12(D)). It seems the correct fix is to discard the debug temps whose RHS contains a variable that is involved in reassociation and then reconstruct it. Is that the expected fix here? - Easwaran testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568
Re: Fix PR middle-end/57370
Submitted the patch (r202262) without the insert_stmt_after hunk. Also fixed nits pointed out by Jakub. - Easwaran On Mon, Sep 2, 2013 at 2:31 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Aug 30, 2013 at 6:13 PM, Easwaran Raman era...@google.com wrote: On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote: Richard, Do you want me to commit everything but the insert_stmt_after hunk now? Yes. There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. In the end all this placement stuff should be reverted and done as post-processing after reassoc is finished. You can remember the inserted stmts that are candidates for moving (just set a pass-local flag on them) and assign UIDs for the whole function after all stmts are inserted. The problem is we need sane UID values as reassoc is happening - not after it is finished. As it process each stmt in reassoc_bb, it might generate new stmts which might have to be compared in find_insert_point or appears_later_in_bb. But if you no longer find_insert_point during reassoc but instead do a scheduling pass after it finished you won't need sane UIDs during reassoc. Richard. - Easwaran Richard. - Easwaran On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446
Re: Fix PR middle-end/57370
On Fri, Aug 30, 2013 at 1:25 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Aug 30, 2013 at 2:30 AM, Easwaran Raman era...@google.com wrote: Richard, Do you want me to commit everything but the insert_stmt_after hunk now? Yes. There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. In the end all this placement stuff should be reverted and done as post-processing after reassoc is finished. You can remember the inserted stmts that are candidates for moving (just set a pass-local flag on them) and assign UIDs for the whole function after all stmts are inserted. The problem is we need sane UID values as reassoc is happening - not after it is finished. As it process each stmt in reassoc_bb, it might generate new stmts which might have to be compared in find_insert_point or appears_later_in_bb. - Easwaran Richard. - Easwaran On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525
Re: Fix PR middle-end/57370
On Fri, Aug 30, 2013 at 9:26 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Aug 30, 2013 at 09:13:34AM -0700, Easwaran Raman wrote: There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. In the end all this placement stuff should be reverted and done as post-processing after reassoc is finished. You can remember the inserted stmts that are candidates for moving (just set a pass-local flag on them) and assign UIDs for the whole function after all stmts are inserted. The problem is we need sane UID values as reassoc is happening - not after it is finished. As it process each stmt in reassoc_bb, it might generate new stmts which might have to be compared in find_insert_point or appears_later_in_bb. A new stmt will be created with UID 0 and in case there is stmt movement, you could zero the UID during movement. Then, you can just special case UID zero when you are looking at UIDs. As on trunk/4.8 gsi_for_stmt is O(1), you can easily walk a couple of previous and/or later stmts and look for the first non-zero UID around it, and treat it as if the UID was previous non-zero UID + 0.5 or next non-zero UID - 0.5. And, if you see too many consecutive statements with UID 0 (more than some constant, 32?), just reassign UIDs to the whole bb. Or you could initially assign UIDs with some gaps, then keep filling those gaps and once you fill them, renumber again with new gaps. Jakub Yes, this is pretty much what I was proposing. The current implementation doesn't rely on UIDs being unique - they only have to be monotonically non-decreasing. So, when a UID of 0 is encountered, go up till a non-zero UID is found and then go down and assign this non-zero UID. This effectively implies that each UID-0 stmt is visited at most twice. I don't think we need to check if I see more than certain number of UID-0 stmts and redo the entire BB. - Easwaran
Re: [Google] Refine hot caller heuristic
On Tue, Aug 20, 2013 at 9:35 PM, Xinliang David Li davi...@google.com wrote: Do you need to guard the jump function access with check if (ipa_node_params_vector.exists ())? I believe it is not necessary since, for example, ipa_analyze_node calls ipa_check_create_node_params that calls create. But I see it is used in ipa-inline-analysis.c everywhere. So I have added a check and conservatively return false. Ideally, useful_cold_callee should be folded into the inline hints estimation. Question about the heuristic: why filtering out PASS_THROUGH parameter cases completely? Passing 'this' parameter in many cases can result in good PRE opportunities. Why not skip the unknown type? The rationale is it is useful to inline bar into foo in the snippet below: void foo () { A a; bar(a); ... } Capturing this requires UNKNOWN and KNOWN_TYPE jump functions. I have changed the check accordingly. I have attached the new patch. - Easwaran David On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman era...@google.com wrote: The current hot caller heuristic simply promotes edges whose caller is hot. This patch does the following: * Turn it off for applications with large footprint since the size increase hurts them * Be more selective by considering arguments to callee when the heuristic is enabled. This performs well on internal benchmarks. Ok for google/4_8 branch if all tests pass? - Easwaran hot_caller.patch Description: Binary data
Re: [Google] Refine hot caller heuristic
On Wed, Aug 21, 2013 at 6:47 AM, Teresa Johnson tejohn...@google.com wrote: +/* Knob to control hot-caller heuristic. 0 means it is turned off, 1 means + it is always applied, and 2 means it is applied only if the footprint is + smaller than PARAM_HOT_CALLER_CODESIZE_THRESHOLD. */ DEFPARAM (PARAM_INLINE_HOT_CALLER, inline-hot-caller, Consider cold callsites for inlining if caller contains hot code, + 2, 0, 2) + +/* The maximum code size estimate under which hot caller heuristic is + applied. */ +DEFPARAM(PARAM_HOT_CALLER_CODESIZE_THRESHOLD, + hot-caller-codesize-threshold, + Maximum profile-based code size footprint estimate for + hot caller heuristic , + 1, 0, 0) Out of curiousity, how sensitive is performance to the value of this parameter? I.e. is there a clear cutoff for the codes that benefit from disabling this inlining vs those that benefit from enabling it? Also, have you tried spec2006? I remember that the codesize of the gcc benchmark was above the larger 15000 threshold I use for tuning down unrolling/peeling, and I needed to refine my heuristics to identify profitable loops to unroll/peel even in the case of large codesize. I'm not sure if there are more benchmarks that will be above the smaller 10K threshold. The cutoff works well for internal benchmarks, but there is a wide range to set the value. I ran spec2006 and gcc benchmark does well. However, there are two c++ benchmarks (which fall below the cutoff) that degrades if I exclude PASS_THRU. I'll investigate that later. + +DEFPARAM (PARAM_INLINE_USEFUL_COLD_CALLEE, + inline-useful-cold-callee, + Consider cold callsites for inlining if caller contains hot code, 1, 0, 1) The description of this param is wrong (it is the same as the description of PARAM_INLINE_HOT_CALLER). It should probably be something like Only consider cold callsites for inlining if analysis finds optimization opportunities Thanks. Fixed. /* Limit of iterations of early inliner. This basically bounds number of Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 201768) +++ gcc/ipa-inline.c (working copy) @@ -528,12 +528,60 @@ big_speedup_p (struct cgraph_edge *e) return false; } +/* Returns true if callee of edge E is considered useful to inline + even if it is cold. A callee is considered useful if there is at + least one argument of pointer type that is not a pass-through. */ Can you expand this comment a bit to add why such arguments indicate useful inlining? Added comments. - Easwaran Thanks, Teresa + +static inline bool +useful_cold_callee (struct cgraph_edge *e) +{ + gimple call = e-call_stmt; + int n, arg_num = gimple_call_num_args (call); + struct ipa_edge_args *args = IPA_EDGE_REF (e); + + for (n = 0; n arg_num; n++) +{ + tree arg = gimple_call_arg (call, n); + if (POINTER_TYPE_P (TREE_TYPE (arg))) +{ + struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, n); + if (jfunc-type != IPA_JF_PASS_THROUGH) +return true; +} +} + return false; +} + +/* Returns true if hot caller heuristic should be used. */ + +static inline bool +enable_hot_caller_heuristic (void) +{ + + gcov_working_set_t *ws = NULL; + int size_threshold = PARAM_VALUE (PARAM_HOT_CALLER_CODESIZE_THRESHOLD); + int num_counters = 0; + int param_inline_hot_caller = PARAM_VALUE (PARAM_INLINE_HOT_CALLER); + + if (param_inline_hot_caller == 0) +return false; + else if (param_inline_hot_caller == 1) +return true; + + ws = find_working_set(PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE)); + if (!ws) +return false; + num_counters = ws-num_counters; + return num_counters = size_threshold; + +} /* Returns true if an edge or its caller are hot enough to be considered for inlining. */ static bool edge_hot_enough_p (struct cgraph_edge *edge) { + static bool use_hot_caller_heuristic = enable_hot_caller_heuristic (); if (cgraph_maybe_hot_edge_p (edge)) return true; @@ -543,9 +591,17 @@ edge_hot_enough_p (struct cgraph_edge *edge) if (flag_auto_profile edge-callee-count == 0 edge-callee-max_bb_count 0) return false; - if (PARAM_VALUE (PARAM_INLINE_HOT_CALLER) - maybe_hot_count_p (NULL, edge-caller-max_bb_count)) -return true; + if (use_hot_caller_heuristic) +{ + struct cgraph_node *where = edge-caller; + if (maybe_hot_count_p (NULL, where-max_bb_count)) +{ + if (PARAM_VALUE (PARAM_INLINE_USEFUL_COLD_CALLEE)) +return useful_cold_callee (edge); + else +return true; +} +} return false; } On Tue, Aug 20, 2013 at 12:26 PM, Easwaran Raman era...@google.com wrote: The current hot caller heuristic simply promotes edges whose
Re: Fix PR middle-end/57370
Richard, Do you want me to commit everything but the insert_stmt_after hunk now? There are more similar failures reported in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57393 and I have attached the updated patch there. Shall I send that for review? Apart from the debug statement issue, almost all the bugs are due to dependence violation because certain newly inserted statements do not have the right UID. Instead of trying to catch all of them, will it be better if I check if the stmt has a proper uid (non-zero if it is not the first stmt) and assign a sensible value at the point where it is used (find_insert_point and appears_later_in_bb) instead of where the stmt is created? I think that would be less brittle. - Easwaran On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. Missing ChangeLog entry for the insert_stmt_after hunk which I do not like at all. The other hunks are ok, but we need to work harder to preserve debug stmts - simply removing all is not going to fly. Richard. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568 + t5572 + +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - +0.36e0_dp * t83 * t84 * t5613 + +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii
[Google] Refine hot caller heuristic
The current hot caller heuristic simply promotes edges whose caller is hot. This patch does the following: * Turn it off for applications with large footprint since the size increase hurts them * Be more selective by considering arguments to callee when the heuristic is enabled. This performs well on internal benchmarks. Ok for google/4_8 branch if all tests pass? - Easwaran hot_caller.patch Description: Binary data
Re: Fix PR middle-end/57370
Ping. On Wed, Jul 31, 2013 at 4:34 PM, Easwaran Raman era...@google.com wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568 + t5572 + +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - +0.36e0_dp * t83 * t84 * t5613 + +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + + t6227 * sx +t6352 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 + +0.40e1_dp * t102 * t327 * t2057 * t557 - +t5456 + t5459 - t5462 + t5466 - +t5468 +t6363 = t5480 - t5489 + +0.96e2_dp * t1054 * t640 * t3679 +t6367 = t5472 - t5474 - t5478 - t5523 + +t5525 + t5531 + t5533 + t5535 + +t5537 - 0.20e1_dp * t102 * t105 * t6363 + +t5540 +t6370 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.40e1_dp * t5565 + +t5568 + t5572 + t5574 +t6373 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t6375 = t6352 + t6367 + t6370 + t6373 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701
Re: Fix PR middle-end/57370
I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. testsuite/ChangeLog: 2013-07-31 Easwaran Raman era...@google.com PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman era...@google.com wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568 + t5572 + +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - +0.36e0_dp * t83 * t84 * t5613 + +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + + t6227 * sx +t6352 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 + +0.40e1_dp * t102 * t327 * t2057 * t557 - +t5456 + t5459 - t5462 + t5466 - +t5468 +t6363 = t5480 - t5489 + +0.96e2_dp * t1054 * t640 * t3679 +t6367 = t5472 - t5474 - t5478 - t5523 + +t5525 + t5531 + t5533 + t5535 + +t5537 - 0.20e1_dp * t102 * t105 * t6363 + +t5540 +t6370 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.40e1_dp * t5565 + +t5568 + t5572 + t5574 +t6373 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t6375 = t6352 + t6367 + t6370 + t6373 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665) +e_ndrho_ndrho_ndrho(ii
Re: Fix PR middle-end/57370
Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman era...@google.com wrote: A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568 + t5572 + +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - +0.36e0_dp * t83 * t84 * t5613 + +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + + t6227 * sx +t6352 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 + +0.40e1_dp * t102 * t327 * t2057 * t557 - +t5456 + t5459 - t5462 + t5466 - +t5468 +t6363 = t5480 - t5489 + +0.96e2_dp * t1054 * t640 * t3679 +t6367 = t5472 - t5474 - t5478 - t5523 + +t5525 + t5531 + t5533 + t5535 + +t5537 - 0.20e1_dp * t102 * t105 * t6363 + +t5540 +t6370 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.40e1_dp * t5565 + +t5568 + t5572 + t5574 +t6373 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t6375 = t6352 + t6367 + t6370 + t6373 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665) +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + + t6669 * sx + END IF + END DO + END SUBROUTINE xb88_lr_adiabatic_lda_calc + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c(revision 200429) +++ gcc/tree-ssa-reassoc.c(working copy) @@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2, if (gimple_code (op1def) == GIMPLE_PHI) { gsi = gsi_after_labels (gimple_bb (op1def)); - gimple_set_uid (sum, gimple_uid (op1def)); + gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi))); gsi_insert_before (gsi, sum, GSI_NEW_STMT); } else
Fix PR middle-end/57370
A newly generated statement in build_and_add_sum function of tree-ssa-reassoc.c has to be assigned the UID of its adjacent statement. In one instance, it was assigned the wrong uid (of an earlier phi statement) which messed up the IR and caused the test program to hang. Bootstraps and no test regressions on x86_64/linux. Ok for trunk? Thanks, Easwaran 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a phi node for a non-phi gimple statement. testsuite/ChangeLog: 2013-06-27 Easwaran Raman era...@google.com PR middle-end/57370 * gfortran.dg/reassoc_12.f90: New testcase. Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 === --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) @@ -0,0 +1,74 @@ +! { dg-do compile } +! { dg-options -O2 -ffast-math } +! PR middle-end/57370 + + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, + grad_deriv,npoints, sx) +IMPLICIT REAL*8 (t) +INTEGER, PARAMETER :: dp=8 +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, + e_ndrho_ndrho_rho + DO ii=1,npoints + IF( grad_deriv = 2 .OR. grad_deriv == -2 ) THEN +t1425 = t233 * t557 +t1429 = beta * t225 +t1622 = t327 * t1621 +t1626 = t327 * t1625 +t1632 = t327 * t1631 +t1685 = t105 * t1684 +t2057 = t1636 + t8 * (t2635 + t3288) + END IF + IF( grad_deriv = 3 .OR. grad_deriv == -3 ) THEN +t5469 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 - t5456 + t5459 - +t5462 + t5466 - t5468 +t5478 = 0.240e2_dp * t1616 * t973 * t645 * t1425 +t5489 = 0.16e2_dp * t1429 * t1658 +t5531 = 0.160e2_dp * t112 * t1626 +t5533 = 0.160e2_dp * t112 * t1632 +t5537 = 0.160e2_dp * t112 * t1622 +t5541 = t5472 - t5478 - t5523 + t5525 + +t5531 + t5533 + t5535 + t5537 + +t5540 +t5565 = t112 * t1685 +t5575 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.80e1_dp * t5565 + t5568 + t5572 + +t5574 +t5611 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t5613 = t5469 + t5541 + t5575 + t5611 +t6223 = t6189 - +0.36e0_dp * t83 * t84 * t5613 + +t6222 +t6227 = - t8 * (t5305 + t6223) +e_ndrho_ndrho_rho(ii) = e_ndrho_ndrho_rho(ii) + + t6227 * sx +t6352 = t5440 - t5443 - t5446 - t5449 - +t5451 - t5454 + +0.40e1_dp * t102 * t327 * t2057 * t557 - +t5456 + t5459 - t5462 + t5466 - +t5468 +t6363 = t5480 - t5489 + +0.96e2_dp * t1054 * t640 * t3679 +t6367 = t5472 - t5474 - t5478 - t5523 + +t5525 + t5531 + t5533 + t5535 + +t5537 - 0.20e1_dp * t102 * t105 * t6363 + +t5540 +t6370 = t5545 - t5548 + t5551 + t5553 - +t5558 + t5560 - t5562 + t5564 - +0.40e1_dp * t5565 + +t5568 + t5572 + t5574 +t6373 = t5579 - t5585 + t5590 - t5595 + +t5597 - t5602 + t5604 + t5607 + +t5610 +t6375 = t6352 + t6367 + t6370 + t6373 +t6380 = - 0.36e0_dp * t83 * t84 * t6375 + t5701 +t6669 = -t4704 - t8 * (t6344 + t6380 + t6665) +e_ndrho_ndrho_ndrho(ii) = e_ndrho_ndrho_ndrho(ii) + + t6669 * sx + END IF + END DO + END SUBROUTINE xb88_lr_adiabatic_lda_calc + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c(revision 200429) +++ gcc/tree-ssa-reassoc.c(working copy) @@ -1207,7 +1207,7 @@ build_and_add_sum (tree type, tree op1, tree op2, if (gimple_code (op1def) == GIMPLE_PHI) { gsi = gsi_after_labels (gimple_bb (op1def)); - gimple_set_uid (sum, gimple_uid (op1def)); + gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi))); gsi_insert_before (gsi, sum, GSI_NEW_STMT); } else
[google gcc-4_8] Change size accounting during inlining in lipo mode
In lipo mode, this patch updates the overall unit size only when the eventual function to which the callee is inlined is in primary module. This is to avoid the situation where the module growth budget is used up by inlines into auxiliary module functions that never get inlined into some primary module function. Ok for google/gcc-4_8 branch? Thanks, Easwaran lipo_inline.patch Description: Binary data
Re: PR tree-optimization/57337
On Tue, May 28, 2013 at 4:11 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 27, 2013 at 10:20 AM, Richard Biener richard.guent...@gmail.com wrote: On Sun, May 26, 2013 at 5:53 AM, Easwaran Raman era...@google.com wrote: On Sat, May 25, 2013 at 4:46 AM, Richard Biener richard.guent...@gmail.com wrote: Easwaran Raman era...@google.com wrote: In that case, if my insert_stmt immediately follows dep_stmt and both have the same UID, not_dominated_by would return true and I will end up updating insert_stmt to dep_stmt which is wrong. But there should be a safe default answer for Equal uids. Unless we are asking different questions in different places. Thus, I do not like the stmt walking but rather have a safe fallback. I am lost here. I don't see how we could avoid doing the stmt walking to resolve the equal uid case. How to ensure that not_dominated_by (a, b) returns true and not_dominated_by (b, a) returns false if A and B have the same UID and A appears before B without doing the statement walk. And, I don't see why the statement walk is bad. It is not likely that there is a long sequence of statements with the same UID. Sure, but if you are always asking a question like is placing X before Y ok? then you can conservatively answer no and code should handle that ok. If you are asking questions both way then of course no conservative answer is possible. Both current uses of not_dominated_by are of the same kind, if the placement is not ok then either the insert point needs adjustment or the debug stmt reset. Ok, thinking about this more I come to the conclusion that a safe default for equal UIDs is not possible as we may be not able to order two dep_stmts. I still dislike walking though, but to get the quite frequent regressions fixed the patch is ok as-is. Please install. Thanks, Richard. Thanks. Submitted the patch at r199385. - Easwaran
Fix PR 57442
I made a thinko when I asserted gcc_unreachable outside the main loop of appears_later_in_bb in my previous fix to PR 57337. It is quite possible for STMT1 to be followed by a one or more statements with the same UID till the end of the BB without encountering STMT2. Right fix is to return STMT1 outside the loop. Ok for trunk? Thanks, Easwaran 2013-05-28 Easwaran Raman era...@google.com PR tree-optimization/57442 * tree-ssa-reassoc.c (appears_later_in_bb): Return correct value when control exits the main loop. 2013-05-28 Easwaran Raman era...@google.com PR tree-optimization/57442 * gcc.dg/tree-ssa/reassoc-30.c: New testcase. Index: gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c === --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-30.c (revision 0) @@ -0,0 +1,13 @@ +/* PR tree-optimization/57442 */ +/* { dg-do compile } */ +/* { dg-options -O1 } */ +short a; +unsigned b; +long c; +int d; + +void f(void) +{ +b = a ? : (a = b) - c + (d - (b + b)); +} + Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199385) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2888,7 +2888,7 @@ appears_later_in_bb (gimple stmt1, gimple stmt2) else if (stmt == stmt2) return stmt2; } - gcc_unreachable (); + return stmt1; } /* Find the statement after which STMT must be moved so that the
Re: PR tree-optimization/57337
On Sat, May 25, 2013 at 4:46 AM, Richard Biener richard.guent...@gmail.com wrote: Easwaran Raman era...@google.com wrote: In that case, if my insert_stmt immediately follows dep_stmt and both have the same UID, not_dominated_by would return true and I will end up updating insert_stmt to dep_stmt which is wrong. But there should be a safe default answer for Equal uids. Unless we are asking different questions in different places. Thus, I do not like the stmt walking but rather have a safe fallback. I am lost here. I don't see how we could avoid doing the stmt walking to resolve the equal uid case. How to ensure that not_dominated_by (a, b) returns true and not_dominated_by (b, a) returns false if A and B have the same UID and A appears before B without doing the statement walk. And, I don't see why the statement walk is bad. It is not likely that there is a long sequence of statements with the same UID. Thanks, Easwaran Richard. - Easwaran On Fri, May 24, 2013 at 1:07 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman era...@google.com wrote: This addresses the case where UID alone is not sufficient to figure out which statement appears earlier in a BB. Bootstraps and no test regressions in x86_64 on linux. Ok for trunk? Why not simply conservatively use gimple_uid (a) = gimple_uid (b) in not_dominated_by? Richard. Thanks, Easwaran 2013-05-23 Easwaran Raman era...@google.com PR tree-optimization/57337 * tree-ssa-reassoc.c (appears_later_in_bb): New function. (find_insert_point): Correctly identify the insertion point when two statements with the same UID is compared. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199211) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) } +/* Among STMT1 and STMT2, return the statement that appears later. Both + statements are in same BB and have the same UID. */ + +static gimple +appears_later_in_bb (gimple stmt1, gimple stmt2) +{ + unsigned uid = gimple_uid (stmt1); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); + gsi_next (gsi); + if (gsi_end_p (gsi)) +return stmt1; + for (; !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + /* If STMT has a different UID than STMT1 and we haven't seen + STMT2 during traversal, we know STMT1 appears later. */ + if (gimple_uid (stmt) != uid) +return stmt1; + else if (stmt == stmt2) +return stmt2; +} + gcc_unreachable (); +} + /* Find the statement after which STMT must be moved so that the dependency from DEP_STMT to STMT is maintained. */ @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt) gimple insert_stmt = stmt; if (dep_stmt == NULL) return stmt; - if (not_dominated_by (insert_stmt, dep_stmt)) + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) + gimple_bb (insert_stmt) == gimple_bb (dep_stmt) + insert_stmt != dep_stmt) +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); + else if (not_dominated_by (insert_stmt, dep_stmt)) insert_stmt = dep_stmt; return insert_stmt; }
Re: PR tree-optimization/57337
In that case, if my insert_stmt immediately follows dep_stmt and both have the same UID, not_dominated_by would return true and I will end up updating insert_stmt to dep_stmt which is wrong. - Easwaran On Fri, May 24, 2013 at 1:07 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, May 23, 2013 at 7:26 PM, Easwaran Raman era...@google.com wrote: This addresses the case where UID alone is not sufficient to figure out which statement appears earlier in a BB. Bootstraps and no test regressions in x86_64 on linux. Ok for trunk? Why not simply conservatively use gimple_uid (a) = gimple_uid (b) in not_dominated_by? Richard. Thanks, Easwaran 2013-05-23 Easwaran Raman era...@google.com PR tree-optimization/57337 * tree-ssa-reassoc.c (appears_later_in_bb): New function. (find_insert_point): Correctly identify the insertion point when two statements with the same UID is compared. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199211) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) } +/* Among STMT1 and STMT2, return the statement that appears later. Both + statements are in same BB and have the same UID. */ + +static gimple +appears_later_in_bb (gimple stmt1, gimple stmt2) +{ + unsigned uid = gimple_uid (stmt1); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); + gsi_next (gsi); + if (gsi_end_p (gsi)) +return stmt1; + for (; !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + /* If STMT has a different UID than STMT1 and we haven't seen + STMT2 during traversal, we know STMT1 appears later. */ + if (gimple_uid (stmt) != uid) +return stmt1; + else if (stmt == stmt2) +return stmt2; +} + gcc_unreachable (); +} + /* Find the statement after which STMT must be moved so that the dependency from DEP_STMT to STMT is maintained. */ @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt) gimple insert_stmt = stmt; if (dep_stmt == NULL) return stmt; - if (not_dominated_by (insert_stmt, dep_stmt)) + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) + gimple_bb (insert_stmt) == gimple_bb (dep_stmt) + insert_stmt != dep_stmt) +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); + else if (not_dominated_by (insert_stmt, dep_stmt)) insert_stmt = dep_stmt; return insert_stmt; }
PR tree-optimization/57337
This addresses the case where UID alone is not sufficient to figure out which statement appears earlier in a BB. Bootstraps and no test regressions in x86_64 on linux. Ok for trunk? Thanks, Easwaran 2013-05-23 Easwaran Raman era...@google.com PR tree-optimization/57337 * tree-ssa-reassoc.c (appears_later_in_bb): New function. (find_insert_point): Correctly identify the insertion point when two statements with the same UID is compared. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199211) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2866,6 +2866,31 @@ not_dominated_by (gimple a, gimple b) } +/* Among STMT1 and STMT2, return the statement that appears later. Both + statements are in same BB and have the same UID. */ + +static gimple +appears_later_in_bb (gimple stmt1, gimple stmt2) +{ + unsigned uid = gimple_uid (stmt1); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt1); + gsi_next (gsi); + if (gsi_end_p (gsi)) +return stmt1; + for (; !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + + /* If STMT has a different UID than STMT1 and we haven't seen + STMT2 during traversal, we know STMT1 appears later. */ + if (gimple_uid (stmt) != uid) +return stmt1; + else if (stmt == stmt2) +return stmt2; +} + gcc_unreachable (); +} + /* Find the statement after which STMT must be moved so that the dependency from DEP_STMT to STMT is maintained. */ @@ -2875,7 +2900,11 @@ find_insert_point (gimple stmt, gimple dep_stmt) gimple insert_stmt = stmt; if (dep_stmt == NULL) return stmt; - if (not_dominated_by (insert_stmt, dep_stmt)) + if (gimple_uid (insert_stmt) == gimple_uid (dep_stmt) + gimple_bb (insert_stmt) == gimple_bb (dep_stmt) + insert_stmt != dep_stmt) +insert_stmt = appears_later_in_bb (insert_stmt, dep_stmt); + else if (not_dominated_by (insert_stmt, dep_stmt)) insert_stmt = dep_stmt; return insert_stmt; }
Re: Fix PR tree-optimization/57322
If you are suggesting using inc_gimple_stmt_max_uid(cfun) for all statements, that won't work because we want to use the UIDs to determine dominance within a BB. If your suggestion is to use that instead of 1 when BB == NULL, that would work (even though setting it to 1 is sufficient.) Thanks, Easwaran On Mon, May 20, 2013 at 4:43 AM, Steven Bosscher stevenb@gmail.com wrote: On Mon, May 20, 2013 at 1:41 AM, Easwaran Raman era...@google.com wrote: PR tree-optimization/57322 * (build_and_add_sum): If a BB is empty, set the UID of the statement added to the BB to be 1. Maybe gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun));? That would keep the UIDs unique except for the copied UIDs. Ciao! Steven
Fix PR tree-optimization/57322
The UID of a newly generated statement in build_and_add_sum is set to that of an adjacent statement in the BB. This is a problem in one case where the BB is empty. This fix sets the UID to be 1 if the BB is empty. Bootstraps and no test regressions on x86_64 . OK for trunk? Thanks, Easwaran --- 2013-05-19 Easwaran Raman era...@google.com PR tree-optimization/57322 * (build_and_add_sum): If a BB is empty, set the UID of the statement added to the BB to be 1. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 199048) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -1165,8 +1165,12 @@ build_and_add_sum (tree type, tree op1, tree op2, if ((!op1def || gimple_nop_p (op1def)) (!op2def || gimple_nop_p (op2def))) { + gimple first_stmt; + unsigned uid; gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR)); - gimple_set_uid (sum, gimple_uid (gsi_stmt (gsi))); + first_stmt = gsi_stmt (gsi); + uid = first_stmt ? gimple_uid (first_stmt) : 1; + gimple_set_uid (sum, uid); gsi_insert_before (gsi, sum, GSI_NEW_STMT); } else if ((!op1def || gimple_nop_p (op1def))
Re: Minimize downward code motion during reassociation
On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman era...@google.com wrote: It seems I need to reset the debug uses of a statement before moving the statement itself. The attached patch starts from the leaf to root of the tree to be reassociated and places them at the point where their dependences will be met after reassociation. This bootstraps and I am running the tests. Ok if there are no test failures? + if ((dep_bb != insert_bb +!dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb)) + || (dep_bb == insert_bb + gimple_uid (insert_stmt) gimple_uid (dep_stmt))) +{ + insert_stmt = dep_stmt; +} superfluous {} Removed. + /* If there are any debug uses of LHS, reset them. */ + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) +{ + if (is_gimple_debug (use_stmt)) +{ + gimple_debug_bind_reset_value (use_stmt); that's only needed for debug stmts that are not dominated by insert_point, no? You are right. I have added the check. + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (stmt_can_throw_internal (insert_point)) +{ use stmt_ends_bb_p (insert_point) instead + edge succ_edge = find_fallthru_edge (insert_bb-succs); + insert_bb = succ_edge-dest; + /* Insert STMT at the beginning of the successor basic block. */ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); are you sure this is a proper insert location? How would you even arrive in this situation given find_insert_point dominance checks? Let's say the last statement in a BB which can throw feeds into a statement that is to be reassociated. Then the insert point would be this statement. Instead, you could always insert at the beginning of its (lone) successor. This is independent of the the find_insert_point checks. Otherwise the patch looks ok - can you add one or two testcases please? I have added a test case and submitted at r199048. Thanks, Easwaran Thanks, Richard. Thanks, Easwaran 2012-12-07 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(find_insert_point): New function. (insert_stmt_after): Likewise. (get_def_stmt): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. (build_and_add_sum): Assign UIDs for new statements. (linearize_expr): Likewise. (do_reassoc): Renumber gimple statement UIDs. On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote: I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. I think you cannot (and should not) move debug statements. Instead you have to invalidate them. Otherwise you'll introduce confusion as debug info cannot handle overlapping live ranges. But maybe Alex can clarify. Richard.
Print column information in dump_loc
This patch dumps the column number as part of dump_loc making the output similar to inform(). This allows these messages to be pattern matched by dg_message. Bootstraps with this change. Ok for trunk? - Easwaran - 2013-05-14 Easwaran Raman era...@google.com * dumpfile.c (dump_loc): Print column number. Index: gcc/dumpfile.c === --- gcc/dumpfile.c (revision 198852) +++ gcc/dumpfile.c (working copy) @@ -261,12 +261,13 @@ dump_loc (int dump_kind, FILE *dfile, source_locat if (dump_kind) { if (LOCATION_LOCUS (loc) BUILTINS_LOCATION) -fprintf (dfile, \n%s:%d: note: , LOCATION_FILE (loc), - LOCATION_LINE (loc)); +fprintf (dfile, \n%s:%d:%d: note: , LOCATION_FILE (loc), + LOCATION_LINE (loc), LOCATION_COLUMN (loc)); else if (current_function_decl) -fprintf (dfile, \n%s:%d: note: , +fprintf (dfile, \n%s:%d:%d: note: , DECL_SOURCE_FILE (current_function_decl), - DECL_SOURCE_LINE (current_function_decl)); + DECL_SOURCE_LINE (current_function_decl), + DECL_SOURCE_COLUMN (current_function_decl)); } }
Re: Minimize downward code motion during reassociation
I want to resend this patch for consideration. I applied the patch to trunk and confirmed that it bootstraps and doesn't cause test regressions. Is this ok for trunk? Thanks, Easwaran On Fri, Dec 7, 2012 at 12:01 PM, Easwaran Raman era...@google.com wrote: It seems I need to reset the debug uses of a statement before moving the statement itself. The attached patch starts from the leaf to root of the tree to be reassociated and places them at the point where their dependences will be met after reassociation. This bootstraps and I am running the tests. Ok if there are no test failures? Thanks, Easwaran 2012-12-07 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(find_insert_point): New function. (insert_stmt_after): Likewise. (get_def_stmt): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. (build_and_add_sum): Assign UIDs for new statements. (linearize_expr): Likewise. (do_reassoc): Renumber gimple statement UIDs. On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote: I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. I think you cannot (and should not) move debug statements. Instead you have to invalidate them. Otherwise you'll introduce confusion as debug info cannot handle overlapping live ranges. But maybe Alex can clarify. Richard.
Re: Minimize downward code motion during reassociation
It seems I need to reset the debug uses of a statement before moving the statement itself. The attached patch starts from the leaf to root of the tree to be reassociated and places them at the point where their dependences will be met after reassociation. This bootstraps and I am running the tests. Ok if there are no test failures? Thanks, Easwaran 2012-12-07 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(find_insert_point): New function. (insert_stmt_after): Likewise. (get_def_stmt): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. (build_and_add_sum): Assign UIDs for new statements. (linearize_expr): Likewise. (do_reassoc): Renumber gimple statement UIDs. On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote: I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. I think you cannot (and should not) move debug statements. Instead you have to invalidate them. Otherwise you'll introduce confusion as debug info cannot handle overlapping live ranges. But maybe Alex can clarify. Richard. reassoc_revised.diff Description: Binary data
Re: Minimize downward code motion during reassociation
I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. Thanks, Easwaran On Sun, Nov 4, 2012 at 9:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:16 PM, Easwaran Raman era...@google.com wrote: On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote: On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote: On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? It's sufficient to call it once if you conservatively update UIDs at stmt move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. Thanks. +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb
Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon
I'd like to get a small patch to tree reassociation ( http://gcc.gnu.org/ml/gcc-patches/2012-10/msg01761.html ) in. Thanks, Easwaran On Mon, Oct 29, 2012 at 10:56 AM, Jakub Jelinek ja...@redhat.com wrote: Status == I'd like to close the stage 1 phase of GCC 4.8 development on Monday, November 5th. If you have still patches for new features you'd like to see in GCC 4.8, please post them for review soon. Patches posted before the freeze, but reviewed shortly after the freeze, may still go in, further changes should be just bugfixes and documentation fixes. Quality Data Priority # Change from Last Report --- --- P1 23 + 23 P2 77 + 8 P3 85 + 84 --- --- Total 185 +115 Previous Report === http://gcc.gnu.org/ml/gcc/2012-03/msg00011.html The next report will be sent by me again, announcing end of stage 1.
Re: Minimize downward code motion during reassociation
Ping. - Easwaran On Wed, Oct 31, 2012 at 12:16 PM, Easwaran Raman era...@google.com wrote: On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote: On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote: On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? It's sufficient to call it once if you conservatively update UIDs at stmt move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. Thanks. +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. It would be a lot easier to understand this function if you call it once after rewrite_expr_tree finished. I think you meant before rewrite_expr_tree. One thing to note here is this function is called only when we first see
Re: Fix bugs introduced by switch-case profile propagation
On Fri, Oct 26, 2012 at 8:05 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote: Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. Seems fine, but under what conditions you get NULL here? Wasn't sure if this is an OK for the patch or if I need to address anything else. Actually I think you should make the except.c to setcurrent_bb when expanding the switch instead. In the current code, in sjlj_emit_dispatch_table (except.c), a sequence of instructions including the dispatch table jump are first generated and emitted to a BB before the first of the jump table targets. I think you are asking me to first emit the instructions before the indirect jump into a new bb so that BLOCK_FOR_INSN (before_case) in stmt.c is non NULL. This would need some refactoring inside except.c, but more importantly this BB won't have the correct control flow right? So, while I can avoid the check for BB being NULL in get_outgoing_edge_probs, I still need to guard for default_edge being NULL and default_prob will still be 0. Would it be ok if I remove the check for NULL inside get_outgoing_edge_probs and instead check it when I initialize BASE? Thanks, Easwaran OK with this change. Honza
Re: Minimize downward code motion during reassociation
On Wed, Oct 31, 2012 at 4:36 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, Oct 24, 2012 at 4:02 AM, Easwaran Raman era...@google.com wrote: On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote: On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? It's sufficient to call it once if you conservatively update UIDs at stmt move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. Thanks. +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. It would be a lot easier to understand this function if you call it once after rewrite_expr_tree finished. I think you meant before rewrite_expr_tree. One thing to note here is this function is called only when we first see a statement whose operand changes after reassociation. That's the reason the moving
Re: Fix bugs introduced by switch-case profile propagation
Hi, On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote: Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. Seems fine, but under what conditions you get NULL here? Wasn't sure if this is an OK for the patch or if I need to address anything else. Thanks, Easwaran Honza (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m profile_status != PROFILE_ABSENT insn JUMP_P (insn) - any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + any_condjump_p (insn) + !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb-succs) prob_sum += e-probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list-code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge-probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge-probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge-probability = default_prob; + if (default_edge) +default_edge-probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
Re: Fix bugs introduced by switch-case profile propagation
On Tue, Oct 23, 2012 at 3:03 AM, Jan Hubicka hubi...@ucw.cz wrote: Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. Seems fine, but under what conditions you get NULL here? When expand_sjlj_dispatch_table calls emit_case_dispatch_table, stmt_bb is NULL. - Easwaran Honza (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m profile_status != PROFILE_ABSENT insn JUMP_P (insn) - any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + any_condjump_p (insn) + !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb-succs) prob_sum += e-probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list-code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge-probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge-probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge-probability = default_prob; + if (default_edge) +default_edge-probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
Re: Minimize downward code motion during reassociation
On Tue, Oct 23, 2012 at 2:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Oct 22, 2012 at 8:31 PM, Easwaran Raman era...@google.com wrote: On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? It's sufficient to call it once if you conservatively update UIDs at stmt move / insert time (assign the same UID as the stmt before/after). I was worried that that would make the code brittle, but looking at the places where a new statement is added or moved, I think it is fine. I have made the change in the new patch. +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. It would be a lot easier to understand this function if you call it once after rewrite_expr_tree finished. I think you meant before rewrite_expr_tree. One thing to note here is this function is called only when we first see a statement whose operand changes after reassociation. That's the reason the moving of statements happen inside rewrite_expr_tree. Why make this all so complicated? It seems to me that we should fixup stmt order only after the whole ops vector has been materialized
Re: Minimize downward code motion during reassociation
On Mon, Oct 22, 2012 at 12:59 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 19, 2012 at 12:36 AM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} Please assign UIDs once using the existing renumber_gimple_stmt_uids (). You seem to call the above multiple times and thus do work bigger than O(number of basic blocks). The reason I call the above multiple times is that gsi_move_before might get called between two calls to the above. For instance, after rewrite_expr_tree is called once, the following sequence of calls could happen: reassociate_bb - linearize_expr_tree - linearize_expr - gsi_move_before. So it is not sufficient to call renumber_gimple_stmt_uids once per do_reassoc. Or do you want me to use renumber_gimple_stmt_uids_in_blocks instead of assign_uids_in_relevant_bbs? +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ Likewise you call this for each call to rewrite_expr_tree, so it seems to me this is quadratic in the number of ops in the op vector. The call to ensure_ops_are_available inside rewrite_expr_tree is guarded by if (!moved) and I am setting moved = true there to ensure that ensure_ops_are_available inside is called once per reassociation of a expression tree. Why make this all so complicated? It seems to me that we should fixup stmt order only after the whole ops vector has been materialized. + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe-op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe-op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb +gimple_uid (insert_stmt) gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just
Re: Fix bugs introduced by switch-case profile propagation
Ping. On Wed, Oct 17, 2012 at 1:48 PM, Easwaran Raman era...@google.com wrote: Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m profile_status != PROFILE_ABSENT insn JUMP_P (insn) - any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + any_condjump_p (insn) + !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb-succs) prob_sum += e-probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list-code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge-probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge-probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge-probability = default_prob; + if (default_edge) +default_edge-probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
Re: Minimize downward code motion during reassociation
On Fri, Oct 19, 2012 at 5:13 PM, Xinliang David Li davi...@google.com wrote: On Thu, Oct 18, 2012 at 3:36 PM, Easwaran Raman era...@google.com wrote: Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} + +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) format issue? +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe-op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe-op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb +gimple_uid (insert_stmt) gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just after that statement. GSI_STMT -- INSERT_STMT? Ok. + Instead, find the first non-label gimple statement in BB and insert before + that. */ + if (gimple_code (insert_stmt) == GIMPLE_PHI) +{ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); +} + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (lookup_stmt_eh_lp (insert_stmt) 0 +stmt_can_throw_internal (insert_stmt)) +{ + edge e, succ_edge = NULL; + edge_iterator ei; + + /* There should be exactly one normal edge out of the basic block. */ + FOR_EACH_EDGE (e, ei, insert_bb-succs) +{ + if (!(e-flags EDGE_COMPLEX)) +{ + gcc_assert (succ_edge == NULL); + succ_edge = e; +} +} + /* Insert STMT at the beginning of the successor basic block. */ + insert_bb = succ_edge-dest; + gsi_insert = gsi_after_labels (insert_bb
Minimize downward code motion during reassociation
Hi, During expression reassociation, statements are conservatively moved downwards to ensure that dependences are correctly satisfied after reassocation. This could lead to lengthening of live ranges. This patch moves statements only to the extent necessary. Bootstraps and no test regression on x86_64/linux. OK for trunk? Thanks, Easwaran 2012-10-18 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(assign_uids): New function. (assign_uids_in_relevant_bbs): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 192487) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,6 +2250,128 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Assign UIDs to statements in basic block BB. */ + +static void +assign_uids (basic_block bb) +{ + unsigned uid = 0; + gimple_stmt_iterator gsi; + /* First assign uids to phis. */ + for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} + + /* Then assign uids to stmts. */ + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, uid++); +} +} + +/* For each operand in OPS, find the basic block that contains the statement + which defines the operand. For all such basic blocks, assign UIDs. */ + +static void +assign_uids_in_relevant_bbs (VEC(operand_entry_t, heap) * ops) +{ + operand_entry_t oe; + int i; + struct pointer_set_t *seen_bbs = pointer_set_create (); + + for (i = 0; VEC_iterate (operand_entry_t, ops, i, oe); i++) +{ + gimple def_stmt; + basic_block bb; + if (TREE_CODE (oe-op) != SSA_NAME) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + bb = gimple_bb (def_stmt); + if (!pointer_set_contains (seen_bbs, bb)) +{ + assign_uids (bb); + pointer_set_insert (seen_bbs, bb); +} +} + pointer_set_destroy (seen_bbs); +} + +/* Ensure that operands in the OPS vector starting from OPINDEXth entry are live + at STMT. This is accomplished by moving STMT if needed. */ + +static void +ensure_ops_are_available (gimple stmt, VEC(operand_entry_t, heap) * ops, int opindex) +{ + int i; + int len = VEC_length (operand_entry_t, ops); + gimple insert_stmt = stmt; + basic_block insert_bb = gimple_bb (stmt); + gimple_stmt_iterator gsi_insert, gsistmt; + for (i = opindex; i len; i++) +{ + operand_entry_t oe = VEC_index (operand_entry_t, ops, i); + gimple def_stmt; + basic_block def_bb; + /* Ignore constants and operands with default definitons. */ + if (TREE_CODE (oe-op) != SSA_NAME + || SSA_NAME_IS_DEFAULT_DEF (oe-op)) +continue; + def_stmt = SSA_NAME_DEF_STMT (oe-op); + def_bb = gimple_bb (def_stmt); + if (def_bb != insert_bb + !dominated_by_p (CDI_DOMINATORS, insert_bb, def_bb)) +{ + insert_bb = def_bb; + insert_stmt = def_stmt; +} + else if (def_bb == insert_bb +gimple_uid (insert_stmt) gimple_uid (def_stmt)) +insert_stmt = def_stmt; +} + if (insert_stmt == stmt) +return; + gsistmt = gsi_for_stmt (stmt); + /* If GSI_STMT is a phi node, then do not insert just after that statement. + Instead, find the first non-label gimple statement in BB and insert before + that. */ + if (gimple_code (insert_stmt) == GIMPLE_PHI) +{ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); +} + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (lookup_stmt_eh_lp (insert_stmt) 0 +stmt_can_throw_internal (insert_stmt)) +{ + edge e, succ_edge = NULL; + edge_iterator ei; + + /* There should be exactly one normal edge out of the basic block. */ + FOR_EACH_EDGE (e, ei, insert_bb-succs) +{ + if (!(e-flags EDGE_COMPLEX)) +{ + gcc_assert (succ_edge == NULL); + succ_edge = e; +} +} + /* Insert STMT at the beginning of the successor basic block. */ + insert_bb = succ_edge-dest; + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); +} + else +{ + gsi_insert = gsi_for_stmt (insert_stmt); + gsi_move_after (gsistmt, gsi_insert); +} + +} + /* Recursively rewrite our linearized statements so that the operators match those in OPS[OPINDEX], putting the computation in rank order. */ @@ -2262,11 +2384,6 @@ rewrite_expr_tree
Fix bugs introduced by switch-case profile propagation
Hi, This patch fixes bugs introduced by my previous patch to propagate profiles during switch expansion. Bootstrap and profiledbootstrap successful on x86_64. Confirmed that it fixes the crashes reported in PR middle-end/54957. OK for trunk? - Easwaran 2012-10-17 Easwaran Raman era...@google.com PR target/54938 PR middle-end/54957 * optabs.c (emit_cmp_and_jump_insn_1): Add REG_BR_PROB note only if it doesn't already exist. * except.c (sjlj_emit_function_enter): Remove unused variable. * stmt.c (get_outgoing_edge_probs): Return 0 if BB is NULL. (emit_case_dispatch_table): Handle the case where STMT_BB is NULL. (expand_sjlj_dispatch_table): Pass BB containing before_case to emit_case_dispatch_table. Index: gcc/optabs.c === --- gcc/optabs.c (revision 192488) +++ gcc/optabs.c (working copy) @@ -4268,11 +4268,9 @@ emit_cmp_and_jump_insn_1 (rtx test, enum machine_m profile_status != PROFILE_ABSENT insn JUMP_P (insn) - any_condjump_p (insn)) -{ - gcc_assert (!find_reg_note (insn, REG_BR_PROB, 0)); - add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); -} + any_condjump_p (insn) + !find_reg_note (insn, REG_BR_PROB, 0)) +add_reg_note (insn, REG_BR_PROB, GEN_INT (prob)); } /* Generate code to compare X with Y so that the condition codes are Index: gcc/except.c === --- gcc/except.c (revision 192488) +++ gcc/except.c (working copy) @@ -1153,7 +1153,7 @@ sjlj_emit_function_enter (rtx dispatch_label) if (dispatch_label) { #ifdef DONT_USE_BUILTIN_SETJMP - rtx x, last; + rtx x; x = emit_library_call_value (setjmp_libfunc, NULL_RTX, LCT_RETURNS_TWICE, TYPE_MODE (integer_type_node), 1, plus_constant (Pmode, XEXP (fc, 0), Index: gcc/stmt.c === --- gcc/stmt.c (revision 192488) +++ gcc/stmt.c (working copy) @@ -1867,6 +1867,8 @@ get_outgoing_edge_probs (basic_block bb) edge e; edge_iterator ei; int prob_sum = 0; + if (!bb) +return 0; FOR_EACH_EDGE(e, ei, bb-succs) prob_sum += e-probability; return prob_sum; @@ -1916,8 +1918,8 @@ emit_case_dispatch_table (tree index_expr, tree in rtx fallback_label = label_rtx (case_list-code_label); rtx table_label = gen_label_rtx (); bool has_gaps = false; - edge default_edge = EDGE_SUCC(stmt_bb, 0); - int default_prob = default_edge-probability; + edge default_edge = stmt_bb ? EDGE_SUCC(stmt_bb, 0) : NULL; + int default_prob = default_edge ? default_edge-probability : 0; int base = get_outgoing_edge_probs (stmt_bb); bool try_with_tablejump = false; @@ -1997,7 +1999,8 @@ emit_case_dispatch_table (tree index_expr, tree in default_prob = 0; } - default_edge-probability = default_prob; + if (default_edge) +default_edge-probability = default_prob; /* We have altered the probability of the default edge. So the probabilities of all other edges need to be adjusted so that it sums up to @@ -2289,7 +2292,8 @@ expand_sjlj_dispatch_table (rtx dispatch_index, emit_case_dispatch_table (index_expr, index_type, case_list, default_label, - minval, maxval, range, NULL); + minval, maxval, range, +BLOCK_FOR_INSN (before_case)); emit_label (default_label); free_alloc_pool (case_node_pool); }
Re: Propagate profile counts during switch expansion
On Sun, Oct 14, 2012 at 8:09 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, Index: optabs.c === --- optabs.c(revision 191879) +++ optabs.c(working copy) @@ -4249,7 +4249,7 @@ prepare_operand (enum insn_code icode, rtx x, int we can do the branch. */ static void -emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label) +emit_cmp_and_jump_insn_1 (rtx test, enum machine_mode mode, rtx label, int prob) { enum machine_mode optab_mode; enum mode_class mclass; @@ -4261,7 +4261,16 @@ static void gcc_assert (icode != CODE_FOR_nothing); gcc_assert (insn_operand_matches (icode, 0, test)); - emit_jump_insn (GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label)); + rtx insn = emit_insn ( + GEN_FCN (icode) (test, XEXP (test, 0), XEXP (test, 1), label)); I think we did not change to style of mixing declaration and code yet. So please put declaration ahead. Ok. I think you want to keep emit_jump_insn. Also do nothing when profile_status == PROFILE_ABSENT. Why should this be dependent on profile_status? The PROB passed could also come from static prediction right. Index: cfgbuild.c === --- cfgbuild.c (revision 191879) +++ cfgbuild.c (working copy) @@ -559,8 +559,11 @@ compute_outgoing_frequencies (basic_block b) f-count = b-count - e-count; return; } + else +{ + guess_outgoing_edge_probabilities (b); +} Add comment here that we rely on multiway BBs having sane probabilities already. You still want to do guessing when the edges out are EH. Those also can be many. I think this should work: - if (single_succ_p (b)) + else if (single_succ_p (b)) { e = single_succ_edge (b); e-probability = REG_BR_PROB_BASE; e-count = b-count; return; } - guess_outgoing_edge_probabilities (b); + else +{ + /* We rely on BBs with more than two successors to have sane probabilities + and do not guess them here. For BBs terminated by switch statements + expanded to jump-table jump, we have done the right thing during + expansion. For EH edges, we still guess the probabilities here. */ + bool complex_edge = false; + FOR_EACH_EDGE (e, ei, b-succs) +if (e-flags EDGE_COMPLEX) + { +complex_edge = true; +break; + } + if (complex_edge) +guess_outgoing_edge_probabilities (b); +} + Index: expr.h === --- expr.h (revision 191879) +++ expr.h (working copy) @@ -190,7 +190,7 @@ extern int have_sub2_insn (rtx, rtx); /* Emit a pair of rtl insns to compare two rtx's and to jump to a label if the comparison is true. */ extern void emit_cmp_and_jump_insns (rtx, rtx, enum rtx_code, rtx, -enum machine_mode, int, rtx); +enum machine_mode, int, rtx, int prob=-1); Hmm, probably first appreance of this C++ construct. I suppose it is OK. http://gcc.gnu.org/codingconventions.html#Default says it is ok for POD values. +static inline void +reset_out_edges_aux (basic_block bb) +{ + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, bb-succs) +e-aux = (void *)0; +} +static inline void +compute_cases_per_edge (gimple stmt) +{ + basic_block bb = gimple_bb (stmt); + reset_out_edges_aux (bb); + int ncases = gimple_switch_num_labels (stmt); + for (int i = ncases - 1; i = 1; --i) +{ + tree elt = gimple_switch_label (stmt, i); + tree lab = CASE_LABEL (elt); + basic_block case_bb = label_to_block_fn (cfun, lab); + edge case_edge = find_edge (bb, case_bb); + case_edge-aux = (void *)((long)(case_edge-aux) + 1); +} +} Comments and newlines per coding standard. Ok. With the these changes, the patch is OK Thanks, Easwaran Thanks, Honza
Re: Propagate profile counts during switch expansion
Ping. On Mon, Oct 8, 2012 at 5:46 PM, Easwaran Raman era...@google.com wrote: I have attached a revised patch. The updated ChangeLog is given below and I have responded to your comments inline: 2012-10-08 Easwaran Raman era...@google.com * optabs.c (emit_cmp_and_jump_insn_1): Add a new parameter to specificy the probability of taking the jump. (emit_cmp_and_jump_insns): Likewise. (expand_compare_and_swap_loop): Make the jump predicted not taken. * dojump.c (do_compare_rtx_and_jump): Remove the code attaching REG_BR_PROB note and pass probability to emit_cmp_and_jump_insns. * cfgbuild.c (compute_outgoing_frequencies): Do not guess outgoing probabilities for branches with more than two successors. * expr.c (emit_block_move_via_loop): Predict the loop backedge loop to be highly taken. (try_casesi): Pass the probability of jumping to the default label. (try_tablejump): Likewise. (do_tablejump): Likewise. * expr.h (try_tablejump): Add a new parameter. (try_casesi): Likewise. (emit_cmp_and_jump_insns): Add probability as default parameter with a default value of -1. * except.c (sjlj_emit_function_enter): Pass probability to emit_cmp_and_jump_insns. * stmt.c (case_node): Add new fields PROB and SUBTREE_PROB. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (add_case_node): Pass estimated probability of jumping to the case label. (emit_case_decision_tree): Pass default_prob to emit_case_nodes. (get_outgoing_edge_probs): New function. (conditional_probability): Likewise. (reset_out_edges_aux): Likewise. (compute_cases_per_edge): Likewise. (emit_case_dispatch_table): Update probabilities of edges coming out of the switch statement. (expand_case): Compute and propagate default edge probability to emit_case_dispatch_table. (expand_sjlj_dispatch_table): Update calls to add_case_node and emit_case_dispatch_table. (balance_case_nodes): Update subtree_prob values. (emit_case_nodes): Compute edge probabilities and add pass them to emit_cmp_and_jump_insns. gcc/testsuite/ChangeLog: 2012-10-02 Easwaran Raman era...@google.com * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, This patch propagates the profile counts during RTL expansion. In many cases, there is no way to determine the exact count of an edge generated during the expansion. So this patch uses some simple heuristics to estimate the edge counts but ensures that the counts of the basic blocks corresponding to the cases are (nearly the) same as at the gimple level. Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk? Index: gcc/expr.c === --- gcc/expr.c (revision 191879) +++ gcc/expr.c (working copy) @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) Please document default_probability. Done. { rtx temp, vector; @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, - default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} dojump already does this kind of logic, but it is bit more cureful: emit_cmp_and_jump_insns (op0, op1, code, size, mode, unsignedp, if_true_label); if (prob != -1 profile_status != PROFILE_ABSENT) { for (last = NEXT_INSN (last); last NEXT_INSN (last); last = NEXT_INSN (last)) if (JUMP_P (last)) break; if (last JUMP_P (last) ! NEXT_INSN (last) any_condjump_p (last)) { gcc_assert (!find_reg_note (last, REG_BR_PROB, 0)); add_reg_note (last, REG_BR_PROB, GEN_INT (prob)); } } What about making emit_cmp_and_jump_insns taking the probability argument and moving the code above inside
Re: Move statements upwards after reassociation
On Fri, Oct 12, 2012 at 1:45 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Oct 12, 2012 at 3:09 AM, Easwaran Raman era...@google.com wrote: Thanks for the comments. As David wrote, the intent of the patch is not to do a general purpose scheduling, but to compensate for the possible live range lengthening introduced by reassociation. On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman era...@google.com wrote: +/* Move STMT up within its BB until it can not be moved any further. */ + +static void move_stmt_upwards (gimple stmt) +{ + gimple_stmt_iterator gsi, gsistmt; + tree rhs1, rhs2; + gimple rhs1def = NULL, rhs2def = NULL; + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + gcc_assert (rhs1); Please no such senseless asserts. The following line will segfault anyway if rhs1 is NULL and with a debugger an assert doesn't add any useful information. Ok. + if (TREE_CODE (rhs1) == SSA_NAME) +rhs1def = SSA_NAME_DEF_STMT (rhs1); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; + if (rhs2) You may not access gimple_assign_rhs2 if it is not there. So you have to check whether you have an unary, binary or ternary (yes) operation. gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment has less than two operands. Regarding the check for ternary operation, I believe it is not necessary. A statement is considered for reassociation only if the RHS class is GIMPLE_BINARY_RHS. Subsequently, for rhs1 and rhs2, it checks if the def statements also have the same code and so it seems to me that a statement with a ternary operator in the RHS will never be considered in rewrite_expr_tree. +{ + if (TREE_CODE (rhs2) == SSA_NAME) +rhs2def = SSA_NAME_DEF_STMT (rhs2); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; +} + gsi = gsi_for_stmt (stmt); + gsistmt = gsi; + gsi_prev (gsi); + for (; !gsi_end_p (gsi); gsi_prev (gsi)) This doesn't make much sense. You can move a stmt to the nearest common post-dominator. Assuming you only want to handle placing it after rhs1def or after rhs2def(?) you don't need any loop, just two dominator queries and an insertion after one of the definition stmts. Within a BB isn't that still O(size of BB)? Please document the fact that both stmts are in the same BB. Ok. And no, it isn't, it is O (size of BB ^ 2). You don't need a loop. operand rank should reflect the dominance relation inside the BB. The rank of phi definitions would mess this up. If that doesn't work simply assign UIDs to the stmts first. Ok. But this code should consider BBs. For reassociation to look across BBs, the code should look something like this: L1 : a_2 = a_1 + 10 jc L3 L2: a_3 = a_2 + 20 - L1 should dominate L2 (otherwise there will be a phi node at L2 and the reassociation of a_3 will not consider the definition of a_2) - There are no other uses of a_2 other than the one in L3. After reassociation, the stmt defining a_2 would be moved to L2. In that case, the downward code motion of a_2 = a_1 + 10 to L2 is beneficial (one less instruction if the branch is taken). It is not obvious to me that moving it to L1 (or whereever a_1 is defined) is beneficial. In this case it doesn't matter whether a1 lives through a3 or if a2 does. But moving the stmt is not necessary, so why not simply avoid it. I used that example to show that the current downward motion has a useful side effect and this patch preserves it. Yes, in this example the downward motion can be avoided but in general it may not be possible. I agree with you that there is unnecessary code motion in many cases. You cannot undo it with yout patch anyway. And I don't see why more optimal placement cannot be done during rewrite_expr_tree itself. I started with that idea, but my current approach looks more simpler. Simpler, but it's a hack. So, the only place we move stmts in rewrite_expr_tree is here: if (!moved) { gimple_stmt_iterator gsinow, gsirhs1; gimple stmt1 = stmt, stmt2; unsigned int count; gsinow = gsi_for_stmt (stmt); count = VEC_length (operand_entry_t, ops) - opindex - 2; while (count-- != 0) { stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt1)); gsirhs1 = gsi_for_stmt (stmt2); gsi_move_before (gsirhs1, gsinow); that moving is done unconditionally, even if stmt2 already dominates stmt1. Why not improve that instead? stmt2 does dominate stmt1. The issue is it unconditionally moves stmt2 downwards even if the statement that defines ops[op_index+1] (the new RHS of stmt2 after reassociation) already dominates stmt2
[google] Move delete with size to its own file (issue6655052)
This patch moves the two argument delete operator into its own file. When a program provides its own definition of operator delete (void *), but not operator delete (void *, size_t), we could end up linking two files that define _ZdlPv resulting in a linker error. Bootstraps with no test regressions on a x86_64 machine running linux. Ok for google/4_7 and google/main branches? Google ref b/6982747 2012-10-11 Easwaran Raman era...@google.com * libsupc++/Makefile.am: Add del_opsz.cc to sources. * libsupc++/Makefile.in: Regenerated. * libsupc++/del_opsz.cc: New file. * libsupc++/del_op.cc(operator delete(void*,std::size_t)): Remove. Index: libstdc++-v3/libsupc++/Makefile.in === --- libstdc++-v3/libsupc++/Makefile.in (revision 192373) +++ libstdc++-v3/libsupc++/Makefile.in (working copy) @@ -92,8 +92,8 @@ LTLIBRARIES = $(noinst_LTLIBRARIES) $(toolexeclib_ libsupc___la_LIBADD = am__objects_1 = array_type_info.lo atexit_arm.lo bad_alloc.lo \ bad_cast.lo bad_typeid.lo class_type_info.lo del_op.lo \ - del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo eh_alloc.lo \ - eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \ + del_opsz.lo del_opnt.lo del_opv.lo del_opvnt.lo dyncast.lo \ + eh_alloc.lo eh_arm.lo eh_aux_runtime.lo eh_call.lo eh_catch.lo \ eh_exception.lo eh_globals.lo eh_personality.lo eh_ptr.lo \ eh_term_handler.lo eh_terminate.lo eh_tm.lo eh_throw.lo \ eh_type.lo eh_unex_handler.lo enum_type_info.lo \ @@ -362,6 +362,7 @@ sources = \ bad_typeid.cc \ class_type_info.cc \ del_op.cc \ + del_opsz.cc \ del_opnt.cc \ del_opv.cc \ del_opvnt.cc \ Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc(revision 192373) +++ libstdc++-v3/libsupc++/del_op.cc(working copy) @@ -47,11 +47,3 @@ operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT if (ptr) std::free(ptr); } - -_GLIBCXX_WEAK_DEFINITION void -operator delete(void* ptr, -std::size_t bytes __attribute__((__unused__))) throw () -{ - if (ptr) -std::free(ptr); -} Index: libstdc++-v3/libsupc++/del_opsz.cc === --- libstdc++-v3/libsupc++/del_opsz.cc (revision 0) +++ libstdc++-v3/libsupc++/del_opsz.cc (revision 0) @@ -0,0 +1,50 @@ +// Boilerplate support routines for -*- C++ -*- dynamic memory management. + +// Copyright (C) 2012 +// Free Software Foundation +// +// This file is part of GCC. +// +// GCC is free software; you can redistribute it and/or modify +// it under the terms of the GNU General Public License as published by +// the Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// GCC is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// http://www.gnu.org/licenses/. + +#include bits/c++config.h + +#if !_GLIBCXX_HOSTED +// A freestanding C runtime may not provide free -- but there is no +// other reasonable way to implement operator delete. +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + extern C void free(void*); +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace +#else +# include cstdlib +#endif + +#include new + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/Makefile.am === --- libstdc++-v3/libsupc++/Makefile.am (revision 192373) +++ libstdc++-v3/libsupc++/Makefile.am (working copy) @@ -53,6 +53,7 @@ sources = \ bad_typeid.cc \ class_type_info.cc \ del_op.cc \ + del_opsz.cc \ del_opnt.cc \ del_opv.cc \ del_opvnt.cc \ -- This patch is available for review at http://codereview.appspot.com/6655052
Re: Move statements upwards after reassociation
Thanks for the comments. As David wrote, the intent of the patch is not to do a general purpose scheduling, but to compensate for the possible live range lengthening introduced by reassociation. On Thu, Oct 11, 2012 at 6:16 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 11, 2012 at 3:52 AM, Easwaran Raman era...@google.com wrote: +/* Move STMT up within its BB until it can not be moved any further. */ + +static void move_stmt_upwards (gimple stmt) +{ + gimple_stmt_iterator gsi, gsistmt; + tree rhs1, rhs2; + gimple rhs1def = NULL, rhs2def = NULL; + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + gcc_assert (rhs1); Please no such senseless asserts. The following line will segfault anyway if rhs1 is NULL and with a debugger an assert doesn't add any useful information. Ok. + if (TREE_CODE (rhs1) == SSA_NAME) +rhs1def = SSA_NAME_DEF_STMT (rhs1); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; + if (rhs2) You may not access gimple_assign_rhs2 if it is not there. So you have to check whether you have an unary, binary or ternary (yes) operation. gimple_assign_rhs2 returns NULL_TREE if it the RHS of an assignment has less than two operands. Regarding the check for ternary operation, I believe it is not necessary. A statement is considered for reassociation only if the RHS class is GIMPLE_BINARY_RHS. Subsequently, for rhs1 and rhs2, it checks if the def statements also have the same code and so it seems to me that a statement with a ternary operator in the RHS will never be considered in rewrite_expr_tree. +{ + if (TREE_CODE (rhs2) == SSA_NAME) +rhs2def = SSA_NAME_DEF_STMT (rhs2); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; +} + gsi = gsi_for_stmt (stmt); + gsistmt = gsi; + gsi_prev (gsi); + for (; !gsi_end_p (gsi); gsi_prev (gsi)) This doesn't make much sense. You can move a stmt to the nearest common post-dominator. Assuming you only want to handle placing it after rhs1def or after rhs2def(?) you don't need any loop, just two dominator queries and an insertion after one of the definition stmts. Within a BB isn't that still O(size of BB)? But this code should consider BBs. For reassociation to look across BBs, the code should look something like this: L1 : a_2 = a_1 + 10 jc L3 L2: a_3 = a_2 + 20 - L1 should dominate L2 (otherwise there will be a phi node at L2 and the reassociation of a_3 will not consider the definition of a_2) - There are no other uses of a_2 other than the one in L3. After reassociation, the stmt defining a_2 would be moved to L2. In that case, the downward code motion of a_2 = a_1 + 10 to L2 is beneficial (one less instruction if the branch is taken). It is not obvious to me that moving it to L1 (or whereever a_1 is defined) is beneficial. And I don't see why more optimal placement cannot be done during rewrite_expr_tree itself. I started with that idea, but my current approach looks more simpler. Thanks, Easwaran NB, the whole reassoc code needs a re-write to avoid the excessive stmt modifications when it does nothing. So I'd very much rather avoid adding anything to reassoc until that rewrite happened. Richard. +{ + gimple curr_stmt = gsi_stmt (gsi); + if (curr_stmt == rhs1def || curr_stmt == rhs2def) { +gsi_move_after (gsistmt, gsi); +return; + } +} + +} + /* Recursively rewrite our linearized statements so that the operators match those in OPS[OPINDEX], putting the computation in rank order. */ static void rewrite_expr_tree (gimple stmt, unsigned int opindex, - VEC(operand_entry_t, heap) * ops, bool moved) + VEC(operand_entry_t, heap) * ops, bool moved, + VEC(gimple, heap) **stmts_to_move) { tree rhs1 = gimple_assign_rhs1 (stmt); tree rhs2 = gimple_assign_rhs2 (stmt); @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind print_gimple_stmt (dump_file, stmt, 0, 0); } } + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); return; } @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind } /* Recurse on the LHS of the binary operator, which is guaranteed to be the non-leaf side. */ - rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved); + rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved, + stmts_to_move); + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); } /* Find out how many cycles we need to compute statements chain. @@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + VEC(gimple, heap) *stmts_to_move = NULL; + gimple stmt; + int i; for (gsi = gsi_last_bb (bb
Move statements upwards after reassociation
Hi, In the expression reassociation pass, statements might get moved downwards to ensure that dependences are not violated after reassociation. This can increase the live range and, in a tight loop, result in spills. This patch simply does a code motion of those statements involved in reassociation and pushes them upwards in the BB as far as possible without violating dependences. Bootstraps and no tests regressions on a x86_64 machine running linux. Ok for trunk? - Easwaran --- 2012-10-10 Easwaran Raman era...@google.com * tree-ssa-reassoc.c (move_stmt_upwards): New function. (rewrite_expr_tree): Record statements to be moved. (reassociate_bb): Move statements affected by reassociation as early as possible. Index: gcc/tree-ssa-reassoc.c === --- gcc/tree-ssa-reassoc.c (revision 191879) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -2250,13 +2250,51 @@ swap_ops_for_binary_stmt (VEC(operand_entry_t, hea } } +/* Move STMT up within its BB until it can not be moved any further. */ + +static void move_stmt_upwards (gimple stmt) +{ + gimple_stmt_iterator gsi, gsistmt; + tree rhs1, rhs2; + gimple rhs1def = NULL, rhs2def = NULL; + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + gcc_assert (rhs1); + if (TREE_CODE (rhs1) == SSA_NAME) +rhs1def = SSA_NAME_DEF_STMT (rhs1); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; + if (rhs2) +{ + if (TREE_CODE (rhs2) == SSA_NAME) +rhs2def = SSA_NAME_DEF_STMT (rhs2); + else if (TREE_CODE (rhs1) != REAL_CST +TREE_CODE (rhs1) != INTEGER_CST) +return; +} + gsi = gsi_for_stmt (stmt); + gsistmt = gsi; + gsi_prev (gsi); + for (; !gsi_end_p (gsi); gsi_prev (gsi)) +{ + gimple curr_stmt = gsi_stmt (gsi); + if (curr_stmt == rhs1def || curr_stmt == rhs2def) { +gsi_move_after (gsistmt, gsi); +return; + } +} + +} + /* Recursively rewrite our linearized statements so that the operators match those in OPS[OPINDEX], putting the computation in rank order. */ static void rewrite_expr_tree (gimple stmt, unsigned int opindex, - VEC(operand_entry_t, heap) * ops, bool moved) + VEC(operand_entry_t, heap) * ops, bool moved, + VEC(gimple, heap) **stmts_to_move) { tree rhs1 = gimple_assign_rhs1 (stmt); tree rhs2 = gimple_assign_rhs2 (stmt); @@ -2299,6 +2337,7 @@ rewrite_expr_tree (gimple stmt, unsigned int opind print_gimple_stmt (dump_file, stmt, 0, 0); } } + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); return; } @@ -2346,7 +2385,9 @@ rewrite_expr_tree (gimple stmt, unsigned int opind } /* Recurse on the LHS of the binary operator, which is guaranteed to be the non-leaf side. */ - rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved); + rewrite_expr_tree (SSA_NAME_DEF_STMT (rhs1), opindex + 1, ops, moved, + stmts_to_move); + VEC_safe_push (gimple, heap, *stmts_to_move, stmt); } /* Find out how many cycles we need to compute statements chain. @@ -3427,6 +3468,9 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + VEC(gimple, heap) *stmts_to_move = NULL; + gimple stmt; + int i; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi)) { @@ -3542,7 +3586,7 @@ reassociate_bb (basic_block bb) VEC_length (operand_entry_t, ops) 3) rewrite_expr_tree_parallel (stmt, width, ops); else -rewrite_expr_tree (stmt, 0, ops, false); +rewrite_expr_tree (stmt, 0, ops, false, stmts_to_move); /* If we combined some repeated factors into a __builtin_powi call, multiply that result by the @@ -3560,6 +3604,7 @@ reassociate_bb (basic_block bb) target_ssa); gimple_set_location (mul_stmt, gimple_location (stmt)); gsi_insert_after (gsi, mul_stmt, GSI_NEW_STMT); + VEC_safe_push (gimple, heap, stmts_to_move, mul_stmt); } } @@ -3567,6 +3612,11 @@ reassociate_bb (basic_block bb) } } } + + FOR_EACH_VEC_ELT (gimple, stmts_to_move, i, stmt) +move_stmt_upwards (stmt); + VEC_free (gimple, heap, stmts_to_move); + for (son = first_dom_son (CDI_POST_DOMINATORS, bb); son; son = next_dom_son (CDI_POST_DOMINATORS, son))
Re: Propagate profile counts during switch expansion
Hi Honza, I am addressing some of the questions you raise here. Will send an updated patch later. On Thu, Oct 4, 2012 at 6:19 AM, Jan Hubicka hubi...@ucw.cz wrote: @@ -560,7 +577,6 @@ compute_outgoing_frequencies (basic_block b) return; } } - if (single_succ_p (b)) { e = single_succ_edge (b); @@ -568,7 +584,10 @@ compute_outgoing_frequencies (basic_block b) e-count = b-count; return; } - guess_outgoing_edge_probabilities (b); + else if (!gen_probabilities_from_existing_counts (b-succs)){ +/* All outgoing edges of B have zero count. Guess probabilities. */ +guess_outgoing_edge_probabilities (b); + } Hmm, I do not quite follow logic here. basic block B is one of many basic blocks that the original BB was split from. It is possible that B may have some of original edges, but there may be new ones. How you can guess the outgoing probabilitie shere. Do you have an example? The code reaches here only when the BB has more than two successor. I assume that this happens only when the BB is terminated by a switch statement. During conversion to RTL, the outgoing edges of this BB and their counts ar untouched. So I am just using those counts to compute the edge probabilities. Also gen_probabilities_from_existing_counts could probably also work based on original edge frequencies. Just to be clear I understand it right, you want me to use the frequency instead of count since the frequency is derived from the counts in profile use builds? + + default_edge-count = default_count; + if (count) +{ + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, stmt_bb-succs) +e-probability = e-count * REG_BR_PROB_BASE / count; +} Hmm, this updates origina BB containing the switch statement? The out_edges of the original BB. Of course, modulo roundoff errors, this should hold. I wonder where did you got the diferences and why do you need this? Originally, I obtained e-probability as e-count * REG_BR_PROB_BASE / bb-count. During profile bootstrap, I noticed BBs whose sum of outgoing edge counts exceeded the BB's count and hence I normalized with the sum of edge counts. You are going to output new control flow and find_many_sub_basic_blocks will recompute all the counts/frequencies inside anyway? Sorry, I am lost here. I am just updating the probability of stmt_bb-succs right? @@ -2430,7 +2523,11 @@ emit_case_nodes (rtx index, case_node_ptr node, rt unsignedp), GT, NULL_RTX, mode, unsignedp, label_rtx (node-right-code_label)); - emit_case_nodes (index, node-left, default_label, index_type); + probability = case_probability (node-right-count, + subtree_count + default_count); + add_prob_note_to_last_insn (probability); + emit_case_nodes (index, node-left, default_label, default_count, + index_type); Hmm, here you seem to be distributing the probabilities of the counts reached. What happens in the case when the edge probability needs to be distributed across nodes of the decision tree. I.e. in t(int a) { switch (a) { case 100: case 200: case 300: case 400: case 500: case 600: case 700: case 800: case 900: t(); case 101: case 202: case 303: case 404: case 505: case 606: case 707: case 808: case 909: q(); } } Ok, that's a bug in my patch. In the above example, there are two outgoing edges from the BB containing switch but many case nodes. I would end up multiplying the counts by the number of cases that lead to a label. If I divide the counts of each case_node by the number of case nodes that jump to the same label, will that be reasonable? In the above case, if t() is called 900 times, I will assume that each of the 9 cases contribute a count of 100. Thanks, Easwaran
Propagate profile counts during switch expansion
Hi, This patch propagates the profile counts during RTL expansion. In many cases, there is no way to determine the exact count of an edge generated during the expansion. So this patch uses some simple heuristics to estimate the edge counts but ensures that the counts of the basic blocks corresponding to the cases are (nearly the) same as at the gimple level. Bootstrapped and profile-bootstrapped on an x86_64/linux machine. OK for trunk? - Easwaran -- 2012-10-02 Easwaran Raman era...@google.com * cfgbuild.c (gen_probabilities_from_existing_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, use the counts to compute probabilities. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (emit_case_decision_tree): Pass default_count to emit_case_nodes. (get_outgoing_edge_counts): New function. (add_prob_note_to_last_insn): Likewise. (case_probability): New macro. (emit_case_dispatch_table): Compute probability of jumping to default label and apply note to the jump. (expand_case): Compute and propagate default edge count to emit_case_dispatch_table. (expand_sjlj_dispatch_table): Update calls to add_case_node and emit_case_dispatch_table. (balance_case_nodes): Update subtree_counts. (emit_case_nodes): Compute edge probabilities and add note. gcc/testsuite/ChangeLog: 2012-10-02 Easwaran Raman era...@google.com * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options -O2 -fdump-rtl-expand-all } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 1: + g++; break; +case 2: + g += 2; break; +case 3: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count 4000 2 expand} } */ +/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count 2000 1 expand} } */ +/* { dg-final-use { cleanup-rtl-dump expand } } */ Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options -O2 -fdump-rtl-expand-all } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 99: + g += 2; break; +case 1: + g++; break; +case 100: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count 4000 2 expand} } */ +/* { dg-final-use { scan-rtl-dump-times ;; basic block\[^\\n\]*count 2000 1 expand} } */ +/* { dg-final-use { cleanup-rtl-dump expand } } */ Index: gcc/expr.c === --- gcc/expr.c (revision 191879) +++ gcc/expr.c (working copy) @@ -154,7 +154,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10894,7 +10894,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10910,9 +10910,17 @@ do_tablejump (rtx index, enum machine_mode mode, r the maximum value of the range
Change the ordering of cdce pass
The conditional dead call elimination pass shrink wraps certain dead calls to math functions. It doesn't handle case like this: D.142420_139 = powD.549 (D.142421_138, D.142419_132); fooD.120935.barD.113815 = D.142420_139; # foo.bar is dead here. This code gets cleaned up by DCE and leaves only pow, which can then be shrink-wrapped by cdce. So it seems reasonable to do this reordering. Bootstraps on x86_64 on linux with no test regression. OK for trunk? - Easwaran -- 2012-06-14 Easwaran Raman era...@google.com * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce from its current position and insert after pass_dce. Index: gcc/passes.c === --- gcc/passes.c(revision 188535) +++ gcc/passes.c(working copy) @@ -1374,7 +1374,6 @@ init_optimization_passes (void) NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); - NEXT_PASS (pass_call_cdce); /* pass_build_alias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. Re-building alias information also rewrites no longer addressed @@ -1387,6 +1386,7 @@ init_optimization_passes (void) NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_vrp); NEXT_PASS (pass_dce); + NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_phiopt);
Re: Change the ordering of cdce pass
ChangeLog entry has a gcc/ prefix that shouldn't be there. Here is the revised entry: 2012-06-14 Easwaran Raman era...@google.com * passes.c (init_optimization_passes): Remove pass_call_cdce from its current position and insert after pass_dce. On Thu, Jun 14, 2012 at 6:38 PM, Easwaran Raman era...@google.com wrote: The conditional dead call elimination pass shrink wraps certain dead calls to math functions. It doesn't handle case like this: D.142420_139 = powD.549 (D.142421_138, D.142419_132); fooD.120935.barD.113815 = D.142420_139; # foo.bar is dead here. This code gets cleaned up by DCE and leaves only pow, which can then be shrink-wrapped by cdce. So it seems reasonable to do this reordering. Bootstraps on x86_64 on linux with no test regression. OK for trunk? - Easwaran -- 2012-06-14 Easwaran Raman era...@google.com * gcc/passes.c (init_optimization_passes): Remove pass_call_cdce from its current position and insert after pass_dce. Index: gcc/passes.c === --- gcc/passes.c (revision 188535) +++ gcc/passes.c (working copy) @@ -1374,7 +1374,6 @@ init_optimization_passes (void) NEXT_PASS (pass_complete_unrolli); NEXT_PASS (pass_ccp); NEXT_PASS (pass_forwprop); - NEXT_PASS (pass_call_cdce); /* pass_build_alias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. Re-building alias information also rewrites no longer addressed @@ -1387,6 +1386,7 @@ init_optimization_passes (void) NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_vrp); NEXT_PASS (pass_dce); + NEXT_PASS (pass_call_cdce); NEXT_PASS (pass_cselim); NEXT_PASS (pass_tree_ifcombine); NEXT_PASS (pass_phiopt);
Re: Propagate profile counts after switch case expansion (issue5896043)
Ping. On Mon, Apr 9, 2012 at 2:33 PM, Easwaran Raman era...@google.com wrote: On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote: On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote: This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? Hi, while this is resonable thing to do, I belive it would make most sense to make switch lowering at gimple level: i.e. reorganize the existing code to produce gimple statement and change expansion code to produce tablejump from every gimple switch statement that was left in the code. This would be both cleaner and would allow gimple optimizers to improve the generated code. Incrementally it would also allow us to improve switch exansion that is quite baroque and not realy producing very good code in some common cases. If you would be interested working on this (i.e. reorganizing the expansion code to produce gimple), I would be very happy. If not, I can review the profile updating part for mainline, since in any case this is desriable thing to do. I am planning to explore improvements to switch expansion (peeling some cases and using jump tables for the rest, for example) and I think the reorganization you suggest is the right way to do such improvements. But until I can spend time on it and get it done, I would like this patch to get reviewed and checked in. Thanks, Easwaran Ping.
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 9:40 PM, Easwaran Raman era...@google.com wrote: On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote: This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? Hi, while this is resonable thing to do, I belive it would make most sense to make switch lowering at gimple level: i.e. reorganize the existing code to produce gimple statement and change expansion code to produce tablejump from every gimple switch statement that was left in the code. This would be both cleaner and would allow gimple optimizers to improve the generated code. Incrementally it would also allow us to improve switch exansion that is quite baroque and not realy producing very good code in some common cases. If you would be interested working on this (i.e. reorganizing the expansion code to produce gimple), I would be very happy. If not, I can review the profile updating part for mainline, since in any case this is desriable thing to do. I am planning to explore improvements to switch expansion (peeling some cases and using jump tables for the rest, for example) and I think the reorganization you suggest is the right way to do such improvements. But until I can spend time on it and get it done, I would like this patch to get reviewed and checked in. Thanks, Easwaran Ping.
Re: [google]Add support for sampled profile collection (issue4438083)
I want to revive this patch for mainline and have some questions on Honza's comments. On Fri, Apr 29, 2011 at 1:48 PM, Jan Hubicka hubi...@ucw.cz wrote: A known limitation is that value profiling is not yet sampled, but it does not seem to cause problems. For the performance alone, we probably don't need to care that much given the fact that we value porfile only relatively expensive operations. But if we want to have the turn off/on feature, then i gueess we need to guard everything. It is not much of pain to add the code generating conditionals everywhere, after all. If we sample value profiling instrumentation as well, does it make sense to use a single counter and rate for all instrumentations. If not, does the additional complexity (and flags) justify the benefit of uniformity? +/* Insert STMT_IF around given sequence of consecutive statements in the + same basic block starting with STMT_START, ending with STMT_END. */ + +static void +insert_if_then (gimple stmt_start, gimple stmt_end, gimple stmt_if) +{ + gimple_stmt_iterator gsi; + basic_block bb_original, bb_before_if, bb_after_if; + edge e_if_taken, e_then_join; + + gsi = gsi_for_stmt (stmt_start); + gsi_insert_before (gsi, stmt_if, GSI_SAME_STMT); + bb_original = gsi_bb (gsi); + e_if_taken = split_block (bb_original, stmt_if); + e_if_taken-flags = ~EDGE_FALLTHRU; + e_if_taken-flags |= EDGE_TRUE_VALUE; + e_then_join = split_block (e_if_taken-dest, stmt_end); + bb_before_if = e_if_taken-src; + bb_after_if = e_then_join-dest; On mainline when we do profile estimation before profiling instrumentaiton, now, you really want to update profile for performance here. I am not sure I understand this. + make_edge (bb_before_if, bb_after_if, EDGE_FALSE_VALUE); +} + +/* Transform: + + ORIGINAL CODE + + Into: + + __gcov_sample_counter++; + if (__gcov_sample_counter = __gcov_sampling_rate) + { + __gcov_sample_counter = 0; + ORIGINAL CODE + } Hmm, I think the probability that internal loop of program will interfere with sampling rate is relatively high, but I see it is bit hard to do something about it. Can we think of some very basic randomization of sampling_rate? The predominant use case we have for this technique for server workloads is as follows: Have a high value for __gcov_sampling_rate (that should probably be named __gcov_sampling_interval) during server start up so that it reduces the overhead as well as ensures that the startup period doesn't pollute the rest of the counters. After startup, change the sampling interval to a small value (in many cases, to 1) using the external interface in libgcov.c. In this use case, randomization doesn't make sense during startup (since we want to skip profile collection) as well as the steady phase (since the interval is 1 or a small number). If you want to have randomization support, do you have any suggestions for how to make it work with low sampling intervals? Thanks, Easwaran Honza
Re: Propagate profile counts after switch case expansion (issue5896043)
On Sun, Mar 25, 2012 at 12:22 PM, Jan Hubicka hubi...@ucw.cz wrote: This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? Hi, while this is resonable thing to do, I belive it would make most sense to make switch lowering at gimple level: i.e. reorganize the existing code to produce gimple statement and change expansion code to produce tablejump from every gimple switch statement that was left in the code. This would be both cleaner and would allow gimple optimizers to improve the generated code. Incrementally it would also allow us to improve switch exansion that is quite baroque and not realy producing very good code in some common cases. If you would be interested working on this (i.e. reorganizing the expansion code to produce gimple), I would be very happy. If not, I can review the profile updating part for mainline, since in any case this is desriable thing to do. I am planning to explore improvements to switch expansion (peeling some cases and using jump tables for the rest, for example) and I think the reorganization you suggest is the right way to do such improvements. But until I can spend time on it and get it done, I would like this patch to get reviewed and checked in. Thanks, Easwaran Honza
Propagate profile counts after switch case expansion (issue5896043)
This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? 2012-03-23 Easwaran Raman era...@google.com * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, retain the counts. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (case_probability): New macro. (expand_case): Propagate execution counts to generated branches using REG_BR_PROB notes. (emit_case_nodes): Likewise. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (compute_subtree_counts): New function to compute SUBTREE_COUNT fields of case nodes. (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the given probability to the last generated instruction. gcc/testsuite/ChangeLog: 2012-03-23 Easwaran Raman era...@google.com * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c index 692fea8..d75fbda 100644 --- a/gcc/cfgbuild.c +++ b/gcc/cfgbuild.c @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) purge_dead_tablejump_edges (bb, table); } +/* Check if there is at least one edge in EDGES with a non-zero count + field. */ + +static bool +non_zero_profile_counts ( VEC(edge,gc) *edges) { + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, edges) +{ + if (e-count 0) +return true; +} + return false; +} + /* Assume that frequency of basic block B is known. Compute frequencies and probabilities of outgoing edges. */ @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) e-count = b-count; return; } + else if (non_zero_profile_counts (b-succs)){ +/*Profile counts already set, but REG_NOTE missing. Retain the counts. */ +return; + } guess_outgoing_edge_probabilities (b); if (b-count) FOR_EACH_EDGE (e, ei, b-succs) diff --git a/gcc/expr.c b/gcc/expr.c index f9de908..fb8eef9 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree minval, tree range, TABLE_LABEL is a CODE_LABEL rtx for the table itself. DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the - index value is out of range. */ + index value is out of range. + DEFAULT_PROBABILITY is the probability of jumping to the + DEFAULT_LABEL. */ static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, -default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} + /* If index is in range, it must fit in Pmode. Convert to Pmode so we can index with it. */ @@ -10758,7 +10768,7 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, int try_tablejump (tree index_type, tree index_expr, tree minval, tree range, - rtx table_label, rtx default_label) + rtx table_label, rtx default_label, int default_probability) { rtx index; @@ -10776,7 +10786,7 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range, TYPE_MODE (TREE_TYPE (range)), expand_normal (range), TYPE_UNSIGNED (TREE_TYPE (range))), - table_label, default_label); + table_label, default_label
Re: Propagate profile counts after switch case expansion (issue5896043)
Some more background on this patch: Right now, while the execution counts of different case labels of a switch statement are obtained during profile collection, they are not propagated to RTL. Instead, counts are regenerated at the RTL level using static heuristics that tend to weigh branches equally which can cause poor optimization of hot code. This patch ensures that the counts collected during profile collection are correctly propagated allowing hot code to be better optimized by RTL optimizations. Patch tested on x86_64. - Easwaran On Fri, Mar 23, 2012 at 10:43 AM, Easwaran Raman era...@google.com wrote: This patch propagates execution count of thee case labels of a switch-case statement after its expansion. Bootstraps and all tests pass. OK for trunk? 2012-03-23 Easwaran Raman era...@google.com * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If at least one successor of a BB has non-zero profile count, retain the counts. * expr.c (do_tablejump): Add a REG_BR_PROB note on the jump to default label. (try_tablejump): Add a parameter to specify the probability of jumping to the default label. * expr.h (try_tablejump): Add a new parameter. * stmt.c (case_node): Add new fields COUNT and SUBTREE_COUNT. (add_case_node): Pass execution count of the case node and use it to initialize COUNT field. (case_probability): New macro. (expand_case): Propagate execution counts to generated branches using REG_BR_PROB notes. (emit_case_nodes): Likewise. (do_jump_if_equal): Pass probability for REG_BR_PROB note. (compute_subtree_counts): New function to compute SUBTREE_COUNT fields of case nodes. (add_prob_note_to_last_insn): Add a REG_BR_PROB note with the given probability to the last generated instruction. gcc/testsuite/ChangeLog: 2012-03-23 Easwaran Raman era...@google.com * gcc.dg/tree-prof/switch-case-1.c: New test case. * gcc.dg/tree-prof/switch-case-2.c: New test case. diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c index 692fea8..d75fbda 100644 --- a/gcc/cfgbuild.c +++ b/gcc/cfgbuild.c @@ -534,6 +534,21 @@ find_bb_boundaries (basic_block bb) purge_dead_tablejump_edges (bb, table); } +/* Check if there is at least one edge in EDGES with a non-zero count + field. */ + +static bool +non_zero_profile_counts ( VEC(edge,gc) *edges) { + edge e; + edge_iterator ei; + FOR_EACH_EDGE(e, ei, edges) + { + if (e-count 0) + return true; + } + return false; +} + /* Assume that frequency of basic block B is known. Compute frequencies and probabilities of outgoing edges. */ @@ -569,6 +584,10 @@ compute_outgoing_frequencies (basic_block b) e-count = b-count; return; } + else if (non_zero_profile_counts (b-succs)){ + /*Profile counts already set, but REG_NOTE missing. Retain the counts. */ + return; + } guess_outgoing_edge_probabilities (b); if (b-count) FOR_EACH_EDGE (e, ei, b-succs) diff --git a/gcc/expr.c b/gcc/expr.c index f9de908..fb8eef9 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -156,7 +156,7 @@ static rtx do_store_flag (sepops, rtx, enum machine_mode); #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10694,11 +10694,13 @@ try_casesi (tree index_type, tree index_expr, tree minval, tree range, TABLE_LABEL is a CODE_LABEL rtx for the table itself. DEFAULT_LABEL is a CODE_LABEL rtx to jump to if the - index value is out of range. */ + index value is out of range. + DEFAULT_PROBABILITY is the probability of jumping to the + DEFAULT_LABEL. */ static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10714,8 +10716,16 @@ do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, the maximum value of the range. */ if (default_label) - emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, - default_label); + { + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) + { + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); + } + } + /* If index is in range, it must fit in Pmode. Convert to Pmode so we can index with it. */ @@ -10758,7 +10768,7
Re: Propagate profile counts after switch case expansion (issue5896043)
On Fri, Mar 23, 2012 at 3:29 PM, Andi Kleen a...@firstfloor.org wrote: Easwaran Raman era...@google.com writes: Some more background on this patch: Right now, while the execution counts of different case labels of a switch statement are obtained during profile collection, they are not propagated to RTL. Instead, counts are regenerated at the RTL level using static heuristics that tend to weigh branches equally which can cause poor optimization of hot code. This patch ensures that the counts collected during profile collection are correctly propagated allowing hot code to be better optimized by RTL optimizations. Patch tested on x86_64. I think your patch doesn't use the probably to weight the decision tree for non tablejump, right? I looked at this some time ago, but the patch always had problems. Do you mean use the weights to decide the shape of the binary tree (similar to COST_TABLE heuristic)? I am planning to send a separate patch for that. This one just makes sure that the profile counts are propagated correctly. So you will still have a situation where a branch corresponding to an infrequently executed case dominates a frequently executed case, but the BB of the cases gets the right profile weight. - Easwaran -Andi -- a...@linux.intel.com -- Speaking for myself only
[google] Use delete with size parameter in STL deallocate (issue5794070)
This patch makes -fsized-delete define a macro __GXX_DELETE_WITH_SIZE__. If this macro is defined, the deallocate function in new_allocator.h invokes the two parameter delete operator with the size of the object. Tested by bootstrapping the compiler and ensuring no tests are broken. Also verified that compiling a C++ program that uses std::vector with -fsized-delete invokes the two parameter version of operator delete. OK for google/main and google/4_6 branches? c-family/ChangeLog.google-main: 2012-03-12 Easwaran Raman era...@google.com * c-cppbuiltin.c (c_cpp_builtins): Define __GXX_DELETE_WITH_SIZE__ for C++ files compiled with -fsized-delete. libstdc++-v3/ChangeLog.google-main: 2012-03-12 Easwaran Raman era...@google.com * include/ext/new_allocator.h (deallocate): Call delete with size if __GXX_DELETE_WITH_SIZE__ is defined. diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c index ccf57fd..3c9a96b 100644 --- a/gcc/c-family/c-cppbuiltin.c +++ b/gcc/c-family/c-cppbuiltin.c @@ -970,6 +970,8 @@ c_cpp_builtins (cpp_reader *pfile) format. */ if (ENABLE_DECIMAL_FLOAT ENABLE_DECIMAL_BID_FORMAT) cpp_define (pfile, __DECIMAL_BID_FORMAT__); + if (c_dialect_cxx () flag_sized_delete) +cpp_define (pfile, __GXX_DELETE_WITH_SIZE__); } /* Pass an object-like macro. If it doesn't lie in the user's diff --git a/libstdc++-v3/include/ext/new_allocator.h b/libstdc++-v3/include/ext/new_allocator.h index 0c82bd0..c972c1e 100644 --- a/libstdc++-v3/include/ext/new_allocator.h +++ b/libstdc++-v3/include/ext/new_allocator.h @@ -94,10 +94,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return static_cast_Tp*(::operator new(__n * sizeof(_Tp))); } +#ifdef __GXX_DELETE_WITH_SIZE__ + // __p is not permitted to be a null pointer. + void + deallocate(pointer __p, size_type __t) + { ::operator delete(__p, __t * sizeof(_Tp)); } +#else // __p is not permitted to be a null pointer. void deallocate(pointer __p, size_type) { ::operator delete(__p); } +#endif size_type max_size() const _GLIBCXX_USE_NOEXCEPT -- This patch is available for review at http://codereview.appspot.com/5794070
Re: [google] Propagate profile information to RTL level during switch expansion
On Tue, Jan 31, 2012 at 10:16 PM, Xinliang David Li davi...@google.com wrote: Ok for google branch with minor changes below. thanks, David +#define case_probability(x, y) ((y) ? ((x) * REG_BR_PROB_BASE / (y)) : -1) + Using upper case for macro? From http://gcc.gnu.org/codingconventions.html, Macros names should be in ALL_CAPS when it's important to be aware that it's a macro (e.g. accessors and simple predicates), but in lowercase (e.g., size_int) where the macro is a wrapper for efficiency that should be considered as a function In this case case_probability is a wrapper for efficiency and hence I decided to go with the lower case name. + count -= default_count; + default_count = 0; Why resetting it to 0? If there are no gaps, then the indirect jump can't jump to the default label. Later I set default_edge-count = default_count; and this value should be 0 if there are no gaps. I have addressed the rest of your comments. -Easwaran + +static void +add_prob_note_to_last_insn(int probability) +{ Missing document.
[google] Propagate profile information to RTL level during switch expansion
This patch propagates profile information to RTL level when expanding switch statements using jump table or a binary tree of branches. Ok for google/gcc-4_6 branch? I would like the patch to be considered for trunk when stage 1 opens again. -Easwaran 2012-01-31 Easwaran Raman era...@google.com * expr.c (do_tablejump): Add default_probability as the sixth parameter and use it to generate REG_BR_PROB note. (try_tablejump): Add default_probability as a parameter. * cfgbuild.c (non_zero_profile_counts): New function. (compute_outgoing_frequencies): If edges have profile counts set, don't replace them with guessed values. * expr.h (try_tablejump): Add additional parameter to the declaration. * stmt.c (tree-flow.h): Include. (case_node): Add new fields count and subtree_count. (add_case_node): Pass count as a paramater and initialize count field of case_node. (case_probability): New macro. (expand_case): Propagate profile information from the tree level to RTL during switch case expansion. (balance_case_nodes): Compute subtree_count for case nodes. (emit_case_nodes): Set branch probabilities for generated branches. testsuite/ChengeLog.google-4_6: 2012-01-31 Easwaran Raman era...@google.com * tree-prof/switch-case-1.c: New test. * tree-prof/switch-case-2.c: New test. Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-1.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options -O2 -fdump-rtl-expand-blocks -fdump-tree-optimized-blocks } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 1: + g++; break; +case 2: + g += 2; break; +case 3: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:4000 4 expand} } */ +/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:2000 1 expand} } */ +/* { dg-final-use { cleanup-rtl-dump expand } } */ Index: gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c === --- gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-prof/switch-case-2.c (revision 0) @@ -0,0 +1,40 @@ +/* { dg-options -O2 -fdump-rtl-expand-blocks } */ +int g; + +__attribute__((noinline)) void foo (int n) +{ + switch (n) +{ +case 99: + g += 2; break; +case 1: + g++; break; +case 100: + g += 1; break; +case 4: + g += 3; break; +case 5: + g += 4; break; +case 6: + g += 5; break; +case 7: + g += 6; break; +case 8: + g += 7; break; +case 9: + g += 8; break; +default: + g += 8; break; + } +} + +int main () +{ + int i; + for (i = 0; i 1; i++) + foo ((i * i) % 5); + return 0; +} +/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:4000 4 expand} } */ +/* { dg-final-use { scan-rtl-dump-times Succ edge\[^\\n\]*count:2000 1 expand} } */ +/* { dg-final-use { cleanup-rtl-dump expand } } */ Index: gcc/expr.c === --- gcc/expr.c (revision 183262) +++ gcc/expr.c (working copy) @@ -155,7 +155,7 @@ static rtx do_store_flag (sepops, rtx, enum machin #ifdef PUSH_ROUNDING static void emit_single_push_insn (enum machine_mode, rtx, tree); #endif -static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx); +static void do_tablejump (rtx, enum machine_mode, rtx, rtx, rtx, int); static rtx const_vector_from_tree (tree); static void write_complex_part (rtx, rtx, bool); @@ -10245,7 +10245,7 @@ try_casesi (tree index_type, tree index_expr, tree static void do_tablejump (rtx index, enum machine_mode mode, rtx range, rtx table_label, - rtx default_label) + rtx default_label, int default_probability) { rtx temp, vector; @@ -10261,9 +10261,17 @@ do_tablejump (rtx index, enum machine_mode mode, r the maximum value of the range. */ if (default_label) -emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, - default_label); +{ + emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1, + default_label); + if (default_probability != -1) +{ + rtx jump_insn = get_last_insn(); + add_reg_note (jump_insn, REG_BR_PROB, GEN_INT (default_probability)); +} +} + /* If index
Re: [google] Add support for delete operator that takes the size of the object as a parameter
Based on Paolo's comments I am dropping the changes to baseline_symbols.txt. As far as minor version, I am bumping it up to 18. Assuming that this patch will be able to go in gcc 4.8 (with minor version 18), I want to keep it at the same version in google/main and google/gcc-4_6. Bootstraps and no test regression. OK for google/main and google/gcc-4_6 branches? - 011-12-17 Easwaran Raman era...@google.com * common.opt (fsized-delete): New option. cp/ChangeLog.google-main: 2011-12-17 Easwaran Raman era...@google.com * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-main: 2011-12-17 Easwaran Raman era...@google.com * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-main: 2011-12-17 Easwaran Raman era...@google.com * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.18. * config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version GLIBCXX_3.4.18. On Wed, Dec 14, 2011 at 3:31 AM, Paolo Carlini paolo.carl...@oracle.com wrote: Hi, On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini paolo.carl...@oracle.com wrote: On 12/12/2011 09:37 PM, Easwaran Raman wrote: Thanks for the comments Paolo. I have attached the new patch. I have bumped the version to 3.4.18 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor version beyond the current 17. Ok, I then don't understand your comment Note that backporting the patch as-is to 4_6-branch would be very wrong in terms of ABI (in mainline we already have a 3.4.17). My original patch added the new symbol in version 3.4.17. Since we don't want to add the symbol to 3.4.16 (if we have a situation where the new runtime is not available when running a program compiled with -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed I had to bump up the version. Note this is going to be an academic discussion, because it's too late for 4.7 anyway, and I'm not even sure it's conforming to add overloads like this for operators. Anyway. For 4.6 branch at this point the situation would be very difficult. We could have a small time window before 4.7 is out to move **all** its new symbols vs 4.6, currently in 3.4.17, to a new 3.4.18 and then we could bump 4.6 to 3.4.17. You see, after a minor is delivered you cannot add anything to it, and also you have the general requirement that the minor version is the minor ersion, thus irrespective whether we are talking about gcc4.6 or gcc4.7, a specific minor version number defines which symbols are exported. I hope now the issues are more clear.You must be always very careful. and used _ZdlPv[jmy] in gnu.ver. I have also added the symbol to baseline_symbols.txt of other targets. You should not, just read again what I wrote. And you don't have to believe me: just browse the libstdc++ ChangeLogs and see if somebody ever does that when the linker map is touched. Sorry, I again misunderstood this as well (and still don't have a good picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? Looks fine yes. But I didn't really check the letters. General rule of thumb when fiddling with linker scripts: double check what you are doing with a multilib system, like x86_64, at least you can catch errors when you are mistakenly not accounting for the 32-bit version of the symbols. In the case at issue, size_t can boil down be unsigned int, unsigned long, unsigned long long, make sure the pattern includes all three. I added that by mimicking the symbol _Znw[jmy] found in the same file. From the log, it looks like the baseline_symbols.txt seems to be generated, but I am not sure how that is to be done. For example, r145437 says a bunch of these baseline_symbols.txt are regenerated, but I don't see any other change from which this might be generated. General rule of thumb: normally, if you aren't a release manager, or a maintainer, **never** regenerate the baseline_symbols files. Just add symbols to the linker script, that is normally fine, doesn't break anything per se, adding is fine (subtracting/changing is not, of course). Paolo. Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc (revision 182251) +++ libstdc++-v3/libsupc++/del_op.cc (working copy) @@ -46,3 +46,11 @@ operator delete(void* ptr) throw () if (ptr) std::free
Re: [google] Add support for delete operator that takes the size of the object as a parameter
On Mon, Dec 12, 2011 at 1:52 PM, Easwaran Raman era...@google.com wrote: On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini paolo.carl...@oracle.com wrote: On 12/12/2011 09:37 PM, Easwaran Raman wrote: Thanks for the comments Paolo. I have attached the new patch. I have bumped the version to 3.4.18 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor version beyond the current 17. Ok, I then don't understand your comment Note that backporting the patch as-is to 4_6-branch would be very wrong in terms of ABI (in mainline we already have a 3.4.17). My original patch added the new symbol in version 3.4.17. Since we don't want to add the symbol to 3.4.16 (if we have a situation where the new runtime is not available when running a program compiled with -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed I had to bump up the version. and used _ZdlPv[jmy] in gnu.ver. I have also added the symbol to baseline_symbols.txt of other targets. You should not, just read again what I wrote. And you don't have to believe me: just browse the libstdc++ ChangeLogs and see if somebody ever does that when the linker map is touched. Sorry, I again misunderstood this as well (and still don't have a good picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added that by mimicking the symbol _Znw[jmy] found in the same file. From the log, it looks like the baseline_symbols.txt seems to be generated, but I am not sure how that is to be done. For example, r145437 says a bunch of these baseline_symbols.txt are regenerated, but I don't see any other change from which this might be generated. Thanks, Easwaran It looks like running 'make new-abi-baseline' under TARGET/libstdc++-v3/testsuite generates the baseline file. Should I check in that? What about other targets? Thanks, Easwaran Paolo.
Re: [google] Add support for delete operator that takes the size of the object as a parameter
Thanks for the comments Paolo. I have attached the new patch. I have bumped the version to 3.4.18 and used _ZdlPv[jmy] in gnu.ver. I have also added the symbol to baseline_symbols.txt of other targets. -Easwaran 2011-12-11 Easwaran Raman era...@google.com * common.opt (fsized-delete): New option. cp/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.18. * config/abi/post/i386-linux-gnu/baseline_symbols.txt: Add new symbol _ZdlPvm@@GLIBCXX_3.4.18. * config/abi/post/sparc-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/mips64-linux-gnu/64/baseline_symbols.txt: Likewise. * config/abi/post/mips64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/alpha-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/i486-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/s390-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Likewise. * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/mips-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/x86_64-linux-gnu/x32/baseline_symbols.txt: Likewise. * config/abi/post/ia64-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/amd64/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/sparcv9/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.10/baseline_symbols.txt: Likewise. * config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.8/sparcv9/baseline_symbols.txt: Likewise. * config/abi/post/solaris2.8/baseline_symbols.txt: Likewise. * config/abi/pre/gnu.ver: Add new symbol _ZdlPv[jmy] at version GLIBCXX_3.4.18. On Mon, Dec 12, 2011 at 5:25 AM, Paolo Carlini paolo.carl...@oracle.com wrote: On 12/12/2011 02:21 PM, Diego Novillo wrote: Ah, right. I missed the ABI implications. For possible inclusion in mainline too, things don't seem completely ok: nothing should be added to the baseline and very likely the export should be adjusted to accommodate for different size_t on the various targets, by using [] in the pattern. See, eg, the existing operator new[](size_t). Paolo. Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc (revision 182251) +++ libstdc++-v3/libsupc++/del_op.cc (working copy) @@ -46,3 +46,11 @@ operator delete(void* ptr) throw () if (ptr) std::free(ptr); } + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/new === --- libstdc++-v3/libsupc++/new (revision 182251) +++ libstdc++-v3/libsupc++/new (working copy) @@ -93,6 +93,7 @@ namespace std void* operator new(std::size_t) throw (std::bad_alloc); void* operator new[](std::size_t) throw (std::bad_alloc); void operator delete(void*) throw(); +void operator delete(void*, std::size_t) throw(); void operator delete[](void*) throw(); void* operator new(std::size_t, const std::nothrow_t) throw(); void* operator new[](std::size_t, const std::nothrow_t) throw(); Index: libstdc++-v3/testsuite/util/testsuite_abi.cc === --- libstdc++-v3/testsuite/util/testsuite_abi.cc (revision 182251) +++ libstdc++-v3/testsuite/util/testsuite_abi.cc (working copy) @@ -195,6 +195,7 @@ check_version(symbol test, bool added) known_versions.push_back(GLIBCXX_3.4.15); known_versions.push_back
Re: [google] Add support for delete operator that takes the size of the object as a parameter
On Mon, Dec 12, 2011 at 12:41 PM, Paolo Carlini paolo.carl...@oracle.com wrote: On 12/12/2011 09:37 PM, Easwaran Raman wrote: Thanks for the comments Paolo. I have attached the new patch. I have bumped the version to 3.4.18 You shouldn't: 4.7 is not out yet, thus no reason to increase the minor version beyond the current 17. Ok, I then don't understand your comment Note that backporting the patch as-is to 4_6-branch would be very wrong in terms of ABI (in mainline we already have a 3.4.17). My original patch added the new symbol in version 3.4.17. Since we don't want to add the symbol to 3.4.16 (if we have a situation where the new runtime is not available when running a program compiled with -fsized-delete) and you said I shouldn't be using 3.4.17, I assumed I had to bump up the version. and used _ZdlPv[jmy] in gnu.ver. I have also added the symbol to baseline_symbols.txt of other targets. You should not, just read again what I wrote. And you don't have to believe me: just browse the libstdc++ ChangeLogs and see if somebody ever does that when the linker map is touched. Sorry, I again misunderstood this as well (and still don't have a good picture). Is the part which adds _ZdlPv[jmy] in gnu.ver ok? I added that by mimicking the symbol _Znw[jmy] found in the same file. From the log, it looks like the baseline_symbols.txt seems to be generated, but I am not sure how that is to be done. For example, r145437 says a bunch of these baseline_symbols.txt are regenerated, but I don't see any other change from which this might be generated. Thanks, Easwaran Paolo.
[google] Add support for delete operator that takes the size of the object as a parameter
In the tcmalloc memory allocator(http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html), deallocation involves a costly lookup to get the size of the deleted object. The size is required to find the right free list to release back the object. By passing the size of the object to the delete call, this lookup can be avoided. This patch adds support for operator delete(void*, size_t) guarded by a -fsized-delete flag. It also adds a default implementation of delete(void *, size_t) that ignores the size parameter. Bootstraps and no test regressions. OK for google/gcc-4_6 branch? --- 2011-12-11 Easwaran Raman era...@google.com * common.opt (fsized-delete): New option. cp/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * decl.c (cxx_init_decl_processing): Specify a function that takes a void* and size_t for DELETE_EXPR. * call.c (build_op_delete_call): If fsized-delete is used, use the variant that takes size_t as the second parameter except when deleteting a pointer of type void *. testsuite/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * g++.dg/other/sized-delete-1.C: New test. libstdc++-v3/ChangeLog.google-4_6: 2011-12-11 Easwaran Raman era...@google.com * libsupc++/del_op.cc (delete): Define a new variant void operator delete(void*, std::size_t). * libsupc++/new (delete): Declare void operator delete(void*, std::size_t) throw(); * testsuite/util/testsuite_abi.cc (check_version): Add new known version GLIBCXX_3.4.17. * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Add new symbol _ZdlPvm@@GLIBCXX_3.4.17. * config/abi/pre/gnu.ver: Add new symbol _ZdlPvm at version GLIBCXX_3.4.17. Index: libstdc++-v3/libsupc++/del_op.cc === --- libstdc++-v3/libsupc++/del_op.cc (revision 182213) +++ libstdc++-v3/libsupc++/del_op.cc (working copy) @@ -46,3 +46,11 @@ operator delete(void* ptr) throw () if (ptr) std::free(ptr); } + +_GLIBCXX_WEAK_DEFINITION void +operator delete(void* ptr, +std::size_t bytes __attribute__((__unused__))) throw () +{ + if (ptr) +std::free(ptr); +} Index: libstdc++-v3/libsupc++/new === --- libstdc++-v3/libsupc++/new (revision 182213) +++ libstdc++-v3/libsupc++/new (working copy) @@ -93,6 +93,7 @@ namespace std void* operator new(std::size_t) throw (std::bad_alloc); void* operator new[](std::size_t) throw (std::bad_alloc); void operator delete(void*) throw(); +void operator delete(void*, std::size_t) throw(); void operator delete[](void*) throw(); void* operator new(std::size_t, const std::nothrow_t) throw(); void* operator new[](std::size_t, const std::nothrow_t) throw(); Index: libstdc++-v3/testsuite/util/testsuite_abi.cc === --- libstdc++-v3/testsuite/util/testsuite_abi.cc (revision 182213) +++ libstdc++-v3/testsuite/util/testsuite_abi.cc (working copy) @@ -194,6 +194,7 @@ check_version(symbol test, bool added) known_versions.push_back(GLIBCXX_3.4.14); known_versions.push_back(GLIBCXX_3.4.15); known_versions.push_back(GLIBCXX_3.4.16); + known_versions.push_back(GLIBCXX_3.4.17); known_versions.push_back(GLIBCXX_LDBL_3.4); known_versions.push_back(GLIBCXX_LDBL_3.4.7); known_versions.push_back(GLIBCXX_LDBL_3.4.10); @@ -560,4 +561,3 @@ demangle(const std::string mangled) } return name; } - Index: libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt === --- libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (revision 182213) +++ libstdc++-v3/config/abi/post/x86_64-linux-gnu/baseline_symbols.txt (working copy) @@ -2422,6 +2422,7 @@ FUNC:_ZTv0_n24_NSt9strstreamD1Ev@@GLIBCXX_3.4 FUNC:_ZdaPv@@GLIBCXX_3.4 FUNC:_ZdaPvRKSt9nothrow_t@@GLIBCXX_3.4 FUNC:_ZdlPv@@GLIBCXX_3.4 +FUNC:_ZdlPvm@@GLIBCXX_3.4.17 FUNC:_ZdlPvRKSt9nothrow_t@@GLIBCXX_3.4 FUNC:_Znam@@GLIBCXX_3.4 FUNC:_ZnamRKSt9nothrow_t@@GLIBCXX_3.4 Index: libstdc++-v3/config/abi/pre/gnu.ver === --- libstdc++-v3/config/abi/pre/gnu.ver (revision 182213) +++ libstdc++-v3/config/abi/pre/gnu.ver (working copy) @@ -1279,6 +1279,13 @@ GLIBCXX_3.4.16 { } GLIBCXX_3.4.15; +GLIBCXX_3.4.17 { + +# operator delete(void*, , unsigned long) +_ZdlPvm; + +} GLIBCXX_3.4.16; + # Symbols in the support library (libsupc++) have their own tag. CXXABI_1.3 { Index: gcc/testsuite/g++.dg/other/sized-delete-1.C === --- gcc/testsuite/g++.dg/other/sized-delete-1.C (revision 0) +++ gcc/testsuite/g++.dg/other/sized-delete-1.C (revision
[google] Fix bugs in sampled profile collection
This fixes two issues with sampled profile collection. It delays cleanup of instrumentation_to_be_sampled after all callgraph nodes have been instrumented and prevents gcov_sample_counter_decl and gcov_sampling_rate_decl from being garbage collected. Ok for google/gcc-4_6 and google/main branches? -Easwaran 2011-09-30 Easwaran Raman era...@google.com * tree-profile.c (gcov_sample_counter_decl): Add GTY marker. (gcov_sampling_rate_decl): Likewise. (add_sampling_to_edge_counters): Do not free instrumentation_to_be_sampled. (cleanup_instrumentation_sampling): New function. (tree_profiling): Call cleanup_instrumentation_sampling at the end. testsuite/ChangeLog.google-4_6: 2011-09-30 Easwaran Raman era...@google.com * gcc.dg/sample-profile-generate-1.c: New test. Index: gcc/testsuite/gcc.dg/sample-profile-generate-1.c === --- gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0) +++ gcc/testsuite/gcc.dg/sample-profile-generate-1.c(revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile} */ +/* { dg-options -O2 -fprofile-generate -fprofile-generate-sampling } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i 100; i++) +{ + foobar(i); +} +} + +void +bar (void) +{ + int i; + for (i = 0; i 100; i++) +{ + foobar(i); +} +} + +/* { dg-final { cleanup-coverage-files } } */ Index: tree-profile.c === --- tree-profile.c (revision 178897) +++ tree-profile.c (working copy) @@ -163,10 +163,10 @@ init_ic_make_global_vars (void) static struct pointer_set_t *instrumentation_to_be_sampled = NULL; /* extern __thread gcov_unsigned_t __gcov_sample_counter */ -static tree gcov_sample_counter_decl = NULL_TREE; +static GTY(()) tree gcov_sample_counter_decl = NULL_TREE; /* extern gcov_unsigned_t __gcov_sampling_rate */ -static tree gcov_sampling_rate_decl = NULL_TREE; +static GTY(()) tree gcov_sampling_rate_decl = NULL_TREE; /* forward declaration. */ void gimple_init_instrumentation_sampling (void); @@ -281,9 +281,13 @@ add_sampling_to_edge_counters (void) break; } } +} +static void +cleanup_instrumentation_sampling (void) +{ /* Free the bitmap. */ - if (instrumentation_to_be_sampled) + if (flag_profile_generate_sampling instrumentation_to_be_sampled) { pointer_set_destroy (instrumentation_to_be_sampled); instrumentation_to_be_sampled = NULL; @@ -1452,6 +1456,7 @@ tree_profiling (void) } del_node_map(); + cleanup_instrumentation_sampling(); return 0; }
Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)
+static inline hashval_t +edge_hash_function (unsigned int id1, unsigned int id2) +{ + /* If the number of functions is less than 1000, this gives a unique value + for every function id combination. */ + const int MULTIPLIER = 1000; + return id1* MULTIPLIER + id2; Change to id1 16 | id2 + if (edge_map == NULL) + { + edge_map = htab_create (25, edge_map_htab_hash_descriptor, + edge_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + if (function_map == NULL) + { + function_map = htab_create (25, function_map_htab_hash_descriptor, + function_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + caller_node = get_function_node (caller); + /* Real nodes are nodes that correspond to .text sections found. These will + definitely be sorted. */ + set_as_real_node (caller_node); This is redundant as set_node_type sets the type correctly. + if (*slot == NULL) + { + reset_functions (edge, new_node, kept_node); + *slot = edge; + add_edge_to_node (new_node, edge); edge will still be in old_node's edge_list +static void set_node_type (Node *n) +{ + void **slot; + char *name = n-name; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. + /* Allocate section_map. */ + if (section_map == NULL) + { + section_map = htab_create (10, section_map_htab_hash_descriptor, + section_map_htab_eq_descriptor , NULL); With -ffunction-sections, this should be roughly equal to size of function_map. +static void +write_out_node (FILE *fp, char *name, + void **handles, unsigned int *shndx, int position) +{ + void **slot; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. Index: function_reordering_plugin/function_reordering_plugin.c === --- function_reordering_plugin/function_reordering_plugin.c (revision 0) +++ function_reordering_plugin/function_reordering_plugin.c (revision 0) + parse_callgraph_section_contents (section_contents, (unsigned int)length); +} + else if (name != NULL) +free (name); name should be freed in the body of then and else-if as well. -Easwaran On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam tmsri...@google.com wrote: *Resending as plain text* I am attaching a patch of the updated files. This patch was meant for the google gcc-4_6 branch. Sorry for not mentioning this upfront in my original mail. Thanks, -Sri. On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd nfr...@mozilla.com wrote: On 9/23/2011 6:03 PM, Sriraman Tallam wrote: This patch adds a new linker plugin to re-order functions. This is great stuff. We were experimenting with using the coverage files to generate an ordering for --section-ordering-file, but this might be even better, will have to experiment with it. A couple of comments on the code itself: Index: function_reordering_plugin/callgraph.h +inline static Edge_list * +make_edge_list (Edge *e) +{ + Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list)); If you are going to use libiberty via hashtab.h, you might as well make use of the *ALLOC family of macros to make this and other allocations a little neater. +/*Represents an edge in the call graph. */ +struct __edge__ I think the usual convention is to call this edge_d or something similar, avoiding the profusion of underscores. +void +map_section_name_to_index (char *section_name, void *handle, int shndx) +{ + void **slot; + char *function_name; + + if (is_prefix_of (.text.hot., section_name)) + function_name = section_name + 10 /* strlen (.text.hot.) */; + else if (is_prefix_of (.text.unlikely., section_name)) + function_name = section_name + 15 /* strlen (.text.unlikely.) */; + else if (is_prefix_of (.text.cold., section_name)) + function_name = section_name + 11 /* strlen (.text.cold.) */; + else if (is_prefix_of (.text.startup., section_name)) + function_name = section_name + 14 /* strlen (.text.startup.) */; + else + function_name = section_name + 6 /*strlen (.text.) */; You don't handle plain .text; this is assuming -ffunction-sections, right? Can this limitation be easily removed? You need to compile with -ffunction-sections or the individual sections cannot be re-ordered. That is why I assumed a .text.* prefix before every function section. Did you have something else in mind? Thanks for your other comments. I will make those changes. -Sri. I think it is customary to put the comment after the semicolon. Might just be easier to
Re: [google] Linker plugin to do function reordering using callgraph edge profiles (issue5124041)
OK for google/gcc-4_6 and google/main branches. -Easwaran On Tue, Sep 27, 2011 at 4:07 PM, Sriraman Tallam tmsri...@google.com wrote: Made all the changes. Attaching new patch of updated files. On Tue, Sep 27, 2011 at 11:26 AM, Easwaran Raman era...@google.com wrote: +static inline hashval_t +edge_hash_function (unsigned int id1, unsigned int id2) +{ + /* If the number of functions is less than 1000, this gives a unique value + for every function id combination. */ + const int MULTIPLIER = 1000; + return id1* MULTIPLIER + id2; Change to id1 16 | id2 + if (edge_map == NULL) + { + edge_map = htab_create (25, edge_map_htab_hash_descriptor, + edge_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + if (function_map == NULL) + { + function_map = htab_create (25, function_map_htab_hash_descriptor, + function_map_htab_eq_descriptor , NULL); Use a larger size and don't hardcode the value. + caller_node = get_function_node (caller); + /* Real nodes are nodes that correspond to .text sections found. These will + definitely be sorted. */ + set_as_real_node (caller_node); This is redundant as set_node_type sets the type correctly. + if (*slot == NULL) + { + reset_functions (edge, new_node, kept_node); + *slot = edge; + add_edge_to_node (new_node, edge); edge will still be in old_node's edge_list I do not have to explicitly delete this as the merged node's edge_list will never be traversed except when cleaning up. +static void set_node_type (Node *n) +{ + void **slot; + char *name = n-name; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. + /* Allocate section_map. */ + if (section_map == NULL) + { + section_map = htab_create (10, section_map_htab_hash_descriptor, + section_map_htab_eq_descriptor , NULL); With -ffunction-sections, this should be roughly equal to size of function_map. +static void +write_out_node (FILE *fp, char *name, + void **handles, unsigned int *shndx, int position) +{ + void **slot; + slot = htab_find_slot_with_hash (section_map, name, htab_hash_string (name), + INSERT); Use htab_find_with_hash or pass NOINSERT. Index: function_reordering_plugin/function_reordering_plugin.c === --- function_reordering_plugin/function_reordering_plugin.c (revision 0) +++ function_reordering_plugin/function_reordering_plugin.c (revision 0) + parse_callgraph_section_contents (section_contents, (unsigned int)length); + } + else if (name != NULL) + free (name); name should be freed in the body of then and else-if as well. -Easwaran On Mon, Sep 26, 2011 at 5:22 PM, Sriraman Tallam tmsri...@google.com wrote: *Resending as plain text* I am attaching a patch of the updated files. This patch was meant for the google gcc-4_6 branch. Sorry for not mentioning this upfront in my original mail. Thanks, -Sri. On Mon, Sep 26, 2011 at 9:53 AM, Sriraman Tallam tmsri...@google.com wrote: On Mon, Sep 26, 2011 at 9:22 AM, Nathan Froyd nfr...@mozilla.com wrote: On 9/23/2011 6:03 PM, Sriraman Tallam wrote: This patch adds a new linker plugin to re-order functions. This is great stuff. We were experimenting with using the coverage files to generate an ordering for --section-ordering-file, but this might be even better, will have to experiment with it. A couple of comments on the code itself: Index: function_reordering_plugin/callgraph.h +inline static Edge_list * +make_edge_list (Edge *e) +{ + Edge_list *list = (Edge_list *)malloc (sizeof (Edge_list)); If you are going to use libiberty via hashtab.h, you might as well make use of the *ALLOC family of macros to make this and other allocations a little neater. +/*Represents an edge in the call graph. */ +struct __edge__ I think the usual convention is to call this edge_d or something similar, avoiding the profusion of underscores. +void +map_section_name_to_index (char *section_name, void *handle, int shndx) +{ + void **slot; + char *function_name; + + if (is_prefix_of (.text.hot., section_name)) + function_name = section_name + 10 /* strlen (.text.hot.) */; + else if (is_prefix_of (.text.unlikely., section_name)) + function_name = section_name + 15 /* strlen (.text.unlikely.) */; + else if (is_prefix_of (.text.cold., section_name)) + function_name = section_name + 11 /* strlen (.text.cold
[google] Backport r175063, r175082 and r175384 from trunk to google/gcc-4_6 branch.
Commited.
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Thu, Jun 23, 2011 at 3:22 AM, Eric Botcazou ebotca...@adacore.com wrote: So, what's the patch(es) that need approval now? Original expr.c patch for PR rtl-optimization/49429 + adjusted and augmented calls.c patch for PR target/49454. Everything is in this thread. Easwaran, would you mind posting a consolidated patch? -- Eric Botcazou Here is the revised patch. Bootstraps and all tests pass on x86_64-unknown-linux. OK for trunk? 2011-06-23 Easwaran Raman era...@google.com PR rtl-optimization/49429 PR target/49454 * expr.c (emit_block_move_hints): Mark MEM_EXPR(x) and MEM_EXPR(y) addressable if emit_block_move_via_libcall is used to copy y into x. * calls.c (initialize_argument_information): Mark an argument addressable if it is passed by invisible reference. (emit_library_call_value_1): Mark MEM_EXPR (val) addressable if it is passed by reference. Index: gcc/expr.c === --- gcc/expr.c (revision 175346) +++ gcc/expr.c (working copy) @@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu else if (may_use_call ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) -retval = emit_block_move_via_libcall (x, y, size, - method == BLOCK_OP_TAILCALL); +{ + /* Since x and y are passed to a libcall, mark the corresponding + tree EXPR as addressable. */ + tree y_expr = MEM_EXPR (y); + tree x_expr = MEM_EXPR (x); + if (y_expr) +mark_addressable (y_expr); + if (x_expr) +mark_addressable (x_expr); + retval = emit_block_move_via_libcall (x, y, size, + method == BLOCK_OP_TAILCALL); +} + else emit_block_move_via_loop (x, y, size, align); Index: gcc/calls.c === --- gcc/calls.c (revision 175346) +++ gcc/calls.c (working copy) @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A TREE_CODE (base) != SSA_NAME (!DECL_P (base) || MEM_P (DECL_RTL (base) { + mark_addressable (args[i].tree_value); + /* We can't use sibcalls if a callee-copied argument is stored in the current function's frame. */ if (!call_from_thunk_p DECL_P (base) !TREE_STATIC (base)) @@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun, } if (MEM_P (val) !must_copy) - slot = val; +{ + tree val_expr = MEM_EXPR (val); + if (val_expr) +mark_addressable (val_expr); + slot = val; +} else { slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Thu, Jun 23, 2011 at 12:16 PM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jun 23, 2011 at 12:02:35PM -0700, Easwaran Raman wrote: + if (y_expr) + mark_addressable (y_expr); Please watch formatting, a tab should be used instead of 8 spaces. + if (x_expr) + mark_addressable (x_expr); Ditto. @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A TREE_CODE (base) != SSA_NAME (!DECL_P (base) || MEM_P (DECL_RTL (base) { + mark_addressable (args[i].tree_value); + Likewise, plus the line is indented too much, each level should be indented by 2 chars. Jakub I have attached a new patch that fixes the formatting issues. Thanks, Easwaran Index: gcc/expr.c === --- gcc/expr.c (revision 175346) +++ gcc/expr.c (working copy) @@ -1181,8 +1181,19 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enu else if (may_use_call ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) -retval = emit_block_move_via_libcall (x, y, size, - method == BLOCK_OP_TAILCALL); +{ + /* Since x and y are passed to a libcall, mark the corresponding + tree EXPR as addressable. */ + tree y_expr = MEM_EXPR (y); + tree x_expr = MEM_EXPR (x); + if (y_expr) + mark_addressable (y_expr); + if (x_expr) + mark_addressable (x_expr); + retval = emit_block_move_via_libcall (x, y, size, + method == BLOCK_OP_TAILCALL); +} + else emit_block_move_via_loop (x, y, size, align); Index: gcc/calls.c === --- gcc/calls.c (revision 175346) +++ gcc/calls.c (working copy) @@ -1084,6 +1084,8 @@ initialize_argument_information (int num_actuals A TREE_CODE (base) != SSA_NAME (!DECL_P (base) || MEM_P (DECL_RTL (base) { + mark_addressable (args[i].tree_value); + /* We can't use sibcalls if a callee-copied argument is stored in the current function's frame. */ if (!call_from_thunk_p DECL_P (base) !TREE_STATIC (base)) @@ -3524,7 +3526,12 @@ emit_library_call_value_1 (int retval, rtx orgfun, } if (MEM_P (val) !must_copy) - slot = val; + { + tree val_expr = MEM_EXPR (val); + if (val_expr) + mark_addressable (val_expr); + slot = val; + } else { slot = assign_temp (lang_hooks.types.type_for_mode (mode, 0),
Re: Mark variables addressable if they are copied using libcall in RTL expander
On Wed, Jun 22, 2011 at 7:13 AM, Eric Botcazou ebotca...@adacore.com wrote: I fear this isn't enough considering pass-by-value aggregates that are callee copied. It's indeed not sufficient for arguments passed by reference but callee-copied. This is PR target/49454. For gcc.c-torture/execute/2717-1.c: typedef struct trio { int a, b, c; } trio; int foo (trio t, int i) { return bar (i, t); } yiedls in the .optimized dump: foo (struct trio t, int i) { int D.1968; struct trio t.0; bb 2: t.0 = t; D.1968_2 = bar (i_1(D), t.0); return D.1968_2; } and the aggregate copy is elided by DSE because t.0 isn't may_be_aliased. This seems to be a pre-existing bug though: its address is passed to bar in RTL. -- Eric Botcazou Is the following patch a reasonable fix for this case? I assume I should add similar code inside emit_library_call_value_1. -Easwaran --- gcc/calls.c (revision 175081) +++ gcc/calls.c (working copy) @@ -1073,6 +1073,8 @@ initialize_argument_information (int num_actuals A callee_copies = reference_callee_copied (args_so_far, TYPE_MODE (type), type, argpos n_named_args); + if (callee_copies) +mark_addressable (args[i].tree_value); /* If we're compiling a thunk, pass through invisible references instead of making a copy. */
Re: Improve DSE in the presence of calls
This patch seems to break ia64 and some other targets. I have updated Bug 49429 with a test case triage. It looks like for some EXPR, may_be_aliased returns incorrect value and hence can_escape incorrectly concludes the variable can't escape resulting in removal of useful stores. (So can_escape() returns false. But later on, in the same BB, I see: In the following list of instructions, (insn 4 3 6 2 (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) y.c:23 5 {movdi_internal} (expr_list:REG_DEAD (reg:DI 112 in0) (nil))) ... (insn 36 30 37 2 (set (reg:DI 120 out0) (reg/f:DI 357)) 5 {movdi_internal} (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 62 [0x3e])) (nil))) (insn 37 36 38 2 (set (reg:DI 121 out1) (reg/f:DI 341)) 5 {movdi_internal} (expr_list:REG_DEAD (reg/f:DI 341) (expr_list:REG_EQUAL (plus:DI (reg/f:DI 328 sfp) (const_int 96 [0x60])) (nil (insn 38 37 39 2 (set (reg:DI 122 out2) (const_int 31 [0x1f])) 5 {movdi_internal} (nil)) (call_insn 39 38 42 2 (parallel [ (set (reg:DI 8 r8) (call (mem:DI (symbol_ref:DI (memcpy) [flags 0x41] function_decl 0x770d2e00 memcpy) [0 memcpy S8 A64]) (const_int 1 [0x1]))) (clobber (reg:DI 320 b0)) (clobber (scratch:DI)) (clobber (scratch:DI)) ]) 332 {call_value_gp} (expr_list:REG_DEAD (reg:DI 122 out2) (expr_list:REG_DEAD (reg:DI 121 out1) (expr_list:REG_DEAD (reg:DI 120 out0) (expr_list:REG_UNUSED (reg:DI 8 r8) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)) (expr_list:REG_DEP_TRUE (use (reg:DI 1 r1)) (expr_list:REG_DEP_TRUE (use (reg:DI 122 out2)) (expr_list:REG_DEP_TRUE (use (reg:DI 121 out1)) (expr_list:REG_DEP_TRUE (use (reg:DI 120 out0)) (nil)) for the memory expression (set (mem/s/c:DI (reg/f:DI 341) [2 s1+0 S8 A64]) (reg:DI 112 in0)) may_be_aliased() returns false, but register 341 is passed as the second parameter to memcpy. I suspect this is a bug elsewhere which is exposed by this patch. If someone knows why this might be happening, I can tighten the can_escape() function appropriately. Thanks, Easwaran On Tue, Jun 14, 2011 at 9:34 AM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 05/10/11 13:18, Easwaran Raman wrote: I am not sure I understand the problem here. If there is a wild read from asm, the instruction has the wild_read flag set. The if statement checks if that flag is set and if so it clears the bitmap - which was the original behavior. Originally, only if read_rec is non NULL you need to recompute the kill set. Now, even if read_rec is NULL, non_frame_wild_read could be set requiring the kill set to be modified, which is what this patch does. In fact, isn't what you have written above the equivalent to what is in the patch as '/* Leave this clause unchanged */' is the same as if (dump_file) fprintf (dump_file, regular read\n); scan_reads_nospill (insn_info, v, NULL); -Easwaran Ping. I have changed the test case to use int and added another test case that shows DSE doesn't happen when the struct instance is volatile (wild_read gets set in that case) What's the purpose behind using unit64_t in the testcase? Somehow I suspect using int64_t means the test is unlikely not going to work across targets with different word sizes. Sorry for the exceedingly long wait. Things have been a bit crazy the last several weeks. On a positive note, re-reading things now I think my objection/comment was mis-guided. Patch approved, and again, sorry for the absurdly long period of non-responsiveness. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJN942IAAoJEBRtltQi2kC7Fj4IAIUvXsEKHZEKHS2k/psJWyaM Uo/vW3CLydRP0+Np/VVSwzHlmWfdWmOj1WPw1Svhvr4gP8BrZ13okVv5jbw1Hh3Y R4mShXFK5eYzmGx5wL54hOze5zViN3gomNGbDAAhk6TCzNXmPyLT/6V1tLFTNhD5 6zOiW8pXh9ik6qTTCKbG0EMuJXDnIbYrJs4d/gHFerUgmRPc8adKjF3PCngD3F4r 40n9W/UxUejYUddavDW1fIdALWYc56F3glplFsII7SMnOmih8MTFYOvk6SZsLS5O G2nzmnUuwt6tPWTyk9bpVKQi5dn8MmLkM13w22t36GKIg6OER2KfUdv44dgE7yw= =o7AI -END PGP SIGNATURE-
Re: Ping: Re: Improve DSE in the presence of calls
Ping. Diego, David, Is this patch OK for google/main? -Easwaran On Thu, Jun 2, 2011 at 4:48 PM, Easwaran Raman era...@google.com wrote: Ping. On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
Re: Ping: Re: Improve DSE in the presence of calls
Ping. On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
Re: Ping: Re: Improve DSE in the presence of calls
Ping. On Sat, May 14, 2011 at 8:01 AM, Easwaran Raman era...@google.com wrote: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
Ping: Re: Improve DSE in the presence of calls
http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00781.html
Re: Improve DSE in the presence of calls
On Tue, May 3, 2011 at 9:40 AM, Easwaran Raman era...@google.com wrote: On Mon, May 2, 2011 at 8:37 PM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/26/11 16:06, Easwaran Raman wrote: You're right. The patch has correctness issues. It is not possible to simply not call add_wild_read because it also resets active_local_stores and frees read_recs. During the local phase it seems better to just treat calls as wild reads and reset active_local_stores and free read_recs. I've now refactored add_wild_read so that resetting active_local_stores and free read_recs are in separate methods that can be called on non-const/non-memset calls. In addition, I have added a non_frame_wild_read field in insn_info to mark non-const and non-memset calls. I've attached the revised patch. Looking better. Just a few more things. Don't all locals escape if the callee has a static chain? Is that handled anywhere? Don't you still have the potential for wild reads in dse_step5_nospill (say from an asm)? if so, then the change can't be correct. Couldn't you just add a clause like this before the else-if? else if (insn_info-non_frame_wild_read) { if (dump_file) fprintf (dump_file, non-frame wild read\n); scan_reads_nospill (insn_info, v, NULL); } else if (insn_info-read_rec) { /* Leave this clause unchanged */ } Am I missing something? I am not sure I understand the problem here. If there is a wild read from asm, the instruction has the wild_read flag set. The if statement checks if that flag is set and if so it clears the bitmap - which was the original behavior. Originally, only if read_rec is non NULL you need to recompute the kill set. Now, even if read_rec is NULL, non_frame_wild_read could be set requiring the kill set to be modified, which is what this patch does. In fact, isn't what you have written above the equivalent to what is in the patch as '/* Leave this clause unchanged */' is the same as if (dump_file) fprintf (dump_file, regular read\n); scan_reads_nospill (insn_info, v, NULL); -Easwaran Ping. I have changed the test case to use int and added another test case that shows DSE doesn't happen when the struct instance is volatile (wild_read gets set in that case) What's the purpose behind using unit64_t in the testcase? Somehow I suspect using int64_t means the test is unlikely not going to work across targets with different word sizes. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNv3iGAAoJEBRtltQi2kC7C8oH+gPi/6DvV2JkmitT3xEeUaW8 szOHC0GysQkot/TiVlQQy6fiz61G49ii0mz68W4cfSntvuN7iaqBqVWpRKWIzvkx alk4U1snj9a2t9+9ZRTX4xm3quggTv+mUDK81a+DIS2zAf6i/HRXLvQbx6fhDOYD sqXqSkvCKqkH2pPHHYEqnBtS/cLFtAfZJe+JSx3u2oqL+sBFGLftdoV6yJzkShLS LpmYHMeDbzhdCtZTf15GQm3h/cBlyrChxjQsjJxLiXk5rrcDI/X+Pqi+cll21Gwg mlzMBi0iYToENl+7aO8DNGvXfliCEzQ7iEUyTctJiTDt3/RgVcaxgRJgqHZNSBI= =VOdm -END PGP SIGNATURE- Index: gcc/testsuite/gcc.dg/pr44194-1.c === --- gcc/testsuite/gcc.dg/pr44194-1.c (revision 0) +++ gcc/testsuite/gcc.dg/pr44194-1.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-dse1 } */ +#include stdint.h + +struct ints { int a, b, c; } foo(); +void bar(int a, int b); + +void func() { + struct ints s = foo(); + bar(s.a, s.b); +} +/* { dg-final { scan-rtl-dump global deletions = 2 dse1 } } */ +/* { dg-final { cleanup-rtl-dump dse1 } } */ Index: gcc/testsuite/gcc.dg/pr44194-2.c === --- gcc/testsuite/gcc.dg/pr44194-2.c (revision 0) +++ gcc/testsuite/gcc.dg/pr44194-2.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-dse1 } */ +#include stdint.h + +struct ints { int a, b, c; } foo(); +void bar(int a, int b); + +void func() { + volatile struct ints s = foo(); + bar(s.a, s.b); +} +/* { dg-final { scan-rtl-dump global deletions = 0 dse1 } } */ +/* { dg-final { cleanup-rtl-dump dse1 } } */ Index: gcc/dse.c === --- gcc/dse.c (revision 172844) +++ gcc/dse.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include dbgcnt.h #include target.h #include params.h +#include tree-flow.h /* This file contains three techniques for performing Dead Store Elimination (dse). @@ -326,6 +327,11 @@ struct insn_info contains a wild read, the use_rec will be null. */ bool wild_read; + /* This is true only for CALL instructions which could potentially read + any non-frame memory location. This field is used by the global + algorithm. */ + bool non_frame_wild_read; + /* This field is only used for the processing of const functions
[google]Implement an optimization based on reuse distance profiling (issue4517049)
This patch by Silvius Rus replaces calls to certain functions with a specialized version that uses non-temporal stores based on memory reuse distance profiling. Bootstraps, no test regressions and the profiling works for a small test case. Ok for google/main.? -Easwaran 2011-05-09 Silvius Rus silvius@gmail.com * value-prof.c (interesting_stringop_to_profile_p): Disbale stringop profiling if reuse distance profiling is turned on. * value-prof.h (gimple_gen_reusedist, optimize_reusedist): Declare. * gcov-io.h: (GCOV_COUNTER_REUSE_DIST): New counter. (GCOV_COUNTERS): Update. (GCOV_COUNTER_NAMES): Add reuse_distance. (GCOV_MERGE_FUNCTIONS): Add __gcov_merge_reusedist. (__gcov_merge_reusedist): New declaration. * profile.c (branch_prob): Enable reuse distance profiling and optimization. * common.opt (fprofile-reusedist, foptimize-locality): New options. * tree-profile.c: Include params.h. (stringop_subst, reusedist_t): New structures. (stringop_subst_t, reusedist_t): New typedefs. (stringop_decl): New global array. (RD_NUM_COUNTERS): New constant. (get_stringop_subst, reusedist_is_interesting_call) (reusedist_instr_func_name, reusedist_get_instr_decl) (reusedist_make_instr_call, reusedist_from_counters) (gimple_gen_reusedist, nt_op_name, reusedist_get_size_threshold) (reusedist_get_distance_large_threshold) (reusedist_get_distance_small_threshold) (reusedist_get_count_threshold, reusedist_nt_is_worth_it) (reusedist_get_nt_decl, maybe_issue_profile_use_note) (reusedist_maybe_replace_with_nt_version, optimize_reusedist): New functions. (gate_tree_profile_ipa): Return true if reuse distance instrumentation or use is turned on. * Makefile.in (LIBGCOV): Add _gcov_merge_reusedist. * libgcov.c (__gcov_weighted_mean2, __gcov_merge_reusedist): New functions. * params.def (PARAM_REUSEDIST_MEAN_DIST_LARGE_THRESH) (PARAM_REUSEDIST_MEAN_DIST_SMALL_THRESH) (PARAM_REUSEDIST_CALL_COUNT_THRESH, PARAM_REUSEDIST_MEMCPY_SIZE_THRESH) (PARAM_REUSEDIST_MEMSET_SIZE_THRESH): New params. Index: gcc/value-prof.c === --- gcc/value-prof.c(revision 173496) +++ gcc/value-prof.c(working copy) @@ -1708,6 +1708,14 @@ interesting_stringop_to_profile_p (tree fndecl, gi { enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl); + /* Disable stringop collection with reuse distance instrumentation + or optimization. Otherwise we end up with hard to correct profile + mismatches for functions where reuse distance-based transformation are + made. We see a number of memcpy at instrumentation time and a different + number of memcpy at profile use time. */ + if (flag_profile_reusedist || flag_optimize_locality) +return false; + if (fcode != BUILT_IN_MEMCPY fcode != BUILT_IN_MEMPCPY fcode != BUILT_IN_MEMSET fcode != BUILT_IN_BZERO) return false; Index: gcc/value-prof.h === --- gcc/value-prof.h(revision 173496) +++ gcc/value-prof.h(working copy) @@ -103,6 +103,8 @@ extern void gimple_gen_const_delta_profiler (histo unsigned, unsigned); extern void gimple_gen_average_profiler (histogram_value, unsigned, unsigned); extern void gimple_gen_ior_profiler (histogram_value, unsigned, unsigned); +extern void gimple_gen_reusedist (void); +extern void optimize_reusedist (void); /* In profile.c. */ extern void init_branch_prob (void); Index: gcc/gcov-io.h === --- gcc/gcov-io.h (revision 173496) +++ gcc/gcov-io.h (working copy) @@ -374,7 +374,8 @@ typedef HOST_WIDEST_INT gcov_type; #define GCOV_LAST_VALUE_COUNTER 8 /* The last of counters used for value profiling. */ #define GCOV_COUNTER_DIRECT_CALL 9 /* Direct call counts. */ -#define GCOV_COUNTERS 10 +#define GCOV_COUNTER_REUSE_DIST 10 /* Reuse distance measure. */ +#define GCOV_COUNTERS 11 /* Number of counters used for value profiling. */ #define GCOV_N_VALUE_COUNTERS \ @@ -383,7 +384,8 @@ typedef HOST_WIDEST_INT gcov_type; /* A list of human readable names of the counters */ #define GCOV_COUNTER_NAMES {arcs, interval, pow2, single, \ delta,indirect_call, average, ior, \ -indirect_call_topn, direct_call} +indirect_call_topn, direct_call, \ + reuse_distance} #define GCOV_ICALL_TOPN_VAL 2 /* Track two hottest callees */ #define GCOV_ICALL_TOPN_NCOUNTS 9 /* The number of counter entries per icall
[google] Backport r172837 and r172788 to google/main
Backported r172788 and r172837 from trunk to google/main. 2011-05-06 Easwaran Raman era...@google.com Backport r172837: * cfgexpand.c (stack_var): Remove OFFSET... (add_stack_var): ...and its reference here... (expand_stack_vars): ...and here. (stack_var_cmp): Sort by descending order of size. (partition_stack_vars): Change heuristic. (union_stack_vars): Fix to reflect changes in partition_stack_vars. (dump_stack_var_partition): Add newline after each partition. 2011-05-06 Easwaran Raman era...@google.com Backport r172788: * cfgexpand.c (add_alias_set_conflicts): Add conflicts with a variable containing union type only with -fstrict-aliasing. testsuite/ChangeLog.google-main: 2011-05-06 Easwaran Raman era...@google.com Backport r172837: * gcc.dg/stack-layout-2.c: New test. 2011-05-06 Easwaran Raman era...@google.com Backport r172788: * gcc.dg/stack-layout-1.c: New test. Index: gcc/testsuite/gcc.dg/stack-layout-1.c === --- gcc/testsuite/gcc.dg/stack-layout-1.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-1.c (revision 173499) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-strict-aliasing -fdump-rtl-expand } */ +union U { + int a; + float b; +}; +struct A { + union U u1; + char a[100]; +}; +void bar (struct A *); +void foo () + { +{ + struct A a; + bar (a); +} +{ + struct A a; + bar (a); +} + } + +/* { dg-final { scan-rtl-dump-times Partition 1 expand } } */ +/* { dg-final { cleanup-rtl-dump expand } } */ Index: gcc/testsuite/gcc.dg/stack-layout-2.c === --- gcc/testsuite/gcc.dg/stack-layout-2.c (revision 0) +++ gcc/testsuite/gcc.dg/stack-layout-2.c (revision 173499) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-expand } */ +void bar( char *); +int foo() +{ + int i=0; + { +char a[8000]; +bar(a); +i += a[0]; + } + { +char a[8192]; +char b[32]; +bar(a); +i += a[0]; +bar(b); +i += a[0]; + } + return i; +} +/* { dg-final { scan-rtl-dump size 8192 expand } } */ +/* { dg-final { scan-rtl-dump size 32 expand } } */ Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 173498) +++ gcc/cfgexpand.c (revision 173499) @@ -158,11 +158,6 @@ struct stack_var /* The Variable. */ tree decl; - /* The offset of the variable. During partitioning, this is the - offset relative to the partition. After partitioning, this - is relative to the stack frame. */ - HOST_WIDE_INT offset; - /* Initially, the size of the variable. Later, the size of the partition, if this variable becomes it's partition's representative. */ HOST_WIDE_INT size; @@ -267,7 +262,6 @@ add_stack_var (tree decl) v = stack_vars[stack_vars_num]; v-decl = decl; - v-offset = 0; v-size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (decl)), 1); /* Ensure that all variables have size, so that a != b for any two variables that are simultaneously live. */ @@ -372,8 +366,9 @@ add_alias_set_conflicts (void) to elements will conflict. In case of unions we have to be careful as type based aliasing rules may say access to the same memory does not conflict. So play - safe and add a conflict in this case. */ - || contains_union) + safe and add a conflict in this case when + -fstrict-aliasing is used. */ + || (contains_union flag_strict_aliasing)) add_stack_var_conflict (i, j); } } @@ -403,9 +398,9 @@ stack_var_cmp (const void *a, const void *b) return (int)largeb - (int)largea; /* Secondary compare on size, decreasing */ - if (sizea sizeb) -return -1; if (sizea sizeb) +return -1; + if (sizea sizeb) return 1; /* Tertiary compare on true alignment, decreasing. */ @@ -564,28 +559,19 @@ update_alias_info_with_stack_vars (void) /* A subroutine of partition_stack_vars. The UNION portion of a UNION/FIND partitioning algorithm. Partitions A and B are known to be non-conflicting. - Merge them into a single partition A. + Merge them into a single partition A. */ - At the same time, add OFFSET to all variables in partition B. At the end - of the partitioning process we've have a nice block easy to lay out within - the stack frame. */ - static void -union_stack_vars (size_t a, size_t b, HOST_WIDE_INT offset) +union_stack_vars (size_t a, size_t b) { - size_t i, last; struct stack_var *vb = stack_vars[b]; bitmap_iterator bi; unsigned u; - /* Update each element of partition B with the given offset, - and merge them into partition A. */ - for (last = i = b; i != EOC; last = i, i = stack_vars[i].next) -{ - stack_vars[i].offset
Re: [google] Backport r172837 and r172788 to google/main
Thanks. Fixed them (as well as the same issue in some earlier entries). -Easwaran On Fri, May 6, 2011 at 11:27 AM, Diego Novillo dnovi...@google.com wrote: On Fri, May 6, 2011 at 14:22, Easwaran Raman era...@google.com wrote: Backported r172788 and r172837 from trunk to google/main. Minor nit: 2011-05-06 Easwaran Raman era...@google.com Backport r172837: This needs to be indented by 1 tab. * cfgexpand.c (stack_var): Remove OFFSET... (add_stack_var): ...and its reference here... (expand_stack_vars): ...and here. Diego.
Re: Improve DSE in the presence of calls
On Mon, May 2, 2011 at 8:37 PM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/26/11 16:06, Easwaran Raman wrote: You're right. The patch has correctness issues. It is not possible to simply not call add_wild_read because it also resets active_local_stores and frees read_recs. During the local phase it seems better to just treat calls as wild reads and reset active_local_stores and free read_recs. I've now refactored add_wild_read so that resetting active_local_stores and free read_recs are in separate methods that can be called on non-const/non-memset calls. In addition, I have added a non_frame_wild_read field in insn_info to mark non-const and non-memset calls. I've attached the revised patch. Looking better. Just a few more things. Don't all locals escape if the callee has a static chain? Is that handled anywhere? Don't you still have the potential for wild reads in dse_step5_nospill (say from an asm)? if so, then the change can't be correct. Couldn't you just add a clause like this before the else-if? else if (insn_info-non_frame_wild_read) { if (dump_file) fprintf (dump_file, non-frame wild read\n); scan_reads_nospill (insn_info, v, NULL); } else if (insn_info-read_rec) { /* Leave this clause unchanged */ } Am I missing something? I am not sure I understand the problem here. If there is a wild read from asm, the instruction has the wild_read flag set. The if statement checks if that flag is set and if so it clears the bitmap - which was the original behavior. Originally, only if read_rec is non NULL you need to recompute the kill set. Now, even if read_rec is NULL, non_frame_wild_read could be set requiring the kill set to be modified, which is what this patch does. In fact, isn't what you have written above the equivalent to what is in the patch as '/* Leave this clause unchanged */' is the same as if (dump_file) fprintf (dump_file, regular read\n); scan_reads_nospill (insn_info, v, NULL); -Easwaran What's the purpose behind using unit64_t in the testcase? Somehow I suspect using int64_t means the test is unlikely not going to work across targets with different word sizes. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNv3iGAAoJEBRtltQi2kC7C8oH+gPi/6DvV2JkmitT3xEeUaW8 szOHC0GysQkot/TiVlQQy6fiz61G49ii0mz68W4cfSntvuN7iaqBqVWpRKWIzvkx alk4U1snj9a2t9+9ZRTX4xm3quggTv+mUDK81a+DIS2zAf6i/HRXLvQbx6fhDOYD sqXqSkvCKqkH2pPHHYEqnBtS/cLFtAfZJe+JSx3u2oqL+sBFGLftdoV6yJzkShLS LpmYHMeDbzhdCtZTf15GQm3h/cBlyrChxjQsjJxLiXk5rrcDI/X+Pqi+cll21Gwg mlzMBi0iYToENl+7aO8DNGvXfliCEzQ7iEUyTctJiTDt3/RgVcaxgRJgqHZNSBI= =VOdm -END PGP SIGNATURE-
[google]Add support for sampled profile collection (issue4438083)
This patch from Silvius Rus adds support for sampled edge profile collection to reduce instrumentation run overhead. Bootstraps and no test regressions. Ok for google/main? 2011-04-28 Silvius Rus silvius@gmail.com * doc/invoke.texi: Document -fprofile-generate-sampling option. * gcov-io.h (__gcov_set_sampling_rate): New declaration. * profile.c (branch_prob): Add support for sampled profile collection. * profile.h (add_sampling_to_edge_counters): New declaration. * common.opt (fprofile-generate-sampling): New option. * tree-profile: Include header files; define EDGE_COUNTER_STMT_COUNT. (instrumentation_to_be_sampled, gcov_sample_counter_decl) (gcov_sampling_rate_decl): New globals. (insert_if_then, add_sampling_wrapper, is_instrumentation_to_be_sampled) (add_sampling_to_edge_counters, gimple_init_instrumentation_sampling): New functions. (gimple_init_edge_profiler): Call gimple_init_instrumentation_sampling. (gimple_gen_edge_profiler): Mark start of instrumentation block. * libgcov.c (__gcov_sampling_rate): New extern declaration. (gcov_sampling_rate_initialized, __gcov_sample_counter): New globals. (gcov_exit): Set sampling rate; minor coding style fixes. * params.def (PARAM_PROFILE_GENERATE_SAMPLING_RATE): New parameter. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 173136) +++ gcc/doc/invoke.texi (working copy) @@ -375,7 +375,7 @@ Objective-C and Objective-C++ Dialects}. -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol -fprefetch-loop-arrays @gol -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol --fprofile-generate=@var{path} @gol +-fprofile-generate=@var{path} -fprofile-generate-sampling @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol -freorder-blocks-and-partition -freorder-functions @gol @@ -7923,6 +7923,20 @@ The following options are enabled: @code{-fprofile If @var{path} is specified, GCC will look at the @var{path} to find the profile feedback data files. See @option{-fprofile-dir}. +@item -fprofile-generate-sampling +@opindex -fprofile-generate-sampling + +Enable sampling for instrumented binaries. Instead of recording every event, +record only every N-th event, where N (the sampling rate) can be set either +at compile time using +@option{--param profile-generate-sampling-rate=@var{value}}, or +at execution start time through environment variable @samp{GCOV_SAMPLING_RATE}. + +At this time sampling applies only to branch counters. A sampling rate of 100 +decreases instrumentated binary slowdown from up to 20x for heavily threaded +applications down to around 2x. @option{-fprofile-correction} is always +needed with sampling. + @item -fprofile-use @itemx -fprofile-use=@var{path} @opindex fprofile-use @@ -9138,6 +9152,9 @@ recognize. If you want to pass an option that takes an argument, you must use @option{-Xassembler} twice, once for the option and once for the argument. +@item profile-generate-sampling-rate +Set the sampling rate with @option{-fprofile-generate-sampling}. + @end table @node Link Options Index: gcc/gcov-io.h === --- gcc/gcov-io.h (revision 173136) +++ gcc/gcov-io.h (working copy) @@ -544,6 +544,9 @@ struct dyn_imp_mod /* Register a new object file module. */ extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN; +/* Set sampling rate to RATE. */ +extern void __gcov_set_sampling_rate (unsigned int rate); + /* Called before fork, to avoid double counting. */ extern void __gcov_flush (void) ATTRIBUTE_HIDDEN; Index: gcc/profile.c === --- gcc/profile.c (revision 173136) +++ gcc/profile.c (working copy) @@ -1210,6 +1210,9 @@ branch_prob (void) /* Commit changes done by instrumentation. */ gsi_commit_edge_inserts (); + + if (flag_profile_generate_sampling) +add_sampling_to_edge_counters (); } free_aux_for_edges (); Index: gcc/profile.h === --- gcc/profile.h (revision 173136) +++ gcc/profile.h (working copy) @@ -47,4 +47,10 @@ extern gcov_type sum_edge_counts (VEC (edge, gc) * extern void init_node_map (void); extern void del_node_map (void); +/* Implement sampling to avoid writing to edge counters very often. + Many concurrent writes to the same counters, or to counters that share + the same cache line leads to up to 30x slowdown on an application running + on 8 CPUs. With sampling, the slowdown reduced to 2x. */ +extern void add_sampling_to_edge_counters (void); + #endif /* PROFILE_H */ Index: gcc/common.opt
Re: Improve DSE in the presence of calls
On Mon, Apr 25, 2011 at 10:03 AM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/22/11 14:19, Easwaran Raman wrote: Hi, This patch improves RTL DSE by not assuming that calls read all memory locations. With this patch, calls are assumed to read any non-frame memory and any stack variables that can potentially escape. This patch partly addresses PR rtl-optimization/44194. Bootstraps and no test regressions. OK for trunk? Thanks, Easwaran 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * dse.c (header files): Include tree-flow.h. (group_info): Add fields. (globals): Add a new variable kill_on_calls. (get_group_info): Initialize added fields. (dse_step0): Initialize kill_on_calls. (can_escape): New function. (record_store): Pass EXPR corresponding to MEM to set_usage_bits. (dse_step2_nospill): Set kill_on_calls based on group-escaped_*. (scan_reads_nospill): Handle call instructions. (find_insn_before_first_wild_read): Remove the function. (dse_step3_scan): Remove call to find_insn_before_first_wild_read. (dse_step5_nospill): Do not kill everything on call. testsuite/ChangeLog: 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * gcc.dg/pr44194-1.c: New test. On a bookkeeping note, it doesn't appear that you ever release the bitmaps for gi-escaped_{p,n} or kill_on_calls. This should probably be done in dse_step7. I've now freed the bitmaps in dse_step7. I'm going to assume that you and Richi have agreed upon the form of can_escape. Yes, Richard suggested to use may_be_aliased for now to check for escaping. AFAICT, even after your patch we still have a wild read recorded for most CALL_INSNs (see this code in scan_insn which was not modified): else /* Every other call, including pure functions, may read memory. */ add_wild_read (bb_info); It appears that you effectively ignore the wild_read in scan_reads_nospill by special casing CALL_INSNs. Correct? Is so, does it make still sense to call add_wild_read in scan_insn? ISTM that wild reads are still possible due to a few other constructs such as asms, address canonicalization failure, etc, so why is the removal of the wild read support in dse_step5_nospill correct? You're right. The patch has correctness issues. It is not possible to simply not call add_wild_read because it also resets active_local_stores and frees read_recs. During the local phase it seems better to just treat calls as wild reads and reset active_local_stores and free read_recs. I've now refactored add_wild_read so that resetting active_local_stores and free read_recs are in separate methods that can be called on non-const/non-memset calls. In addition, I have added a non_frame_wild_read field in insn_info to mark non-const and non-memset calls. I've attached the revised patch. -Easwaran Just to be clear, I generally like the change, but there's some details that I think need to be better understood before installing. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNtalUAAoJEBRtltQi2kC7evMH/0UyasIBTcZM7MSeCDGvefdR byd5/XaaH1FtdRS64ne1bGbjB/h96AJbXYfc3QECtKugfmwgSbJeaN/BD8T4WY/L 2z1MkeSwcpswnxufjTR6G5WezuUsxagB4/xoSqk7NRfJdsM3eBR9n+ehTwlfsU5q bTvQ/Uat0x027ddyrifD6vIOIIqFwMba9CvvN7vV0O+yzuxjzxNfe4IEcyMg2RIZ j6xK+Bv7pge9pHC8ERKOFO17CPdK2JBe4ovtFL8s1sadBkjO2044uRszcl8E/9rj MUGHExtFEIypzzi0wBqWyy3fz5QRCBuN/Xj6ptk4Cw7dkuBTZPWjySWrXV5kEs4= =omzC -END PGP SIGNATURE- 2011-04-26 Easwaran Raman era...@google.com * dse.c: Include tree-flow.h (insn_info): Add new field non_frame_wild_read. (group_info): Add new fields escaped_n and escaped_p. (kill_on_calls): New variable. (get_group_info): Initialize gi-escaped_n and gi-escaped_p. (dse_step0): Initialize kill_on_calls. (can_escape): New function. (set_usage_bits): Add additional parameter; record information about escaped locations. (record_store): Pass EXPR corresponding to MEM to set_usage_bits. (dse_step2_nospill): Set kill_on_calls based on group-escaped_n and group-escaped_n. (add_wild_read): Refactor into... (reset_active_stores): ... New method, and (free_read_records): ... New method. (add_non_frame_wild_read): New method. (scan_insn): Call add_non_frame_wild_read on non-const calls. (scan_reads_nospill): Handle instructions with non_frame_wild_read. (dse_step5_nospill): Call scan_reads_nospill for instructions marked as non_frame_wild_read. (dse_step7): Free escaped_n, escaped_p and kill_on_calls bitmaps. testsuite/ChangeLog 2011-04-26 Easwaran Raman era...@google.com
Re: [google]Pass --save-temps to the assembler (issue4436049)
On Thu, Apr 21, 2011 at 3:20 PM, Andrew Pinski pins...@gmail.com wrote: On Thu, Apr 21, 2011 at 3:18 PM, Easwaran Raman era...@google.com wrote: When using gcc with a post-assembly tool, we use a wrapper that invokes the tool and needs to know if the tool's output (another assembly file) needs to be saved. The wrapper doesn't generate any additional .o files and hence doesn't need to know whether the obj file is to be saved (which is done by the driver). Why do we need to pass -save-temps=obj in that case? -save-temps=obj means place the save temps in the same directory as the object file would be put rather than the current directory. Thanks, Andrew Pinski Sorry, I misunderstood that option. Is the following patch ok? Configured with --with-gnu-as option, verified that it bootstraps and gcc passes --save-temps to 'as' when run with -save-temps=obj. Diego, OK for google/main? -Easwaran 2011-04-25 Easwaran Raman era...@google.com * gcc/gcc.c (asm_options): Pass --save-temps to assembler when invoked with -save-temps= option. Index: gcc/gcc.c === --- gcc/gcc.c (revision 172932) +++ gcc/gcc.c (working copy) @@ -784,7 +784,7 @@ static const char *asm_options = #if HAVE_GNU_AS /* If GNU AS is used, then convert -w (no warnings), -I, and -v to the assembler equivalents. */ -%{v} %{w:-W} %{I*} %{save-temps:--save-temps} +%{v} %{w:-W} %{I*} %{save-temps*:--save-temps} #endif %a %Y %{c:%W{o*}%{!o*:-o %w%b%O}}%{!c:-o %d%w%u%O};
Improve DSE in the presence of calls
Hi, This patch improves RTL DSE by not assuming that calls read all memory locations. With this patch, calls are assumed to read any non-frame memory and any stack variables that can potentially escape. This patch partly addresses PR rtl-optimization/44194. Bootstraps and no test regressions. OK for trunk? Thanks, Easwaran 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * dse.c (header files): Include tree-flow.h. (group_info): Add fields. (globals): Add a new variable kill_on_calls. (get_group_info): Initialize added fields. (dse_step0): Initialize kill_on_calls. (can_escape): New function. (record_store): Pass EXPR corresponding to MEM to set_usage_bits. (dse_step2_nospill): Set kill_on_calls based on group-escaped_*. (scan_reads_nospill): Handle call instructions. (find_insn_before_first_wild_read): Remove the function. (dse_step3_scan): Remove call to find_insn_before_first_wild_read. (dse_step5_nospill): Do not kill everything on call. testsuite/ChangeLog: 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * gcc.dg/pr44194-1.c: New test. Index: gcc/testsuite/gcc.dg/pr44194-1.c === --- gcc/testsuite/gcc.dg/pr44194-1.c (revision 0) +++ gcc/testsuite/gcc.dg/pr44194-1.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-rtl-dse1 } */ +#include stdint.h + +struct twoints { uint64_t a, b; } foo(); +void bar(uint64_t a, uint64_t b); + +void func() { + struct twoints s = foo(); + bar(s.a, s.b); +} +/* { dg-final { scan-rtl-dump global deletions = 2 dse1 } } */ +/* { dg-final { cleanup-rtl-dump dse1 } } */ Index: gcc/dse.c === --- gcc/dse.c (revision 172844) +++ gcc/dse.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include dbgcnt.h #include target.h #include params.h +#include tree-flow.h /* This file contains three techniques for performing Dead Store Elimination (dse). @@ -501,6 +502,11 @@ struct group_info deleted. */ bitmap store1_n, store1_p, store2_n, store2_p; + /* These bitmaps keep track of offsets in this group escape this function. + An offset escapes if it corresponds to a named variable whose + addressable flag is set. */ + bitmap escaped_n, escaped_p; + /* The positions in this bitmap have the same assignments as the in, out, gen and kill bitmaps. This bitmap is all zeros except for the positions that are occupied by stores for this group. */ @@ -588,6 +594,9 @@ static int spill_deleted; static bitmap all_blocks; +/* Locations that are killed by calls in the global phase. */ +static bitmap kill_on_calls; + /* The number of bits used in the global bitmaps. */ static unsigned int current_position; @@ -692,6 +701,8 @@ get_group_info (rtx base) gi-store1_p = BITMAP_ALLOC (NULL); gi-store2_n = BITMAP_ALLOC (NULL); gi-store2_p = BITMAP_ALLOC (NULL); + gi-escaped_p = BITMAP_ALLOC (NULL); + gi-escaped_n = BITMAP_ALLOC (NULL); gi-group_kill = BITMAP_ALLOC (NULL); gi-process_globally = false; gi-offset_map_size_n = 0; @@ -714,6 +725,8 @@ get_group_info (rtx base) gi-store1_p = BITMAP_ALLOC (NULL); gi-store2_n = BITMAP_ALLOC (NULL); gi-store2_p = BITMAP_ALLOC (NULL); + gi-escaped_p = BITMAP_ALLOC (NULL); + gi-escaped_n = BITMAP_ALLOC (NULL); gi-group_kill = BITMAP_ALLOC (NULL); gi-process_globally = false; gi-frame_related = @@ -739,6 +752,7 @@ dse_step0 (void) spill_deleted = 0; scratch = BITMAP_ALLOC (NULL); + kill_on_calls = BITMAP_ALLOC (NULL); rtx_store_info_pool = create_alloc_pool (rtx_store_info_pool, @@ -881,31 +895,48 @@ delete_dead_store_insn (insn_info_t insn_info) insn_info-wild_read = false; } +/* Check if EXPR can possibly escape the current function scope. */ +static bool +can_escape (tree expr) +{ + tree base; + if (!expr) +return true; + base = get_base_address (expr); + if (DECL_P (base) + !may_be_aliased (base)) +return false; + return true; +} /* Set the store* bitmaps offset_map_size* fields in GROUP based on OFFSET and WIDTH. */ static void -set_usage_bits (group_info_t group, HOST_WIDE_INT offset, HOST_WIDE_INT width) +set_usage_bits (group_info_t group, HOST_WIDE_INT offset, HOST_WIDE_INT width, +tree expr) { HOST_WIDE_INT i; - + bool expr_escapes = can_escape (expr); if (offset -MAX_OFFSET offset + width MAX_OFFSET) for (i=offset; ioffset+width; i++) { bitmap store1; bitmap store2; +bitmap escaped; int ai; if (i 0) { store1 = group-store1_n; store2 = group-store2_n; + escaped = group-escaped_n; ai = -i; } else { store1
Re: Improve DSE in the presence of calls
On Fri, Apr 22, 2011 at 1:26 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Apr 22, 2011 at 01:19:17PM -0700, Easwaran Raman wrote: The ChangeLog entry has various issues: 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 This should have tab before PR as well. * dse.c (header files): Include tree-flow.h. This should be just * dse.c: Include tree-flow.h. (group_info): Add fields. (struct group_info): Add escaped_n and escaped_p fields. (it is always better to be be explicit on what you added). (globals): Add a new variable kill_on_calls. (kill_on_calls): New variable. (get_group_info): Initialize added fields. Instead of added fields again mention which fields were added. 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 Again, tab before PR. * gcc.dg/pr44194-1.c: New test. Jakub Thanks for the comments. The fixed ChangeLog given below. -Easwaran 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * dse.c: Include tree-flow.h. (group_info): Add escaped_n and escaped_p fields. (kill_on_calls): New variable. (get_group_info): Initialize gi-escaped_n and gi-escaped_p. (dse_step0): Initialize kill_on_calls. (can_escape): New function. (record_store): Pass EXPR corresponding to MEM to set_usage_bits. (dse_step2_nospill): Set kill_on_calls based on group-escaped_n and group-escaped_n. (scan_reads_nospill): Handle call instructions. (find_insn_before_first_wild_read): Remove the function. (dse_step3_scan): Remove call to find_insn_before_first_wild_read. (dse_step5_nospill): Do not kill everything on call. testsuite/ChangeLog: 2011-04-22 Easwaran Raman era...@google.com PR rtl-optimization/44194 * gcc.dg/pr44194-1.c: New test.
Re: Improve stack layout heuristic.
On Thu, Apr 21, 2011 at 8:43 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Apr 21, 2011 at 5:22 PM, Michael Matz m...@suse.de wrote: Hi, On Wed, 20 Apr 2011, Easwaran Raman wrote: But you're right - not adding that conflict doesn't actually reduce the size of bit maps. Reverting back to what was there originally. Thanks, I have no more issues with the patch. You'll need to find someone who can formally approve it, though. Ok with a proper changelog entry. Richard. Ciao, Michael. Committed with the following Changelog entry: 2011-04-21 Easwaran Raman era...@google.com * gcc/cfgexpand.c (stack_var): Remove OFFSET... (add_stack_var): ...and its reference here... (expand_stack_vars): ...and here. (stack_var_cmp): Sort by descending order of size. (partition_stack_vars): Change heuristic. (union_stack_vars): Fix to reflect changes in partition_stack_vars. (dump_stack_var_partition): Add newline after each partition. testsuite/Changelog: 2011-04-21 Easwaran Raman era...@google.com * gcc.dg/stack-layout-2.c: New test. Thanks, Easwaran