Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 03/05/12 12:21, Richard Guenther wrote: On Wed, May 2, 2012 at 4:06 PM, Tom de Vries tom_devr...@mentor.com wrote: On 27/04/12 11:01, Richard Guenther wrote: SNIP I see you do not handle SNIP struct S { int i; }; struct S foo (void); struct S bar (void) { struct S s1, s2; if (...) s = foo (); else s = foo (); because the calls have a LHS that is not an SSA name. Indeed, the gvn patch handles this example conservatively, and tree-tail-merge fails to optimize this test-case: ... struct S { int i; }; extern struct S foo (void); extern int foo2 (void); struct S s; int bar (int c) { int r; if (c) { s = foo (); r = foo2 (); } else { s = foo (); r = foo2 (); } return r; } ... A todo. SNIP bootstrapped and reg-tested on x86_64 (ada inclusive). Is this patch ok, or is the todo required? No, you can followup with that. Richard, here is the follow-up patch, which adds value numbering of a call for which the lhs is not an SSA_NAME. The only thing I ended up using from the patch in http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using MODIFY_EXPR. I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1 because I don't think it will trigger with PRE. bootstrapped and reg-tested on x86_64. Ok for trunk? Hmm, I wonder why if (!gimple_call_internal_p (stmt) (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) /* If the call has side effects, subsequent calls won't have the same incoming vuse, so it's save to assume equality. */ || gimple_has_side_effects (stmt))) works - I realize you added the gimple_has_side_effects () call - but if you consider ECF_LOOPING_CONST_OR_PURE functions, which have no VDEF, then it's odd how the comment applies. And together both tests turn out to let all calls pass. Richard, You're right, this is not correct. The test for gimple_has_side_effect should be a test for gimple_vdef. A ECF_LOOPING_CONST_OR_PURE function will be rejected by the updated condition. I fixed this in the patch, and added comments describing both the const/pure clause, and the vdef clause. I also removed the comment 'We should handle stores from calls' since this patch implements that. + tree lhs = gimple_call_lhs (call); + + if (lhs TREE_CODE (lhs) != SSA_NAME) +{ + memset (temp, 0, sizeof (temp)); + temp.opcode = MODIFY_EXPR; + temp.type = TREE_TYPE (lhs); + temp.op0 = lhs; + temp.off = -1; + VEC_safe_push (vn_reference_op_s, heap, *result, temp); +} this deserves a comment Done. - you are adding the extra operand solely for the purpose of hashing. You are also not doing a good job identifying common calls. Consider if () *p = foo (); else *q = foo (); where p and q are value-numbered the same. You fail to properly commonize the blocks. That is because valueization of the ops of the call does not work for arbitrarily complex operands - see how we handle call operands. Instead you should probably use copy_reference_ops_from_ref on the lhs, similar to call operands. If p and q are value numbered, it means they're SSA_NAMEs, and that means they're not handled by this patch which is only about handling calls for which the lhs is not an SSA_NAME. This example is handled by the patch I posted for pr52009. I reposted the patch and added this test-case (http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00155.html). So I'm not using copy_reference_ops_from_ref on the lhs, since it's not an SSA_NAME. Using MODIFY_EXPR as toplevel code for the vn_reference is going to indeed disable PRE for them, likewise any other call handling in VN. Otherwise the idea looks ok - can you change the patch like above and add a testcase with an equal-VNed indirect store? I updated the patch as indicated in my comments, and added the test-case to the patch for pr52009. Bootstrapped and reg-tested on x86_64 (ada inclusive). OK for trunk? Thanks, - Tom 2012-07-05 Tom de Vries t...@codesourcery.com * tree-ssa-sccvn.c (copy_reference_ops_from_call) (visit_reference_op_call): Handle case that lhs is not an SSA_NAME. (visit_use): Also call visit_reference_op_call for calls with a vdef. * gcc.dg/pr51879-16.c: New test. * gcc.dg/pr51879-17.c: Same. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c (revision 189007) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -942,6 +942,20 @@ copy_reference_ops_from_call (gimple cal { vn_reference_op_s temp; unsigned i; + tree lhs = gimple_call_lhs (call); + + /* If 2 calls have a different non-ssa lhs, vdef value numbers should be + different. By adding the lhs here in the vector, we ensure that the +
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Wed, May 2, 2012 at 4:06 PM, Tom de Vries tom_devr...@mentor.com wrote: On 27/04/12 11:01, Richard Guenther wrote: SNIP I see you do not handle SNIP struct S { int i; }; struct S foo (void); struct S bar (void) { struct S s1, s2; if (...) s = foo (); else s = foo (); because the calls have a LHS that is not an SSA name. Indeed, the gvn patch handles this example conservatively, and tree-tail-merge fails to optimize this test-case: ... struct S { int i; }; extern struct S foo (void); extern int foo2 (void); struct S s; int bar (int c) { int r; if (c) { s = foo (); r = foo2 (); } else { s = foo (); r = foo2 (); } return r; } ... A todo. SNIP bootstrapped and reg-tested on x86_64 (ada inclusive). Is this patch ok, or is the todo required? No, you can followup with that. Richard, here is the follow-up patch, which adds value numbering of a call for which the lhs is not an SSA_NAME. The only thing I ended up using from the patch in http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using MODIFY_EXPR. I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1 because I don't think it will trigger with PRE. bootstrapped and reg-tested on x86_64. Ok for trunk? Hmm, I wonder why if (!gimple_call_internal_p (stmt) (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) /* If the call has side effects, subsequent calls won't have the same incoming vuse, so it's save to assume equality. */ || gimple_has_side_effects (stmt))) works - I realize you added the gimple_has_side_effects () call - but if you consider ECF_LOOPING_CONST_OR_PURE functions, which have no VDEF, then it's odd how the comment applies. And together both tests turn out to let all calls pass. + tree lhs = gimple_call_lhs (call); + + if (lhs TREE_CODE (lhs) != SSA_NAME) +{ + memset (temp, 0, sizeof (temp)); + temp.opcode = MODIFY_EXPR; + temp.type = TREE_TYPE (lhs); + temp.op0 = lhs; + temp.off = -1; + VEC_safe_push (vn_reference_op_s, heap, *result, temp); +} this deserves a comment - you are adding the extra operand solely for the purpose of hashing. You are also not doing a good job identifying common calls. Consider if () *p = foo (); else *q = foo (); where p and q are value-numbered the same. You fail to properly commonize the blocks. That is because valueization of the ops of the call does not work for arbitrarily complex operands - see how we handle call operands. Instead you should probably use copy_reference_ops_from_ref on the lhs, similar to call operands. Using MODIFY_EXPR as toplevel code for the vn_reference is going to indeed disable PRE for them, likewise any other call handling in VN. Otherwise the idea looks ok - can you change the patch like above and add a testcase with an equal-VNed indirect store? Thanks, Richard. Thanks, - Tom 2012-05-02 Tom de Vries t...@codesourcery.com * tree-ssa-sccvn.c (copy_reference_ops_from_call) (visit_reference_op_call): Handle case that lhs is not an SSA_NAME. (visit_use): Call visit_reference_op_call for calls with lhs that is not an SSA_NAME. * gcc.dg/pr51879-16.c: New test. * gcc.dg/pr51879-17.c: Same.
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 27/04/12 11:01, Richard Guenther wrote: SNIP I see you do not handle SNIP struct S { int i; }; struct S foo (void); struct S bar (void) { struct S s1, s2; if (...) s = foo (); else s = foo (); because the calls have a LHS that is not an SSA name. Indeed, the gvn patch handles this example conservatively, and tree-tail-merge fails to optimize this test-case: ... struct S { int i; }; extern struct S foo (void); extern int foo2 (void); struct S s; int bar (int c) { int r; if (c) { s = foo (); r = foo2 (); } else { s = foo (); r = foo2 (); } return r; } ... A todo. SNIP bootstrapped and reg-tested on x86_64 (ada inclusive). Is this patch ok, or is the todo required? No, you can followup with that. Richard, here is the follow-up patch, which adds value numbering of a call for which the lhs is not an SSA_NAME. The only thing I ended up using from the patch in http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01731.html was the idea of using MODIFY_EXPR. I don't include any handling of MODIFY_EXPR in create_component_ref_by_pieces_1 because I don't think it will trigger with PRE. bootstrapped and reg-tested on x86_64. Ok for trunk? Thanks, - Tom 2012-05-02 Tom de Vries t...@codesourcery.com * tree-ssa-sccvn.c (copy_reference_ops_from_call) (visit_reference_op_call): Handle case that lhs is not an SSA_NAME. (visit_use): Call visit_reference_op_call for calls with lhs that is not an SSA_NAME. * gcc.dg/pr51879-16.c: New test. * gcc.dg/pr51879-17.c: Same. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c (revision 186907) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -942,6 +947,17 @@ copy_reference_ops_from_call (gimple cal { vn_reference_op_s temp; unsigned i; + tree lhs = gimple_call_lhs (call); + + if (lhs TREE_CODE (lhs) != SSA_NAME) +{ + memset (temp, 0, sizeof (temp)); + temp.opcode = MODIFY_EXPR; + temp.type = TREE_TYPE (lhs); + temp.op0 = lhs; + temp.off = -1; + VEC_safe_push (vn_reference_op_s, heap, *result, temp); +} /* Copy the type, opcode, function being called and static chain. */ memset (temp, 0, sizeof (temp)); @@ -2624,6 +2641,9 @@ visit_reference_op_call (tree lhs, gimpl tree vuse = gimple_vuse (stmt); tree vdef = gimple_vdef (stmt); + if (lhs TREE_CODE (lhs) != SSA_NAME) +lhs = NULL_TREE; + vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE; vr1.operands = valueize_shared_reference_ops_from_call (stmt); vr1.type = gimple_expr_type (stmt); @@ -3410,9 +3432,7 @@ visit_use (tree use) /* If the call has side effects, subsequent calls won't have the same incoming vuse, so it's save to assume equality. */ - || gimple_has_side_effects (stmt)) - ((lhs TREE_CODE (lhs) == SSA_NAME) - || (!lhs gimple_vdef (stmt + || gimple_has_side_effects (stmt))) { changed = visit_reference_op_call (lhs, stmt); } Index: gcc/testsuite/gcc.dg/pr51879-16.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-16.c (revision 0) @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-pre } */ + +struct S { + int i; +}; + +extern struct S foo (void); +extern int foo2 (void); + +struct S s; + +int bar (int c) { + int r; + + if (c) +{ + s = foo (); + r = foo2 (); +} + else +{ + s = foo (); + r = foo2 (); +} + + return r; +} + +/* { dg-final { scan-tree-dump-times foo \\( 1 pre} } */ +/* { dg-final { scan-tree-dump-times foo2 \\( 1 pre} } */ +/* { dg-final { cleanup-tree-dump pre } } */ Index: gcc/testsuite/gcc.dg/pr51879-17.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-17.c (revision 0) @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-pre } */ + +struct S { + int i; +}; + +extern struct S foo (void); +extern int foo2 (void); + +struct S s, s2; + +int bar (int c) { + int r; + + if (c) +{ + s = foo (); + r = foo2 (); +} + else +{ + s2 = foo (); + r = foo2 (); +} + + return r; +} + +/* { dg-final { scan-tree-dump-times foo \\( 2 pre} } */ +/* { dg-final { scan-tree-dump-times foo2 \\( 2 pre} } */ +/* { dg-final { cleanup-tree-dump pre } } */
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 26/04/12 12:20, Richard Guenther wrote: On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries tom_devr...@mentor.com wrote: On 25/04/12 11:57, Richard Guenther wrote: SNIP Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) +{ + bla (5); + a = bar (7); +} + else +{ + bla (5); + a = bar (7); +} + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths I think it's hard to decide whether they have the same value or not, since we cannot write a test-case that compares the results of calls from 2 branches of an if statement. void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I added noinline to make sure there's no variation from inlining. ... extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ ((__noreturn__)); __attribute__((noinline)) void *foo () { return __builtin_return_address (0); } __attribute__((noinline)) void *bar (int b) { if (b) return foo (); else return foo (); } int main() { void *a, *b; a = bar (1); b = bar (0); if (a == b) abort (); return 0; } ... This test-case passes with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping ... and fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping ... with the proposed patch, it also fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping ... Tree-tail-merge performs the same transformation as cross-jumping for this test-case. I think that the transformation is valid, and that the test-case behaves as expected. Subsequent calls to foo may or may not return the same value, depending on the optimizations, we can't assert a != b, that's just the nature of __builtin_return_address. Btw, this example is not what I meant when I said 'a test-case that compares the results of calls from 2 branches of an if statement'. I tried to say that the semantics of C is defined in terms of taking either 1 branch or the other, rather than executing both in parallel, after which both results would be available. From that point of view, this example compares subsequent calls to foo, not parallel. Ah, I see. Btw, I was not suggesting that we should not optimize the above. I started looking at the problem in terms of subsequent calls. Consider the imaginary attribute nosideeffect, similar to const and pure. const: - no effects except the return value - the return value depends only on the parameters pure: - no effects except the return value - the return value depends only on the parameters and/or global variables nosideeffect: - no effects except the return value The code fragment: ... extern int f (int) __attribute ((nosideeffect)); int g() { int a; int b; a = f (5); b = f (5); return a + b; } ... would result in a gimple sequence more or less like this: ... # VUSE .MEMD.1712_4(D) aD.1707_1 = fD.1704 (5); # VUSE .MEMD.1712_4(D) bD.1708_2 = fD.1704 (5); D.1710_3 = aD.1707_1 + bD.1708_2; # VUSE .MEMD.1712_6 return D.1710_3; ... The value numbering modification I'm proposing would say that a and b have the same value, which is not true. I updated the patch to ensure in this case that a and b would be seen as different. Note that things won't break if the attribute is missing on a function, since that just means that the function would have a vdef, and there would be no 2 subsequent function calls with the same vuse. - but that is not what matters - because obviously we can still merge the blocks. Right. That is, we cannot be sure that the side-effects on memory of bla (5) and bla (5) are the same. But is that not what your value-numbering changes will assert? In the case of calls in different branches of an control flow statement, we can aggressively assume they're the same, since we cannot write a check that would start failing. Apart from the above ... which shows that even the returned values can matter. In the case of subsequent calls with side effects, the vuses will never be the same. In the case of subsequent calls without side effects, but not pure or const, we can't assume they have the same result. AFAIK, there's currently no way to specify such a function, but the updated patch should be able handle this. (not sure how you achieve this, but it looks like you insert/lookup general calls, thus not
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Fri, Apr 27, 2012 at 8:20 AM, Tom de Vries tom_devr...@mentor.com wrote: On 26/04/12 12:20, Richard Guenther wrote: On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries tom_devr...@mentor.com wrote: On 25/04/12 11:57, Richard Guenther wrote: SNIP Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) + { + bla (5); + a = bar (7); + } + else + { + bla (5); + a = bar (7); + } + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths I think it's hard to decide whether they have the same value or not, since we cannot write a test-case that compares the results of calls from 2 branches of an if statement. void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I added noinline to make sure there's no variation from inlining. ... extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ ((__noreturn__)); __attribute__((noinline)) void *foo () { return __builtin_return_address (0); } __attribute__((noinline)) void *bar (int b) { if (b) return foo (); else return foo (); } int main() { void *a, *b; a = bar (1); b = bar (0); if (a == b) abort (); return 0; } ... This test-case passes with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping ... and fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping ... with the proposed patch, it also fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping ... Tree-tail-merge performs the same transformation as cross-jumping for this test-case. I think that the transformation is valid, and that the test-case behaves as expected. Subsequent calls to foo may or may not return the same value, depending on the optimizations, we can't assert a != b, that's just the nature of __builtin_return_address. Btw, this example is not what I meant when I said 'a test-case that compares the results of calls from 2 branches of an if statement'. I tried to say that the semantics of C is defined in terms of taking either 1 branch or the other, rather than executing both in parallel, after which both results would be available. From that point of view, this example compares subsequent calls to foo, not parallel. Ah, I see. Btw, I was not suggesting that we should not optimize the above. I started looking at the problem in terms of subsequent calls. Consider the imaginary attribute nosideeffect, similar to const and pure. const: - no effects except the return value - the return value depends only on the parameters pure: - no effects except the return value - the return value depends only on the parameters and/or global variables nosideeffect: - no effects except the return value The code fragment: ... extern int f (int) __attribute ((nosideeffect)); int g() { int a; int b; a = f (5); b = f (5); return a + b; } ... would result in a gimple sequence more or less like this: ... # VUSE .MEMD.1712_4(D) aD.1707_1 = fD.1704 (5); # VUSE .MEMD.1712_4(D) bD.1708_2 = fD.1704 (5); D.1710_3 = aD.1707_1 + bD.1708_2; # VUSE .MEMD.1712_6 return D.1710_3; ... The value numbering modification I'm proposing would say that a and b have the same value, which is not true. I updated the patch to ensure in this case that a and b would be seen as different. Note that things won't break if the attribute is missing on a function, since that just means that the function would have a vdef, and there would be no 2 subsequent function calls with the same vuse. - but that is not what matters - because obviously we can still merge the blocks. Right. That is, we cannot be sure that the side-effects on memory of bla (5) and bla (5) are the same. But is that not what your value-numbering changes will assert? In the case of calls in different branches of an control flow statement, we can aggressively assume they're the same, since we cannot write a check that would start failing. Apart from the above ... which shows that even the returned values can matter. In the case of subsequent calls with side effects, the vuses will never be the same. In the case of subsequent calls without side effects, but not pure or const, we can't assume they have the same result. AFAIK, there's currently no way to specify such a function, but the updated patch should be able handle this. (not
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Wed, Apr 25, 2012 at 11:56 PM, Tom de Vries tom_devr...@mentor.com wrote: On 25/04/12 11:57, Richard Guenther wrote: SNIP Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) + { + bla (5); + a = bar (7); + } + else + { + bla (5); + a = bar (7); + } + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths I think it's hard to decide whether they have the same value or not, since we cannot write a test-case that compares the results of calls from 2 branches of an if statement. void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I added noinline to make sure there's no variation from inlining. ... extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ ((__noreturn__)); __attribute__((noinline)) void *foo () { return __builtin_return_address (0); } __attribute__((noinline)) void *bar (int b) { if (b) return foo (); else return foo (); } int main() { void *a, *b; a = bar (1); b = bar (0); if (a == b) abort (); return 0; } ... This test-case passes with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping ... and fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping ... with the proposed patch, it also fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping ... Tree-tail-merge performs the same transformation as cross-jumping for this test-case. I think that the transformation is valid, and that the test-case behaves as expected. Subsequent calls to foo may or may not return the same value, depending on the optimizations, we can't assert a != b, that's just the nature of __builtin_return_address. Btw, this example is not what I meant when I said 'a test-case that compares the results of calls from 2 branches of an if statement'. I tried to say that the semantics of C is defined in terms of taking either 1 branch or the other, rather than executing both in parallel, after which both results would be available. From that point of view, this example compares subsequent calls to foo, not parallel. Ah, I see. Btw, I was not suggesting that we should not optimize the above. I started looking at the problem in terms of subsequent calls. Consider the imaginary attribute nosideeffect, similar to const and pure. const: - no effects except the return value - the return value depends only on the parameters pure: - no effects except the return value - the return value depends only on the parameters and/or global variables nosideeffect: - no effects except the return value The code fragment: ... extern int f (int) __attribute ((nosideeffect)); int g() { int a; int b; a = f (5); b = f (5); return a + b; } ... would result in a gimple sequence more or less like this: ... # VUSE .MEMD.1712_4(D) aD.1707_1 = fD.1704 (5); # VUSE .MEMD.1712_4(D) bD.1708_2 = fD.1704 (5); D.1710_3 = aD.1707_1 + bD.1708_2; # VUSE .MEMD.1712_6 return D.1710_3; ... The value numbering modification I'm proposing would say that a and b have the same value, which is not true. I updated the patch to ensure in this case that a and b would be seen as different. Note that things won't break if the attribute is missing on a function, since that just means that the function would have a vdef, and there would be no 2 subsequent function calls with the same vuse. - but that is not what matters - because obviously we can still merge the blocks. Right. That is, we cannot be sure that the side-effects on memory of bla (5) and bla (5) are the same. But is that not what your value-numbering changes will assert? In the case of calls in different branches of an control flow statement, we can aggressively assume they're the same, since we cannot write a check that would start failing. Apart from the above ... which shows that even the returned values can matter. In the case of subsequent calls with side effects, the vuses will never be the same. In the case of subsequent calls without side effects, but not pure or const, we can't assume they have the same result. AFAIK, there's currently no way to specify such a function, but the updated patch should be able handle this. (not sure how you achieve this, but it
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries tom_devr...@mentor.com wrote: On 17/04/12 14:24, Richard Guenther wrote: On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote: On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 #
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote: void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I don't think we guarantee the above, after all, even pure functions may use __builtin_return_address (0) - it doesn't modify memory, and we happily remove pure calls, CSE the return values etc. Jakub
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 25/04/12 12:09, Jakub Jelinek wrote: On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote: void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I don't think we guarantee the above, after all, even pure functions may use __builtin_return_address (0) - it doesn't modify memory, and we happily remove pure calls, CSE the return values etc. Jakub, pure: - no effects except the return value - return value depends only on the parameters and/or global variables AFAIU, given this definition, a pure function cannot use __builtin_return_address since it would mean that the return value depends on something else than parameters and global variables. Thanks, - Tom Jakub
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 25/04/12 11:57, Richard Guenther wrote: SNIP Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) +{ + bla (5); + a = bar (7); +} + else +{ + bla (5); + a = bar (7); +} + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths I think it's hard to decide whether they have the same value or not, since we cannot write a test-case that compares the results of calls from 2 branches of an if statement. void *foo () { return __builtin_return_address (0); } void *bar (_Bool b) { if (b) return foo (); else return foo (); } int main() { if (bar(true) == bar(false)) abort (); } ok ... outside of the scope of standard C, but we certainly _can_ do this. Which would question tail-merging the above at all, of course. I added noinline to make sure there's no variation from inlining. ... extern void abort (void) __attribute__ ((__nothrow__)) __attribute__ ((__noreturn__)); __attribute__((noinline)) void *foo () { return __builtin_return_address (0); } __attribute__((noinline)) void *bar (int b) { if (b) return foo (); else return foo (); } int main() { void *a, *b; a = bar (1); b = bar (0); if (a == b) abort (); return 0; } ... This test-case passes with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fno-crossjumping ... and fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -fno-tree-tail-merge -fcrossjumping ... with the proposed patch, it also fails with: ... gcc -O2 calls.c -fno-optimize-sibling-calls -ftree-tail-merge -fno-crossjumping ... Tree-tail-merge performs the same transformation as cross-jumping for this test-case. I think that the transformation is valid, and that the test-case behaves as expected. Subsequent calls to foo may or may not return the same value, depending on the optimizations, we can't assert a != b, that's just the nature of __builtin_return_address. Btw, this example is not what I meant when I said 'a test-case that compares the results of calls from 2 branches of an if statement'. I tried to say that the semantics of C is defined in terms of taking either 1 branch or the other, rather than executing both in parallel, after which both results would be available. From that point of view, this example compares subsequent calls to foo, not parallel. I started looking at the problem in terms of subsequent calls. Consider the imaginary attribute nosideeffect, similar to const and pure. const: - no effects except the return value - the return value depends only on the parameters pure: - no effects except the return value - the return value depends only on the parameters and/or global variables nosideeffect: - no effects except the return value The code fragment: ... extern int f (int) __attribute ((nosideeffect)); int g() { int a; int b; a = f (5); b = f (5); return a + b; } ... would result in a gimple sequence more or less like this: ... # VUSE .MEMD.1712_4(D) aD.1707_1 = fD.1704 (5); # VUSE .MEMD.1712_4(D) bD.1708_2 = fD.1704 (5); D.1710_3 = aD.1707_1 + bD.1708_2; # VUSE .MEMD.1712_6 return D.1710_3; ... The value numbering modification I'm proposing would say that a and b have the same value, which is not true. I updated the patch to ensure in this case that a and b would be seen as different. Note that things won't break if the attribute is missing on a function, since that just means that the function would have a vdef, and there would be no 2 subsequent function calls with the same vuse. - but that is not what matters - because obviously we can still merge the blocks. Right. That is, we cannot be sure that the side-effects on memory of bla (5) and bla (5) are the same. But is that not what your value-numbering changes will assert? In the case of calls in different branches of an control flow statement, we can aggressively assume they're the same, since we cannot write a check that would start failing. Apart from the above ... which shows that even the returned values can matter. In the case of subsequent calls with side effects, the vuses will never be the same. In the case of subsequent calls without side effects, but not pure or const, we can't assume they have the same result. AFAIK, there's currently no way to specify such a function, but the updated patch should be able handle this. (not sure how you achieve this, but it looks like you insert/lookup general calls, thus not pure/const ones Correct. - that could easily lead to miscompiles(?)). And to increased compile-time (not sure if it matters). I
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 17/04/12 14:24, Richard Guenther wrote: On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote: On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1717_9 = VDEF .MEMD.1717_6(D) # USE =
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries tom_devr...@mentor.com wrote: On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1717_9 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5);
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 27/01/12 21:37, Tom de Vries wrote: On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use structural equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF .MEMD.1717_7 # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1717_9 = VDEF .MEMD.1717_6(D) # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_10 = VDEF .MEMD.1717_9 # USE = nonlocal # CLB = nonlocal
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On 24/01/12 11:40, Richard Guenther wrote: On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); because 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. Right. And that also doesn't work if the function is without lhs, such as in the new test-case pr51879-6.c. I fixed this by storing both lhs and vdef, such that I don't have to derive the vdef from the lhs. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) Doing so willl hurt performance of tail-merging in its current form. OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that value numbering as used in tail-merging has its limitations too. Do you have any ideas how to address that one? @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Removed now, that was a workaround for a bug in an earlier version of the patch, that I didn't need anymore. Bootstrapped and reg-tested on x86_64. OK for stage1? Thanks, - Tom Thanks, Richard. 2012-01-27 Tom de Vries t...@codesourcery.com PR tree-optimization/51879 * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field. * tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef. Handle case that lhs is NULL_TREE. (visit_use): Handle non-pure/const calls and calls without result using visit_reference_op_call. gcc.dg/pr51879.c: New test. gcc.dg/pr51879-2.c: Same. gcc.dg/pr51879-3.c: Same. gcc.dg/pr51879-4.c: Same. gcc.dg/pr51879-6.c: Same. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c (revision 183325) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -2589,27 +2589,44 @@ visit_reference_op_call (tree lhs, gimpl { bool changed = false; struct vn_reference_s vr1; - tree result; + vn_reference_t vnresult = NULL; tree vuse = gimple_vuse (stmt); + tree vdef = gimple_vdef (stmt); + + if (vdef) +VN_INFO (vdef)-use_processed = true; vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE; vr1.operands = valueize_shared_reference_ops_from_call (stmt); vr1.type = gimple_expr_type (stmt); vr1.set = 0; vr1.hashcode = vn_reference_compute_hash (vr1); - result = vn_reference_lookup_1 (vr1, NULL); - if (result) +
Re: [PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries tom_devr...@mentor.com wrote: Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? The following cannot really work: @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); becase 'result' may be not an SSA name. It might also not have a proper definition statement (if VN_INFO (result)-needs_insertion is true). So you at least need to guard things properly. (On a side-note - I _did_ want to remove value-numbering virtual operands at some point ...) @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); ... exactly because of the issue that a stmt has multiple defs. Btw, vuse should have been visited here or be part of our SCC, so, why do you need this check? Thanks, Richard. Thanks, - Tom 2012-01-23 Tom de Vries t...@codesourcery.com PR tree-optimization/51879 tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef. (visit_use): Handle non-pure/const calls using visit_reference_op_call. gcc.dg/pr51879.c: New test. gcc.dg/pr51879-2.c: Same. gcc.dg/pr51879-3.c: Same. gcc.dg/pr51879-4.c: Same.
[PATCH] Fix for PR51879 - Missed tail merging with non-const/pure calls
Richard, Jakub, the following patch fixes PR51879. Consider the following test-case: ... int bar (int); void baz (int); void foo (int y) { int a; if (y == 6) a = bar (7); else a = bar (7); baz (a); } ... after compiling at -02, the representation looks like this before tail-merging: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1714_7 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_3 = barD.1703 (7); goto bb 5; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1714_8 = VDEF .MEMD.1714_6(D) # USE = nonlocal # CLB = nonlocal aD.1709_4 = barD.1703 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:1 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1709_1 = PHI aD.1709_3(3), aD.1709_4(4) # .MEMD.1714_5 = PHI .MEMD.1714_7(3), .MEMD.1714_8(4) # .MEMD.1714_9 = VDEF .MEMD.1714_5 # USE = nonlocal # CLB = nonlocal bazD.1705 (aD.1709_1); # VUSE .MEMD.1714_9 return; ... the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. The patch makes sure non-const/pure call results (gimple_vdef and gimple_call_lhs) are properly value numbered. Bootstrapped and reg-tested on x86_64. ok for stage1? Thanks, - Tom 2012-01-23 Tom de Vries t...@codesourcery.com PR tree-optimization/51879 tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef. (visit_use): Handle non-pure/const calls using visit_reference_op_call. gcc.dg/pr51879.c: New test. gcc.dg/pr51879-2.c: Same. gcc.dg/pr51879-3.c: Same. gcc.dg/pr51879-4.c: Same. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c (revision 183325) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -2591,6 +2591,7 @@ visit_reference_op_call (tree lhs, gimpl struct vn_reference_s vr1; tree result; tree vuse = gimple_vuse (stmt); + tree vdef = gimple_vdef (stmt); vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE; vr1.operands = valueize_shared_reference_ops_from_call (stmt); @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl result = vn_reference_lookup_1 (vr1, NULL); if (result) { - changed = set_ssa_val_to (lhs, result); + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); + if (vdef) + changed |= set_ssa_val_to (vdef, result_vdef); + changed |= set_ssa_val_to (lhs, result); + if (TREE_CODE (result) == SSA_NAME VN_INFO (result)-has_constants) VN_INFO (lhs)-has_constants = true; @@ -2609,7 +2614,9 @@ visit_reference_op_call (tree lhs, gimpl { void **slot; vn_reference_t vr2; - changed = set_ssa_val_to (lhs, lhs); + if (vdef) + changed |= set_ssa_val_to (vdef, vdef); + changed |= set_ssa_val_to (lhs, lhs); vr2 = (vn_reference_t) pool_alloc (current_info-references_pool); vr2-vuse = vr1.vuse; vr2-operands = valueize_refs (create_reference_ops_from_call (stmt)); @@ -3359,8 +3366,10 @@ visit_use (tree use) /* ??? We should handle stores from calls. */ else if (TREE_CODE (lhs) == SSA_NAME) { + tree vuse = gimple_vuse (stmt); if (!gimple_call_internal_p (stmt) - gimple_call_flags (stmt) (ECF_PURE | ECF_CONST)) + (gimple_call_flags (stmt) (ECF_PURE | ECF_CONST) + || (vuse SSA_VAL (vuse) != VN_TOP))) changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); Index: gcc/testsuite/gcc.dg/pr51879-2.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-pre } */ + +int bar (int); +void baz (int); + +void +foo (int y) +{ + int a; + if (y) +baz (bar (7) + 6); + else +baz (bar (7) + 6); +} + +/* { dg-final { scan-tree-dump-times bar \\( 1 pre} } */ +/* { dg-final { scan-tree-dump-times baz \\( 1 pre} } */ +/* { dg-final { cleanup-tree-dump pre } } */ Index: gcc/testsuite/gcc.dg/pr51879.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-pre } */ + +int bar (int); +void baz (int); + +void +foo (int y) +{ + int a; + if (y == 6) +a = bar (7); + else +a = bar (7); + baz (a); +} + +/* { dg-final { scan-tree-dump-times bar \\( 1 pre} } */ +/* { dg-final { cleanup-tree-dump pre } } */ Index: gcc/testsuite/gcc.dg/pr51879-3.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-pre } */ + +int