Re: [PATCH] Improve detection of constant conditions during jump threading

2016-05-03 Thread Richard Biener
On Mon, May 2, 2016 at 9:25 PM, H.J. Lu  wrote:
> On Sat, Apr 30, 2016 at 2:28 PM, Patrick Palka  wrote:
>> `On Fri, Apr 29, 2016 at 3:15 PM, Jeff Law  wrote:
>>> On 04/19/2016 11:50 AM, Patrick Palka wrote:
>>>
 1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
 in that we no longer perform FSM threading during vrp2 but instead we
 detect two new jump threading opportunities during vrp1.  Not sure if
 the new code is better but it is shorter.  I wonder how this should be
 resolved...
>>>
>>> Definitely not a regression.  As you note we thread two jumps earlier
>>> utilizing your new code.  With the old code we're dependent upon other
>>> simplifications occurring which eventually exposes the FSM threads that the
>>> test is checking for in vrp2.
>>>
>>> I think we just want to remove the test for FSM jump threads in VRP2. We get
>>> coverage for your test via ssa-thread-14.  That should just leave the
>>> verification that we do not have irreducible loops at the end of VRP2 in
>>> ssa-thread-11.  I'll make that change.
>>>
>>> I do see what appears to be a missed jump thread, but that's not affected
>>> positively or negatively by your change.  There's a reasonable chance it's
>>> only exposed by normal thread jumps in VRP2 and since we don't iterate, it's
>>> left in the IL.  I haven't analyzed it in detail, but my hope is that when I
>>> pull the backwards threading out into its own pass that we'll start picking
>>> up more of these secondary effects.
>>
>> Interesting info.  I also spotted another minor optimization
>> opportunity within this test case, although more related to vrp than
>> to jump threading.  It would require iterating over the use statements
>> of a subexpression within a conditional, and it turns out that VRP
>> already does this so it's only a matter of adding another case to test
>> for during each iteration.  I'll post a patch when it's ready.
>>
>>>
>>>

 gcc/ChangeLog:

 * tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
 out into ...
 (simplify_control_stmt_condition_1): ... here.  Recurse into
 BIT_AND_EXPRs and BIT_IOR_EXPRs.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-thread-14.c: New test.
 ---
>>>
>>> I fixed a few formatting nits (too long lines).
>>>
>
> This one fails on Linux/x86-64 and Linux/x86.   This patch fixes it.  OK
> for trunk?

Ok.

Richard.

>
> --
> H.J.
> ---
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> index db9ed3b..e2ac2f7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile }  */
> -/* { dg-additional-options "-O2 -fdump-tree-vrp" }  */
> +/* { dg-additional-options "-O2 -fdump-tree-vrp-details" }  */
>  /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
>
>  void foo (void);


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-05-02 Thread H.J. Lu
On Sat, Apr 30, 2016 at 2:28 PM, Patrick Palka  wrote:
> `On Fri, Apr 29, 2016 at 3:15 PM, Jeff Law  wrote:
>> On 04/19/2016 11:50 AM, Patrick Palka wrote:
>>
>>> 1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
>>> in that we no longer perform FSM threading during vrp2 but instead we
>>> detect two new jump threading opportunities during vrp1.  Not sure if
>>> the new code is better but it is shorter.  I wonder how this should be
>>> resolved...
>>
>> Definitely not a regression.  As you note we thread two jumps earlier
>> utilizing your new code.  With the old code we're dependent upon other
>> simplifications occurring which eventually exposes the FSM threads that the
>> test is checking for in vrp2.
>>
>> I think we just want to remove the test for FSM jump threads in VRP2. We get
>> coverage for your test via ssa-thread-14.  That should just leave the
>> verification that we do not have irreducible loops at the end of VRP2 in
>> ssa-thread-11.  I'll make that change.
>>
>> I do see what appears to be a missed jump thread, but that's not affected
>> positively or negatively by your change.  There's a reasonable chance it's
>> only exposed by normal thread jumps in VRP2 and since we don't iterate, it's
>> left in the IL.  I haven't analyzed it in detail, but my hope is that when I
>> pull the backwards threading out into its own pass that we'll start picking
>> up more of these secondary effects.
>
> Interesting info.  I also spotted another minor optimization
> opportunity within this test case, although more related to vrp than
> to jump threading.  It would require iterating over the use statements
> of a subexpression within a conditional, and it turns out that VRP
> already does this so it's only a matter of adding another case to test
> for during each iteration.  I'll post a patch when it's ready.
>
>>
>>
>>>
>>> gcc/ChangeLog:
>>>
>>> * tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
>>> out into ...
>>> (simplify_control_stmt_condition_1): ... here.  Recurse into
>>> BIT_AND_EXPRs and BIT_IOR_EXPRs.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.dg/tree-ssa/ssa-thread-14.c: New test.
>>> ---
>>
>> I fixed a few formatting nits (too long lines).
>>

This one fails on Linux/x86-64 and Linux/x86.   This patch fixes it.  OK
for trunk?


-- 
H.J.
---
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
index db9ed3b..e2ac2f7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile }  */
-/* { dg-additional-options "-O2 -fdump-tree-vrp" }  */
+/* { dg-additional-options "-O2 -fdump-tree-vrp-details" }  */
 /* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */

 void foo (void);


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-30 Thread Patrick Palka
`On Fri, Apr 29, 2016 at 3:15 PM, Jeff Law  wrote:
> On 04/19/2016 11:50 AM, Patrick Palka wrote:
>
>> 1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
>> in that we no longer perform FSM threading during vrp2 but instead we
>> detect two new jump threading opportunities during vrp1.  Not sure if
>> the new code is better but it is shorter.  I wonder how this should be
>> resolved...
>
> Definitely not a regression.  As you note we thread two jumps earlier
> utilizing your new code.  With the old code we're dependent upon other
> simplifications occurring which eventually exposes the FSM threads that the
> test is checking for in vrp2.
>
> I think we just want to remove the test for FSM jump threads in VRP2. We get
> coverage for your test via ssa-thread-14.  That should just leave the
> verification that we do not have irreducible loops at the end of VRP2 in
> ssa-thread-11.  I'll make that change.
>
> I do see what appears to be a missed jump thread, but that's not affected
> positively or negatively by your change.  There's a reasonable chance it's
> only exposed by normal thread jumps in VRP2 and since we don't iterate, it's
> left in the IL.  I haven't analyzed it in detail, but my hope is that when I
> pull the backwards threading out into its own pass that we'll start picking
> up more of these secondary effects.

Interesting info.  I also spotted another minor optimization
opportunity within this test case, although more related to vrp than
to jump threading.  It would require iterating over the use statements
of a subexpression within a conditional, and it turns out that VRP
already does this so it's only a matter of adding another case to test
for during each iteration.  I'll post a patch when it's ready.

>
>
>>
>> gcc/ChangeLog:
>>
>> * tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
>> out into ...
>> (simplify_control_stmt_condition_1): ... here.  Recurse into
>> BIT_AND_EXPRs and BIT_IOR_EXPRs.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-thread-14.c: New test.
>> ---
>
> I fixed a few formatting nits (too long lines).
>
>
>> +
>> + if (res1 != NULL_TREE && res2 != NULL_TREE)
>> +   {
>> + if (rhs_code == BIT_AND_EXPR
>> + && TYPE_PRECISION (TREE_TYPE (op0)) == 1
>> + && integer_nonzerop (res1)
>> + && integer_nonzerop (res2))
>> +   {
>> + /* If A != 0 and B != 0 then (bool)(A | B) != 0 is true.
>> */
>> + if (cond_code == NE_EXPR)
>> +   return one_cst;
>> + /* If A != 0 and B != 0 then (bool)(A | B) == 0 is
>> false.  */
>> + if (cond_code == EQ_EXPR)
>> +   return zero_cst;
>
> I think you wanted (A & B) in the two immediately preceding comments, which
> I fixed.
>
>
> I suspect there's other stuff we could do in this space, but you've probably
> covered the most important cases.
>
>> +  /* Handle (A CMP B) CMP 0.  */
>> +  else if (TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
>> +  == tcc_comparison)
>> +   {
>> + tree rhs1 = gimple_assign_rhs1 (def_stmt);
>> + tree rhs2 = gimple_assign_rhs2 (def_stmt);
>> +
>> + tree_code new_cond = gimple_assign_rhs_code (def_stmt);
>> + if (cond_code == EQ_EXPR)
>> +   new_cond = invert_tree_comparison (new_cond, false);
>> +
>> + tree res
>> +   = simplify_control_stmt_condition_1 (e, def_stmt,
>> avail_exprs_stack,
>> +rhs1, new_cond, rhs2,
>> +dummy_cond, simplify,
>> +
>> handle_dominating_asserts,
>> +limit - 1);
>> + if (res != NULL_TREE && is_gimple_min_invariant (res))
>> +   return res;
>
> I was a bit confused by this case, but then realized that we already
> narrowed COND_CODE to EQ_EXPR/NE_EXPR.
>
> I made the minor edits noted above and committed the change.

Awesome, thanks for your help.

>
> Thanks,
> Jeff


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-29 Thread Jeff Law

On 04/19/2016 11:50 AM, Patrick Palka wrote:


1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
in that we no longer perform FSM threading during vrp2 but instead we
detect two new jump threading opportunities during vrp1.  Not sure if
the new code is better but it is shorter.  I wonder how this should be
resolved...
Definitely not a regression.  As you note we thread two jumps earlier 
utilizing your new code.  With the old code we're dependent upon other 
simplifications occurring which eventually exposes the FSM threads that 
the test is checking for in vrp2.


I think we just want to remove the test for FSM jump threads in VRP2. 
We get coverage for your test via ssa-thread-14.  That should just leave 
the verification that we do not have irreducible loops at the end of 
VRP2 in ssa-thread-11.  I'll make that change.


I do see what appears to be a missed jump thread, but that's not 
affected positively or negatively by your change.  There's a reasonable 
chance it's only exposed by normal thread jumps in VRP2 and since we 
don't iterate, it's left in the IL.  I haven't analyzed it in detail, 
but my hope is that when I pull the backwards threading out into its own 
pass that we'll start picking up more of these secondary effects.





gcc/ChangeLog:

* tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
out into ...
(simplify_control_stmt_condition_1): ... here.  Recurse into
BIT_AND_EXPRs and BIT_IOR_EXPRs.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-thread-14.c: New test.
---

I fixed a few formatting nits (too long lines).



+
+ if (res1 != NULL_TREE && res2 != NULL_TREE)
+   {
+ if (rhs_code == BIT_AND_EXPR
+ && TYPE_PRECISION (TREE_TYPE (op0)) == 1
+ && integer_nonzerop (res1)
+ && integer_nonzerop (res2))
+   {
+ /* If A != 0 and B != 0 then (bool)(A | B) != 0 is true.  */
+ if (cond_code == NE_EXPR)
+   return one_cst;
+ /* If A != 0 and B != 0 then (bool)(A | B) == 0 is false.  */
+ if (cond_code == EQ_EXPR)
+   return zero_cst;
I think you wanted (A & B) in the two immediately preceding comments, 
which I fixed.



I suspect there's other stuff we could do in this space, but you've 
probably covered the most important cases.



+  /* Handle (A CMP B) CMP 0.  */
+  else if (TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
+  == tcc_comparison)
+   {
+ tree rhs1 = gimple_assign_rhs1 (def_stmt);
+ tree rhs2 = gimple_assign_rhs2 (def_stmt);
+
+ tree_code new_cond = gimple_assign_rhs_code (def_stmt);
+ if (cond_code == EQ_EXPR)
+   new_cond = invert_tree_comparison (new_cond, false);
+
+ tree res
+   = simplify_control_stmt_condition_1 (e, def_stmt, avail_exprs_stack,
+rhs1, new_cond, rhs2,
+dummy_cond, simplify,
+handle_dominating_asserts,
+limit - 1);
+ if (res != NULL_TREE && is_gimple_min_invariant (res))
+   return res;
I was a bit confused by this case, but then realized that we already 
narrowed COND_CODE to EQ_EXPR/NE_EXPR.


I made the minor edits noted above and committed the change.

Thanks,
Jeff


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-29 Thread Patrick Palka
On Fri, Apr 29, 2016 at 11:00 AM, Jeff Law  wrote:
> On 04/28/2016 06:08 PM, Patrick Palka wrote:
>>>
>>>
>>>
>>> The glitch in that plan is there is no easy linkage between the use of
>>> b_5
>>> in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and
>>> myself
>>> are looking at independently for some of Aldy's work.
>>
>>
>> I see.. One other deficiency I noticed in the existing threading code
>> is that there may have been multiple ASSERT_EXPRs registered for b_5,
>> so bb3 could look like
>>
>> :
>> b_15 = ASSERT_EXPR ;
>> b_16 = ASSERT_EXPR ;
>> foo ();
>>
>> but currently we don't consider the 2nd ASSERT_EXPR because we only
>> look at the immediate uses of b_5.  This oversight makes us fail to
>> thread
>>
>> void bar (void);
>> void baz (void);
>>
>> void
>> foo (int a)
>> {
>>   if (a != 5 && a != 10)
>> bar ();
>>   if (a == 10)
>> baz ();
>> }
>
> Can you file this as a BZ please and add me to the cc list.  I'll probably
> add Andrew as well since this is great example of something we'd like to
> catch with his work.   Thanks.

Done, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70879

>
>
>>
>>>
>>> In this specific instance, there's a good chance your analysis is
>>> catching
>>> something earlier and allowing it to be better simplified.  But let's do
>>> the
>>> analysis to make sure.
>>
>>
>> From what I can tell, the patch does cause fewer conditionals to get
>> executed in general.  I spot two extra jumps that are threaded in the
>> final code compared to without the patch.  I wouldn't trust my
>> analysis though!
>
> I'll walk through it.

Cool, I'll be eagerly awaiting your analysis.

>
>>
>> By the way, the test case ssa-thread-11.c is somewhat buggy since its
>> two functions lack return statements.  Also I would expect abort() to
>> have the noreturn attribute.
>
> Those testcases are heavily reduced and ultimately are useful only to show
> cases where jump threading ought to happen -- there's all kinds of undefined
> behaviour in those tests.  Their only purpose is to set up a CFG and
> conditionals in which jump threading should be happening and wasn't at some
> point or another.

Oh, okay.

>
>>
>> That makes sense!  I will play around with this technique.
>
> Aside from the time, the biggest problem is ASLR and the ld.so hashing bits
> which cause slight variations from one run to the next.  Otherwise it is
> highly stable, results are independent of the runtime load on the machine
> and measure the primary effect I'm usually searching for.  All excellent
> properties :-)

I see.

>
> For your patch the reduction in runtime branches is tiny, on the order of
> 0.01%, but clearly still visible and well outside the typical noise.
>
> What is far more interesting is its overall effect on total instructions
> executed.  Typically for each runtime branch eliminated I see 2-4 total
> instruction fetches eliminated.  Which makes sense if you think about it --
> the branch usually has a comparison and perhaps some setup code which
> becomes dead as a result of threading the jump.

That makes sense.

>
> Your patch eliminates 8.5 instruction fetches per branch eliminated, which
> is very good, in fact, it's by far the highest ratio I can recall ever
> seeing.  So essentially while it doesn't fire often, when it fires it's a
> much bigger win than most jump threading cases.

Cool!  That makes sense to me because the changes in this patch, when
they fire, eliminate not just one but multiple (along with setup
code).  So a huge test like if (a && (b || c) && d && !e) could be
completely eliminated if we know that e.g. d == 0 along some incoming
edge.  Thanks for taking the time to measure this.

>
> Jeff
>


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-29 Thread Jeff Law

On 04/28/2016 06:08 PM, Patrick Palka wrote:



The glitch in that plan is there is no easy linkage between the use of b_5
in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and myself
are looking at independently for some of Aldy's work.


I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like

:
b_15 = ASSERT_EXPR ;
b_16 = ASSERT_EXPR ;
foo ();

but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5.  This oversight makes us fail to
thread

void bar (void);
void baz (void);

void
foo (int a)
{
  if (a != 5 && a != 10)
bar ();
  if (a == 10)
baz ();
}
Can you file this as a BZ please and add me to the cc list.  I'll 
probably add Andrew as well since this is great example of something 
we'd like to catch with his work.   Thanks.







In this specific instance, there's a good chance your analysis is catching
something earlier and allowing it to be better simplified.  But let's do the
analysis to make sure.


From what I can tell, the patch does cause fewer conditionals to get
executed in general.  I spot two extra jumps that are threaded in the
final code compared to without the patch.  I wouldn't trust my
analysis though!

I'll walk through it.



By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements.  Also I would expect abort() to
have the noreturn attribute.
Those testcases are heavily reduced and ultimately are useful only to 
show cases where jump threading ought to happen -- there's all kinds of 
undefined behaviour in those tests.  Their only purpose is to set up a 
CFG and conditionals in which jump threading should be happening and 
wasn't at some point or another.




That makes sense!  I will play around with this technique.
Aside from the time, the biggest problem is ASLR and the ld.so hashing 
bits which cause slight variations from one run to the next.  Otherwise 
it is highly stable, results are independent of the runtime load on the 
machine and measure the primary effect I'm usually searching for.  All 
excellent properties :-)


For your patch the reduction in runtime branches is tiny, on the order 
of 0.01%, but clearly still visible and well outside the typical noise.


What is far more interesting is its overall effect on total instructions 
executed.  Typically for each runtime branch eliminated I see 2-4 total 
instruction fetches eliminated.  Which makes sense if you think about it 
-- the branch usually has a comparison and perhaps some setup code which 
becomes dead as a result of threading the jump.


Your patch eliminates 8.5 instruction fetches per branch eliminated, 
which is very good, in fact, it's by far the highest ratio I can recall 
ever seeing.  So essentially while it doesn't fire often, when it fires 
it's a much bigger win than most jump threading cases.


Jeff



Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-28 Thread Patrick Palka
On Thu, Apr 28, 2016 at 12:52 PM, Jeff Law  wrote:
> On 04/20/2016 03:02 AM, Richard Biener wrote:
>>
>> On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka 
>> wrote:
>>>
>>> This patch makes the jump threader look through the BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs within a condition so that we could find dominating
>>> ASSERT_EXPRs that could help make the overall condition evaluate to a
>>> constant.  For example, we currently don't perform any jump threading in
>>> the following test case even though it's known that if the code calls
>>> foo() then it can't possibly call bar() afterwards:
>
> I'd always envisioned we'd do more simplifications than we're doing now and
> this fits well within how I expected to exploit ASSERT_EXPRs and DOM's
> available expressions/const/copies tables.
>
> However, I do have some long term direction plans that may make how we do
> this change a bit.  In the mean time I don't see a reason not to go forward
> with your change.
>
>
>
>
>>>
>>> void
>>> baz_1 (int a, int b, int c)
>>> {
>>>   if (a && b)
>>> foo ();
>>>   if (!b && c)
>>> bar ();
>>> }
>>>
>>>:
>>>_4 = a_3(D) != 0;
>>>_6 = b_5(D) != 0;
>>>_7 = _4 & _6;
>>>if (_7 != 0)
>>>  goto ;
>>>else
>>>  goto ;
>>>
>>>:
>>>b_15 = ASSERT_EXPR ;
>>>foo ();
>>>
>>>:
>>>_10 = b_5(D) == 0;
>>>_12 = c_11(D) != 0;
>>>_13 = _10 & _12;
>>>if (_13 != 0)
>>>  goto ;
>>>else
>>>  goto ;
>>>
>>>:
>>>bar ();
>>>
>>>:
>>>return;
>>>
>>> So we here miss a jump threading opportunity that would have made bb 3
>>> jump
>>> straight to bb 6 instead of falling through to bb 4.
>>>
>>> If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
>>> there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
>>> could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
>>> always false.  This is what this patch does, basically by making
>>> simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
>>> BIT_IOR_EXPRs.
>>>
>>> Does this seem like a good idea/approach?
>
> So the other approach I've been pondering for a while is backwards
> substitution.
>
> So given _13 != 0, we expand that to
>
> (_10 & _12) != 0
>
> Which further expands into
> ((b_5 == 0) & (c_11 != 0)) != 0
>
> And we follow b_5 back to the ASSERT_EXPR which allows us to start
> simplifying terms.
>
>
> The glitch in that plan is there is no easy linkage between the use of b_5
> in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew and myself
> are looking at independently for some of Aldy's work.

I see.. One other deficiency I noticed in the existing threading code
is that there may have been multiple ASSERT_EXPRs registered for b_5,
so bb3 could look like

:
b_15 = ASSERT_EXPR ;
b_16 = ASSERT_EXPR ;
foo ();

but currently we don't consider the 2nd ASSERT_EXPR because we only
look at the immediate uses of b_5.  This oversight makes us fail to
thread

void bar (void);
void baz (void);

void
foo (int a)
{
  if (a != 5 && a != 10)
bar ();
  if (a == 10)
baz ();
}

>
> But that's all future work...  Back to your patch...
>
>>>
>>> Notes:
>>>
>>> 1. This patch introduces a "regression" in
>>> gcc.dg/tree-ssa/ssa-thread-11.c
>>> in that we no longer perform FSM threading during vrp2 but instead we
>>> detect two new jump threading opportunities during vrp1.  Not sure if
>>> the new code is better but it is shorter.  I wonder how this should be
>>> resolved...
>>
>>
>> Try adjusting the testcase so that it performs the FSM threading again
>> or adjust the expected outcome...
>
> Right.  We just need to look closely at the before/after dumps, make a
> decision about whether the result is better or worse.  If it's better, then
> we adjust the output to the new better result (and I would claim that the
> same threading, but done earlier is better).
>
> Shorter isn't generally a good indicator of whether or not something is
> better.  The thing to look at is the number of conditional executed on the
> various paths through the CFG.
>
> In this specific instance, there's a good chance your analysis is catching
> something earlier and allowing it to be better simplified.  But let's do the
> analysis to make sure.

>From what I can tell, the patch does cause fewer conditionals to get
executed in general.  I spot two extra jumps that are threaded in the
final code compared to without the patch.  I wouldn't trust my
analysis though!

By the way, the test case ssa-thread-11.c is somewhat buggy since its
two functions lack return statements.  Also I would expect abort() to
have the noreturn attribute.

>
>>
>>> 2. I haven't tested the performance impact of this patch.  What would be
>>> a good way to do this?
>
> I have ways to do that.  Jump threading inherently is about reducing the
> 

Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-28 Thread Jeff Law

On 04/20/2016 03:02 AM, Richard Biener wrote:

On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka  wrote:

This patch makes the jump threader look through the BIT_AND_EXPRs and
BIT_IOR_EXPRs within a condition so that we could find dominating
ASSERT_EXPRs that could help make the overall condition evaluate to a
constant.  For example, we currently don't perform any jump threading in
the following test case even though it's known that if the code calls
foo() then it can't possibly call bar() afterwards:
I'd always envisioned we'd do more simplifications than we're doing now 
and this fits well within how I expected to exploit ASSERT_EXPRs and 
DOM's available expressions/const/copies tables.


However, I do have some long term direction plans that may make how we 
do this change a bit.  In the mean time I don't see a reason not to go 
forward with your change.






void
baz_1 (int a, int b, int c)
{
  if (a && b)
foo ();
  if (!b && c)
bar ();
}

   :
   _4 = a_3(D) != 0;
   _6 = b_5(D) != 0;
   _7 = _4 & _6;
   if (_7 != 0)
 goto ;
   else
 goto ;

   :
   b_15 = ASSERT_EXPR ;
   foo ();

   :
   _10 = b_5(D) == 0;
   _12 = c_11(D) != 0;
   _13 = _10 & _12;
   if (_13 != 0)
 goto ;
   else
 goto ;

   :
   bar ();

   :
   return;

So we here miss a jump threading opportunity that would have made bb 3 jump
straight to bb 6 instead of falling through to bb 4.

If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
always false.  This is what this patch does, basically by making
simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
BIT_IOR_EXPRs.

Does this seem like a good idea/approach?
So the other approach I've been pondering for a while is backwards 
substitution.


So given _13 != 0, we expand that to

(_10 & _12) != 0

Which further expands into
((b_5 == 0) & (c_11 != 0)) != 0

And we follow b_5 back to the ASSERT_EXPR which allows us to start 
simplifying terms.



The glitch in that plan is there is no easy linkage between the use of 
b_5 in bb4 and the ASSERT_EXPR in bb3.  That's something Aldy, Andrew 
and myself are looking at independently for some of Aldy's work.


But that's all future work...  Back to your patch...



Notes:

1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
in that we no longer perform FSM threading during vrp2 but instead we
detect two new jump threading opportunities during vrp1.  Not sure if
the new code is better but it is shorter.  I wonder how this should be
resolved...


Try adjusting the testcase so that it performs the FSM threading again
or adjust the expected outcome...
Right.  We just need to look closely at the before/after dumps, make a 
decision about whether the result is better or worse.  If it's better, 
then we adjust the output to the new better result (and I would claim 
that the same threading, but done earlier is better).


Shorter isn't generally a good indicator of whether or not something is 
better.  The thing to look at is the number of conditional executed on 
the various paths through the CFG.


In this specific instance, there's a good chance your analysis is 
catching something earlier and allowing it to be better simplified.  But 
let's do the analysis to make sure.





2. I haven't tested the performance impact of this patch.  What would be
a good way to do this?
I have ways to do that.  Jump threading inherently is about reducing the 
number of conditionals we have to evaluate at runtime.   Valgrind can 
give us that information.  So what I do is...


Build a control compiler.  Use that to build an older version of gcc 
(gcc-4.7.3 in particular) .  I then use the just-built gcc-4.7.3 to 
build a bunch of .i files under valgrind control.


I then repeat, but the first compiler has whatever patch I want to 
evaluate installed.


I can then compare the output from valgrind to see which has better 
branching behaviour.


It takes a few hours, but it's not terrible and has provided good data 
through the years.  I'll throw your patch into the tester and see what 
it spits out.


I'll also have a look at the details of the patch.

jeff


Re: [PATCH] Improve detection of constant conditions during jump threading

2016-04-20 Thread Richard Biener
On Tue, Apr 19, 2016 at 7:50 PM, Patrick Palka  wrote:
> This patch makes the jump threader look through the BIT_AND_EXPRs and
> BIT_IOR_EXPRs within a condition so that we could find dominating
> ASSERT_EXPRs that could help make the overall condition evaluate to a
> constant.  For example, we currently don't perform any jump threading in
> the following test case even though it's known that if the code calls
> foo() then it can't possibly call bar() afterwards:
>
> void
> baz_1 (int a, int b, int c)
> {
>   if (a && b)
> foo ();
>   if (!b && c)
> bar ();
> }
>
>:
>_4 = a_3(D) != 0;
>_6 = b_5(D) != 0;
>_7 = _4 & _6;
>if (_7 != 0)
>  goto ;
>else
>  goto ;
>
>:
>b_15 = ASSERT_EXPR ;
>foo ();
>
>:
>_10 = b_5(D) == 0;
>_12 = c_11(D) != 0;
>_13 = _10 & _12;
>if (_13 != 0)
>  goto ;
>else
>  goto ;
>
>:
>bar ();
>
>:
>return;
>
> So we here miss a jump threading opportunity that would have made bb 3 jump
> straight to bb 6 instead of falling through to bb 4.
>
> If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
> there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
> could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
> always false.  This is what this patch does, basically by making
> simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
> BIT_IOR_EXPRs.
>
> Does this seem like a good idea/approach?
>
> Notes:
>
> 1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
> in that we no longer perform FSM threading during vrp2 but instead we
> detect two new jump threading opportunities during vrp1.  Not sure if
> the new code is better but it is shorter.  I wonder how this should be
> resolved...

Try adjusting the testcase so that it performs the FSM threading again
or adjust the expected outcome...

> 2. I haven't tested the performance impact of this patch.  What would be
> a good way to do this?
>
> 3. According to my instrumentation, an older version of this change
> added 4000 new threaded jumps during bootstrap.

Looks reasonable to me.  I wonder if we want to limit the amount of
recursion done - we might get exponential explosion for, say

  _1 = a_1 < 0;
  _2 = b_3 < 0;
  _4 = c_5 < 0;
  _6 = _1 & _2;
  _7 = _6 & _4;
  _8 = _1 & _4;
  _9 = _7 & _6;
 ...

that is, if the conditions don't form a tree but a graph and thus we visit
some conditions multiple times (unnecessarily).  reassoc might
simplify this but first DOM/VRP runs before that.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
> out into ...
> (simplify_control_stmt_condition_1): ... here.  Recurse into
> BIT_AND_EXPRs and BIT_IOR_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-thread-14.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c |  81 +
>  gcc/tree-ssa-threadedge.c | 249 
> +-
>  2 files changed, 285 insertions(+), 45 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> new file mode 100644
> index 000..db9ed3b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
> @@ -0,0 +1,81 @@
> +/* { dg-do compile }  */
> +/* { dg-additional-options "-O2 -fdump-tree-vrp" }  */
> +/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
> +
> +void foo (void);
> +void bar (void);
> +void blah (void);
> +
> +/* One jump threaded here.  */
> +
> +void
> +baz_1 (int a, int b, int c)
> +{
> +  if (a && b)
> +foo ();
> +  if (!b && c)
> +bar ();
> +}
> +
> +/* One jump threaded here.  */
> +
> +void
> +baz_2 (int a, int b, int c)
> +{
> +  if (a && b)
> +foo ();
> +  if (b || c)
> +bar ();
> +}
> +
> +/* One jump threaded here.  */
> +
> +void
> +baz_3 (int a, int b, int c)
> +{
> +  if (a && b > 10)
> +foo ();
> +  if (b < 5 && c)
> +bar ();
> +}
> +
> +/* Two jumps threaded here.  */
> +
> +void
> +baz_4 (int a, int b, int c)
> +{
> +  if (a && b)
> +{
> +  foo ();
> +  if (c)
> +bar ();
> +}
> +  if (b && c)
> +blah ();
> +}
> +
> +/* Two jumps threaded here.  */
> +
> +void
> +baz_5 (int a, int b, int c)
> +{
> +  if (a && b)
> +{
> +  foo ();
> +  if (c)
> +bar ();
> +}
> +  if (!b || !c)
> +blah ();
> +}
> +
> +/* One jump threaded here.  */
> +
> +void
> +baz_6 (int a, int b, int c)
> +{
> +  if (a == 39 && b == 41)
> +foo ();
> +  if (c == 12 || b == 41)
> +bar ();
> +}
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index f60be38..a4e8a26 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -376,6 +376,12 @@ 

[PATCH] Improve detection of constant conditions during jump threading

2016-04-19 Thread Patrick Palka
This patch makes the jump threader look through the BIT_AND_EXPRs and
BIT_IOR_EXPRs within a condition so that we could find dominating
ASSERT_EXPRs that could help make the overall condition evaluate to a
constant.  For example, we currently don't perform any jump threading in
the following test case even though it's known that if the code calls
foo() then it can't possibly call bar() afterwards:

void
baz_1 (int a, int b, int c)
{
  if (a && b)
foo ();
  if (!b && c)
bar ();
}

   :
   _4 = a_3(D) != 0;
   _6 = b_5(D) != 0;
   _7 = _4 & _6;
   if (_7 != 0)
 goto ;
   else
 goto ;

   :
   b_15 = ASSERT_EXPR ;
   foo ();

   :
   _10 = b_5(D) == 0;
   _12 = c_11(D) != 0;
   _13 = _10 & _12;
   if (_13 != 0)
 goto ;
   else
 goto ;

   :
   bar ();

   :
   return;

So we here miss a jump threading opportunity that would have made bb 3 jump
straight to bb 6 instead of falling through to bb 4.

If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
always false.  This is what this patch does, basically by making
simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
BIT_IOR_EXPRs.

Does this seem like a good idea/approach?

Notes:

1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
in that we no longer perform FSM threading during vrp2 but instead we
detect two new jump threading opportunities during vrp1.  Not sure if
the new code is better but it is shorter.  I wonder how this should be
resolved...

2. I haven't tested the performance impact of this patch.  What would be
a good way to do this?

3. According to my instrumentation, an older version of this change
added 4000 new threaded jumps during bootstrap.

gcc/ChangeLog:

* tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
out into ...
(simplify_control_stmt_condition_1): ... here.  Recurse into
BIT_AND_EXPRs and BIT_IOR_EXPRs.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-thread-14.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c |  81 +
 gcc/tree-ssa-threadedge.c | 249 +-
 2 files changed, 285 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
new file mode 100644
index 000..db9ed3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -0,0 +1,81 @@
+/* { dg-do compile }  */
+/* { dg-additional-options "-O2 -fdump-tree-vrp" }  */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
+
+void foo (void);
+void bar (void);
+void blah (void);
+
+/* One jump threaded here.  */
+
+void
+baz_1 (int a, int b, int c)
+{
+  if (a && b)
+foo ();
+  if (!b && c)
+bar ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_2 (int a, int b, int c)
+{
+  if (a && b)
+foo ();
+  if (b || c)
+bar ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_3 (int a, int b, int c)
+{
+  if (a && b > 10)
+foo ();
+  if (b < 5 && c)
+bar ();
+}
+
+/* Two jumps threaded here.  */
+
+void
+baz_4 (int a, int b, int c)
+{
+  if (a && b)
+{
+  foo ();
+  if (c)
+bar ();
+}
+  if (b && c)
+blah ();
+}
+
+/* Two jumps threaded here.  */
+
+void
+baz_5 (int a, int b, int c)
+{
+  if (a && b)
+{
+  foo ();
+  if (c)
+bar ();
+}
+  if (!b || !c)
+blah ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_6 (int a, int b, int c)
+{
+  if (a == 39 && b == 41)
+foo ();
+  if (c == 12 || b == 41)
+bar ();
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index f60be38..a4e8a26 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -376,6 +376,12 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
   return stmt;
 }
 
+static tree simplify_control_stmt_condition_1 (edge, gimple *,
+  class avail_exprs_stack *,
+  tree, enum tree_code, tree,
+  gcond *, pfn_simplify, bool,
+  unsigned);
+
 /* Simplify the control statement at the end of the block E->dest.
 
To avoid allocating memory unnecessarily, a scratch GIMPLE_COND
@@ -436,52 +442,14 @@ simplify_control_stmt_condition (edge e,
}
}
 
-  if (handle_dominating_asserts)
-   {
- /* Now see if the operand was consumed by an ASSERT_EXPR
-which dominates E->src.  If so, we want to replace the
-operand with the LHS of the ASSERT_EXPR.  */
- if (TREE_CODE (op0) == SSA_NAME)
-   op0 = lhs_of_dominating_assert (op0, e->src,