Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-31 Thread Richard Biener
On Mon, Jan 30, 2017 at 6:19 PM, Aldy Hernandez  wrote:
> On 01/30/2017 10:03 AM, Richard Biener wrote:
>>
>> On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/26/2017 07:29 AM, Richard Biener wrote:


 On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez 
 wrote:
>
>
> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>>
>>>
>>>
 Your initial on-demand approach is fine to catch some of the cases but
 it
 will not handle those for which we'd need post-dominance.

 I guess we can incrementally add that.
>>>
>>>
>>>
>>> No complaints from me.
>>>
>>> This is my initial on-demand approach, with a few fixes you've commented
>>> on
>>> throughout.
>>>
>>> As you can see, there is still an #if 0 wrt to using your suggested
>>> conservative handling of memory loads, which I'm not entirely convinced
>>> of,
>>> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about
>>> it, I
>>> can enable the code again.
>>
>>
>> It is really required -- fortunately loop-unswitch-1.c is one of the cases
>> where
>> the post-dom / always-executed bits help .  The comparison is inside the
>> loop header and thus always executed when the loop enters, so inserrting
>> it
>> on the preheader edge is fine.
>
>
> Left as is then.
>
>>
>>> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
>>> unswitching something.  It seems kinda silly that we have various
>>> unswitch
>>> tests, but we don't actually check whether we have unswitched anything.
>>
>>
>> Heh.  It probably was added for an ICE...
>>
>>> This test was the only one in the *unswitch*.c set that I saw was
>>> actually
>>> being unswitched.  Of course, if you don't agree with my #if 0 above, I
>>> will
>>> remove this change to the test.
>>>
>>> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
>>> architectures?  If not, again, I can remove the one-liner.
>>
>>
>> I think so.
>
>
> Left as well.
>
>>
>>>
>>> How does this look for trunk?
>>
>>
>> With a unswitch-local solution I meant to not add new files but put the
>> defined_or_undefined class (well, or rather a single function...) into
>> tree-ssa-loop-unswitch.c.
>
>
> Done.
>
>>
>> @@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop
>> *loop)
>>  {
>>/* Unswitching on undefined values would introduce undefined
>>  behavior that the original program might never exercise.  */
>> -  if (ssa_undefined_value_p (use, true))
>> +  if (defined_or_undefined->is_maybe_undefined (use))
>> return NULL_TREE;
>>def = SSA_NAME_DEF_STMT (use);
>>def_bb = gimple_bb (def);
>>
>> as I said, moving this (now more expensive check) after
>>
>>   if (def_bb
>>   && flow_bb_inside_loop_p (loop, def_bb))
>> return NULL_TREE;
>>
>> this cheap check would be better.  It should avoid 99% of all calls I bet.
>
>
> Done.
>
>>
>> You can recover the loop-unswitch-1.c case by passing down
>> the using stmt and checking its BB against loop_header (the only
>> block that we trivially know is always executed when entering the region).
>> Or do that check in the caller, like
>>
>> if (bb != loop->header
>>&& ssa_undefined_value_p (use, true) /
>> defined_or_undefined->is_maybe_undefined (use))
>
>
> Done in callee.
>
>>
>> +  gimple *def = SSA_NAME_DEF_STMT (t);
>> +
>> +  /* Check that all the PHI args are fully defined.  */
>> +  if (gphi *phi = dyn_cast  (def))
>> +   {
>> + if (virtual_operand_p (PHI_RESULT (phi)))
>> +   continue;
>>
>> You should never run into virtual operands (you only walk SSA_OP_USEs).
>
>
> Done.
>
>>
>> You can stop walking at stmts that dominate the region header,
>> like with
>>
>> +  gimple *def = SSA_NAME_DEF_STMT (t);
>> /* Uses in stmts always executed when the region header
>> executes are fine.  */
>> if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
>>   continue;
>
>
> H... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in the
> attached patch to FAIL).  I haven't looked at it, but I seem to recall in
> the testcase that we could have a DEF that dominated the loop but was a mess
> of PHI's, some of whose args were undefined.
>
> Did you perhaps want to put that dominated_by_p call after the PHI arg
> checks (which seems to work)?:

Ah, yes - of course.  PHIs are not really "defs".

>   /* Check that all the PHI args are fully defined.  */
>   if (gphi *phi = dyn_cast  (def))
> ...
> ...
> ...
>
> +  /* Uses in stmts always executed when the region header executes
> +are fine.  */
> +  if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
> +   continue;
> +
>/* Handle calls and memory loads conservatively.  */
>if (!is_gimple_assign (def)
>   || (gimple_assign_single_p (def)
>
> Until this is clear, I've left this dominated_by_p call 

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-30 Thread Aldy Hernandez

On 01/30/2017 10:03 AM, Richard Biener wrote:

On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:

On 01/26/2017 07:29 AM, Richard Biener wrote:


On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:


On 01/24/2017 07:23 AM, Richard Biener wrote:




Your initial on-demand approach is fine to catch some of the cases but it
will not handle those for which we'd need post-dominance.

I guess we can incrementally add that.



No complaints from me.

This is my initial on-demand approach, with a few fixes you've commented on
throughout.

As you can see, there is still an #if 0 wrt to using your suggested
conservative handling of memory loads, which I'm not entirely convinced of,
as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about it, I
can enable the code again.


It is really required -- fortunately loop-unswitch-1.c is one of the cases where
the post-dom / always-executed bits help .  The comparison is inside the
loop header and thus always executed when the loop enters, so inserrting it
on the preheader edge is fine.


Left as is then.




Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
unswitching something.  It seems kinda silly that we have various unswitch
tests, but we don't actually check whether we have unswitched anything.


Heh.  It probably was added for an ICE...


This test was the only one in the *unswitch*.c set that I saw was actually
being unswitched.  Of course, if you don't agree with my #if 0 above, I will
remove this change to the test.

Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
architectures?  If not, again, I can remove the one-liner.


I think so.


Left as well.





How does this look for trunk?


With a unswitch-local solution I meant to not add new files but put the
defined_or_undefined class (well, or rather a single function...) into
tree-ssa-loop-unswitch.c.


Done.



@@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop)
 {
   /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
-  if (ssa_undefined_value_p (use, true))
+  if (defined_or_undefined->is_maybe_undefined (use))
return NULL_TREE;
   def = SSA_NAME_DEF_STMT (use);
   def_bb = gimple_bb (def);

as I said, moving this (now more expensive check) after

  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

this cheap check would be better.  It should avoid 99% of all calls I bet.


Done.



You can recover the loop-unswitch-1.c case by passing down
the using stmt and checking its BB against loop_header (the only
block that we trivially know is always executed when entering the region).
Or do that check in the caller, like

if (bb != loop->header
   && ssa_undefined_value_p (use, true) /
defined_or_undefined->is_maybe_undefined (use))


Done in callee.



+  gimple *def = SSA_NAME_DEF_STMT (t);
+
+  /* Check that all the PHI args are fully defined.  */
+  if (gphi *phi = dyn_cast  (def))
+   {
+ if (virtual_operand_p (PHI_RESULT (phi)))
+   continue;

You should never run into virtual operands (you only walk SSA_OP_USEs).


Done.



You can stop walking at stmts that dominate the region header,
like with

+  gimple *def = SSA_NAME_DEF_STMT (t);
/* Uses in stmts always executed when the region header
executes are fine.  */
if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
  continue;


H... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in 
the attached patch to FAIL).  I haven't looked at it, but I seem to 
recall in the testcase that we could have a DEF that dominated the loop 
but was a mess of PHI's, some of whose args were undefined.


Did you perhaps want to put that dominated_by_p call after the PHI arg 
checks (which seems to work)?:


  /* Check that all the PHI args are fully defined.  */
  if (gphi *phi = dyn_cast  (def))
...
...
...

+  /* Uses in stmts always executed when the region header executes
+are fine.  */
+  if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
+   continue;
+
   /* Handle calls and memory loads conservatively.  */
   if (!is_gimple_assign (def)
  || (gimple_assign_single_p (def)

Until this is clear, I've left this dominated_by_p call #if 0'ed out.



and the bail out for PARM_DECLs is wrong:

+  /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+get their initial value from function entry.  */
+  if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+   return false;

needs to be a continue; rather than a return false.


Done.

Preliminary test show the attached patch works.  Further tests on-going.

Aldy
gcc/

PR tree-optimization/71691
* bitmap.h (class auto_bitmap): New.
* tree-ssa-loop-unswit

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-30 Thread Richard Biener
On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:
> On 01/26/2017 07:29 AM, Richard Biener wrote:
>>
>> On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/24/2017 07:23 AM, Richard Biener wrote:
>
>
>> Your initial on-demand approach is fine to catch some of the cases but it
>> will not handle those for which we'd need post-dominance.
>>
>> I guess we can incrementally add that.
>
>
> No complaints from me.
>
> This is my initial on-demand approach, with a few fixes you've commented on
> throughout.
>
> As you can see, there is still an #if 0 wrt to using your suggested
> conservative handling of memory loads, which I'm not entirely convinced of,
> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about it, I
> can enable the code again.

It is really required -- fortunately loop-unswitch-1.c is one of the cases where
the post-dom / always-executed bits help .  The comparison is inside the
loop header and thus always executed when the loop enters, so inserrting it
on the preheader edge is fine.

> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
> unswitching something.  It seems kinda silly that we have various unswitch
> tests, but we don't actually check whether we have unswitched anything.

Heh.  It probably was added for an ICE...

> This test was the only one in the *unswitch*.c set that I saw was actually
> being unswitched.  Of course, if you don't agree with my #if 0 above, I will
> remove this change to the test.
>
> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
> architectures?  If not, again, I can remove the one-liner.

I think so.

>
> How does this look for trunk?

With a unswitch-local solution I meant to not add new files but put the
defined_or_undefined class (well, or rather a single function...) into
tree-ssa-loop-unswitch.c.

@@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop)
 {
   /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
-  if (ssa_undefined_value_p (use, true))
+  if (defined_or_undefined->is_maybe_undefined (use))
return NULL_TREE;
   def = SSA_NAME_DEF_STMT (use);
   def_bb = gimple_bb (def);

as I said, moving this (now more expensive check) after

  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

this cheap check would be better.  It should avoid 99% of all calls I bet.

You can recover the loop-unswitch-1.c case by passing down
the using stmt and checking its BB against loop_header (the only
block that we trivially know is always executed when entering the region).
Or do that check in the caller, like

if (bb != loop->header
   && ssa_undefined_value_p (use, true) /
defined_or_undefined->is_maybe_undefined (use))

+  gimple *def = SSA_NAME_DEF_STMT (t);
+
+  /* Check that all the PHI args are fully defined.  */
+  if (gphi *phi = dyn_cast  (def))
+   {
+ if (virtual_operand_p (PHI_RESULT (phi)))
+   continue;

You should never run into virtual operands (you only walk SSA_OP_USEs).

You can stop walking at stmts that dominate the region header,
like with

+  gimple *def = SSA_NAME_DEF_STMT (t);
/* Uses in stmts always executed when the region header
executes are fine.  */
if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
  continue;

and the bail out for PARM_DECLs is wrong:

+  /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+get their initial value from function entry.  */
+  if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+   return false;

needs to be a continue; rather than a return false.


Otherwise looks ok and sorry for the continuing delays in reviewing this...

Thanks,
Richard.

> Aldy
>


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-27 Thread Aldy Hernandez

On 01/26/2017 07:29 AM, Richard Biener wrote:

On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:

On 01/24/2017 07:23 AM, Richard Biener wrote:



Your initial on-demand approach is fine to catch some of the cases but it
will not handle those for which we'd need post-dominance.

I guess we can incrementally add that.


No complaints from me.

This is my initial on-demand approach, with a few fixes you've commented 
on throughout.


As you can see, there is still an #if 0 wrt to using your suggested 
conservative handling of memory loads, which I'm not entirely convinced 
of, as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly 
about it, I can enable the code again.


Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually 
unswitching something.  It seems kinda silly that we have various 
unswitch tests, but we don't actually check whether we have unswitched 
anything.  This test was the only one in the *unswitch*.c set that I saw 
was actually being unswitched.  Of course, if you don't agree with my 
#if 0 above, I will remove this change to the test.


Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across 
architectures?  If not, again, I can remove the one-liner.


How does this look for trunk?

Aldy

gcc/

PR tree-optimization/71691
* bitmap.h (class auto_bitmap): New.
* tree-ssa-defined-or-undefined.c: New file.
* tree-ssa-defined-or-undefined.h: New file.
* Makefile.in (tree-ssa-defined-or-undefined.o): New.
* tree-ssa-loop-unswitch.c (tree_may_unswitch_on): Add
defined_or_undefined parameter.  Use defined_or_undefined instead
of ssa_undefined_value_p.
(tree_unswitch_single_loop): Instantiate defined_or_undefined
variable.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 821584a..e0de276 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
tree-ssa-coalesce.o \
tree-ssa-copy.o \
tree-ssa-dce.o \
+   tree-ssa-defined-or-undefined.o \
tree-ssa-dom.o \
tree-ssa-dse.o \
tree-ssa-forwprop.o \
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 196738b..f158b44 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -802,4 +802,25 @@ bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
bmp_iter_and_compl (&(ITER), &(BITNUM));
\
bmp_iter_next (&(ITER), &(BITNUM)))
 
+/* A class that ties the lifetime of a bitmap to its scope.  */
+class auto_bitmap
+{
+ public:
+  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
+  ~auto_bitmap () { BITMAP_FREE (bits); }
+  // Allow calling bitmap functions on our bitmap.
+  operator bitmap () { return bits; }
+
+ private:
+  // Prevent making a copy that references our bitmap.
+  auto_bitmap (const auto_bitmap &);
+  auto_bitmap &operator = (const auto_bitmap &);
+#if __cplusplus >= 201103L
+  auto_bitmap (auto_bitmap &&);
+  auto_bitmap &operator = (auto_bitmap &&);
+#endif
+
+  bitmap bits;
+};
+
 #endif /* GCC_BITMAP_H */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-1.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
index 930364c..f6fc41d 100644
--- a/gcc/testsuite/gcc.dg/loop-unswitch-1.c
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
@@ -1,6 +1,6 @@
 /* For PR rtl-optimization/27735  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details" } */
 
 void set_color(void);
 void xml_colorize_line(unsigned int *p, int state)
@@ -32,3 +32,5 @@ parse_tag: ;
 }
 }
 
+/* Test that we actually unswitched something.  */
+/* { dg-final { scan-tree-dump ";; Unswitching loop" "unswitch" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-5.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-5.c
new file mode 100644
index 000..b41e853
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-5.c
@@ -0,0 +1,51 @@
+/* PR middle-end/71691 */
+/* { dg-do run } */
+/* { dg-options "-fno-tree-vrp -O2 -funswitch-loops 
-fdump-tree-unswitch-details" } */
+
+/* Note: The -fno-tree-vrp above is only there to avoid VRP papering
+   over the problem.  */
+
+char b;
+short f;
+unsigned e;
+int g = 20;
+
+void
+foo ()
+{
+  int l, h;
+  for (l = 0; l <= 7; l++)
+{
+  int j = 38;
+  if (g)
+   h = 0;
+  for (; h <= 7; h++)
+   {
+ int i, k = b % (j % 4);
+ g = f;
+ for (;;)
+   {
+ j = 6 || b;
+ if (e)
+   {
+ for (; j; --j)
+   if (k)
+ __builtin_printf ("%d", 9);
+ if (i)
+   __builtin_printf ("%d", j);
+   }
+ if (l)
+   continue;
+ break;
+   }
+ i = f || b;
+   }
+}
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-defined-or-undefined.c 
b/gcc/tree-ssa-defined-or-undefined.c
new file mode 10064

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-26 Thread Richard Biener
On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:
> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>
>> On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/18/2017 10:10 AM, Richard Biener wrote:


 On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez 
 wrote:
>>>
>>>
>>>
>>> Hi Richard.
>>>
>>> I'd be happy to provide a patch, but could you please elaborate, as I'm
>>> afraid I'm not following.
>>>
> We could go back to my original, no caching version (with the other
> suggestions).  That solves the problem quite simply, with no
> regressions.



 So let's go with a unswitching-local solution then.  Based on
 tree_may_unswitch_on:
>>>
>>>
>>>
>>> What do you mean by unswitching-local?
>>
>>
>> Like your original patch, not adding new generic infrastructure
>> outside of tree-ssa-unswitch.c.
>>

   /* Condition must be invariant.  */
   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
 {
   /* Unswitching on undefined values would introduce undefined
  behavior that the original program might never exercise.  */
   if (ssa_undefined_value_p (use, true))
 return NULL_TREE;
   def = SSA_NAME_DEF_STMT (use);
   def_bb = gimple_bb (def);
   if (def_bb
   && flow_bb_inside_loop_p (loop, def_bb))
 return NULL_TREE;

 we only have to look for uses in blocks dominating the loop header block
 (or in blocks post-dominating that loop header, but we can probably
 implement
 that by simply including the loop header itself with a FIXME comment).
>>>
>>>
>>>
>>> Look for *uses*??  Do you mean definitions?  I mean, we're trying to
>>> figure
>>> out whether we are going to unswitch on a use that is inside a the loop,
>>> not
>>> before or after. So perhaps we only care about *definitions*
>>> (SSA_NAME_DEF_STMT) dominating the loop header.
>>
>>
>> We're looking for stmts using the 'use's in the above loop on a path that
>> is
>> always executed when the loop is entered.  So we're not introducing a
>> use of a possibly undefined value.
>>
>> Of course we can also prove that 'use' is in fact not undefined looking at
>> its defs which are obviously always dominating the loop header if the
>> condition satisfies tree_may_unswitch_on (non-dominating defs will
>> have uses in PHIs dominating  the header which we have to treat
>> conservatively of course).
>>
>>>

 Note that we only need to know whether a BB will be always executed when
 the loop is entered -- there's "infrastructure" elsewhere that computes
 this
 w/o the need of post-dominance.  For example see
 fill_always_executed_in_1
 tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
 ->aux
 via the original copy tables, but we could simplify it as we're only
 interested in
 the loop which preheader we put the unswitch condition on so we can use
 a bitmap to record whether a block of the loop is always executed or
 not).
>>>
>>>
>>>
>>> I don't see any use of ->aux within loop unswitching, so perhaps no
>>> adjustment is needed?  I verified this by successfully bootstrapping
>>> with:
>>>
>>> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
>>> index 92599fb..774d6bf 100644
>>> --- a/gcc/tree-ssa-loop-unswitch.c
>>> +++ b/gcc/tree-ssa-loop-unswitch.c
>>> @@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
>>>struct loop *loop;
>>>bool changed = false;
>>>
>>> +  basic_block bb;
>>> +  FOR_ALL_BB_FN (bb, cfun)
>>> +if (bb->aux)
>>> +  {
>>> +   gcc_unreachable ();
>>> +  }
>>>
>>> Furthermore, you say that we have this "infrastructure" without the need
>>> for
>>> post-dominance.  But we still need dominance info.  The function
>>> fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function,
>>> and
>>> throughout its dependencies.  I thought the point of pre-calculating
>>> dominance info (or post-dominance info) originally was because we
>>> couldn't
>>> depend on dominance info being kept up to date between executions of
>>> tree_unswitch_single_loop(), which BTW, runs recursively.
>>
>>
>> dominators are kept up-to-date within cfg manipulation routines and
>> unswitching
>> already uses them.
>>
>> So if you just try to prove 'use' is defined you don't even need
>> dominators.  But
>> that misses cases like
>>
>> int x;
>> foo ()
>> {
>>   for (;;)
>> {
>>if (x == 5)
>>  ...;
>> }
>> }
>>
>> where unswitching is valid because x is always used when the loop is
>> entered.
>> Similar
>>
>> int x, a[10];
>> foo (int c)
>> {
>>   foo (x);
>>   for (i=0;i> {
>>if (a[c])
>>  break;
>>if (x == 5)
>>  ...;
>> }
>> }
>>
>> even though the condition is not always executed if the loop is entered.
>
>
> I don't think fill_always_execute_in_1 gives us what (I think) you want.
> For the attac

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-26 Thread Aldy Hernandez

On 01/24/2017 07:23 AM, Richard Biener wrote:

On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez  wrote:

On 01/18/2017 10:10 AM, Richard Biener wrote:


On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez  wrote:



Hi Richard.

I'd be happy to provide a patch, but could you please elaborate, as I'm
afraid I'm not following.


We could go back to my original, no caching version (with the other
suggestions).  That solves the problem quite simply, with no regressions.



So let's go with a unswitching-local solution then.  Based on
tree_may_unswitch_on:



What do you mean by unswitching-local?


Like your original patch, not adding new generic infrastructure
outside of tree-ssa-unswitch.c.



  /* Condition must be invariant.  */
  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
{
  /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
  if (ssa_undefined_value_p (use, true))
return NULL_TREE;
  def = SSA_NAME_DEF_STMT (use);
  def_bb = gimple_bb (def);
  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

we only have to look for uses in blocks dominating the loop header block
(or in blocks post-dominating that loop header, but we can probably
implement
that by simply including the loop header itself with a FIXME comment).



Look for *uses*??  Do you mean definitions?  I mean, we're trying to figure
out whether we are going to unswitch on a use that is inside a the loop, not
before or after. So perhaps we only care about *definitions*
(SSA_NAME_DEF_STMT) dominating the loop header.


We're looking for stmts using the 'use's in the above loop on a path that is
always executed when the loop is entered.  So we're not introducing a
use of a possibly undefined value.

Of course we can also prove that 'use' is in fact not undefined looking at
its defs which are obviously always dominating the loop header if the
condition satisfies tree_may_unswitch_on (non-dominating defs will
have uses in PHIs dominating  the header which we have to treat
conservatively of course).





Note that we only need to know whether a BB will be always executed when
the loop is entered -- there's "infrastructure" elsewhere that computes
this
w/o the need of post-dominance.  For example see fill_always_executed_in_1
tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
->aux
via the original copy tables, but we could simplify it as we're only
interested in
the loop which preheader we put the unswitch condition on so we can use
a bitmap to record whether a block of the loop is always executed or not).



I don't see any use of ->aux within loop unswitching, so perhaps no
adjustment is needed?  I verified this by successfully bootstrapping with:

diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 92599fb..774d6bf 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
   struct loop *loop;
   bool changed = false;

+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+if (bb->aux)
+  {
+   gcc_unreachable ();
+  }

Furthermore, you say that we have this "infrastructure" without the need for
post-dominance.  But we still need dominance info.  The function
fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, and
throughout its dependencies.  I thought the point of pre-calculating
dominance info (or post-dominance info) originally was because we couldn't
depend on dominance info being kept up to date between executions of
tree_unswitch_single_loop(), which BTW, runs recursively.


dominators are kept up-to-date within cfg manipulation routines and unswitching
already uses them.

So if you just try to prove 'use' is defined you don't even need
dominators.  But
that misses cases like

int x;
foo ()
{
  for (;;)
{
   if (x == 5)
 ...;
}
}

where unswitching is valid because x is always used when the loop is entered.
Similar

int x, a[10];
foo (int c)
{
  foo (x);
  for (i=0;i

I don't think fill_always_execute_in_1 gives us what (I think) you want. 
  For the attached graph, fill_always_execute_in_1 only marks the 
following basic blocks:


bb 4 in loop 2
bb 6 in loop 3
bb 9 in loop 1

Which are basically the loop headers.  Unless I'm misunderstanding 
something, this is useless for the problem at hand.


For example, for loop 3, I would've expected bb5 to be marked as always 
executed if we enter loop 3 since it is its immediate predecessor. 
Similarly for loop 2.  I would expect bb3 to be marked as always being 
executed if we enter loop 2.  And bb10 always enters loop 1.


Also, in other testcases, for loops where the flow loops back to the 
loop-header, we don't even get the loop header marked as always executed 
in the loop.


I still think my initial on-demand approach is straightforward, 
conservative, and solves the problem 

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-24 Thread Richard Biener
On Mon, Jan 23, 2017 at 6:26 PM, Aldy Hernandez  wrote:
> On 01/18/2017 10:10 AM, Richard Biener wrote:
>>
>> On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez  wrote:
>
>
> Hi Richard.
>
> I'd be happy to provide a patch, but could you please elaborate, as I'm
> afraid I'm not following.
>
>>> We could go back to my original, no caching version (with the other
>>> suggestions).  That solves the problem quite simply, with no regressions.
>>
>>
>> So let's go with a unswitching-local solution then.  Based on
>> tree_may_unswitch_on:
>
>
> What do you mean by unswitching-local?

Like your original patch, not adding new generic infrastructure
outside of tree-ssa-unswitch.c.

>>
>>   /* Condition must be invariant.  */
>>   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
>> {
>>   /* Unswitching on undefined values would introduce undefined
>>  behavior that the original program might never exercise.  */
>>   if (ssa_undefined_value_p (use, true))
>> return NULL_TREE;
>>   def = SSA_NAME_DEF_STMT (use);
>>   def_bb = gimple_bb (def);
>>   if (def_bb
>>   && flow_bb_inside_loop_p (loop, def_bb))
>> return NULL_TREE;
>>
>> we only have to look for uses in blocks dominating the loop header block
>> (or in blocks post-dominating that loop header, but we can probably
>> implement
>> that by simply including the loop header itself with a FIXME comment).
>
>
> Look for *uses*??  Do you mean definitions?  I mean, we're trying to figure
> out whether we are going to unswitch on a use that is inside a the loop, not
> before or after. So perhaps we only care about *definitions*
> (SSA_NAME_DEF_STMT) dominating the loop header.

We're looking for stmts using the 'use's in the above loop on a path that is
always executed when the loop is entered.  So we're not introducing a
use of a possibly undefined value.

Of course we can also prove that 'use' is in fact not undefined looking at
its defs which are obviously always dominating the loop header if the
condition satisfies tree_may_unswitch_on (non-dominating defs will
have uses in PHIs dominating  the header which we have to treat
conservatively of course).

>
>>
>> Note that we only need to know whether a BB will be always executed when
>> the loop is entered -- there's "infrastructure" elsewhere that computes
>> this
>> w/o the need of post-dominance.  For example see fill_always_executed_in_1
>> tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use
>> ->aux
>> via the original copy tables, but we could simplify it as we're only
>> interested in
>> the loop which preheader we put the unswitch condition on so we can use
>> a bitmap to record whether a block of the loop is always executed or not).
>
>
> I don't see any use of ->aux within loop unswitching, so perhaps no
> adjustment is needed?  I verified this by successfully bootstrapping with:
>
> diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
> index 92599fb..774d6bf 100644
> --- a/gcc/tree-ssa-loop-unswitch.c
> +++ b/gcc/tree-ssa-loop-unswitch.c
> @@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
>struct loop *loop;
>bool changed = false;
>
> +  basic_block bb;
> +  FOR_ALL_BB_FN (bb, cfun)
> +if (bb->aux)
> +  {
> +   gcc_unreachable ();
> +  }
>
> Furthermore, you say that we have this "infrastructure" without the need for
> post-dominance.  But we still need dominance info.  The function
> fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, and
> throughout its dependencies.  I thought the point of pre-calculating
> dominance info (or post-dominance info) originally was because we couldn't
> depend on dominance info being kept up to date between executions of
> tree_unswitch_single_loop(), which BTW, runs recursively.

dominators are kept up-to-date within cfg manipulation routines and unswitching
already uses them.

So if you just try to prove 'use' is defined you don't even need
dominators.  But
that misses cases like

int x;
foo ()
{
  for (;;)
{
   if (x == 5)
 ...;
}
}

where unswitching is valid because x is always used when the loop is entered.
Similar

int x, a[10];
foo (int c)
{
  foo (x);
  for (i=0;i> Can you send a patch that does this maybe?
>
>
> Sure, when I understand what you suggest ;-).
>
> Again, thanks for your input here.
> Aldy
>


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-23 Thread Aldy Hernandez

On 01/18/2017 10:10 AM, Richard Biener wrote:

On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez  wrote:


Hi Richard.

I'd be happy to provide a patch, but could you please elaborate, as I'm 
afraid I'm not following.



We could go back to my original, no caching version (with the other
suggestions).  That solves the problem quite simply, with no regressions.


So let's go with a unswitching-local solution then.  Based on
tree_may_unswitch_on:


What do you mean by unswitching-local?



  /* Condition must be invariant.  */
  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
{
  /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
  if (ssa_undefined_value_p (use, true))
return NULL_TREE;
  def = SSA_NAME_DEF_STMT (use);
  def_bb = gimple_bb (def);
  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

we only have to look for uses in blocks dominating the loop header block
(or in blocks post-dominating that loop header, but we can probably implement
that by simply including the loop header itself with a FIXME comment).


Look for *uses*??  Do you mean definitions?  I mean, we're trying to 
figure out whether we are going to unswitch on a use that is inside a 
the loop, not before or after. So perhaps we only care about 
*definitions* (SSA_NAME_DEF_STMT) dominating the loop header.




Note that we only need to know whether a BB will be always executed when
the loop is entered -- there's "infrastructure" elsewhere that computes this
w/o the need of post-dominance.  For example see fill_always_executed_in_1
tree-ssa-loop-im.c (we can't use that 1:1 I think because we already use ->aux
via the original copy tables, but we could simplify it as we're only
interested in
the loop which preheader we put the unswitch condition on so we can use
a bitmap to record whether a block of the loop is always executed or not).


I don't see any use of ->aux within loop unswitching, so perhaps no 
adjustment is needed?  I verified this by successfully bootstrapping with:


diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 92599fb..774d6bf 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -94,6 +94,14 @@ tree_ssa_unswitch_loops (void)
   struct loop *loop;
   bool changed = false;

+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+if (bb->aux)
+  {
+   gcc_unreachable ();
+  }

Furthermore, you say that we have this "infrastructure" without the need 
for post-dominance.  But we still need dominance info.  The function 
fill_always_execute_in_1 uses CDI_DOMINATORS both inside said function, 
and throughout its dependencies.  I thought the point of pre-calculating 
dominance info (or post-dominance info) originally was because we 
couldn't depend on dominance info being kept up to date between 
executions of tree_unswitch_single_loop(), which BTW, runs recursively.



Can you send a patch that does this maybe?


Sure, when I understand what you suggest ;-).

Again, thanks for your input here.
Aldy



Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-18 Thread Richard Biener
On Fri, Jan 13, 2017 at 7:48 PM, Aldy Hernandez  wrote:
> [Sorry for the delay, I was sick.]
>
>
> On 01/09/2017 04:30 AM, Richard Biener wrote:
>>
>> On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/04/2017 07:11 AM, Richard Biener wrote:


 On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:
>
>
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>
>>
>>
>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez 
>> wrote:
>>>
>>>
>>>
>>> Hi folks.
>>>
>>> This is a follow-up on Jeff and Richi's interaction on the
>>> aforementioned
>>> PR
>>> here:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>
>>> I decided to explore the idea of analyzing may-undefness on-demand,
>>> which
>>> actually looks rather cheap.
>>>
>>> BTW, I don't understand why we don't have auto_bitmap's, as we
>>> already
>>> have
>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
>>> code
>>> we
>>> already have.
>>>
>>> The attached patch fixes the bug without introducing any regressions.
>>>
>>> I also tested the patch by compiling 242 .ii files with -O3.  These
>>> were
>>> gathered from a stage1 build with -save-temps.  There is a slight
>>> time
>>> degradation of 4 seconds within 27 minutes of user time:
>>>
>>> tainted:26:52
>>> orig:   26:48
>>>
>>> This was the average aggregate time of two runs compiling all 242 .ii
>>> files.
>>> IMO, this looks reasonable.  It is after all, -O3.Is it
>>> acceptable?
>>
>>
>>
>>
>> +  while (!worklist.is_empty ())
>> +{
>> +  name = worklist.pop ();
>> +  gcc_assert (TREE_CODE (name) == SSA_NAME);
>> +
>> +  if (ssa_undefined_value_p (name, true))
>> +   return true;
>> +
>> +  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>> worklist",
>> so maybe renaming it would be a good thing as well.
>
>
>
>
> I don't understand what you would prefer here.



 Set the bit when you put name on the worklist (and do not do that if the
 bit is set).  Thus simply remove the above and add a bitmap_set_bit
 for the initial name you put on the worklist.

>>
>> + if (TREE_CODE (name) == SSA_NAME)
>> +   {
>> + /* If an SSA has already been seen, this may be a
>> loop.
>> +Fail conservatively.  */
>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>> (name)))
>> +   return false;
>>
>> so to me "conservative" is returning true, not false.
>
>
>
>
> OK
>
>>
>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>> (name));
>> + worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>> bitmap_set_bit
>> returns whether it set a bit, thus
>>
>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>> (name)))
>>   worklist.safe_push (name);
>>
>> should work?
>
>
>
>
> Fixed.
>
>>
>> +  /* Check that any SSA names used to define NAME is also fully
>> +defined.  */
>> +  use_operand_p use_p;
>> +  ssa_op_iter iter;
>> +  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>> +   {
>> + name = USE_FROM_PTR (use_p);
>> + if (TREE_CODE (name) == SSA_NAME)
>>
>> always true.
>>
>> +   {
>> + /* If an SSA has already been seen, this may be a loop.
>> +Fail conservatively.  */
>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +   return false;
>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> + worklist.safe_push (name);
>>
>> See above.
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>> bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>
>
>
>
> Done, though bitmaps are now calculated as part of the instantiation.
>
>>
>> Also once you hit defs that are in a post-dominated region of the loop
>> entry
>> you can treat them as not undefined (as their use invokes undefined
>> behavior).  This is also how you treat function parameters (well,
>> ssa_undefined_value_p does), where the call site invokes undefined
>> behavior
>> when passing i

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-13 Thread Aldy Hernandez

[Sorry for the delay, I was sick.]

On 01/09/2017 04:30 AM, Richard Biener wrote:

On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez  wrote:

On 01/04/2017 07:11 AM, Richard Biener wrote:


On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:


On 12/20/2016 09:16 AM, Richard Biener wrote:



On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez 
wrote:



Hi folks.

This is a follow-up on Jeff and Richi's interaction on the
aforementioned
PR
here:

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand,
which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already
have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
code
we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These
were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

tainted:26:52
orig:   26:48

This was the average aggregate time of two runs compiling all 242 .ii
files.
IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?




+  while (!worklist.is_empty ())
+{
+  name = worklist.pop ();
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+  if (ssa_undefined_value_p (name, true))
+   return true;
+
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the
worklist",
so maybe renaming it would be a good thing as well.




I don't understand what you would prefer here.



Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.



+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a
loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
(name)))
+   return false;

so to me "conservative" is returning true, not false.




OK



+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And
bitmap_set_bit
returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
(name)))
  worklist.safe_push (name);

should work?




Fixed.



+  /* Check that any SSA names used to define NAME is also fully
+defined.  */
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+   {
+ name = USE_FROM_PTR (use_p);
+ if (TREE_CODE (name) == SSA_NAME)

always true.

+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;
+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a
bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).




Done, though bitmaps are now calculated as part of the instantiation.



Also once you hit defs that are in a post-dominated region of the loop
entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined
behavior
when passing in undefined values.  So we need an extra parameter
specifying
the post-dominance region.




Done.



You do not handle memory or calls conservatively which means the
existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also
fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
  || gimple_assign_single_p (def))
return true;




As I asked previously, I understand the !is_gimple_assign, which will
skip
over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def)
==
true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't
we
want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed
post-dominance
region.  A previous incantation of this patch (sans the post-dominance
stuff) had performance within the noise of the previous implementation.

I am testing again, and will do some performance comparisons in a bit,
but
for now-- are we on the same page here?  Is this what yo

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-09 Thread Richard Biener
On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez  wrote:
> On 01/04/2017 07:11 AM, Richard Biener wrote:
>>
>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:
>>>
>>> On 12/20/2016 09:16 AM, Richard Biener wrote:


 On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez 
 wrote:
>
>
> Hi folks.
>
> This is a follow-up on Jeff and Richi's interaction on the
> aforementioned
> PR
> here:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>
> I decided to explore the idea of analyzing may-undefness on-demand,
> which
> actually looks rather cheap.
>
> BTW, I don't understand why we don't have auto_bitmap's, as we already
> have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's
> code
> we
> already have.
>
> The attached patch fixes the bug without introducing any regressions.
>
> I also tested the patch by compiling 242 .ii files with -O3.  These
> were
> gathered from a stage1 build with -save-temps.  There is a slight time
> degradation of 4 seconds within 27 minutes of user time:
>
> tainted:26:52
> orig:   26:48
>
> This was the average aggregate time of two runs compiling all 242 .ii
> files.
> IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?



 +  while (!worklist.is_empty ())
 +{
 +  name = worklist.pop ();
 +  gcc_assert (TREE_CODE (name) == SSA_NAME);
 +
 +  if (ssa_undefined_value_p (name, true))
 +   return true;
 +
 +  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

 it should be already set as we use visited_ssa as "was it ever on the
 worklist",
 so maybe renaming it would be a good thing as well.
>>>
>>>
>>>
>>> I don't understand what you would prefer here.
>>
>>
>> Set the bit when you put name on the worklist (and do not do that if the
>> bit is set).  Thus simply remove the above and add a bitmap_set_bit
>> for the initial name you put on the worklist.
>>

 + if (TREE_CODE (name) == SSA_NAME)
 +   {
 + /* If an SSA has already been seen, this may be a
 loop.
 +Fail conservatively.  */
 + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
 (name)))
 +   return false;

 so to me "conservative" is returning true, not false.
>>>
>>>
>>>
>>> OK
>>>

 + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
 + worklist.safe_push (name);

 but for loops we can just continue and ignore this use.  And
 bitmap_set_bit
 returns whether it set a bit, thus

 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
 (name)))
   worklist.safe_push (name);

 should work?
>>>
>>>
>>>
>>> Fixed.
>>>

 +  /* Check that any SSA names used to define NAME is also fully
 +defined.  */
 +  use_operand_p use_p;
 +  ssa_op_iter iter;
 +  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
 +   {
 + name = USE_FROM_PTR (use_p);
 + if (TREE_CODE (name) == SSA_NAME)

 always true.

 +   {
 + /* If an SSA has already been seen, this may be a loop.
 +Fail conservatively.  */
 + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
 +   return false;
 + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
 + worklist.safe_push (name);

 See above.

 In principle the thing is sound but I'd like to be able to pass in a
 bitmap of
 known maybe-undefined/must-defined SSA names to have a cache for
 multiple invocations from within a pass (like this unswitching case).
>>>
>>>
>>>
>>> Done, though bitmaps are now calculated as part of the instantiation.
>>>

 Also once you hit defs that are in a post-dominated region of the loop
 entry
 you can treat them as not undefined (as their use invokes undefined
 behavior).  This is also how you treat function parameters (well,
 ssa_undefined_value_p does), where the call site invokes undefined
 behavior
 when passing in undefined values.  So we need an extra parameter
 specifying
 the post-dominance region.
>>>
>>>
>>>
>>> Done.
>>>

 You do not handle memory or calls conservatively which means the
 existing
 testcase only needs some obfuscation to become a problem again.  To fix
 that before /* Check that any SSA names used to define NAME is also
 fully
 defined.  */ bail out conservatively, like

if (! is_gimple_assign (def)
   || gimple_assign_single_p (def))
 return true;
>>>

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-07 Thread Aldy Hernandez

On 01/04/2017 07:11 AM, Richard Biener wrote:

On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:

On 12/20/2016 09:16 AM, Richard Biener wrote:


On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez  wrote:


Hi folks.

This is a follow-up on Jeff and Richi's interaction on the aforementioned
PR
here:

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already
have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

tainted:26:52
orig:   26:48

This was the average aggregate time of two runs compiling all 242 .ii
files.
IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?



+  while (!worklist.is_empty ())
+{
+  name = worklist.pop ();
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+  if (ssa_undefined_value_p (name, true))
+   return true;
+
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the
worklist",
so maybe renaming it would be a good thing as well.



I don't understand what you would prefer here.


Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.



+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;

so to me "conservative" is returning true, not false.



OK



+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And
bitmap_set_bit
returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
  worklist.safe_push (name);

should work?



Fixed.



+  /* Check that any SSA names used to define NAME is also fully
+defined.  */
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+   {
+ name = USE_FROM_PTR (use_p);
+ if (TREE_CODE (name) == SSA_NAME)

always true.

+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;
+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a
bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).



Done, though bitmaps are now calculated as part of the instantiation.



Also once you hit defs that are in a post-dominated region of the loop
entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined
behavior
when passing in undefined values.  So we need an extra parameter
specifying
the post-dominance region.



Done.



You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
  || gimple_assign_single_p (def))
return true;



As I asked previously, I understand the !is_gimple_assign, which will skip
over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
want to follow up the def chain precisely on these?

The attached implementation uses a cache, and a pre-computed post-dominance
region.  A previous incantation of this patch (sans the post-dominance
stuff) had performance within the noise of the previous implementation.

I am testing again, and will do some performance comparisons in a bit, but
for now-- are we on the same page here?  Is this what you wanted?


+  /* DEFs in the post-dominated region of the loop entry invoke
+undefined behavior.  Adding any use won't make things any
+  

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-04 Thread Trevor Saunders
On Fri, Dec 16, 2016 at 09:41:33AM -0500, Aldy Hernandez wrote:
> Hi folks.
> 
> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
> here:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
> 
> I decided to explore the idea of analyzing may-undefness on-demand, which
> actually looks rather cheap.
> 
> BTW, I don't understand why we don't have auto_bitmap's, as we already have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
> already have.

No good reason, it just hasn't been done yet.

I'm tempted to fit both this and auto_sbitmap into a unique_ptr with
custom deletor, but that would make it hard to do small object
optimizations, so maybe it isn't actually a great idea.

> +class auto_bitmap
> +{
> + public:
> +  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
> +  ~auto_bitmap () { BITMAP_FREE (bits); }

We could make the member a bitmap_head, and then use bitmap_initialize
and bitmap_clear in the ctor and dtor, that'd save a alloc / dealloc.

You might also want to use the CXX_MEM_STAT macros on the ctor to get
better attribution of memory usage.

> +  // Prevent making a copy that references our bitmap.
> +  auto_bitmap (const auto_bitmap &);
> +  auto_bitmap &operator = (const auto_bitmap &);
> +#if __cplusplus >= 201103L
> +  auto_bitmap (auto_bitmap &&);
> +  auto_bitmap &operator = (auto_bitmap &&);

We could actually support moving bitmaps pretty easily, though we would
need to do the auto_ptr style hacks to keep building as c++98.  I'll
probably do that work eventually for unique_ptr, but its kind of late
for gcc 8 unless we just use it to fix memory leaks.

> +#endif
> +
> +  bitmap bits;

style guide would say that should be m_bits I believe.

Trev

sorry about the lag here.


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-04 Thread Richard Biener
On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez  wrote:
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>>
>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez  wrote:
>>>
>>> Hi folks.
>>>
>>> This is a follow-up on Jeff and Richi's interaction on the aforementioned
>>> PR
>>> here:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>>>
>>> I decided to explore the idea of analyzing may-undefness on-demand, which
>>> actually looks rather cheap.
>>>
>>> BTW, I don't understand why we don't have auto_bitmap's, as we already
>>> have
>>> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code
>>> we
>>> already have.
>>>
>>> The attached patch fixes the bug without introducing any regressions.
>>>
>>> I also tested the patch by compiling 242 .ii files with -O3.  These were
>>> gathered from a stage1 build with -save-temps.  There is a slight time
>>> degradation of 4 seconds within 27 minutes of user time:
>>>
>>> tainted:26:52
>>> orig:   26:48
>>>
>>> This was the average aggregate time of two runs compiling all 242 .ii
>>> files.
>>> IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?
>>
>>
>> +  while (!worklist.is_empty ())
>> +{
>> +  name = worklist.pop ();
>> +  gcc_assert (TREE_CODE (name) == SSA_NAME);
>> +
>> +  if (ssa_undefined_value_p (name, true))
>> +   return true;
>> +
>> +  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>> worklist",
>> so maybe renaming it would be a good thing as well.
>
>
> I don't understand what you would prefer here.

Set the bit when you put name on the worklist (and do not do that if the
bit is set).  Thus simply remove the above and add a bitmap_set_bit
for the initial name you put on the worklist.

>>
>> + if (TREE_CODE (name) == SSA_NAME)
>> +   {
>> + /* If an SSA has already been seen, this may be a loop.
>> +Fail conservatively.  */
>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +   return false;
>>
>> so to me "conservative" is returning true, not false.
>
>
> OK
>
>>
>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> + worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>> bitmap_set_bit
>> returns whether it set a bit, thus
>>
>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>>   worklist.safe_push (name);
>>
>> should work?
>
>
> Fixed.
>
>>
>> +  /* Check that any SSA names used to define NAME is also fully
>> +defined.  */
>> +  use_operand_p use_p;
>> +  ssa_op_iter iter;
>> +  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
>> +   {
>> + name = USE_FROM_PTR (use_p);
>> + if (TREE_CODE (name) == SSA_NAME)
>>
>> always true.
>>
>> +   {
>> + /* If an SSA has already been seen, this may be a loop.
>> +Fail conservatively.  */
>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
>> +   return false;
>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
>> + worklist.safe_push (name);
>>
>> See above.
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>> bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>
>
> Done, though bitmaps are now calculated as part of the instantiation.
>
>>
>> Also once you hit defs that are in a post-dominated region of the loop
>> entry
>> you can treat them as not undefined (as their use invokes undefined
>> behavior).  This is also how you treat function parameters (well,
>> ssa_undefined_value_p does), where the call site invokes undefined
>> behavior
>> when passing in undefined values.  So we need an extra parameter
>> specifying
>> the post-dominance region.
>
>
> Done.
>
>>
>> You do not handle memory or calls conservatively which means the existing
>> testcase only needs some obfuscation to become a problem again.  To fix
>> that before /* Check that any SSA names used to define NAME is also fully
>> defined.  */ bail out conservatively, like
>>
>>if (! is_gimple_assign (def)
>>   || gimple_assign_single_p (def))
>> return true;
>
>
> As I asked previously, I understand the !is_gimple_assign, which will skip
> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) ==
> true"??
>
> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
> want to follow up the def chain precisely on these?
>
> The attached implementation uses a cache, and a pre-computed post-dominance
> region.  A previous incantation of this patch (sans the post-dominance
> stuff) had performance within t

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-04 Thread Richard Biener
On Wed, Dec 21, 2016 at 7:43 PM, Aldy Hernandez  wrote:
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>
>> You do not handle memory or calls conservatively which means the existing
>> testcase only needs some obfuscation to become a problem again.  To fix
>> that before /* Check that any SSA names used to define NAME is also fully
>> defined.  */ bail out conservatively, like
>>
>>if (! is_gimple_assign (def)
>>   || gimple_assign_single_p (def))
>> return true;
>
>
> I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs
> setting a value, but the "gimple_assign_single_p(def) == true"??
>
> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
> want to follow up the def chain precisely on these?

For the first (= e) we can't, for the second, yes, the predicate is too strict.
But it's conservative ;)

Likewise for e_2 = VIEW_CONVERT_EXPR  btw, or REAL/IMAGPART_EXPR.

Note both VIEW_CONVERT_EXPR and REAL/IMAGPART_EXPR will also appear
in RHSs we can't handle.

Richard.

>
> Thanks.
> Aldy


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-03 Thread Aldy Hernandez

On 12/20/2016 09:16 AM, Richard Biener wrote:

On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez  wrote:

Hi folks.

This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
here:

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, which
actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already have
auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

tainted:26:52
orig:   26:48

This was the average aggregate time of two runs compiling all 242 .ii files.
IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?


+  while (!worklist.is_empty ())
+{
+  name = worklist.pop ();
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+  if (ssa_undefined_value_p (name, true))
+   return true;
+
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the worklist",
so maybe renaming it would be a good thing as well.


I don't understand what you would prefer here.



+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;

so to me "conservative" is returning true, not false.


OK



+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And bitmap_set_bit
returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
  worklist.safe_push (name);

should work?


Fixed.



+  /* Check that any SSA names used to define NAME is also fully
+defined.  */
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+   {
+ name = USE_FROM_PTR (use_p);
+ if (TREE_CODE (name) == SSA_NAME)

always true.

+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;
+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).


Done, though bitmaps are now calculated as part of the instantiation.



Also once you hit defs that are in a post-dominated region of the loop entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined behavior
when passing in undefined values.  So we need an extra parameter specifying
the post-dominance region.


Done.



You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
  || gimple_assign_single_p (def))
return true;


As I asked previously, I understand the !is_gimple_assign, which will 
skip over GIMPLE_CALLs setting a value, but the 
"gimple_assign_single_p(def) == true"??


Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't 
we want to follow up the def chain precisely on these?


The attached implementation uses a cache, and a pre-computed 
post-dominance region.  A previous incantation of this patch (sans the 
post-dominance stuff) had performance within the noise of the previous 
implementation.


I am testing again, and will do some performance comparisons in a bit, 
but for now-- are we on the same page here?  Is this what you wanted?


Aldy

p.s. I could turn the post-dominance region into a bitmap for faster 
access if preferred.
commit 47d0c1b3144d4d56405d72c3ad55183d8632d0a7
Author: Aldy Hernandez 
Date:   Fri Dec 16 03:44:52 2016 -0500

PR tree-optimization/71691
* bitmap.h (class auto_bitmap): New.
* tree-ssa-defined-or-undefined.c: New file.
* tree-ssa-defined-or-undefined.h: New file.
* Makef

Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-21 Thread Aldy Hernandez

On 12/20/2016 09:16 AM, Richard Biener wrote:


You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
  || gimple_assign_single_p (def))
return true;


I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs 
setting a value, but the "gimple_assign_single_p(def) == true"??


Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? 
Don't we want to follow up the def chain precisely on these?


Thanks.
Aldy


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-21 Thread Jeff Law

On 12/20/2016 10:33 AM, Richard Biener wrote:


but for loops we can just continue and ignore this use.  And

bitmap_set_bit

returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION

(name)))

  worklist.safe_push (name);

should work?

What if we're in an irreducible region?


Handling all back edges by deferring to their defs should work.  At least I 
can't see how it would not.

I'll take your word for it -- I haven't thought deeply about this.









In principle the thing is sound but I'd like to be able to pass in a

bitmap of

known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).

So that means keeping another bitmap for things positively identified
as
defined, then saving it for later invocations.


We eventually need a tristate here for maximum caching.  And as the result 
depends on the dominating BB of the postdom region the savings might be 
questionable.

Possibly.

Jeff


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-20 Thread Richard Biener
On December 20, 2016 4:50:25 PM GMT+01:00, Jeff Law  wrote:
>On 12/20/2016 07:16 AM, Richard Biener wrote:
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>worklist",
>> so maybe renaming it would be a good thing as well.
>>
>> + if (TREE_CODE (name) == SSA_NAME)
>> +   {
>> + /* If an SSA has already been seen, this may be a
>loop.
>> +Fail conservatively.  */
>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>(name)))
>> +   return false;
>>
>> so to me "conservative" is returning true, not false.
>Agreed.
>
>>
>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>(name));
>> + worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>bitmap_set_bit
>> returns whether it set a bit, thus
>>
>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>(name)))
>>   worklist.safe_push (name);
>>
>> should work?
>What if we're in an irreducible region?

Handling all back edges by deferring to their defs should work.  At least I 
can't see how it would not.

>
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>So that means keeping another bitmap for things positively identified
>as 
>defined, then saving it for later invocations.

We eventually need a tristate here for maximum caching.  And as the result 
depends on the dominating BB of the postdom region the savings might be 
questionable.

>  So maybe some of my
>work 
>+ some of his melded together.  (mine computes once for the entire FN 
>and saves the result, so converting it to on-demand saving the result 
>shouldn't be terrible).
>
>
>>
>> Only unswitching on conditions that post-dominate the loop entry
>might be a
>> simpler solution for the PR in question.  OTOH this may disable too
>much
>> unswitching in practice.
>It might be worth experimentation.
>
>Jeff




Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-20 Thread Jeff Law

On 12/20/2016 07:16 AM, Richard Biener wrote:


it should be already set as we use visited_ssa as "was it ever on the worklist",
so maybe renaming it would be a good thing as well.

+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;

so to me "conservative" is returning true, not false.

Agreed.



+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And bitmap_set_bit
returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
  worklist.safe_push (name);

should work?

What if we're in an irreducible region?




In principle the thing is sound but I'd like to be able to pass in a bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).
So that means keeping another bitmap for things positively identified as 
defined, then saving it for later invocations.  So maybe some of my work 
+ some of his melded together.  (mine computes once for the entire FN 
and saves the result, so converting it to on-demand saving the result 
shouldn't be terrible).





Only unswitching on conditions that post-dominate the loop entry might be a
simpler solution for the PR in question.  OTOH this may disable too much
unswitching in practice.

It might be worth experimentation.

Jeff


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-20 Thread Richard Biener
On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez  wrote:
> Hi folks.
>
> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
> here:
>
> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>
> I decided to explore the idea of analyzing may-undefness on-demand, which
> actually looks rather cheap.
>
> BTW, I don't understand why we don't have auto_bitmap's, as we already have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
> already have.
>
> The attached patch fixes the bug without introducing any regressions.
>
> I also tested the patch by compiling 242 .ii files with -O3.  These were
> gathered from a stage1 build with -save-temps.  There is a slight time
> degradation of 4 seconds within 27 minutes of user time:
>
> tainted:26:52
> orig:   26:48
>
> This was the average aggregate time of two runs compiling all 242 .ii files.
> IMO, this looks reasonable.  It is after all, -O3.Is it acceptable?

+  while (!worklist.is_empty ())
+{
+  name = worklist.pop ();
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+  if (ssa_undefined_value_p (name, true))
+   return true;
+
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the worklist",
so maybe renaming it would be a good thing as well.

+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;

so to me "conservative" is returning true, not false.

+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And bitmap_set_bit
returns whether it set a bit, thus

if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
  worklist.safe_push (name);

should work?

+  /* Check that any SSA names used to define NAME is also fully
+defined.  */
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+   {
+ name = USE_FROM_PTR (use_p);
+ if (TREE_CODE (name) == SSA_NAME)

always true.

+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;
+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).

Also once you hit defs that are in a post-dominated region of the loop entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined behavior
when passing in undefined values.  So we need an extra parameter specifying
the post-dominance region.

You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
  || gimple_assign_single_p (def))
return true;

Only unswitching on conditions that post-dominate the loop entry might be a
simpler solution for the PR in question.  OTOH this may disable too much
unswitching in practice.

Richard.

> Aldy


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-19 Thread Jeff Law

On 12/16/2016 07:41 AM, Aldy Hernandez wrote:


BTW, I don't understand why we don't have auto_bitmap's, as we already
have auto_sbitmap's.  I've implemented the former based on
auto_sbitmap's code we already have.
Trevor poked at it a bit.  bitmaps are a bit more complex than sbitmaps 
in terms of implementation details.


https://gcc.gnu.org/ml/gcc/2016-04/msg00013.html

But his suggestion was to first create auto_bitmap, then look to convert 
to using that as opposed to his other approaches.





The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were
gathered from a stage1 build with -save-temps.  There is a slight time
degradation of 4 seconds within 27 minutes of user time:

tainted:26:52
orig:26:48

This was the average aggregate time of two runs compiling all 242 .ii
files.  IMO, this looks reasonable.  It is after all, -O3.Is it
acceptable?

Aldy

curr


commit 2310bcd0e2552a40ca1de354faf005ed3e9daf4e
Author: Aldy Hernandez 
Date:   Fri Dec 16 03:44:52 2016 -0500

PR tree-optimization/71691
* bitmap.h (class auto_bitmap): New.
* tree-ssa-loop-unswitch.c (ssa_maybe_undefined_value_p): New.
(tree_may_unswitch_on): Call ssa_maybe_undefined_value_p.
So I'm going to defer to Richi since he was reviewing my original 
attempt in this space.


It probably doesn't matter in practice (when I looked at this I couldn't 
get into the code in question with a -O3 bootstrap or with the 
testsuite, just with the testcase in the BZ) but you might consider 
handling an already visited node slightly differently.


If the the node was visited and resolved as undefined, then we would 
have already exited the loop.


If the node was visited and resolved as defined, then we could just keep 
processing other items on the the worklist.


The case where you want to conservatively return false is when you're 
actively processing the name in question.


Jeff


[PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2016-12-16 Thread Aldy Hernandez

Hi folks.

This is a follow-up on Jeff and Richi's interaction on the 
aforementioned PR here:


https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, 
which actually looks rather cheap.


BTW, I don't understand why we don't have auto_bitmap's, as we already 
have auto_sbitmap's.  I've implemented the former based on 
auto_sbitmap's code we already have.


The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were 
gathered from a stage1 build with -save-temps.  There is a slight time 
degradation of 4 seconds within 27 minutes of user time:


tainted:26:52
orig:   26:48

This was the average aggregate time of two runs compiling all 242 .ii 
files.  IMO, this looks reasonable.  It is after all, -O3.Is it 
acceptable?


Aldy
commit 2310bcd0e2552a40ca1de354faf005ed3e9daf4e
Author: Aldy Hernandez 
Date:   Fri Dec 16 03:44:52 2016 -0500

PR tree-optimization/71691
* bitmap.h (class auto_bitmap): New.
* tree-ssa-loop-unswitch.c (ssa_maybe_undefined_value_p): New.
(tree_may_unswitch_on): Call ssa_maybe_undefined_value_p.

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 1b2c8e0..bc32a21 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -802,4 +802,25 @@ bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
bmp_iter_and_compl (&(ITER), &(BITNUM));
\
bmp_iter_next (&(ITER), &(BITNUM)))
 
+/* A class that ties the lifetime of a bitmap to its scope.  */
+class auto_bitmap
+{
+ public:
+  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
+  ~auto_bitmap () { BITMAP_FREE (bits); }
+  // Allow calling bitmap functions on our bitmap.
+  operator bitmap () { return bits; }
+
+ private:
+  // Prevent making a copy that references our bitmap.
+  auto_bitmap (const auto_bitmap &);
+  auto_bitmap &operator = (const auto_bitmap &);
+#if __cplusplus >= 201103L
+  auto_bitmap (auto_bitmap &&);
+  auto_bitmap &operator = (auto_bitmap &&);
+#endif
+
+  bitmap bits;
+};
+
 #endif /* GCC_BITMAP_H */
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71691.c 
b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
new file mode 100644
index 000..1ffd1a2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
@@ -0,0 +1,47 @@
+/* { dg-options "-fno-tree-vrp" } */
+
+char b;
+short f;
+unsigned e;
+int g = 20;
+
+void
+foo ()
+{
+  int l, h;
+  for (l = 0; l <= 7; l++)
+{
+  int j = 38;
+  if (g)
+   h = 0;
+  for (; h <= 7; h++)
+   {
+ int i, k = b % (j % 4);
+ g = f;
+ for (;;)
+   {
+ j = 6 || b;
+ if (e)
+   {
+ for (; j; --j)
+   if (k)
+ __builtin_printf ("%d", 9);
+ if (i)
+   __builtin_printf ("%d", j);
+   }
+ if (l)
+   continue;
+ break;
+   }
+ i = f || b;
+   }
+}
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 40905af..bc34395 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -109,6 +109,71 @@ tree_ssa_unswitch_loops (void)
   return 0;
 }
 
+/* Return TRUE if an SSA_NAME may be undefined.  */
+
+static bool
+ssa_maybe_undefined_value_p (tree name)
+{
+  /* If it's obviously undefined, avoid further computations.  */
+  if (ssa_undefined_value_p (name, true))
+return true;
+
+  auto_bitmap visited_ssa;
+  auto_vec worklist;
+  worklist.safe_push (name);
+  while (!worklist.is_empty ())
+{
+  name = worklist.pop ();
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+  if (ssa_undefined_value_p (name, true))
+   return true;
+
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+
+  /* Check that all the PHI args are fully defined.  */
+  gimple *def = SSA_NAME_DEF_STMT (name);
+  if (gphi *phi = dyn_cast  (def))
+   {
+ if (virtual_operand_p (PHI_RESULT (phi)))
+   continue;
+ for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+   {
+ name = gimple_phi_arg_def (phi, i);
+ if (TREE_CODE (name) == SSA_NAME)
+   {
+ /* If an SSA has already been seen, this may be a loop.
+Fail conservatively.  */
+ if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+   return false;
+ bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+ worklist.safe_push (name);
+   }
+   }
+ continue;
+   }
+
+  /* Check that any SSA names used to define NAME is also fully
+defined.  */
+  use_operand_p use_p;
+  ssa_op_iter i