Re: [PATCH 0/3][POPCOUNT]

2018-07-06 Thread Kugan Vivekanandarajah
Hi Jeff,

Thanks for looking into it.

On 6 July 2018 at 08:03, Jeff Law  wrote:
> On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
>> Hi Jeff,
>>
>> Thanks for the comments.
>>
>> On 23 June 2018 at 02:06, Jeff Law  wrote:
>>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
 When we set niter with maybe_zero, currently final_value_relacement
 will not happen due to expression_expensive_p not handling. Patch 1
 adds this.

 With that we have the following optimized gimple.

[local count: 118111601]:
   if (b_4(D) != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   _2 = (unsigned long) b_4(D);
   _9 = __builtin_popcountl (_2);
   c_3 = b_4(D) != 0 ? _9 : 1;

[local count: 118111601]:
   # c_12 = PHI 

 I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
 latch execute zero times for b_4 == 0 means that the body will execute
 ones.
>>> ISTM that DOM ought to have simplified the conditional, unless there's
>>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
>>> must have the value _9.
>> As of now, dom is not optimizing it. With the attached hack, it can be made 
>> to.
> What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the
> dumps I'm looking at.  Instead it's c_3 = _9, which is what I would
> expect since we know that b_4 != 0
>
>
> My tests have been on x86_64 and aarch64 linux targets.  I've tried with
> patch#1 installed as well as with patch #1 and patch #2 together.
>
> What target, what flags and what patches do I need to see this?
You need the patch 1 (attaching) to get that. With Patch 2 in this
series, it will be optimized.

I haven't committed the patches yet as I am testing all the three
patches. I will commit after testing on current trunk.

Thanks,
Kugan


>
> Jeff
From 12263df77931aa55d205b9db470436848d762684 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 22 Jun 2018 14:10:26 +1000
Subject: [PATCH 1/3] generate popcount when checked for zero

Change-Id: I951e6d487268b757cbdaa8dcf671ab1377490db6
---
 gcc/gimplify.c  |  2 +-
 gcc/gimplify.h  |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr64183.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr85073.c |  2 +-
 gcc/tree-scalar-evolution.c | 12 
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 48ac92e..c86ad1a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -3878,7 +3878,7 @@ gimplify_pure_cond_expr (tree *expr_p, gimple_seq *pre_p)
EXPR is GENERIC, while tree_could_trap_p can be called
only on GIMPLE.  */
 
-static bool
+bool
 generic_expr_could_trap_p (tree expr)
 {
   unsigned i, n;
diff --git a/gcc/gimplify.h b/gcc/gimplify.h
index dd0e4c0..62ca869 100644
--- a/gcc/gimplify.h
+++ b/gcc/gimplify.h
@@ -83,6 +83,7 @@ extern enum gimplify_status gimplify_arg (tree *, gimple_seq *, location_t,
 extern void gimplify_function_tree (tree);
 extern enum gimplify_status gimplify_va_arg_expr (tree *, gimple_seq *,
 		  gimple_seq *);
+extern bool generic_expr_could_trap_p (tree expr);
 gimple *gimplify_assign (tree, tree, gimple_seq *);
 
 #endif /* GCC_GIMPLIFY_H */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
index 7a854fc..50d0c5a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-vectorize -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -fno-tree-vectorize -fdisable-tree-sccp -fdump-tree-cunroll-details" } */
 
 int bits;
 unsigned int size;
diff --git a/gcc/testsuite/gcc.target/i386/pr85073.c b/gcc/testsuite/gcc.target/i386/pr85073.c
index 187102d..71a5d23 100644
--- a/gcc/testsuite/gcc.target/i386/pr85073.c
+++ b/gcc/testsuite/gcc.target/i386/pr85073.c
@@ -1,6 +1,6 @@
 /* PR target/85073 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi" } */
+/* { dg-options "-O2 -mbmi -fdisable-tree-sccp" } */
 
 int
 foo (unsigned x)
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 4b0ec02..8e29005 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3508,6 +3508,18 @@ expression_expensive_p (tree expr)
   return false;
 }
 
+  if (code == COND_EXPR)
+return (expression_expensive_p (TREE_OPERAND (expr, 0))
+	|| (EXPR_P (TREE_OPERAND (expr, 1))
+		&& EXPR_P (TREE_OPERAND (expr, 2)))
+	/* If either branch has side effects or could trap.  */
+	|| TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 1))
+	|| generic_expr_could_trap_p (TREE_OPERAND (expr, 1))
+	|| TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0))
+	|| generic_expr_could_trap_p (TREE_OPERAND (expr, 0))
+	|| expression_expensive_p (TREE_OPERAND (expr, 1))
+	|| expression_expensive_p 

Re: [PATCH 0/3][POPCOUNT]

2018-07-05 Thread Jeff Law
On 07/05/2018 11:39 PM, Richard Biener wrote:
> On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law  wrote:
>> On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
>>> Hi Jeff,
>>>
>>> Thanks for the comments.
>>>
>>> On 23 June 2018 at 02:06, Jeff Law  wrote:
 On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
> When we set niter with maybe_zero, currently final_value_relacement
> will not happen due to expression_expensive_p not handling. Patch 1
> adds this.
>
> With that we have the following optimized gimple.
>
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
>   c_3 = b_4(D) != 0 ? _9 : 1;
>
>[local count: 118111601]:
>   # c_12 = PHI 
>
> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when
>> the
> latch execute zero times for b_4 == 0 means that the body will
>> execute
> ones.
 ISTM that DOM ought to have simplified the conditional, unless
>> there's
 some other way to get to bb3.  We know that b_4 is nonzero and thus
>> c_3
 must have the value _9.
>>> As of now, dom is not optimizing it. With the attached hack, it can
>> be made to.
>> What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of
>> the
>> dumps I'm looking at.  Instead it's c_3 = _9, which is what I would
>> expect since we know that b_4 != 0
>>
>>
>> My tests have been on x86_64 and aarch64 linux targets.  I've tried
>> with
>> patch#1 installed as well as with patch #1 and patch #2 together.
>>
>> What target, what flags and what patches do I need to see this?
> 
> I believe it has been fixed in niters analysis to avoid the condition if it 
> is known to be true. 
Ah.  Presumably the change from 6-16.  I'll back that out and retry.

jeff


Re: [PATCH 0/3][POPCOUNT]

2018-07-05 Thread Richard Biener
On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law  wrote:
>On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
>> Hi Jeff,
>> 
>> Thanks for the comments.
>> 
>> On 23 June 2018 at 02:06, Jeff Law  wrote:
>>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
 When we set niter with maybe_zero, currently final_value_relacement
 will not happen due to expression_expensive_p not handling. Patch 1
 adds this.

 With that we have the following optimized gimple.

[local count: 118111601]:
   if (b_4(D) != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   _2 = (unsigned long) b_4(D);
   _9 = __builtin_popcountl (_2);
   c_3 = b_4(D) != 0 ? _9 : 1;

[local count: 118111601]:
   # c_12 = PHI 

 I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when
>the
 latch execute zero times for b_4 == 0 means that the body will
>execute
 ones.
>>> ISTM that DOM ought to have simplified the conditional, unless
>there's
>>> some other way to get to bb3.  We know that b_4 is nonzero and thus
>c_3
>>> must have the value _9.
>> As of now, dom is not optimizing it. With the attached hack, it can
>be made to.
>What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of
>the
>dumps I'm looking at.  Instead it's c_3 = _9, which is what I would
>expect since we know that b_4 != 0
>
>
>My tests have been on x86_64 and aarch64 linux targets.  I've tried
>with
>patch#1 installed as well as with patch #1 and patch #2 together.
>
>What target, what flags and what patches do I need to see this?

I believe it has been fixed in niters analysis to avoid the condition if it is 
known to be true. 

Richard. 

>Jeff



Re: [PATCH 0/3][POPCOUNT]

2018-07-05 Thread Jeff Law
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
> Hi Jeff,
> 
> Thanks for the comments.
> 
> On 23 June 2018 at 02:06, Jeff Law  wrote:
>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
>>> When we set niter with maybe_zero, currently final_value_relacement
>>> will not happen due to expression_expensive_p not handling. Patch 1
>>> adds this.
>>>
>>> With that we have the following optimized gimple.
>>>
>>>[local count: 118111601]:
>>>   if (b_4(D) != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   _2 = (unsigned long) b_4(D);
>>>   _9 = __builtin_popcountl (_2);
>>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>>
>>>[local count: 118111601]:
>>>   # c_12 = PHI 
>>>
>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>>> latch execute zero times for b_4 == 0 means that the body will execute
>>> ones.
>> ISTM that DOM ought to have simplified the conditional, unless there's
>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
>> must have the value _9.
> As of now, dom is not optimizing it. With the attached hack, it can be made 
> to.
What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the
dumps I'm looking at.  Instead it's c_3 = _9, which is what I would
expect since we know that b_4 != 0


My tests have been on x86_64 and aarch64 linux targets.  I've tried with
patch#1 installed as well as with patch #1 and patch #2 together.

What target, what flags and what patches do I need to see this?

Jeff


Re: [PATCH 0/3][POPCOUNT]

2018-06-27 Thread Jeff Law
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
> Hi Jeff,
> 
> Thanks for the comments.
> 
> On 23 June 2018 at 02:06, Jeff Law  wrote:
>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
>>> When we set niter with maybe_zero, currently final_value_relacement
>>> will not happen due to expression_expensive_p not handling. Patch 1
>>> adds this.
>>>
>>> With that we have the following optimized gimple.
>>>
>>>[local count: 118111601]:
>>>   if (b_4(D) != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   _2 = (unsigned long) b_4(D);
>>>   _9 = __builtin_popcountl (_2);
>>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>>
>>>[local count: 118111601]:
>>>   # c_12 = PHI 
>>>
>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>>> latch execute zero times for b_4 == 0 means that the body will execute
>>> ones.
>> ISTM that DOM ought to have simplified the conditional, unless there's
>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
>> must have the value _9.
> As of now, dom is not optimizing it. With the attached hack, it can be made 
> to.
Thanks.   It's not as much of a hack as you might think.

Right now there's two ways to get at the data we're looking for.  One is
to query the VRP engine like you've done.  The other is to query the
expression table which should have something like 1 = (b4 != 0) in it as
that expression holds on the 2->3 edge which dominates bb3.

One of the things I want to do this summer is integrate the VRP engine
queries better into DOM and stop relying on adding conditionals into the
expression table (which isn't as powerful and doesn't scale well).  So
your patch isn't a terrible hack.  It's just not the well integrated
solution I want to pull together :-)


I think it was just last year when I started wondering if we were
handling COND_EXPRs on the RHS of an assignment correctly, but it got
lost on my stack of TODO items.  It's good to have a testcase which
confirms my suspicion that we're not handling them as well as we should.

I've opened bz86339 to track this issue.

Jeff


Re: [PATCH 0/3][POPCOUNT]

2018-06-25 Thread Richard Biener
On Mon, Jun 25, 2018 at 11:30 AM Bin.Cheng  wrote:
>
> On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah
>  wrote:
> > Hi Bin,
> >
> > On 25 June 2018 at 13:56, Bin.Cheng  wrote:
> >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
> >>  wrote:
> >>> Hi Bin,
> >>>
> >>> Thanks for your comments.
> >>>
> >>> On 25 June 2018 at 11:15, Bin.Cheng  wrote:
>  On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
>   wrote:
> > When we set niter with maybe_zero, currently final_value_relacement
> > will not happen due to expression_expensive_p not handling. Patch 1
> > adds this.
> >
> > With that we have the following optimized gimple.
> >
> >[local count: 118111601]:
> >   if (b_4(D) != 0)
> > goto ; [89.00%]
> >   else
> > goto ; [11.00%]
> >
> >[local count: 105119324]:
> >   _2 = (unsigned long) b_4(D);
> >   _9 = __builtin_popcountl (_2);
> >   c_3 = b_4(D) != 0 ? _9 : 1;
> >
> >[local count: 118111601]:
> >   # c_12 = PHI 
> >
> > I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>  No, it doesn't make much sense.  when b_4(D) == 0, the popcount
>  computed should be 0.  Point is you can never get b_4(D) == 0 with
>  guard condition in basic block 2.  So the result should simply be:
> >>>
> >>> When we do  calculate niter (for the copy header case), we set
> >>> may_be_zero as (which I think is correct)
> >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
> >>>   build_zero_cst
> >>>   (TREE_TYPE (src)));
> >>>
> >>> Then in final_value_replacement_loop (struct loop *loop)
> >>>
> >>> for the PHI stmt for which we are going to do the final value replacement,
> >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
> >>>
> >>> then we do
> >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree 
> >>> evolution_fn)
> >>>
> >>> where when we do chrec_apply to the polynomial_chrec with niter from
> >>> popcount which also has the may_be_zero, we end up with the 1.
> >>> Looking at this, I am not sure if this is wrong. May be I am missing 
> >>> something.
> >> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
> >> 0?  Though it never happens in this case.
> >
> > We dont set popcount = 1. When we set niter for popcount pattern with
> > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
> >   build_zero_cst (TREE_TYPE (src)));
> Hmm, I think this is unnecessary and causing the weird cond_expr in
> following optimization.  What happens if you simply set it to false?

It is surely not unnecessary because we set it to non-NULL only when
the result is _not_ simply popcount() but popcount()-1.

All the above suggests that SCEV does sth wrong.  Dumps show

  (set_nb_iterations_in_loop = b_4(D) != 0 ? (unsigned long)
__builtin_popcountl ((unsigned long) b_4(D)) + 18446744073709551615 :
0))

(chrec_apply
  (varying_loop = 1
)
  (chrec = {1, +, 1}_1)
  (x = b_4(D) != 0 ? (int) ((unsigned int) __builtin_popcountl
((unsigned long) b_4(D)) + 4294967295) : 0)
  (res = b_4(D) != 0 ? __builtin_popcountl ((unsigned long) b_4(D)) : 1))
)

this is because the analysis works on a header-copied loop and thus
the IV for C is {1, +, 1}
so this is all correct and to be simply optimized by following passes
factoring in
the range of b_4(D) which was tested to be _not_zero before.

Richard.

> Thanks,
> bin
> >
> > Because of which, we have an niter in the final_value_replacement, we have
> > (gdb) p debug_tree (niter)
> >   > type  > size 
> > unit-size 
> > align:64 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x7694d1f8 precision:64 min  > 0x7694a120 0> max  > 18446744073709551615>>
> >
> > arg:0  > type  > size 
> > unit-size 
> > align:8 warn_if_not_align:0 symtab:0 alias-set -1
> > canonical-type 0x76945b28 precision:1 min  > 0x7694a048 0> max >
> >
> > arg:0  > 0x76945738 long int>
> > visited var 
> > def_stmt GIMPLE_NOPvolu
> > version:4>
> > arg:1 >
> > arg:1 
> >
> > arg:0 
> >
> > arg:0  > 0x769455e8 int>
> >
> > fn  > 0x76a55888>
> > readonly constant arg:0  > 0x769ff600 __builtin_popcountl>>
> > arg:0  > 0x7694d1f8>
> > arg:0 >>>
> > arg:1 >
> > arg:2  > 0x7694d1f8> constant 0>>
> >
> > Then from there then we do compute_overall_effect_of_inner_loop for
> > scalar evolution of PHI with niter we get the 1.
> >
> >>>
> >>> In this testcase, before we enter the loop we have a check for (b_4(D)
>  0). Thus, setting niter->may_be_zero is not strictly necessary but
> >>> conservatively correct (?).
> >> Yes, but not necessarily.  Setting maybe_zero could 

Re: [PATCH 0/3][POPCOUNT]

2018-06-25 Thread Bin.Cheng
On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah
 wrote:
> Hi Bin,
>
> On 25 June 2018 at 13:56, Bin.Cheng  wrote:
>> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Bin,
>>>
>>> Thanks for your comments.
>>>
>>> On 25 June 2018 at 11:15, Bin.Cheng  wrote:
 On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
  wrote:
> When we set niter with maybe_zero, currently final_value_relacement
> will not happen due to expression_expensive_p not handling. Patch 1
> adds this.
>
> With that we have the following optimized gimple.
>
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
>   c_3 = b_4(D) != 0 ? _9 : 1;
>
>[local count: 118111601]:
>   # c_12 = PHI 
>
> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
 No, it doesn't make much sense.  when b_4(D) == 0, the popcount
 computed should be 0.  Point is you can never get b_4(D) == 0 with
 guard condition in basic block 2.  So the result should simply be:
>>>
>>> When we do  calculate niter (for the copy header case), we set
>>> may_be_zero as (which I think is correct)
>>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
>>>   build_zero_cst
>>>   (TREE_TYPE (src)));
>>>
>>> Then in final_value_replacement_loop (struct loop *loop)
>>>
>>> for the PHI stmt for which we are going to do the final value replacement,
>>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
>>>
>>> then we do
>>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)
>>>
>>> where when we do chrec_apply to the polynomial_chrec with niter from
>>> popcount which also has the may_be_zero, we end up with the 1.
>>> Looking at this, I am not sure if this is wrong. May be I am missing 
>>> something.
>> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
>> 0?  Though it never happens in this case.
>
> We dont set popcount = 1. When we set niter for popcount pattern with
> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
>   build_zero_cst (TREE_TYPE (src)));
Hmm, I think this is unnecessary and causing the weird cond_expr in
following optimization.  What happens if you simply set it to false?

Thanks,
bin
>
> Because of which, we have an niter in the final_value_replacement, we have
> (gdb) p debug_tree (niter)
>   type  size 
> unit-size 
> align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7694d1f8 precision:64 min  0x7694a120 0> max  18446744073709551615>>
>
> arg:0  type  size 
> unit-size 
> align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x76945b28 precision:1 min  0x7694a048 0> max >
>
> arg:0  0x76945738 long int>
> visited var 
> def_stmt GIMPLE_NOPvolu
> version:4>
> arg:1 >
> arg:1 
>
> arg:0 
>
> arg:0  0x769455e8 int>
>
> fn  0x76a55888>
> readonly constant arg:0  0x769ff600 __builtin_popcountl>>
> arg:0  0x7694d1f8>
> arg:0 >>>
> arg:1 >
> arg:2  0x7694d1f8> constant 0>>
>
> Then from there then we do compute_overall_effect_of_inner_loop for
> scalar evolution of PHI with niter we get the 1.
>
>>>
>>> In this testcase, before we enter the loop we have a check for (b_4(D)
 0). Thus, setting niter->may_be_zero is not strictly necessary but
>>> conservatively correct (?).
>> Yes, but not necessarily.  Setting maybe_zero could confuse following
>> optimizations and we should avoid doing that whenever possible.  If
>> any pass goes wrong because it's not set conservatively, it is that
>> pass' responsibility and should be fixed accordingly.  Here IMHO, we
>> don't need to set it.
>
> My patch 2 is for not setting this when we know know a_4(D) is not
> zero in this path.
>
> Thanks,
> Kugan
>
>
>
>
>>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> Kugan
>>>

>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   c_3 = __builtin_popcountl (_2);
>
>[local count: 118111601]:
>   # c_12 = PHI 

 I think this is the code generated if maybe_zero is not set?  which it
 should not be set here.
 For the same reason, it can be further optimized into:

>[local count: 118111601]:
>   _2 = (unsigned long) b_4(D);
>   c_12 = __builtin_popcountl (_2);
>

> latch execute zero times for b_4 == 0 means that the body will execute

Re: [PATCH 0/3][POPCOUNT]

2018-06-24 Thread Kugan Vivekanandarajah
Hi Bin,

On 25 June 2018 at 13:56, Bin.Cheng  wrote:
> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Bin,
>>
>> Thanks for your comments.
>>
>> On 25 June 2018 at 11:15, Bin.Cheng  wrote:
>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
>>>  wrote:
 When we set niter with maybe_zero, currently final_value_relacement
 will not happen due to expression_expensive_p not handling. Patch 1
 adds this.

 With that we have the following optimized gimple.

[local count: 118111601]:
   if (b_4(D) != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   _2 = (unsigned long) b_4(D);
   _9 = __builtin_popcountl (_2);
   c_3 = b_4(D) != 0 ? _9 : 1;

[local count: 118111601]:
   # c_12 = PHI 

 I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
>>> computed should be 0.  Point is you can never get b_4(D) == 0 with
>>> guard condition in basic block 2.  So the result should simply be:
>>
>> When we do  calculate niter (for the copy header case), we set
>> may_be_zero as (which I think is correct)
>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
>>   build_zero_cst
>>   (TREE_TYPE (src)));
>>
>> Then in final_value_replacement_loop (struct loop *loop)
>>
>> for the PHI stmt for which we are going to do the final value replacement,
>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
>>
>> then we do
>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)
>>
>> where when we do chrec_apply to the polynomial_chrec with niter from
>> popcount which also has the may_be_zero, we end up with the 1.
>> Looking at this, I am not sure if this is wrong. May be I am missing 
>> something.
> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
> 0?  Though it never happens in this case.

We dont set popcount = 1. When we set niter for popcount pattern with
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
  build_zero_cst (TREE_TYPE (src)));

Because of which, we have an niter in the final_value_replacement, we have
(gdb) p debug_tree (niter)
 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7694d1f8 precision:64 min  max >

arg:0 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x76945b28 precision:1 min  max >

arg:0 
visited var 
def_stmt GIMPLE_NOPvolu
version:4>
arg:1 >
arg:1 

arg:0 

arg:0 

fn 
readonly constant arg:0 >
arg:0 
arg:0 >>>
arg:1 >
arg:2  constant 0>>

Then from there then we do compute_overall_effect_of_inner_loop for
scalar evolution of PHI with niter we get the 1.

>>
>> In this testcase, before we enter the loop we have a check for (b_4(D)
>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but
>> conservatively correct (?).
> Yes, but not necessarily.  Setting maybe_zero could confuse following
> optimizations and we should avoid doing that whenever possible.  If
> any pass goes wrong because it's not set conservatively, it is that
> pass' responsibility and should be fixed accordingly.  Here IMHO, we
> don't need to set it.

My patch 2 is for not setting this when we know know a_4(D) is not
zero in this path.

Thanks,
Kugan




>
> Thanks,
> bin
>>
>> Thanks,
>> Kugan
>>
>>>
[local count: 118111601]:
   if (b_4(D) != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   _2 = (unsigned long) b_4(D);
   c_3 = __builtin_popcountl (_2);

[local count: 118111601]:
   # c_12 = PHI 
>>>
>>> I think this is the code generated if maybe_zero is not set?  which it
>>> should not be set here.
>>> For the same reason, it can be further optimized into:
>>>
[local count: 118111601]:
   _2 = (unsigned long) b_4(D);
   c_12 = __builtin_popcountl (_2);

>>>
 latch execute zero times for b_4 == 0 means that the body will execute
 ones.
>>> You never get zero times latch here with the aforementioned guard condition.
>>>
>>> BTW, I didn't look at following patches which could be wanted optimizations.
>>>
>>> Thanks,
>>> bin

 The issue here is, since we are checking if (b_4(D) != 0) before
 entering the loop means we don't need to set maybe_zero. Patch 2
 handles this.

 With that we have
[local count: 118111601]:
   if (b_4(D) != 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   _2 = (unsigned long) b_4(D);
   _9 = __builtin_popcountl (_2);


Re: [PATCH 0/3][POPCOUNT]

2018-06-24 Thread Bin.Cheng
On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
 wrote:
> Hi Bin,
>
> Thanks for your comments.
>
> On 25 June 2018 at 11:15, Bin.Cheng  wrote:
>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
>>  wrote:
>>> When we set niter with maybe_zero, currently final_value_relacement
>>> will not happen due to expression_expensive_p not handling. Patch 1
>>> adds this.
>>>
>>> With that we have the following optimized gimple.
>>>
>>>[local count: 118111601]:
>>>   if (b_4(D) != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   _2 = (unsigned long) b_4(D);
>>>   _9 = __builtin_popcountl (_2);
>>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>>
>>>[local count: 118111601]:
>>>   # c_12 = PHI 
>>>
>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
>> computed should be 0.  Point is you can never get b_4(D) == 0 with
>> guard condition in basic block 2.  So the result should simply be:
>
> When we do  calculate niter (for the copy header case), we set
> may_be_zero as (which I think is correct)
> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
>   build_zero_cst
>   (TREE_TYPE (src)));
>
> Then in final_value_replacement_loop (struct loop *loop)
>
> for the PHI stmt for which we are going to do the final value replacement,
> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.
>
> then we do
> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)
>
> where when we do chrec_apply to the polynomial_chrec with niter from
> popcount which also has the may_be_zero, we end up with the 1.
> Looking at this, I am not sure if this is wrong. May be I am missing 
> something.
I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
0?  Though it never happens in this case.
>
> In this testcase, before we enter the loop we have a check for (b_4(D)
>> 0). Thus, setting niter->may_be_zero is not strictly necessary but
> conservatively correct (?).
Yes, but not necessarily.  Setting maybe_zero could confuse following
optimizations and we should avoid doing that whenever possible.  If
any pass goes wrong because it's not set conservatively, it is that
pass' responsibility and should be fixed accordingly.  Here IMHO, we
don't need to set it.

Thanks,
bin
>
> Thanks,
> Kugan
>
>>
>>>[local count: 118111601]:
>>>   if (b_4(D) != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   _2 = (unsigned long) b_4(D);
>>>   c_3 = __builtin_popcountl (_2);
>>>
>>>[local count: 118111601]:
>>>   # c_12 = PHI 
>>
>> I think this is the code generated if maybe_zero is not set?  which it
>> should not be set here.
>> For the same reason, it can be further optimized into:
>>
>>>[local count: 118111601]:
>>>   _2 = (unsigned long) b_4(D);
>>>   c_12 = __builtin_popcountl (_2);
>>>
>>
>>> latch execute zero times for b_4 == 0 means that the body will execute
>>> ones.
>> You never get zero times latch here with the aforementioned guard condition.
>>
>> BTW, I didn't look at following patches which could be wanted optimizations.
>>
>> Thanks,
>> bin
>>>
>>> The issue here is, since we are checking if (b_4(D) != 0) before
>>> entering the loop means we don't need to set maybe_zero. Patch 2
>>> handles this.
>>>
>>> With that we have
>>>[local count: 118111601]:
>>>   if (b_4(D) != 0)
>>> goto ; [89.00%]
>>>   else
>>> goto ; [11.00%]
>>>
>>>[local count: 105119324]:
>>>   _2 = (unsigned long) b_4(D);
>>>   _9 = __builtin_popcountl (_2);
>>>
>>>[local count: 118111601]:
>>>   # c_12 = PHI <0(2), _9(3)>
>>>
>>> As advised earlier, patch 3 adds phiopt support to remove this.
>>>
>>> Bootstrap and regression testing are ongoing.
>>>
>>> Is this OK for trunk.
>>>
>>> Thanks,
>>> Kugan


Re: [PATCH 0/3][POPCOUNT]

2018-06-24 Thread Kugan Vivekanandarajah
Hi Bin,

Thanks for your comments.

On 25 June 2018 at 11:15, Bin.Cheng  wrote:
> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
>  wrote:
>> When we set niter with maybe_zero, currently final_value_relacement
>> will not happen due to expression_expensive_p not handling. Patch 1
>> adds this.
>>
>> With that we have the following optimized gimple.
>>
>>[local count: 118111601]:
>>   if (b_4(D) != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>
>>[local count: 118111601]:
>>   # c_12 = PHI 
>>
>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
> No, it doesn't make much sense.  when b_4(D) == 0, the popcount
> computed should be 0.  Point is you can never get b_4(D) == 0 with
> guard condition in basic block 2.  So the result should simply be:

When we do  calculate niter (for the copy header case), we set
may_be_zero as (which I think is correct)
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
  build_zero_cst
  (TREE_TYPE (src)));

Then in final_value_replacement_loop (struct loop *loop)

for the PHI stmt for which we are going to do the final value replacement,
we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

then we do
compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

where when we do chrec_apply to the polynomial_chrec with niter from
popcount which also has the may_be_zero, we end up with the 1.
Looking at this, I am not sure if this is wrong. May be I am missing something.

In this testcase, before we enter the loop we have a check for (b_4(D)
> 0). Thus, setting niter->may_be_zero is not strictly necessary but
conservatively correct (?).

Thanks,
Kugan

>
>>[local count: 118111601]:
>>   if (b_4(D) != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   c_3 = __builtin_popcountl (_2);
>>
>>[local count: 118111601]:
>>   # c_12 = PHI 
>
> I think this is the code generated if maybe_zero is not set?  which it
> should not be set here.
> For the same reason, it can be further optimized into:
>
>>[local count: 118111601]:
>>   _2 = (unsigned long) b_4(D);
>>   c_12 = __builtin_popcountl (_2);
>>
>
>> latch execute zero times for b_4 == 0 means that the body will execute
>> ones.
> You never get zero times latch here with the aforementioned guard condition.
>
> BTW, I didn't look at following patches which could be wanted optimizations.
>
> Thanks,
> bin
>>
>> The issue here is, since we are checking if (b_4(D) != 0) before
>> entering the loop means we don't need to set maybe_zero. Patch 2
>> handles this.
>>
>> With that we have
>>[local count: 118111601]:
>>   if (b_4(D) != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>
>>[local count: 118111601]:
>>   # c_12 = PHI <0(2), _9(3)>
>>
>> As advised earlier, patch 3 adds phiopt support to remove this.
>>
>> Bootstrap and regression testing are ongoing.
>>
>> Is this OK for trunk.
>>
>> Thanks,
>> Kugan


Re: [PATCH 0/3][POPCOUNT]

2018-06-24 Thread Kugan Vivekanandarajah
Hi Jeff,

Thanks for the comments.

On 23 June 2018 at 02:06, Jeff Law  wrote:
> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
>> When we set niter with maybe_zero, currently final_value_relacement
>> will not happen due to expression_expensive_p not handling. Patch 1
>> adds this.
>>
>> With that we have the following optimized gimple.
>>
>>[local count: 118111601]:
>>   if (b_4(D) != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>   c_3 = b_4(D) != 0 ? _9 : 1;
>>
>>[local count: 118111601]:
>>   # c_12 = PHI 
>>
>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
>> latch execute zero times for b_4 == 0 means that the body will execute
>> ones.
> ISTM that DOM ought to have simplified the conditional, unless there's
> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
> must have the value _9.
As of now, dom is not optimizing it. With the attached hack, it can be made to.

>
>
>>
>> The issue here is, since we are checking if (b_4(D) != 0) before
>> entering the loop means we don't need to set maybe_zero. Patch 2
>> handles this.
>>
>> With that we have
>>[local count: 118111601]:
>>   if (b_4(D) != 0)
>> goto ; [89.00%]
>>   else
>> goto ; [11.00%]
>>
>>[local count: 105119324]:
>>   _2 = (unsigned long) b_4(D);
>>   _9 = __builtin_popcountl (_2);
>>
>>[local count: 118111601]:
>>   # c_12 = PHI <0(2), _9(3)>
>>
>> As advised earlier, patch 3 adds phiopt support to remove this.
> So if DOM or some other pass fixed up the assignment to c_3 I'd hope we
> wouldn't set maybe_zero.
>
> So I'd like to start by first determining if we should have already
> simplified the COND_EXPR in bb3.  Do you have a testcase for that?
Sorry, It is hidden in patch 3 (attaching now). You will need patch 1 as well.

Thanks,
Kugan

>
>
> jeff
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a6f176c..77ae7d1b 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1991,6 +1991,25 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, 
gimple_stmt_iterator si)
}
}
 
+  if (gimple_code (stmt) == GIMPLE_ASSIGN
+ && gimple_assign_rhs_code (stmt) == COND_EXPR)
+   {
+ /* If this is a conditional stmt, see if we can optimize the
+condition.  */
+ x_vr_values = evrp_range_analyzer.get_vr_values ();
+ tree exp = gimple_assign_rhs1 (stmt);
+ tree lhs = TREE_OPERAND (exp, 0);
+ tree rhs = TREE_OPERAND (exp, 1);
+ tree val = x_vr_values->vrp_evaluate_conditional (TREE_CODE (exp),
+   lhs, rhs, stmt);
+ if (is_gimple_min_invariant (val))
+   {
+ gimple_assign_set_rhs1 (stmt, val);
+ update_stmt (stmt);
+   }
+ x_vr_values = NULL;
+   }
+
   if (gimple_code (stmt) == GIMPLE_COND)
{
  tree lhs = gimple_cond_lhs (stmt);
int PopCount (long b) {
int c = 0;

while (b) {
	b &= b - 1;
	c++;
}
return c;
}


Re: [PATCH 0/3][POPCOUNT]

2018-06-24 Thread Bin.Cheng
On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
 wrote:
> When we set niter with maybe_zero, currently final_value_relacement
> will not happen due to expression_expensive_p not handling. Patch 1
> adds this.
>
> With that we have the following optimized gimple.
>
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
>   c_3 = b_4(D) != 0 ? _9 : 1;
>
>[local count: 118111601]:
>   # c_12 = PHI 
>
> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
No, it doesn't make much sense.  when b_4(D) == 0, the popcount
computed should be 0.  Point is you can never get b_4(D) == 0 with
guard condition in basic block 2.  So the result should simply be:

>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   c_3 = __builtin_popcountl (_2);
>
>[local count: 118111601]:
>   # c_12 = PHI 

I think this is the code generated if maybe_zero is not set?  which it
should not be set here.
For the same reason, it can be further optimized into:

>[local count: 118111601]:
>   _2 = (unsigned long) b_4(D);
>   c_12 = __builtin_popcountl (_2);
>

> latch execute zero times for b_4 == 0 means that the body will execute
> ones.
You never get zero times latch here with the aforementioned guard condition.

BTW, I didn't look at following patches which could be wanted optimizations.

Thanks,
bin
>
> The issue here is, since we are checking if (b_4(D) != 0) before
> entering the loop means we don't need to set maybe_zero. Patch 2
> handles this.
>
> With that we have
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
>
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
>
>[local count: 118111601]:
>   # c_12 = PHI <0(2), _9(3)>
>
> As advised earlier, patch 3 adds phiopt support to remove this.
>
> Bootstrap and regression testing are ongoing.
>
> Is this OK for trunk.
>
> Thanks,
> Kugan


Re: [PATCH 0/3][POPCOUNT]

2018-06-22 Thread Jeff Law
On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
> When we set niter with maybe_zero, currently final_value_relacement
> will not happen due to expression_expensive_p not handling. Patch 1
> adds this.
> 
> With that we have the following optimized gimple.
> 
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
>   c_3 = b_4(D) != 0 ? _9 : 1;
> 
>[local count: 118111601]:
>   # c_12 = PHI 
> 
> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
> latch execute zero times for b_4 == 0 means that the body will execute
> ones.
ISTM that DOM ought to have simplified the conditional, unless there's
some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
must have the value _9.


> 
> The issue here is, since we are checking if (b_4(D) != 0) before
> entering the loop means we don't need to set maybe_zero. Patch 2
> handles this.
> 
> With that we have
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _2 = (unsigned long) b_4(D);
>   _9 = __builtin_popcountl (_2);
> 
>[local count: 118111601]:
>   # c_12 = PHI <0(2), _9(3)>
> 
> As advised earlier, patch 3 adds phiopt support to remove this.
So if DOM or some other pass fixed up the assignment to c_3 I'd hope we
wouldn't set maybe_zero.

So I'd like to start by first determining if we should have already
simplified the COND_EXPR in bb3.  Do you have a testcase for that?


jeff


[PATCH 0/3][POPCOUNT]

2018-06-22 Thread Kugan Vivekanandarajah
When we set niter with maybe_zero, currently final_value_relacement
will not happen due to expression_expensive_p not handling. Patch 1
adds this.

With that we have the following optimized gimple.

   [local count: 118111601]:
  if (b_4(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _2 = (unsigned long) b_4(D);
  _9 = __builtin_popcountl (_2);
  c_3 = b_4(D) != 0 ? _9 : 1;

   [local count: 118111601]:
  # c_12 = PHI 

I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
latch execute zero times for b_4 == 0 means that the body will execute
ones.

The issue here is, since we are checking if (b_4(D) != 0) before
entering the loop means we don't need to set maybe_zero. Patch 2
handles this.

With that we have
   [local count: 118111601]:
  if (b_4(D) != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _2 = (unsigned long) b_4(D);
  _9 = __builtin_popcountl (_2);

   [local count: 118111601]:
  # c_12 = PHI <0(2), _9(3)>

As advised earlier, patch 3 adds phiopt support to remove this.

Bootstrap and regression testing are ongoing.

Is this OK for trunk.

Thanks,
Kugan