RE: [RFC] Should widening_mul should consider block frequency?

2020-03-26 Thread Yangfei (Felix)
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Thursday, March 26, 2020 3:37 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> >
> > That's a good point.  I have attached the v2 patch.
> > Also did a spec2017 test on aarch64, no obvious impact witnessed with this.
> > Can you sponsor this patch please?  My networking does not work well
> > and I am having some trouble pushing it : - (
> 
> Pushed.  For the future can you please attach patches suitable for git am?

Sure, will do.  Thanks for the help : - )  

Felix


Re: [RFC] Should widening_mul should consider block frequency?

2020-03-26 Thread Richard Biener via Gcc-patches
On Thu, Mar 26, 2020 at 2:06 AM Yangfei (Felix)  wrote:
>
> Hi!
>
> > -Original Message-
> > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > Sent: Tuesday, March 24, 2020 10:14 PM
> > To: Yangfei (Felix) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC] Should widening_mul should consider block frequency?
> >
> > > > As written in the PR I'd follow fma generation and restrict defs to the 
> > > > same
> > BB.
> > >
> > > Thanks for the suggestion.  That should be more consistent.
> > > Attached please find the adapted patch.
> > > Bootstrap and tested on both x86_64 and aarch64 Linux platform.
> >
> > OK with moving the BB check before the is_widening_mult_p call since it's 
> > way
> > cheaper.
>
> That's a good point.  I have attached the v2 patch.
> Also did a spec2017 test on aarch64, no obvious impact witnessed with this.
> Can you sponsor this patch please?  My networking does not work well and I am 
> having some trouble pushing it : - (

Pushed.  For the future can you please attach patches suitable for git am?

Thanks,
Richard.

> git commit msg:
>
> widening_mul: restrict ops to be defined in the same basic-block when 
> convert plusminus to widen
>
> In the testcase for PR94269, widening_mul moves two multiply instructions 
> from outside the loop to inside
> the loop, merging with two add instructions separately.  This increases 
> the cost of the loop.  Like FMA detection
> in the same pass, simply restrict ops to be defined in the same 
> basic-block to avoid possibly moving multiply
> to a different block with a higher execution frequency.
>
> 2020-03-26  Felix Yang  
>
> gcc/
> PR tree-optimization/94269
> * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
> operation to single basic block.
>
> gcc/testsuite/
> PR tree-optimization/94269
> * gcc.dg/pr94269.c: New test.
>
> change log:
>
> gcc:
> +2020-03-26  Felix Yang  
> +
> +   PR tree-optimization/94269
> +   * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
> +   operation to single basic block.
>
> gcc/testsuite:
> +2020-03-26  Felix Yang  
> +
> +   PR tree-optimization/94269
> +   * gcc.dg/pr94269.c: New test.
> +


RE: [RFC] Should widening_mul should consider block frequency?

2020-03-25 Thread Yangfei (Felix)
Hi!

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, March 24, 2020 10:14 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> > > As written in the PR I'd follow fma generation and restrict defs to the 
> > > same
> BB.
> >
> > Thanks for the suggestion.  That should be more consistent.
> > Attached please find the adapted patch.
> > Bootstrap and tested on both x86_64 and aarch64 Linux platform.
> 
> OK with moving the BB check before the is_widening_mult_p call since it's way
> cheaper.

That's a good point.  I have attached the v2 patch.  
Also did a spec2017 test on aarch64, no obvious impact witnessed with this.  
Can you sponsor this patch please?  My networking does not work well and I am 
having some trouble pushing it : - (  

git commit msg: 

widening_mul: restrict ops to be defined in the same basic-block when 
convert plusminus to widen

In the testcase for PR94269, widening_mul moves two multiply instructions 
from outside the loop to inside
the loop, merging with two add instructions separately.  This increases the 
cost of the loop.  Like FMA detection
in the same pass, simply restrict ops to be defined in the same basic-block 
to avoid possibly moving multiply
to a different block with a higher execution frequency.  

2020-03-26  Felix Yang  

gcc/
PR tree-optimization/94269
* tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
operation to single basic block.

gcc/testsuite/
PR tree-optimization/94269
* gcc.dg/pr94269.c: New test.

change log:

gcc:
+2020-03-26  Felix Yang  
+
+   PR tree-optimization/94269
+   * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
+   operation to single basic block.

gcc/testsuite:
+2020-03-26  Felix Yang  
+
+   PR tree-optimization/94269
+   * gcc.dg/pr94269.c: New test.
+


pr94269-v2.patch
Description: pr94269-v2.patch


Re: [RFC] Should widening_mul should consider block frequency?

2020-03-24 Thread Richard Biener via Gcc-patches
On Tue, Mar 24, 2020 at 12:37 PM Yangfei (Felix)  wrote:
>
> Hi!
>
> > -Original Message-
> > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > Sent: Monday, March 23, 2020 11:25 PM
> > To: Yangfei (Felix) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC] Should widening_mul should consider block frequency?
> >
> > On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) 
> > wrote:
> > >
> > > Hi,
> > >
> > >   I created a bug for this issue:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
> > >   Looks like widening_mul phase may move multiply instruction from outside
> > the loop to inside the loop, merging with one add instruction inside the 
> > loop.
> > >   This will increase the cost of the loop at least on aarch64 (4 cycles 
> > > vs 1
> > cycle).  I think widening_mul should consider block frequency when doing 
> > such
> > a combination.
> > >   I mean something like:
> >
> > As written in the PR I'd follow fma generation and restrict defs to the 
> > same BB.
>
> Thanks for the suggestion.  That should be more consistent.
> Attached please find the adapted patch.
> Bootstrap and tested on both x86_64 and aarch64 Linux platform.

OK with moving the BB check before the is_widening_mult_p call
since it's way cheaper.

Thanks,
Richard.

> gcc:
> +2020-03-24  Felix Yang  
> +
> +   PR tree-optimization/94269
> +   * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
> +   operation to single basic block.
>
> gcc/testsuite:
> +2020-03-24  Felix Yang  
> +
> +   PR tree-optimization/94269
> +   * gcc.dg/pr94269.c: New test.
> +
>
> Thanks,
> Felix


RE: [RFC] Should widening_mul should consider block frequency?

2020-03-24 Thread Yangfei (Felix)
Hi!

> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Monday, March 23, 2020 11:25 PM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) 
> wrote:
> >
> > Hi,
> >
> >   I created a bug for this issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
> >   Looks like widening_mul phase may move multiply instruction from outside
> the loop to inside the loop, merging with one add instruction inside the loop.
> >   This will increase the cost of the loop at least on aarch64 (4 cycles vs 1
> cycle).  I think widening_mul should consider block frequency when doing such
> a combination.
> >   I mean something like:
> 
> As written in the PR I'd follow fma generation and restrict defs to the same 
> BB.

Thanks for the suggestion.  That should be more consistent.  
Attached please find the adapted patch.  
Bootstrap and tested on both x86_64 and aarch64 Linux platform.  

gcc:
+2020-03-24  Felix Yang  
+
+   PR tree-optimization/94269
+   * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
+   operation to single basic block.

gcc/testsuite:
+2020-03-24  Felix Yang  
+
+   PR tree-optimization/94269
+   * gcc.dg/pr94269.c: New test.
+

Thanks,
Felix


pr94269-v1.patch
Description: pr94269-v1.patch


Re: [RFC] Should widening_mul should consider block frequency?

2020-03-23 Thread Richard Biener via Gcc-patches
On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix)  wrote:
>
> Hi,
>
>   I created a bug for this issue: 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
>   Looks like widening_mul phase may move multiply instruction from outside 
> the loop to inside the loop, merging with one add instruction inside the loop.
>   This will increase the cost of the loop at least on aarch64 (4 cycles vs 1 
> cycle).  I think widening_mul should consider block frequency when doing such 
> a combination.
>   I mean something like:

As written in the PR I'd follow fma generation and restrict defs to the same BB.

Richard.

> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 54ba035..4439452 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2721,7 +2721,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
> gimple *stmt,
>  {
>if (!has_single_use (rhs1)
>   || !is_widening_mult_p (rhs1_stmt, , _rhs1,
> - , _rhs2))
> + , _rhs2)
> + || (gimple_bb (rhs1_stmt) != gimple_bb (stmt)
> + && gimple_bb (rhs1_stmt)->count.to_frequency(cfun)
> +< gimple_bb (stmt)->count.to_frequency(cfun)))
> return false;
>add_rhs = rhs2;
>conv_stmt = conv1_stmt;
> @@ -2730,7 +2733,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
> gimple *stmt,
>  {
>if (!has_single_use (rhs2)
>   || !is_widening_mult_p (rhs2_stmt, , _rhs1,
> - , _rhs2))
> + , _rhs2)
> + || (gimple_bb (rhs2_stmt) != gimple_bb (stmt)
> + && gimple_bb (rhs2_stmt)->count.to_frequency(cfun)
> +< gimple_bb (stmt)->count.to_frequency(cfun)))
> return false;
>add_rhs = rhs1;
>conv_stmt = conv2_stmt;
>
>   Comments?
>
> Thanks,
> Felix


[RFC] Should widening_mul should consider block frequency?

2020-03-23 Thread Yangfei (Felix)
Hi,

  I created a bug for this issue: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269 
  Looks like widening_mul phase may move multiply instruction from outside the 
loop to inside the loop, merging with one add instruction inside the loop.  
  This will increase the cost of the loop at least on aarch64 (4 cycles vs 1 
cycle).  I think widening_mul should consider block frequency when doing such a 
combination.  
  I mean something like:
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 54ba035..4439452 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2721,7 +2721,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple *stmt,
 {
   if (!has_single_use (rhs1)
  || !is_widening_mult_p (rhs1_stmt, , _rhs1,
- , _rhs2))
+ , _rhs2)
+ || (gimple_bb (rhs1_stmt) != gimple_bb (stmt)
+ && gimple_bb (rhs1_stmt)->count.to_frequency(cfun)
+< gimple_bb (stmt)->count.to_frequency(cfun)))
return false;
   add_rhs = rhs2;
   conv_stmt = conv1_stmt;
@@ -2730,7 +2733,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, 
gimple *stmt,
 {
   if (!has_single_use (rhs2)
  || !is_widening_mult_p (rhs2_stmt, , _rhs1,
- , _rhs2))
+ , _rhs2)
+ || (gimple_bb (rhs2_stmt) != gimple_bb (stmt)
+ && gimple_bb (rhs2_stmt)->count.to_frequency(cfun)
+< gimple_bb (stmt)->count.to_frequency(cfun)))
return false;
   add_rhs = rhs1;
   conv_stmt = conv2_stmt;

  Comments?

Thanks,
Felix