Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-07-12 Thread Jeff Law
On 7/8/19 1:39 AM, JiangNing OS wrote:
> Hi Jeff and Richard B.,
> 
> Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. 
> Attached is the new patch. Review again, please!
> 
> Thanks a lot!
> -Jiangning
> 
>> -Original Message-
>> From: Jeff Law 
>> Sent: Saturday, June 22, 2019 12:10 AM
>> To: Richard Biener 
>> Cc: JiangNing OS ; gcc-
>> patc...@gcc.gnu.org
>> Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
>>
>> On 6/21/19 6:23 AM, Richard Biener wrote:
>>> That looks like a pretty easy form to analyze.  I'd suggest looking
>>> through tree-ssa-phiopt.c closely.  There's several transformations in
>>> there that share similarities with yours.
>>>
>>>
>>> More specifically there is the conditional store elimination (cselim)
>>> pass inside this file.
>> That's the one I was thinking of.  It looks reasonably close to the cases
>> JiangNing is trying to capture.
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> 2) My current solution fits into current back-end if-conversion
>>> pass very well. I don't want to invent
>>>> a new framework to solve this relatively small issue. Besides,
>>> this back-end patch doesn't only
>>>> enhance store speculation detection, but also fix a bug in the
>>> original code. Understood, but I still wonder if we're better off
>>> addressing this in gimple.
>>>
>>>
>>> Traditionally if-conversions profitability heavily depends on the
>>> target and esp. if memory is involved costing on GIMPLE is hard.
>>> That's also one reason why we turned off loop if-conversion on GIMPLE
>>> for non-vectorized code.
>> Yea.  That's always the concern for transformations that aren't trivial to 
>> show
>> are better.
>>
>> Conditional store elimination as implemented right now in phiopt requires a
>> single store in the then/else clauses.  So we're really just sinking the 
>> stores
>> through a PHI.  We're really not changing the number of loads or stores on
>> any path.
>>
>> In the cases I think JiangNing is trying to handle we are adding a store on a
>> path where it didn't previously exist because there is no else clause.  So we
>> likely need to check the edge probabilities to avoid doing something dumb.  I
>> don't have a good sense where the cutoff should be.  tree-ssa-sink might
>> have some nuggets of information to help guide.
>>
>>> phiopt is really about followup optimization opportunities in passes
>>> that do not handle conditionals well.
>>>
>>> cselim is on the border...
>> ACK.  In fact, looking at it it I wonder if it'd be better in 
>> tree-ssa-sink.c :-)
>>
>>
>> jeff
> 
> csel5.patch
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9f83f75b0bf..1a7acf7f1ba 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2019-07-08  Jiangning Liu  
> +
> + PR tree-optimization/89430
> + * tree-ssa-phiopt.c (cond_store_replacement): Support conditional
> + store elimination for local variable without address escape.
> +
>  2019-07-07  Jeff Law  
>  
>   PR tree-optimization/91090
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 12e5bc167e0..65a34b43fb5 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,13 @@
> +2019-07-08  Jiangning Liu  
> +
> + PR tree-optimization/89430
> + * gcc.dg/tree-ssa/pr89430-1.c: New test.
> + * gcc.dg/tree-ssa/pr89430-2.c: New test.
> + * gcc.dg/tree-ssa/pr89430-3.c: New test.
> + * gcc.dg/tree-ssa/pr89430-4.c: New test.
> + * gcc.dg/tree-ssa/pr89430-5.c: New test.
> + * gcc.dg/tree-ssa/pr89430-6.c: New test.
THanks!

I made a trivial update to the comment before cond_store_replacement and
installed this patch on the trunk.

jeff


Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-07-12 Thread Jeff Law
On 7/11/19 10:08 PM, Andrew Pinski wrote:
> On Mon, Jul 8, 2019 at 12:39 AM JiangNing OS
>  wrote:
>>
>> Hi Jeff and Richard B.,
>>
>> Following your tips, I've found a much simpler solution in 
>> tree-ssa-phiopt.c. Attached is the new patch. Review again, please!
> 
>   /* Prove that we can move the store down.  We could also check
>  TREE_THIS_NOTRAP here, but in that case we also could move stores,
>  whose value is not available readily, which we want to avoid.  */
> 
> I think the comment above the change needs to be changed or extended slightly.
Actually, I don't think that comment needs to change.  But the one
before cond_store_replacement needs minor updating which I'll take care of.

My reasoning is that addressables in the local stack are most likely
going to already have their lines in the cache and thus are available
readily.


Jeff


Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-07-11 Thread Andrew Pinski
On Mon, Jul 8, 2019 at 12:39 AM JiangNing OS
 wrote:
>
> Hi Jeff and Richard B.,
>
> Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. 
> Attached is the new patch. Review again, please!

  /* Prove that we can move the store down.  We could also check
 TREE_THIS_NOTRAP here, but in that case we also could move stores,
 whose value is not available readily, which we want to avoid.  */

I think the comment above the change needs to be changed or extended slightly.

Thanks,
Andrew Pinski

>
> Thanks a lot!
> -Jiangning
>
> > -Original Message-
> > From: Jeff Law 
> > Sent: Saturday, June 22, 2019 12:10 AM
> > To: Richard Biener 
> > Cc: JiangNing OS ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
> >
> > On 6/21/19 6:23 AM, Richard Biener wrote:
> > >
> > > That looks like a pretty easy form to analyze.  I'd suggest looking
> > > through tree-ssa-phiopt.c closely.  There's several transformations in
> > > there that share similarities with yours.
> > >
> > >
> > > More specifically there is the conditional store elimination (cselim)
> > > pass inside this file.
> > That's the one I was thinking of.  It looks reasonably close to the cases
> > JiangNing is trying to capture.
> >
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >> 2) My current solution fits into current back-end if-conversion
> > > pass very well. I don't want to invent
> > >> a new framework to solve this relatively small issue. Besides,
> > > this back-end patch doesn't only
> > >> enhance store speculation detection, but also fix a bug in the
> > > original code. Understood, but I still wonder if we're better off
> > > addressing this in gimple.
> > >
> > >
> > > Traditionally if-conversions profitability heavily depends on the
> > > target and esp. if memory is involved costing on GIMPLE is hard.
> > > That's also one reason why we turned off loop if-conversion on GIMPLE
> > > for non-vectorized code.
> > Yea.  That's always the concern for transformations that aren't trivial to 
> > show
> > are better.
> >
> > Conditional store elimination as implemented right now in phiopt requires a
> > single store in the then/else clauses.  So we're really just sinking the 
> > stores
> > through a PHI.  We're really not changing the number of loads or stores on
> > any path.
> >
> > In the cases I think JiangNing is trying to handle we are adding a store on 
> > a
> > path where it didn't previously exist because there is no else clause.  So 
> > we
> > likely need to check the edge probabilities to avoid doing something dumb.  
> > I
> > don't have a good sense where the cutoff should be.  tree-ssa-sink might
> > have some nuggets of information to help guide.
> >
> > >
> > > phiopt is really about followup optimization opportunities in passes
> > > that do not handle conditionals well.
> > >
> > > cselim is on the border...
> > ACK.  In fact, looking at it it I wonder if it'd be better in 
> > tree-ssa-sink.c :-)
> >
> >
> > jeff


RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-07-08 Thread JiangNing OS
Hi Jeff and Richard B.,

Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. 
Attached is the new patch. Review again, please!

Thanks a lot!
-Jiangning

> -Original Message-
> From: Jeff Law 
> Sent: Saturday, June 22, 2019 12:10 AM
> To: Richard Biener 
> Cc: JiangNing OS ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
> 
> On 6/21/19 6:23 AM, Richard Biener wrote:
> >
> > That looks like a pretty easy form to analyze.  I'd suggest looking
> > through tree-ssa-phiopt.c closely.  There's several transformations in
> > there that share similarities with yours.
> >
> >
> > More specifically there is the conditional store elimination (cselim)
> > pass inside this file.
> That's the one I was thinking of.  It looks reasonably close to the cases
> JiangNing is trying to capture.
> 
> 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >> 2) My current solution fits into current back-end if-conversion
> > pass very well. I don't want to invent
> >> a new framework to solve this relatively small issue. Besides,
> > this back-end patch doesn't only
> >> enhance store speculation detection, but also fix a bug in the
> > original code. Understood, but I still wonder if we're better off
> > addressing this in gimple.
> >
> >
> > Traditionally if-conversions profitability heavily depends on the
> > target and esp. if memory is involved costing on GIMPLE is hard.
> > That's also one reason why we turned off loop if-conversion on GIMPLE
> > for non-vectorized code.
> Yea.  That's always the concern for transformations that aren't trivial to 
> show
> are better.
> 
> Conditional store elimination as implemented right now in phiopt requires a
> single store in the then/else clauses.  So we're really just sinking the 
> stores
> through a PHI.  We're really not changing the number of loads or stores on
> any path.
> 
> In the cases I think JiangNing is trying to handle we are adding a store on a
> path where it didn't previously exist because there is no else clause.  So we
> likely need to check the edge probabilities to avoid doing something dumb.  I
> don't have a good sense where the cutoff should be.  tree-ssa-sink might
> have some nuggets of information to help guide.
> 
> >
> > phiopt is really about followup optimization opportunities in passes
> > that do not handle conditionals well.
> >
> > cselim is on the border...
> ACK.  In fact, looking at it it I wonder if it'd be better in tree-ssa-sink.c 
> :-)
> 
> 
> jeff


csel5.patch
Description: csel5.patch


Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-21 Thread Jeff Law
On 6/21/19 6:23 AM, Richard Biener wrote:
> 
> That looks like a pretty easy form to analyze.  I'd suggest looking 
> through tree-ssa-phiopt.c closely.  There's several transformations
> in there that share similarities with yours.
> 
> 
> More specifically there is the conditional store elimination (cselim)
> pass inside this file.
That's the one I was thinking of.  It looks reasonably close to the
cases JiangNing is trying to capture.


> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> 2) My current solution fits into current back-end if-conversion
> pass very well. I don't want to invent
>> a new framework to solve this relatively small issue. Besides,
> this back-end patch doesn't only
>> enhance store speculation detection, but also fix a bug in the
> original code. Understood, but I still wonder if we're better off
> addressing this in gimple.
> 
> 
> Traditionally if-conversions profitability heavily depends on the
> target and esp. if memory is involved costing on GIMPLE is hard.
> That's also one reason why we turned off loop if-conversion on GIMPLE
> for non-vectorized code.
Yea.  That's always the concern for transformations that aren't trivial
to show are better.

Conditional store elimination as implemented right now in phiopt
requires a single store in the then/else clauses.  So we're really just
sinking the stores through a PHI.  We're really not changing the number
of loads or stores on any path.

In the cases I think JiangNing is trying to handle we are adding a store
on a path where it didn't previously exist because there is no else
clause.  So we likely need to check the edge probabilities to avoid
doing something dumb.  I don't have a good sense where the cutoff should
be.  tree-ssa-sink might have some nuggets of information to help guide.

> 
> phiopt is really about followup optimization opportunities in passes
> that do not handle conditionals well.
> 
> cselim is on the border...
ACK.  In fact, looking at it it I wonder if it'd be better in
tree-ssa-sink.c :-)


jeff


Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-21 Thread Richard Biener
On Fri, Jun 21, 2019 at 1:13 AM Jeff Law  wrote:

> On 6/20/19 3:53 AM, JiangNing OS wrote:
> > Hi Jeff,
> >
> > Appreciate your effort to review my patch! I've updated my patch as
> attached. See my answers below.
> >
> >> in current function, so the store speculation can be avoided.
> >> So at a high level should we be doing this in gimple rather than RTL?
> >> We're going to have a lot more information about types, better
> >> infrastructure for looking at uses/defs, access to the alias oracle, we
> should
> >> be able to accurately distinguish between potentially shared objects vs
> those
> >> which are local to the thread, etc.  We lose the low level costing
> information
> >> though.
> >>
> >> I'm still going to go through the patch and do some level of review,
> but I do
> >> think we need to answer the higher level question though.
> >>
> > I have the following reasons,
> >
> > 1) Following the clue Richard B gave me before about parameter --param
> allow-store-data-races,
> > I did check the middle-end pass tree-if-conv, but I think this pass at
> the moment doesn't work
> > for the issue I'm trying to solve. Tree-if-conv is to do if conversion
> for loop, and its final goal is to
> > help loop vectorization, while my case doesn't have a loop at all.
> I think the fact that it's focused so much on loops is a historical
> accident.  We certainly have a variety of places in the gimple
> optimizers that do if-conversion, and they're not all in tree-if-conv :(
>  For example, some are done in tree-ssa-phiopt.
>
> In the gimple optimizers the testcase from 89430 is going to look
> something like this:
>
>
> > ;   basic block 2, loop depth 0, count 1073741824 (estimated locally),
> maybe hot
> > ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> > ;;pred:   ENTRY [always]  count:1073741824 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> >   a.0_1 = a;
> >   _2 = (long unsigned int) k_8(D);
> >   _3 = _2 * 4;
> >   _4 = a.0_1 + _3;
> >   _5 = *_4;
> >   if (_5 > b_9(D))
> > goto ; [50.00%]
> >   else
> > goto ; [50.00%]
> > ;;succ:   3 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
> > ;;4 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
> >
> > ;;   basic block 3, loop depth 0, count 536870913 (estimated locally),
> maybe hot
> > ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> > ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (TRUE_VALUE,EXECUTABLE)
> >   *_4 = b_9(D);
> > ;;succ:   4 [always]  count:536870913 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> >
> > ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally),
> maybe hot
> > ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> > ;;pred:   3 [always]  count:536870913 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> > ;;2 [50.0% (guessed)]  count:536870912 (estimated
> locally) (FALSE_VALUE,EXECUTABLE)
> >   return;
>
> That looks like a pretty easy form to analyze.  I'd suggest looking
> through tree-ssa-phiopt.c closely.  There's several transformations in
> there that share similarities with yours.
>

More specifically there is the conditional store elimination (cselim) pass
inside this file.


>
>
>
>
>
>
>
>
> > 2) My current solution fits into current back-end if-conversion pass
> very well. I don't want to invent
> > a new framework to solve this relatively small issue. Besides, this
> back-end patch doesn't only
> > enhance store speculation detection, but also fix a bug in the original
> code.
> Understood, but I still wonder if we're better off addressing this in
> gimple.
>

Traditionally if-conversions profitability heavily depends on the target
and esp.
if memory is involved costing on GIMPLE is hard.  That's also one reason why
we turned off loop if-conversion on GIMPLE for non-vectorized code.

phiopt is really about followup optimization opportunities in passes that do
not handle conditionals well.

cselim is on the border...


> >> Just from a design standpoint, what are the consequences if this
> function
> >> returns true for something that isn't actually in the stack or false for
> >> something that is on the stack?
> >>
> > If noce_mem_is_on_stack returns true for something that isn't actually
> in the stack,
> > it could potentially introduce store speculation, then the if-conversion
> optimization
> > will be incorrect. If this function returns false for something that is
> on stack, it doesn't
> > matter, because the optimization will not be triggered.
> OK.  That's what I expected.
> >
> >
> >
> >
> >>
> >>> +
> >>> +/* Always return true, if there is a dominating write.
> >>> +
> >>> +   When there is a dominating read from memory on stack,
> >>> +   1) if x = a is a memory read, return true.
> >>> +   2) if x = a is a memory write, return true if the memory is on
> stack.
> >>> +  This 

Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-20 Thread Jeff Law
On 6/20/19 3:53 AM, JiangNing OS wrote:
> Hi Jeff,
> 
> Appreciate your effort to review my patch! I've updated my patch as attached. 
> See my answers below.
> 
>> in current function, so the store speculation can be avoided.
>> So at a high level should we be doing this in gimple rather than RTL?
>> We're going to have a lot more information about types, better
>> infrastructure for looking at uses/defs, access to the alias oracle, we 
>> should
>> be able to accurately distinguish between potentially shared objects vs those
>> which are local to the thread, etc.  We lose the low level costing 
>> information
>> though.
>>
>> I'm still going to go through the patch and do some level of review, but I do
>> think we need to answer the higher level question though.
>>
> I have the following reasons,
> 
> 1) Following the clue Richard B gave me before about parameter --param 
> allow-store-data-races,
> I did check the middle-end pass tree-if-conv, but I think this pass at the 
> moment doesn't work
> for the issue I'm trying to solve. Tree-if-conv is to do if conversion for 
> loop, and its final goal is to
> help loop vectorization, while my case doesn't have a loop at all. 
I think the fact that it's focused so much on loops is a historical
accident.  We certainly have a variety of places in the gimple
optimizers that do if-conversion, and they're not all in tree-if-conv :(
 For example, some are done in tree-ssa-phiopt.

In the gimple optimizers the testcase from 89430 is going to look
something like this:


> ;   basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe 
> hot
> ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   ENTRY [always]  count:1073741824 (estimated locally) 
> (FALLTHRU,EXECUTABLE)
>   a.0_1 = a;
>   _2 = (long unsigned int) k_8(D);
>   _3 = _2 * 4;
>   _4 = a.0_1 + _3;
>   _5 = *_4;
>   if (_5 > b_9(D))
> goto ; [50.00%]
>   else
> goto ; [50.00%]
> ;;succ:   3 [50.0% (guessed)]  count:536870912 (estimated locally) 
> (TRUE_VALUE,EXECUTABLE)
> ;;4 [50.0% (guessed)]  count:536870912 (estimated locally) 
> (FALSE_VALUE,EXECUTABLE)
> 
> ;;   basic block 3, loop depth 0, count 536870913 (estimated locally), maybe 
> hot
> ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   2 [50.0% (guessed)]  count:536870912 (estimated locally) 
> (TRUE_VALUE,EXECUTABLE)
>   *_4 = b_9(D);
> ;;succ:   4 [always]  count:536870913 (estimated locally) 
> (FALLTHRU,EXECUTABLE)
> 
> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe 
> hot
> ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   3 [always]  count:536870913 (estimated locally) 
> (FALLTHRU,EXECUTABLE)
> ;;2 [50.0% (guessed)]  count:536870912 (estimated locally) 
> (FALSE_VALUE,EXECUTABLE)
>   return;

That looks like a pretty easy form to analyze.  I'd suggest looking
through tree-ssa-phiopt.c closely.  There's several transformations in
there that share similarities with yours.










> 2) My current solution fits into current back-end if-conversion pass very 
> well. I don't want to invent
> a new framework to solve this relatively small issue. Besides, this back-end 
> patch doesn't only
> enhance store speculation detection, but also fix a bug in the original code. 
Understood, but I still wonder if we're better off addressing this in
gimple.


>> Just from a design standpoint, what are the consequences if this function
>> returns true for something that isn't actually in the stack or false for
>> something that is on the stack?
>>
> If noce_mem_is_on_stack returns true for something that isn't actually in the 
> stack, 
> it could potentially introduce store speculation, then the if-conversion 
> optimization
> will be incorrect. If this function returns false for something that is on 
> stack, it doesn't
> matter, because the optimization will not be triggered. 
OK.  That's what I expected.
> 
> 
> 
> 
>>
>>> +
>>> +/* Always return true, if there is a dominating write.
>>> +
>>> +   When there is a dominating read from memory on stack,
>>> +   1) if x = a is a memory read, return true.
>>> +   2) if x = a is a memory write, return true if the memory is on stack.
>>> +  This is the guarantee the memory is *not* readonly. */
>>> +
>>> +static bool
>>> +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
>>> +   const_rtx x, bool is_store) {
>>> +  rtx_insn *insn;
>>> +  rtx set;
>>> +
>>> +  gcc_assert (MEM_P (x));
>>> +
>>> +  FOR_BB_INSNS (bb, insn)
>>> +{
>>> +  set = single_set (insn);
>>> +  if (!set)
>>> +continue;
>>> +
>>> +  /* Dominating store */
>>> +  if (rtx_equal_p (x, SET_DEST (set)))
>>> +return true;
>>> +
>>> +  /* Dominating load */
>>> +  if (rtx_equal_p (x, SET_SRC (set)))
>>> +if (is_store && noce_mem_is_on_stack (a_insn, 

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-20 Thread JiangNing OS
Hi Jeff,

Appreciate your effort to review my patch! I've updated my patch as attached. 
See my answers below.

> in current function, so the store speculation can be avoided.
> So at a high level should we be doing this in gimple rather than RTL?
> We're going to have a lot more information about types, better
> infrastructure for looking at uses/defs, access to the alias oracle, we should
> be able to accurately distinguish between potentially shared objects vs those
> which are local to the thread, etc.  We lose the low level costing information
> though.
> 
> I'm still going to go through the patch and do some level of review, but I do
> think we need to answer the higher level question though.
> 
I have the following reasons,

1) Following the clue Richard B gave me before about parameter --param 
allow-store-data-races,
I did check the middle-end pass tree-if-conv, but I think this pass at the 
moment doesn't work
for the issue I'm trying to solve. Tree-if-conv is to do if conversion for 
loop, and its final goal is to
help loop vectorization, while my case doesn't have a loop at all. 
2) My current solution fits into current back-end if-conversion pass very well. 
I don't want to invent
a new framework to solve this relatively small issue. Besides, this back-end 
patch doesn't only
enhance store speculation detection, but also fix a bug in the original code. 

> Nits: We typically refer to parameters, variables, etc in comments using
> upper case.  You'll need to review the entire patch for these its.
> 
> So perhaps the comment should be something like:
> 
> /* Return true of X, a MEM expression, is on the stack.  A_INSN contains
>X if A_INSN exists.  */
> 
Fixed in attached new patch.

> 
> Just from a design standpoint, what are the consequences if this function
> returns true for something that isn't actually in the stack or false for
> something that is on the stack?
> 
If noce_mem_is_on_stack returns true for something that isn't actually in the 
stack, 
it could potentially introduce store speculation, then the if-conversion 
optimization
will be incorrect. If this function returns false for something that is on 
stack, it doesn't
matter, because the optimization will not be triggered. 

> Nit: Space between the function name and its open paren for arguments.  ie
> 
> if (fixed_base_plus_p (a)
>  ^
> I see other instances of this nit, please review the patch and correct them.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!a_insn)
> > +return false;
> I'm not sure what the calling profile is for this function, but this is a 
> cheaper
> test, so you might consider moving it before the test of fixed_base_plus_p.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!reg_mentioned_p (x, a_insn))
> > +return false;
> > +
> > +  /* Check if x is on stack. Assume a mem expression using registers
> > + related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > +if (reg_mentioned_p (DF_REF_REG (use), x)
> > +&& bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> > +  return true;
> > +
> > +  return false;
> > +}
> So is X always a MEM?  Just wanted to make sure I understand.
> reg_mentioned_p will do the right thing (using rtx_equal_p) for the
> comparison.
> 
Yes. X is always a MEM. There is an assertion for this in the code above.

> 
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > +   When there is a dominating read from memory on stack,
> > +   1) if x = a is a memory read, return true.
> > +   2) if x = a is a memory write, return true if the memory is on stack.
> > +  This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > +   const_rtx x, bool is_store) {
> > +  rtx_insn *insn;
> > +  rtx set;
> > +
> > +  gcc_assert (MEM_P (x));
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +{
> > +  set = single_set (insn);
> > +  if (!set)
> > +continue;
> > +
> > +  /* Dominating store */
> > +  if (rtx_equal_p (x, SET_DEST (set)))
> > +return true;
> > +
> > +  /* Dominating load */
> > +  if (rtx_equal_p (x, SET_SRC (set)))
> > +if (is_store && noce_mem_is_on_stack (a_insn, x))
> > +  return true;
> > +}
> > +
> > +  return false;
> > +}
> So what would be the consequences here of returning false when in fact
> there was a dominating read or write?  That could easily happen if the
> dominating read or write was not a single_set.
If return false when in fact there is a dominating read or write, the 
optimization will not
be triggered, because noce_mem_maybe_invalid_p will be true, which is playing 
the same
role as may_trap_or_fault_p.

> 
> I'm guessing that from a design standpoint you're trying to find cases where
> you know an object was written to within the block and does not escape.  So
> 

Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-17 Thread Jeff Law
On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
> case below,
> 
> if (...)
> x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target 
> cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address 
> taken in current function, so the store speculation can be avoided. 
So at a high level should we be doing this in gimple rather than RTL?
We're going to have a lot more information about types, better
infrastructure for looking at uses/defs, access to the alias oracle, we
should be able to accurately distinguish between potentially shared
objects vs those which are local to the thread, etc.  We lose the low
level costing information though.

I'm still going to go through the patch and do some level of review, but
I do think we need to answer the higher level question though.


>  
> Index: ifcvt.c
> ===
> --- ifcvt.c   (revision 269698)
> +++ ifcvt.c   (working copy)
> @@ -2029,6 +2045,93 @@
>return true;
>  }
>  
> +/* Return true if MEM x is on stack. a_insn contains x, if it exists. */
Nits: We typically refer to parameters, variables, etc in comments using
upper case.  You'll need to review the entire patch for these its.

So perhaps the comment should be something like:

/* Return true of X, a MEM expression, is on the stack.  A_INSN contains
   X if A_INSN exists.  */


Just from a design standpoint, what are the consequences if this
function returns true for something that isn't actually in the stack or
false for something that is on the stack?



> +
> +static bool
> +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x)
> +{
> +  df_ref use;
> +
> +  gcc_assert (x);
> +  gcc_assert (MEM_P (x));
> +
> +  /* Early exits if find base is a stack register. */
> +  rtx a = XEXP (x, 0);
> +  if (fixed_base_plus_p(a))
> +return true;
Nit: Space between the function name and its open paren for arguments.  ie

if (fixed_base_plus_p (a)
 ^
I see other instances of this nit, please review the patch and correct them.


> +
> +  if (!a_insn)
> +return false;
I'm not sure what the calling profile is for this function, but this is
a cheaper test, so you might consider moving it before the test of
fixed_base_plus_p.


> +
> +  if (!reg_mentioned_p (x, a_insn))
> +return false;
> +
> +  /* Check if x is on stack. Assume a mem expression using registers
> + related to stack register is always on stack. */
> +  FOR_EACH_INSN_USE (use, a_insn)
> +if (reg_mentioned_p (DF_REF_REG (use), x)
> +&& bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> +  return true;
> +
> +  return false;
> +}
So is X always a MEM?  Just wanted to make sure I understand.
reg_mentioned_p will do the right thing (using rtx_equal_p) for the
comparison.


> +
> +/* Always return true, if there is a dominating write.
> +   
> +   When there is a dominating read from memory on stack,
> +   1) if x = a is a memory read, return true.
> +   2) if x = a is a memory write, return true if the memory is on stack.
> +  This is the guarantee the memory is *not* readonly. */
> +
> +static bool
> +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> +   const_rtx x, bool is_store)
> +{
> +  rtx_insn *insn;
> +  rtx set;
> +
> +  gcc_assert (MEM_P (x));
> +
> +  FOR_BB_INSNS (bb, insn)
> +{
> +  set = single_set (insn);
> +  if (!set)
> +continue;
> +
> +  /* Dominating store */
> +  if (rtx_equal_p (x, SET_DEST (set)))
> +return true;
> +
> +  /* Dominating load */
> +  if (rtx_equal_p (x, SET_SRC (set)))
> +if (is_store && noce_mem_is_on_stack (a_insn, x))
> +  return true;
> +}
> +
> +  return false;
> +}
So what would be the consequences here of returning false when in fact
there was a dominating read or write?  That could easily happen if the
dominating read or write was not a single_set.

I'm guessing that from a design standpoint you're trying to find cases
where you know an object was written to within the block and does not
escape.  So returning false when there was a dominating write is safe.
Returning true when there was not would be disastrous.  Right?





> @@ -5347,6 +5462,234 @@
>  
>return FALSE;
>  }
> +
> +
> +/* Find all insns that must be stack address and store REGNO into
> +   bitmap bba_sets_must_be_sfp. */
> +
> +static void
> +find_all_must_be_sfp_insns (void)
> +{
> +  rtx_insn *a_insn;
> +  basic_block bb;
> +  bool sfp_found;
> +  auto_vec def_count;
> +  df_ref def, use;
> +
> +  /* Calculate def counts for each insn. */
> +  def_count.safe_grow_cleared (max_reg_num () + 1);
> +  FOR_ALL_BB_FN (bb, cfun)
> +FOR_BB_INSNS (bb, a_insn)
> +  {
> + 

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-06-16 Thread JiangNing OS
Hi,

May I know any feedback comments to this patch?

Thanks,
-Jiangning

From: Kyrill Tkachov 
Sent: Friday, May 24, 2019 10:55 PM
To: JiangNing OS ; gcc-patches@gcc.gnu.org
Cc: ebotca...@adacore.com; ste...@gcc.gnu.org; l...@redhat.com
Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)


Hi Jiangning
On 3/15/19 4:46 AM, JiangNing OS wrote:
This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
case below,

if (...)
x = a;  /* x is memory */
/* no else */

We can generate conditional move and remove the branch as below if the target 
cost is acceptable.

r1 = x
r2 = a
cmp ...
csel r3, r1, r2, cond
x = r3

This could be safe if x is a stack variable, and there isn't any address taken 
in current function, so the store speculation can be avoided.

In practice, this optimization can improve a real application performance by %4 
on aarch64.



Now that GCC 10 development is open, this should appropriate for considering.

I've cc'ed folks who are either listed maintainers in this area or have 
reviewed patches in this area in my recent memory.

Thanks,

Kyrill


Thanks,
-Jiangning


Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-05-24 Thread Kyrill Tkachov

Hi Jiangning

On 3/15/19 4:46 AM, JiangNing OS wrote:
This patch is to fix a missing ifcvt opportunity in back-end. For the 
simple case below,


if (...)
    x = a;  /* x is memory */
/* no else */

We can generate conditional move and remove the branch as below if the 
target cost is acceptable.


r1 = x
r2 = a
cmp ...
csel r3, r1, r2, cond
x = r3

This could be safe if x is a stack variable, and there isn't any 
address taken in current function, so the store speculation can be 
avoided.


In practice, this optimization can improve a real application 
performance by %4 on aarch64.




Now that GCC 10 development is open, this should appropriate for 
considering.


I've cc'ed folks who are either listed maintainers in this area or have 
reviewed patches in this area in my recent memory.


Thanks,

Kyrill



Thanks,
-Jiangning


RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-05-04 Thread JiangNing OS
Hi Jeff,

Yes. The latter one "[PATCH] improve ifcvt optimization (PR 
rtl-optimization/89430)" supersedes the earlier one " Fixing ifcvt issue as 
exposed by BZ89430".

Thanks,
-Jiangning

-Original Message-
From: Jeff Law  
Sent: Tuesday, April 30, 2019 11:54 PM
To: JiangNing OS ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the 
> simple case below,
> 
> if (...)
> x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target 
> cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address 
> taken in current function, so the store speculation can be avoided. 
> 
> In practice, this optimization can improve a real application performance by 
> %4 on aarch64.
[ ... ]
I notice that a couple weeks you'd sent a similar patch under the heading 
"Fixing ifcvt issue as exposed by BZ89430".  Does this patch supersede the 
earlier patch?

Jeff



Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-04-30 Thread Jeff Law
On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
> case below,
> 
> if (...)
> x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target 
> cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address 
> taken in current function, so the store speculation can be avoided. 
> 
> In practice, this optimization can improve a real application performance by 
> %4 on aarch64.
[ ... ]
I notice that a couple weeks you'd sent a similar patch under the
heading "Fixing ifcvt issue as exposed by BZ89430".  Does this patch
supersede the earlier patch?

Jeff



Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

2019-03-18 Thread Jeff Law
On 3/14/19 10:46 PM, JiangNing OS wrote:
> This patch is to fix a missing ifcvt opportunity in back-end. For the simple 
> case below,
> 
> if (...)
> x = a;  /* x is memory */
> /* no else */
> 
> We can generate conditional move and remove the branch as below if the target 
> cost is acceptable. 
> 
> r1 = x
> r2 = a
> cmp ...
> csel r3, r1, r2, cond
> x = r3
> 
> This could be safe if x is a stack variable, and there isn't any address 
> taken in current function, so the store speculation can be avoided. 
> 
> In practice, this optimization can improve a real application performance by 
> %4 on aarch64.
This seems like something that should be addressed for gcc-10 rather
than gcc-9.  Can we come back to it once gcc-10 development starts?

jeff