Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 7/8/19 1:39 AM, JiangNing OS wrote: > Hi Jeff and Richard B., > > Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. > Attached is the new patch. Review again, please! > > Thanks a lot! > -Jiangning > >> -Original Message- >> From: Jeff Law >> Sent: Saturday, June 22, 2019 12:10 AM >> To: Richard Biener >> Cc: JiangNing OS ; gcc- >> patc...@gcc.gnu.org >> Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430) >> >> On 6/21/19 6:23 AM, Richard Biener wrote: >>> That looks like a pretty easy form to analyze. I'd suggest looking >>> through tree-ssa-phiopt.c closely. There's several transformations in >>> there that share similarities with yours. >>> >>> >>> More specifically there is the conditional store elimination (cselim) >>> pass inside this file. >> That's the one I was thinking of. It looks reasonably close to the cases >> JiangNing is trying to capture. >> >> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> 2) My current solution fits into current back-end if-conversion >>> pass very well. I don't want to invent >>>> a new framework to solve this relatively small issue. Besides, >>> this back-end patch doesn't only >>>> enhance store speculation detection, but also fix a bug in the >>> original code. Understood, but I still wonder if we're better off >>> addressing this in gimple. >>> >>> >>> Traditionally if-conversions profitability heavily depends on the >>> target and esp. if memory is involved costing on GIMPLE is hard. >>> That's also one reason why we turned off loop if-conversion on GIMPLE >>> for non-vectorized code. >> Yea. That's always the concern for transformations that aren't trivial to >> show >> are better. >> >> Conditional store elimination as implemented right now in phiopt requires a >> single store in the then/else clauses. So we're really just sinking the >> stores >> through a PHI. We're really not changing the number of loads or stores on >> any path. >> >> In the cases I think JiangNing is trying to handle we are adding a store on a >> path where it didn't previously exist because there is no else clause. So we >> likely need to check the edge probabilities to avoid doing something dumb. I >> don't have a good sense where the cutoff should be. tree-ssa-sink might >> have some nuggets of information to help guide. >> >>> phiopt is really about followup optimization opportunities in passes >>> that do not handle conditionals well. >>> >>> cselim is on the border... >> ACK. In fact, looking at it it I wonder if it'd be better in >> tree-ssa-sink.c :-) >> >> >> jeff > > csel5.patch > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 9f83f75b0bf..1a7acf7f1ba 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2019-07-08 Jiangning Liu > + > + PR tree-optimization/89430 > + * tree-ssa-phiopt.c (cond_store_replacement): Support conditional > + store elimination for local variable without address escape. > + > 2019-07-07 Jeff Law > > PR tree-optimization/91090 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 12e5bc167e0..65a34b43fb5 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,13 @@ > +2019-07-08 Jiangning Liu > + > + PR tree-optimization/89430 > + * gcc.dg/tree-ssa/pr89430-1.c: New test. > + * gcc.dg/tree-ssa/pr89430-2.c: New test. > + * gcc.dg/tree-ssa/pr89430-3.c: New test. > + * gcc.dg/tree-ssa/pr89430-4.c: New test. > + * gcc.dg/tree-ssa/pr89430-5.c: New test. > + * gcc.dg/tree-ssa/pr89430-6.c: New test. THanks! I made a trivial update to the comment before cond_store_replacement and installed this patch on the trunk. jeff
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 7/11/19 10:08 PM, Andrew Pinski wrote: > On Mon, Jul 8, 2019 at 12:39 AM JiangNing OS > wrote: >> >> Hi Jeff and Richard B., >> >> Following your tips, I've found a much simpler solution in >> tree-ssa-phiopt.c. Attached is the new patch. Review again, please! > > /* Prove that we can move the store down. We could also check > TREE_THIS_NOTRAP here, but in that case we also could move stores, > whose value is not available readily, which we want to avoid. */ > > I think the comment above the change needs to be changed or extended slightly. Actually, I don't think that comment needs to change. But the one before cond_store_replacement needs minor updating which I'll take care of. My reasoning is that addressables in the local stack are most likely going to already have their lines in the cache and thus are available readily. Jeff
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On Mon, Jul 8, 2019 at 12:39 AM JiangNing OS wrote: > > Hi Jeff and Richard B., > > Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. > Attached is the new patch. Review again, please! /* Prove that we can move the store down. We could also check TREE_THIS_NOTRAP here, but in that case we also could move stores, whose value is not available readily, which we want to avoid. */ I think the comment above the change needs to be changed or extended slightly. Thanks, Andrew Pinski > > Thanks a lot! > -Jiangning > > > -Original Message- > > From: Jeff Law > > Sent: Saturday, June 22, 2019 12:10 AM > > To: Richard Biener > > Cc: JiangNing OS ; gcc- > > patc...@gcc.gnu.org > > Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430) > > > > On 6/21/19 6:23 AM, Richard Biener wrote: > > > > > > That looks like a pretty easy form to analyze. I'd suggest looking > > > through tree-ssa-phiopt.c closely. There's several transformations in > > > there that share similarities with yours. > > > > > > > > > More specifically there is the conditional store elimination (cselim) > > > pass inside this file. > > That's the one I was thinking of. It looks reasonably close to the cases > > JiangNing is trying to capture. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> 2) My current solution fits into current back-end if-conversion > > > pass very well. I don't want to invent > > >> a new framework to solve this relatively small issue. Besides, > > > this back-end patch doesn't only > > >> enhance store speculation detection, but also fix a bug in the > > > original code. Understood, but I still wonder if we're better off > > > addressing this in gimple. > > > > > > > > > Traditionally if-conversions profitability heavily depends on the > > > target and esp. if memory is involved costing on GIMPLE is hard. > > > That's also one reason why we turned off loop if-conversion on GIMPLE > > > for non-vectorized code. > > Yea. That's always the concern for transformations that aren't trivial to > > show > > are better. > > > > Conditional store elimination as implemented right now in phiopt requires a > > single store in the then/else clauses. So we're really just sinking the > > stores > > through a PHI. We're really not changing the number of loads or stores on > > any path. > > > > In the cases I think JiangNing is trying to handle we are adding a store on > > a > > path where it didn't previously exist because there is no else clause. So > > we > > likely need to check the edge probabilities to avoid doing something dumb. > > I > > don't have a good sense where the cutoff should be. tree-ssa-sink might > > have some nuggets of information to help guide. > > > > > > > > phiopt is really about followup optimization opportunities in passes > > > that do not handle conditionals well. > > > > > > cselim is on the border... > > ACK. In fact, looking at it it I wonder if it'd be better in > > tree-ssa-sink.c :-) > > > > > > jeff
RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Hi Jeff and Richard B., Following your tips, I've found a much simpler solution in tree-ssa-phiopt.c. Attached is the new patch. Review again, please! Thanks a lot! -Jiangning > -Original Message- > From: Jeff Law > Sent: Saturday, June 22, 2019 12:10 AM > To: Richard Biener > Cc: JiangNing OS ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430) > > On 6/21/19 6:23 AM, Richard Biener wrote: > > > > That looks like a pretty easy form to analyze. I'd suggest looking > > through tree-ssa-phiopt.c closely. There's several transformations in > > there that share similarities with yours. > > > > > > More specifically there is the conditional store elimination (cselim) > > pass inside this file. > That's the one I was thinking of. It looks reasonably close to the cases > JiangNing is trying to capture. > > > > > > > > > > > > > > > > > > > > > > > >> 2) My current solution fits into current back-end if-conversion > > pass very well. I don't want to invent > >> a new framework to solve this relatively small issue. Besides, > > this back-end patch doesn't only > >> enhance store speculation detection, but also fix a bug in the > > original code. Understood, but I still wonder if we're better off > > addressing this in gimple. > > > > > > Traditionally if-conversions profitability heavily depends on the > > target and esp. if memory is involved costing on GIMPLE is hard. > > That's also one reason why we turned off loop if-conversion on GIMPLE > > for non-vectorized code. > Yea. That's always the concern for transformations that aren't trivial to > show > are better. > > Conditional store elimination as implemented right now in phiopt requires a > single store in the then/else clauses. So we're really just sinking the > stores > through a PHI. We're really not changing the number of loads or stores on > any path. > > In the cases I think JiangNing is trying to handle we are adding a store on a > path where it didn't previously exist because there is no else clause. So we > likely need to check the edge probabilities to avoid doing something dumb. I > don't have a good sense where the cutoff should be. tree-ssa-sink might > have some nuggets of information to help guide. > > > > > phiopt is really about followup optimization opportunities in passes > > that do not handle conditionals well. > > > > cselim is on the border... > ACK. In fact, looking at it it I wonder if it'd be better in tree-ssa-sink.c > :-) > > > jeff csel5.patch Description: csel5.patch
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 6/21/19 6:23 AM, Richard Biener wrote: > > That looks like a pretty easy form to analyze. I'd suggest looking > through tree-ssa-phiopt.c closely. There's several transformations > in there that share similarities with yours. > > > More specifically there is the conditional store elimination (cselim) > pass inside this file. That's the one I was thinking of. It looks reasonably close to the cases JiangNing is trying to capture. > > > > > > > > > > >> 2) My current solution fits into current back-end if-conversion > pass very well. I don't want to invent >> a new framework to solve this relatively small issue. Besides, > this back-end patch doesn't only >> enhance store speculation detection, but also fix a bug in the > original code. Understood, but I still wonder if we're better off > addressing this in gimple. > > > Traditionally if-conversions profitability heavily depends on the > target and esp. if memory is involved costing on GIMPLE is hard. > That's also one reason why we turned off loop if-conversion on GIMPLE > for non-vectorized code. Yea. That's always the concern for transformations that aren't trivial to show are better. Conditional store elimination as implemented right now in phiopt requires a single store in the then/else clauses. So we're really just sinking the stores through a PHI. We're really not changing the number of loads or stores on any path. In the cases I think JiangNing is trying to handle we are adding a store on a path where it didn't previously exist because there is no else clause. So we likely need to check the edge probabilities to avoid doing something dumb. I don't have a good sense where the cutoff should be. tree-ssa-sink might have some nuggets of information to help guide. > > phiopt is really about followup optimization opportunities in passes > that do not handle conditionals well. > > cselim is on the border... ACK. In fact, looking at it it I wonder if it'd be better in tree-ssa-sink.c :-) jeff
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On Fri, Jun 21, 2019 at 1:13 AM Jeff Law wrote: > On 6/20/19 3:53 AM, JiangNing OS wrote: > > Hi Jeff, > > > > Appreciate your effort to review my patch! I've updated my patch as > attached. See my answers below. > > > >> in current function, so the store speculation can be avoided. > >> So at a high level should we be doing this in gimple rather than RTL? > >> We're going to have a lot more information about types, better > >> infrastructure for looking at uses/defs, access to the alias oracle, we > should > >> be able to accurately distinguish between potentially shared objects vs > those > >> which are local to the thread, etc. We lose the low level costing > information > >> though. > >> > >> I'm still going to go through the patch and do some level of review, > but I do > >> think we need to answer the higher level question though. > >> > > I have the following reasons, > > > > 1) Following the clue Richard B gave me before about parameter --param > allow-store-data-races, > > I did check the middle-end pass tree-if-conv, but I think this pass at > the moment doesn't work > > for the issue I'm trying to solve. Tree-if-conv is to do if conversion > for loop, and its final goal is to > > help loop vectorization, while my case doesn't have a loop at all. > I think the fact that it's focused so much on loops is a historical > accident. We certainly have a variety of places in the gimple > optimizers that do if-conversion, and they're not all in tree-if-conv :( > For example, some are done in tree-ssa-phiopt. > > In the gimple optimizers the testcase from 89430 is going to look > something like this: > > > > ; basic block 2, loop depth 0, count 1073741824 (estimated locally), > maybe hot > > ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > > ;;pred: ENTRY [always] count:1073741824 (estimated locally) > (FALLTHRU,EXECUTABLE) > > a.0_1 = a; > > _2 = (long unsigned int) k_8(D); > > _3 = _2 * 4; > > _4 = a.0_1 + _3; > > _5 = *_4; > > if (_5 > b_9(D)) > > goto ; [50.00%] > > else > > goto ; [50.00%] > > ;;succ: 3 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) > > ;;4 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > > > > ;; basic block 3, loop depth 0, count 536870913 (estimated locally), > maybe hot > > ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > > ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated > locally) (TRUE_VALUE,EXECUTABLE) > > *_4 = b_9(D); > > ;;succ: 4 [always] count:536870913 (estimated locally) > (FALLTHRU,EXECUTABLE) > > > > ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), > maybe hot > > ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > > ;;pred: 3 [always] count:536870913 (estimated locally) > (FALLTHRU,EXECUTABLE) > > ;;2 [50.0% (guessed)] count:536870912 (estimated > locally) (FALSE_VALUE,EXECUTABLE) > > return; > > That looks like a pretty easy form to analyze. I'd suggest looking > through tree-ssa-phiopt.c closely. There's several transformations in > there that share similarities with yours. > More specifically there is the conditional store elimination (cselim) pass inside this file. > > > > > > > > > > 2) My current solution fits into current back-end if-conversion pass > very well. I don't want to invent > > a new framework to solve this relatively small issue. Besides, this > back-end patch doesn't only > > enhance store speculation detection, but also fix a bug in the original > code. > Understood, but I still wonder if we're better off addressing this in > gimple. > Traditionally if-conversions profitability heavily depends on the target and esp. if memory is involved costing on GIMPLE is hard. That's also one reason why we turned off loop if-conversion on GIMPLE for non-vectorized code. phiopt is really about followup optimization opportunities in passes that do not handle conditionals well. cselim is on the border... > >> Just from a design standpoint, what are the consequences if this > function > >> returns true for something that isn't actually in the stack or false for > >> something that is on the stack? > >> > > If noce_mem_is_on_stack returns true for something that isn't actually > in the stack, > > it could potentially introduce store speculation, then the if-conversion > optimization > > will be incorrect. If this function returns false for something that is > on stack, it doesn't > > matter, because the optimization will not be triggered. > OK. That's what I expected. > > > > > > > > > >> > >>> + > >>> +/* Always return true, if there is a dominating write. > >>> + > >>> + When there is a dominating read from memory on stack, > >>> + 1) if x = a is a memory read, return true. > >>> + 2) if x = a is a memory write, return true if the memory is on > stack. > >>> + This
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 6/20/19 3:53 AM, JiangNing OS wrote: > Hi Jeff, > > Appreciate your effort to review my patch! I've updated my patch as attached. > See my answers below. > >> in current function, so the store speculation can be avoided. >> So at a high level should we be doing this in gimple rather than RTL? >> We're going to have a lot more information about types, better >> infrastructure for looking at uses/defs, access to the alias oracle, we >> should >> be able to accurately distinguish between potentially shared objects vs those >> which are local to the thread, etc. We lose the low level costing >> information >> though. >> >> I'm still going to go through the patch and do some level of review, but I do >> think we need to answer the higher level question though. >> > I have the following reasons, > > 1) Following the clue Richard B gave me before about parameter --param > allow-store-data-races, > I did check the middle-end pass tree-if-conv, but I think this pass at the > moment doesn't work > for the issue I'm trying to solve. Tree-if-conv is to do if conversion for > loop, and its final goal is to > help loop vectorization, while my case doesn't have a loop at all. I think the fact that it's focused so much on loops is a historical accident. We certainly have a variety of places in the gimple optimizers that do if-conversion, and they're not all in tree-if-conv :( For example, some are done in tree-ssa-phiopt. In the gimple optimizers the testcase from 89430 is going to look something like this: > ; basic block 2, loop depth 0, count 1073741824 (estimated locally), maybe > hot > ;;prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED) > ;;pred: ENTRY [always] count:1073741824 (estimated locally) > (FALLTHRU,EXECUTABLE) > a.0_1 = a; > _2 = (long unsigned int) k_8(D); > _3 = _2 * 4; > _4 = a.0_1 + _3; > _5 = *_4; > if (_5 > b_9(D)) > goto ; [50.00%] > else > goto ; [50.00%] > ;;succ: 3 [50.0% (guessed)] count:536870912 (estimated locally) > (TRUE_VALUE,EXECUTABLE) > ;;4 [50.0% (guessed)] count:536870912 (estimated locally) > (FALSE_VALUE,EXECUTABLE) > > ;; basic block 3, loop depth 0, count 536870913 (estimated locally), maybe > hot > ;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED) > ;;pred: 2 [50.0% (guessed)] count:536870912 (estimated locally) > (TRUE_VALUE,EXECUTABLE) > *_4 = b_9(D); > ;;succ: 4 [always] count:536870913 (estimated locally) > (FALLTHRU,EXECUTABLE) > > ;; basic block 4, loop depth 0, count 1073741824 (estimated locally), maybe > hot > ;;prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED) > ;;pred: 3 [always] count:536870913 (estimated locally) > (FALLTHRU,EXECUTABLE) > ;;2 [50.0% (guessed)] count:536870912 (estimated locally) > (FALSE_VALUE,EXECUTABLE) > return; That looks like a pretty easy form to analyze. I'd suggest looking through tree-ssa-phiopt.c closely. There's several transformations in there that share similarities with yours. > 2) My current solution fits into current back-end if-conversion pass very > well. I don't want to invent > a new framework to solve this relatively small issue. Besides, this back-end > patch doesn't only > enhance store speculation detection, but also fix a bug in the original code. Understood, but I still wonder if we're better off addressing this in gimple. >> Just from a design standpoint, what are the consequences if this function >> returns true for something that isn't actually in the stack or false for >> something that is on the stack? >> > If noce_mem_is_on_stack returns true for something that isn't actually in the > stack, > it could potentially introduce store speculation, then the if-conversion > optimization > will be incorrect. If this function returns false for something that is on > stack, it doesn't > matter, because the optimization will not be triggered. OK. That's what I expected. > > > > >> >>> + >>> +/* Always return true, if there is a dominating write. >>> + >>> + When there is a dominating read from memory on stack, >>> + 1) if x = a is a memory read, return true. >>> + 2) if x = a is a memory write, return true if the memory is on stack. >>> + This is the guarantee the memory is *not* readonly. */ >>> + >>> +static bool >>> +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn, >>> + const_rtx x, bool is_store) { >>> + rtx_insn *insn; >>> + rtx set; >>> + >>> + gcc_assert (MEM_P (x)); >>> + >>> + FOR_BB_INSNS (bb, insn) >>> +{ >>> + set = single_set (insn); >>> + if (!set) >>> +continue; >>> + >>> + /* Dominating store */ >>> + if (rtx_equal_p (x, SET_DEST (set))) >>> +return true; >>> + >>> + /* Dominating load */ >>> + if (rtx_equal_p (x, SET_SRC (set))) >>> +if (is_store && noce_mem_is_on_stack (a_insn,
RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Hi Jeff, Appreciate your effort to review my patch! I've updated my patch as attached. See my answers below. > in current function, so the store speculation can be avoided. > So at a high level should we be doing this in gimple rather than RTL? > We're going to have a lot more information about types, better > infrastructure for looking at uses/defs, access to the alias oracle, we should > be able to accurately distinguish between potentially shared objects vs those > which are local to the thread, etc. We lose the low level costing information > though. > > I'm still going to go through the patch and do some level of review, but I do > think we need to answer the higher level question though. > I have the following reasons, 1) Following the clue Richard B gave me before about parameter --param allow-store-data-races, I did check the middle-end pass tree-if-conv, but I think this pass at the moment doesn't work for the issue I'm trying to solve. Tree-if-conv is to do if conversion for loop, and its final goal is to help loop vectorization, while my case doesn't have a loop at all. 2) My current solution fits into current back-end if-conversion pass very well. I don't want to invent a new framework to solve this relatively small issue. Besides, this back-end patch doesn't only enhance store speculation detection, but also fix a bug in the original code. > Nits: We typically refer to parameters, variables, etc in comments using > upper case. You'll need to review the entire patch for these its. > > So perhaps the comment should be something like: > > /* Return true of X, a MEM expression, is on the stack. A_INSN contains >X if A_INSN exists. */ > Fixed in attached new patch. > > Just from a design standpoint, what are the consequences if this function > returns true for something that isn't actually in the stack or false for > something that is on the stack? > If noce_mem_is_on_stack returns true for something that isn't actually in the stack, it could potentially introduce store speculation, then the if-conversion optimization will be incorrect. If this function returns false for something that is on stack, it doesn't matter, because the optimization will not be triggered. > Nit: Space between the function name and its open paren for arguments. ie > > if (fixed_base_plus_p (a) > ^ > I see other instances of this nit, please review the patch and correct them. > Fixed in attached new patch. > > > + > > + if (!a_insn) > > +return false; > I'm not sure what the calling profile is for this function, but this is a > cheaper > test, so you might consider moving it before the test of fixed_base_plus_p. > Fixed in attached new patch. > > > + > > + if (!reg_mentioned_p (x, a_insn)) > > +return false; > > + > > + /* Check if x is on stack. Assume a mem expression using registers > > + related to stack register is always on stack. */ > > + FOR_EACH_INSN_USE (use, a_insn) > > +if (reg_mentioned_p (DF_REF_REG (use), x) > > +&& bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use))) > > + return true; > > + > > + return false; > > +} > So is X always a MEM? Just wanted to make sure I understand. > reg_mentioned_p will do the right thing (using rtx_equal_p) for the > comparison. > Yes. X is always a MEM. There is an assertion for this in the code above. > > > + > > +/* Always return true, if there is a dominating write. > > + > > + When there is a dominating read from memory on stack, > > + 1) if x = a is a memory read, return true. > > + 2) if x = a is a memory write, return true if the memory is on stack. > > + This is the guarantee the memory is *not* readonly. */ > > + > > +static bool > > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn, > > + const_rtx x, bool is_store) { > > + rtx_insn *insn; > > + rtx set; > > + > > + gcc_assert (MEM_P (x)); > > + > > + FOR_BB_INSNS (bb, insn) > > +{ > > + set = single_set (insn); > > + if (!set) > > +continue; > > + > > + /* Dominating store */ > > + if (rtx_equal_p (x, SET_DEST (set))) > > +return true; > > + > > + /* Dominating load */ > > + if (rtx_equal_p (x, SET_SRC (set))) > > +if (is_store && noce_mem_is_on_stack (a_insn, x)) > > + return true; > > +} > > + > > + return false; > > +} > So what would be the consequences here of returning false when in fact > there was a dominating read or write? That could easily happen if the > dominating read or write was not a single_set. If return false when in fact there is a dominating read or write, the optimization will not be triggered, because noce_mem_maybe_invalid_p will be true, which is playing the same role as may_trap_or_fault_p. > > I'm guessing that from a design standpoint you're trying to find cases where > you know an object was written to within the block and does not escape. So >
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 3/14/19 10:46 PM, JiangNing OS wrote: > This patch is to fix a missing ifcvt opportunity in back-end. For the simple > case below, > > if (...) > x = a; /* x is memory */ > /* no else */ > > We can generate conditional move and remove the branch as below if the target > cost is acceptable. > > r1 = x > r2 = a > cmp ... > csel r3, r1, r2, cond > x = r3 > > This could be safe if x is a stack variable, and there isn't any address > taken in current function, so the store speculation can be avoided. So at a high level should we be doing this in gimple rather than RTL? We're going to have a lot more information about types, better infrastructure for looking at uses/defs, access to the alias oracle, we should be able to accurately distinguish between potentially shared objects vs those which are local to the thread, etc. We lose the low level costing information though. I'm still going to go through the patch and do some level of review, but I do think we need to answer the higher level question though. > > Index: ifcvt.c > === > --- ifcvt.c (revision 269698) > +++ ifcvt.c (working copy) > @@ -2029,6 +2045,93 @@ >return true; > } > > +/* Return true if MEM x is on stack. a_insn contains x, if it exists. */ Nits: We typically refer to parameters, variables, etc in comments using upper case. You'll need to review the entire patch for these its. So perhaps the comment should be something like: /* Return true of X, a MEM expression, is on the stack. A_INSN contains X if A_INSN exists. */ Just from a design standpoint, what are the consequences if this function returns true for something that isn't actually in the stack or false for something that is on the stack? > + > +static bool > +noce_mem_is_on_stack (rtx_insn *a_insn, const_rtx x) > +{ > + df_ref use; > + > + gcc_assert (x); > + gcc_assert (MEM_P (x)); > + > + /* Early exits if find base is a stack register. */ > + rtx a = XEXP (x, 0); > + if (fixed_base_plus_p(a)) > +return true; Nit: Space between the function name and its open paren for arguments. ie if (fixed_base_plus_p (a) ^ I see other instances of this nit, please review the patch and correct them. > + > + if (!a_insn) > +return false; I'm not sure what the calling profile is for this function, but this is a cheaper test, so you might consider moving it before the test of fixed_base_plus_p. > + > + if (!reg_mentioned_p (x, a_insn)) > +return false; > + > + /* Check if x is on stack. Assume a mem expression using registers > + related to stack register is always on stack. */ > + FOR_EACH_INSN_USE (use, a_insn) > +if (reg_mentioned_p (DF_REF_REG (use), x) > +&& bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use))) > + return true; > + > + return false; > +} So is X always a MEM? Just wanted to make sure I understand. reg_mentioned_p will do the right thing (using rtx_equal_p) for the comparison. > + > +/* Always return true, if there is a dominating write. > + > + When there is a dominating read from memory on stack, > + 1) if x = a is a memory read, return true. > + 2) if x = a is a memory write, return true if the memory is on stack. > + This is the guarantee the memory is *not* readonly. */ > + > +static bool > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn, > + const_rtx x, bool is_store) > +{ > + rtx_insn *insn; > + rtx set; > + > + gcc_assert (MEM_P (x)); > + > + FOR_BB_INSNS (bb, insn) > +{ > + set = single_set (insn); > + if (!set) > +continue; > + > + /* Dominating store */ > + if (rtx_equal_p (x, SET_DEST (set))) > +return true; > + > + /* Dominating load */ > + if (rtx_equal_p (x, SET_SRC (set))) > +if (is_store && noce_mem_is_on_stack (a_insn, x)) > + return true; > +} > + > + return false; > +} So what would be the consequences here of returning false when in fact there was a dominating read or write? That could easily happen if the dominating read or write was not a single_set. I'm guessing that from a design standpoint you're trying to find cases where you know an object was written to within the block and does not escape. So returning false when there was a dominating write is safe. Returning true when there was not would be disastrous. Right? > @@ -5347,6 +5462,234 @@ > >return FALSE; > } > + > + > +/* Find all insns that must be stack address and store REGNO into > + bitmap bba_sets_must_be_sfp. */ > + > +static void > +find_all_must_be_sfp_insns (void) > +{ > + rtx_insn *a_insn; > + basic_block bb; > + bool sfp_found; > + auto_vec def_count; > + df_ref def, use; > + > + /* Calculate def counts for each insn. */ > + def_count.safe_grow_cleared (max_reg_num () + 1); > + FOR_ALL_BB_FN (bb, cfun) > +FOR_BB_INSNS (bb, a_insn) > + { > +
RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Hi, May I know any feedback comments to this patch? Thanks, -Jiangning From: Kyrill Tkachov Sent: Friday, May 24, 2019 10:55 PM To: JiangNing OS ; gcc-patches@gcc.gnu.org Cc: ebotca...@adacore.com; ste...@gcc.gnu.org; l...@redhat.com Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430) Hi Jiangning On 3/15/19 4:46 AM, JiangNing OS wrote: This patch is to fix a missing ifcvt opportunity in back-end. For the simple case below, if (...) x = a; /* x is memory */ /* no else */ We can generate conditional move and remove the branch as below if the target cost is acceptable. r1 = x r2 = a cmp ... csel r3, r1, r2, cond x = r3 This could be safe if x is a stack variable, and there isn't any address taken in current function, so the store speculation can be avoided. In practice, this optimization can improve a real application performance by %4 on aarch64. Now that GCC 10 development is open, this should appropriate for considering. I've cc'ed folks who are either listed maintainers in this area or have reviewed patches in this area in my recent memory. Thanks, Kyrill Thanks, -Jiangning
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Hi Jiangning On 3/15/19 4:46 AM, JiangNing OS wrote: This patch is to fix a missing ifcvt opportunity in back-end. For the simple case below, if (...) x = a; /* x is memory */ /* no else */ We can generate conditional move and remove the branch as below if the target cost is acceptable. r1 = x r2 = a cmp ... csel r3, r1, r2, cond x = r3 This could be safe if x is a stack variable, and there isn't any address taken in current function, so the store speculation can be avoided. In practice, this optimization can improve a real application performance by %4 on aarch64. Now that GCC 10 development is open, this should appropriate for considering. I've cc'ed folks who are either listed maintainers in this area or have reviewed patches in this area in my recent memory. Thanks, Kyrill Thanks, -Jiangning
RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
Hi Jeff, Yes. The latter one "[PATCH] improve ifcvt optimization (PR rtl-optimization/89430)" supersedes the earlier one " Fixing ifcvt issue as exposed by BZ89430". Thanks, -Jiangning -Original Message- From: Jeff Law Sent: Tuesday, April 30, 2019 11:54 PM To: JiangNing OS ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430) On 3/14/19 10:46 PM, JiangNing OS wrote: > This patch is to fix a missing ifcvt opportunity in back-end. For the > simple case below, > > if (...) > x = a; /* x is memory */ > /* no else */ > > We can generate conditional move and remove the branch as below if the target > cost is acceptable. > > r1 = x > r2 = a > cmp ... > csel r3, r1, r2, cond > x = r3 > > This could be safe if x is a stack variable, and there isn't any address > taken in current function, so the store speculation can be avoided. > > In practice, this optimization can improve a real application performance by > %4 on aarch64. [ ... ] I notice that a couple weeks you'd sent a similar patch under the heading "Fixing ifcvt issue as exposed by BZ89430". Does this patch supersede the earlier patch? Jeff
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 3/14/19 10:46 PM, JiangNing OS wrote: > This patch is to fix a missing ifcvt opportunity in back-end. For the simple > case below, > > if (...) > x = a; /* x is memory */ > /* no else */ > > We can generate conditional move and remove the branch as below if the target > cost is acceptable. > > r1 = x > r2 = a > cmp ... > csel r3, r1, r2, cond > x = r3 > > This could be safe if x is a stack variable, and there isn't any address > taken in current function, so the store speculation can be avoided. > > In practice, this optimization can improve a real application performance by > %4 on aarch64. [ ... ] I notice that a couple weeks you'd sent a similar patch under the heading "Fixing ifcvt issue as exposed by BZ89430". Does this patch supersede the earlier patch? Jeff
Re: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)
On 3/14/19 10:46 PM, JiangNing OS wrote: > This patch is to fix a missing ifcvt opportunity in back-end. For the simple > case below, > > if (...) > x = a; /* x is memory */ > /* no else */ > > We can generate conditional move and remove the branch as below if the target > cost is acceptable. > > r1 = x > r2 = a > cmp ... > csel r3, r1, r2, cond > x = r3 > > This could be safe if x is a stack variable, and there isn't any address > taken in current function, so the store speculation can be avoided. > > In practice, this optimization can improve a real application performance by > %4 on aarch64. This seems like something that should be addressed for gcc-10 rather than gcc-9. Can we come back to it once gcc-10 development starts? jeff