May be you can convince Fred the dead chi analysis is correct (baring this fix)? Sun
On Mon, Aug 1, 2011 at 12:22 PM, Ann <ann...@gmail.com> wrote: > Hi Fred, > The goal of Propagate_vsym_bb() is based on the dead chi analysis, and > the goal of it is to maintain ud chain after dead chi analysis. So I > believe the root cause is dead chi identification, and I think Hui's fix is > in the right phase. > I haven't figured out a better and cleaner fix which can maintain the > same performance at the same time. > Thank you. > > Best Regards, > Ann > > > On Sun, Jul 31, 2011 at 1:48 AM, Fred Chow <frdc...@gmail.com> wrote: >> >> Ann, >> >> Since I do not completely understand the enhancements in Open64's version >> of opt_dse.cxx (whose purpose I understand is to enable more aliased stores >> to be deleted), I'm not in a position to judge whether Hui's latest fix is >> the best solution. Since you consider Hui's fix as correct, the question is >> whether there is a better and cleaner fix. >> >> Do you agree that the overlapped live range bug was caused by >> DSE::Propagate_vsym_bb()? If so, would a better and cleaner fix be to >> improve Propagate_vsym_bb() so it will not introduce overlapped live range >> by taking into account the implicit mu's? >> >> Fred >> >> On 07/26/2011 09:45 AM, Ann wrote: >> >> Hi Fred, >> I think Hui's fix is to mark necessary chi live, so as to avoid copy >> propagation and avoid overlapped live range. >> I think the mu-chi representation in open64 are conservative, which >> caused redundant chi and mu contained in the IR. The new dse is aggressive >> to delete redundant chi's. >> I once tried to detect all implicit mus and use them to prevent chi >> elimination. But this approach caused 2~5% performance degradation for 3 >> programs in SPECK2006. That means the mu lists carry redundant mu nodes >> which may prevent optimization. >> >> I agree with Hui's fix, it will not cause any performance degradation >> and make the IR optimized and right. The "right" means the IR is >> consistent, leaving necessary chi left and causing no overlapped live >> range. >> >> Best Regards, >> Ann >> >> On Tue, Jul 26, 2011 at 6:31 AM, Fred Chow <frdc...@gmail.com> wrote: >>> >>> My answers below: >>> >>> On 07/25/2011 07:11 AM, Hui Shi wrote: >>> >>> Hi Fred, >>> >>> The new enhancement in DSE introduce problem that some chi node is not >>> live due to same location with last store. >>> >>> For fixing the problem from copy propagation. >>> " For example, you can leave opt_dse.cxx intact and consider preventing >>> the bad copy propagation instead. At PathScale, we did not do copy >>> propagation on chi occurrences, which I think is the root cause of the bug." >>> >>> In my initial example in this thread, do you mean don't copy propgate >>> sym2v3 to STMT3? >>> >>> Yes. >>> >>> It's not the copy propagation in CreateCodeMap. >>> This happens after DSE mark sym2 chi node on STMT1 not live. >>> DSE::Dead_store_elim calles DSE::Propagate_vsym_bb, propagte sym2v3 to >>> STMT3, because sym2v4 = chi(sym2v3) is dead. >>> >>> DSE::Propagate_vsym_bb() is exactly what I was referring to by "copy >>> propagation on chi occurrences". I did not know that this is done in >>> opt_dse.cxx, this being another enhancement put into Open64 that was not in >>> PathScale. >>> >>> I think this logical is correct, otherwise, STMT3's sym2 chi node's >>> operand can't directly find a live def. >>> >>> There is an implicit use of sym2 in STMT2. If there is an explicit MU >>> node at STMT2 for sym2, it would be using sym2v4. That's why the copy >>> propagation of sym2v3 to STMT3 is wrong, because it causes the overlapped >>> live range. When the implicit use of sym2 at STMT2 is converted to explicit >>> due to ILOAD-folding, this triggers the live range overlap assertion. >>> >>> STMT1 >>> sym2v4 = chi(sym2v3) NOT LIVE >>> default_vsym_v4 = chi(default_vsym_v3) LIVE >>> .... >>> STMT2 >>> ILOAD mu(default_vsym_v4) >>> ... >>> STMT3 >>> sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead >>> chi. >>> default_vsym_v5 = chi(default_vsym_v4) >>> >>> I think this patch is safe to apply on the current DSE. >>> >>> I think your patch is not needed if you fix Propagate_vsym_bb() by >>> preventing propagation across implicit MU's. The root cause of the bug is >>> in DSE::Propagate_vsym_bb(). >>> >>> Fred >>> >>> Regards >>> Shi Hui >>> >>> On Sun, Jul 24, 2011 at 11:18 AM, Fred Chow <frdc...@gmail.com> wrote: >>>> >>>> My comments below: >>>> >>>> On 07/23/2011 09:00 AM, Hui Shi wrote: >>>> >>>> Hi Fred, >>>> >>>> Thanks for your comments. >>>> Comments below. >>>> >>>> On Sat, Jul 23, 2011 at 8:04 PM, Fred Chow <frdc...@gmail.com> wrote: >>>>> >>>>> Hi Shi, >>>>> >>>>> Your change to force other aliased chi nodes to live in >>>>> DSE::Set_Reqruied_CHI() is fine with me. But I'm not comfortable with >>>>> your >>>>> introduction of the _vse_required_set, which I understand is for finding >>>>> more dead stores. More below: >>>>> >>>>> On 07/22/2011 01:50 AM, Hui Shi wrote: >>>>> >>>>> Hi Fred, >>>>> >>>>> Follow your suggestion, if one chi node is live, other aliased chi >>>>> nodes in same chi_list should live. >>>>> I fix this issue in DSE, could you help review this patch? >>>>> >>>>> Fix is: >>>>> 1. Add hash_set _vse_requried_set in DSE, record if VER_STAB_ENTRY is >>>>> once processed by DSE::Set_Required_VSE. >>>>> >>>>> Each vse entry is an SSA version of an aux entry. Thus, from vse to >>>>> aux is a many-to-one mapping. Given a vse, vse->Any_use() already tells >>>>> me >>>>> that this vse has once been required. Why do you have to set up this >>>>> _vse_required_set, and using DSE::VSE_required() to check the information >>>>> that is given by vse->Any_use()? Should it be an aux set instead of a vse >>>>> set? >>>> >>>> One vse entry set in _vse_required_set means this vse entry was once >>>> processed by DSE::Set_Required_VSE. >>>> However this doesn't means vse->Any_use() is true. >>>> >>>> I agree that the above is the case with the latest version of >>>> opt_dse.cxx. But in the original version of opt_dse.cxx from PathScale, >>>> vse->Any_use() is set to true whenever Set_required_VSE() is entered. So >>>> there has obviously been enhancements put in, and your creation of >>>> _vse_required_set is made necessary due to those enhancements. Since those >>>> enhancements has not been reviewed by me, I would appreciate the folks who >>>> did those enhancements provide their review opinion on this latest >>>> addition. >>>> >>>> In DSE::Set_Required_VSE( VER_STAB_ENTRY *vse, BOOL real_use, WN *ref_wn >>>> ). >>>> When VSE's def type is WHIRL statement or PHI, vse->any_use is true. >>>> When VSE's def type is CHI, VSE entry any_use will not set true, if >>>> AUX's last store write the same mem location with current chi's statement. >>>> >>>> VSE entry in _vse_required_set means its use in ldid/mu/phi/chi is live >>>> then this VSE entry is processed by DSE::Set_Required_VSE. >>>> But not necessary means VSE entry's def chi node is live. >>>> >>>>> >>>>> 2. In DSE::Set_Required_CHI >>>>> 1. If chi node is live, find aliased/non-live, once required chi >>>>> nodes in same chi list. If find, force these chi nodes live. >>>>> >>>>> If a chi is non-live, then "once required" must be false. So there is >>>>> a contradiction here. If your intention is "aux-once-required", there is >>>>> still a problem, because the aux may be found to be required later. >>>> >>>> From above explanation, it force live chi nodes that are >>>> 1. once processed in DSE::Set_Required_CHI before. >>>> 2. not live due to chi node's whirl statment write same mem location >>>> with same AUX's last store. >>>> >>>>> Set last_store be NULL and call set_required_vse. >>>>> 2. If chi node is not live, find aliased chi nodes live in same chi >>>>> list. If find, set current chi node live. >>>>> 3. Remove previous fix in set_required_mu >>>>> >>>>> VER_STAB_ENTRY is required means this AUX symbol's some version is >>>>> really used in program. >>>>> Using this information, we can exclude mark some AUX symbol's chi node >>>>> live, if this AUX is never used. >>>>> This is meaningful in following case, >>>>> int foo(int a) { >>>>> int t1, t2; >>>>> int *p; >>>>> >>>>> if(a){ >>>>> p = &t1; >>>>> } >>>>> else { >>>>> p = &t2; >>>>> } >>>>> t1 =3; >>>>> *p = 5; >>>>> return t2; >>>>> } >>>>> >>>>> chi nodes live info after original DSE >>>>> I4I4LDID 0 <st 20> T<4,.predef_I4,4> >>>>> I4INTCONST 0 (0x0) >>>>> I4I4NE >>>>> U8LDA 0 <st 4> T<53,anon_ptr.,8> >>>>> U8LDA 0 <st 5> T<53,anon_ptr.,8> >>>>> U8SELECT >>>>> U8STID 0 <st 21> T<53,anon_ptr.,8> {line: 1/6} {freq: 0, ln: 6, col: 0} >>>>> I4INTCONST 3 (0x3) >>>>> I4STID 0 <st 23> T<4,.predef_I4,4> {line: 1/12} {freq: 0, ln: 12, col: >>>>> 0} // STMT1 >>>>> sym9v22 <- chi( sym9v19 ) NOT LIVE // default vsym chi is dead. >>>>> this statement is dead >>>>> I4INTCONST 5 (0x5) >>>>> U8U8LDID 0 <st 21> T<53,anon_ptr.,8> >>>>> I4ISTORE 0 T<53,anon_ptr.,8> {line: 1/13} {freq: 0, ln: 13, col: 0} >>>>> //STMT2 >>>>> sym4v24 <- chi( sym4v23 ) NOT LIVE // t1 chi node is dead >>>>> sym5v25 <- chi( sym5v15 ) LIVE // t2 chi node is live >>>>> sym8v26 <- chi( sym8v18 ) LIVE // return vsym chi ndoe is live >>>>> sym9v27 <- chi( sym9v22 ) NOT LIVE // default vsym chi is dead >>>>> I4I4LDID 0 <st 25> T<4,.predef_I4,4> >>>>> I4STID 1 <st 28> T<4,.predef_I4,4> {line: 1/14} {freq: 0, ln: 14, col: >>>>> 0} >>>>> RETURN {line: 1/14} {freq: 0, ln: 14, col: 0} >>>>> mu[ sym8v26 sym8v26 ] >>>>> >>>>> If simply mark alias chi node live, default vsym chi ndoe will live in >>>>> STMT2, because it's aliased with t2. >>>>> Then follow DU chian, default vsym chi nodes in STMT1 is also live. >>>>> Cause STMT1 not dead. >>>>> >>>>> Actually, default vsym will not be requried from return vsym or t2's DU >>>>> chain. >>>>> So we need allow aliased chi nodes not live exist, if this chi node is >>>>> never required in DSE analysis. >>>>> >>>>> I consider default vsym as being live at STMT2, because it aliases with >>>>> t2. This will cause STMT1 to be live. I understand by keeping the >>>>> default >>>>> vsym non-live, you can delete STMT1. But my intuition is that making the >>>>> default vsym non-live (when in fact it is live) will cause other bugs. >>>>> >>>> >>>> default vsym chi nodes in STMT2 is not mark live, because it never >>>> processed by DSE::Set_Required_VSE. >>>> This means default vsym chi nodes in STMT2 has no explicit use. >>>> So there is no requirement for getting def statement from deafult vsym's >>>> use to STMT2. That's why I think its okay to leave the default vsym >>>> non-live. >>>> >>>> Is this a legacy problem or not?Is there same problem in DSE's initial >>>> version? >>>> >>>> Before your current fix, and also in the original version from >>>> PathScale, default vsym is not live at STMT2. But with your fix, and >>>> without your _vse_required_set mechanism, default vsym would be made live >>>> at >>>> STMT2. I understand your effort is in trying to prevent performance >>>> regression due to your fix. >>>> >>>> Strictly speaking, default vsym is live at STMT2, because it aliases >>>> with t2 which is used at the return statement. If this incorrect liveness >>>> of the default vsym does not cause any wrong optimization, then it is still >>>> OK. Maybe you should consider using a different approach to fix the bug. >>>> For example, you can leave opt_dse.cxx intact and consider preventing the >>>> bad copy propagation instead. At PathScale, we did not do copy propagation >>>> on chi occurrences, which I think is the root cause of the bug. >>>> >>>> Fred >>>> >>>> Regards >>>> Shi Hui >>>> >>>> >>>>> Fred >>>>> >>>>> Regards >>>>> Shi Hui >>>>> >>>>> >>>>> >>>>> On Tue, Jul 5, 2011 at 5:16 PM, Fred Chow <frdc...@gmail.com> wrote: >>>>>> >>>>>> Hi Shi, >>>>>> >>>>>> Your fix by calling CODEMAP rename is OK in its own right. But this >>>>>> fix is in reality for patching up a bug that was introduced by copy >>>>>> propagation. We should fix the root cause of the problem, NOT to patch >>>>>> it >>>>>> so the symptom does not appear. >>>>>> >>>>>> Basically, the following from your original mail: >>>>>> >>>>>> STMT1 >>>>>> sym2v4 = chi(sym2v3) NOT LIVE >>>>>> default_vsym_v4 = chi(default_vsym_v3) LIVE >>>>>> >>>>>> is already wrong. sym2v aliases with the default vsym. Thus, one >>>>>> cannot be live while the other is dead. >>>>>> >>>>>> In looking at the code in opt_dse.cxx, I found that it has diverged a >>>>>> lot from the version that I'm familiar with. Thus, I'm not in a >>>>>> position to >>>>>> point out what the root problem is. At least the version I worked with >>>>>> at >>>>>> PathScale does not have this problem. I would welcome others to provide >>>>>> input on this. >>>>>> >>>>>> Fred >>>>>> >>>>>> On 07/04/2011 07:37 PM, Hui Shi wrote: >>>>>> >>>>>> Hi Fred, >>>>>> >>>>>> Thanks for your comments. >>>>>> The reanming happens only when there are some dead phi/chi nodes >>>>>> revive in CODEMAP::Iload_folded. >>>>>> To evaluate how this patch will affect compile-time, I add dump when >>>>>> this renaming happens. >>>>>> Tested with SPEC2006 base/peak >>>>>> Following cases have CODEMAP rename after create CODEMAPE, and >>>>>> renaming count is seldom consider test's PU count.. >>>>>> 447.dealII base/peak rename 20+ times >>>>>> 483.xalancbmk base/peak rename 30+ times >>>>>> 416.gamess peak rename 4 times >>>>>> 458.sjeng base rename 1 time. >>>>>> So this renaming rarely happens and I think we can afford the >>>>>> increased compile time. >>>>>> >>>>>> And from another perspective. Rename_CODEMAP is already used in other >>>>>> phase like aft CGF optimization, COMP_UNIT::Fold_lda_iload_istore, etc. >>>>>> These optimizations and renaming are more likely happen. >>>>>> >>>>>> One utiIity I hope WOPT can have is CODEMAP rename only rename somve >>>>>> AUX (give in a table or list). Then rename cost can be decreased a lot. >>>>>> >>>>>> >>>>>> For copy propagation issue. >>>>>> The copy propagation happens in Dead_Store_Elminination, >>>>>> DSE::Propagate_vsym_bb. >>>>>> To disable the propagation, need change Dead_Store_Elimination. If one >>>>>> chi node is live, >>>>>> all other chi nodes binding to same statment and set_required should >>>>>> live because they are aliased with each other. >>>>>> This will also cause compile time isuee, because more live phi/chi >>>>>> node in codrep. >>>>>> >>>>>> Another issue is though we fix the copy propgation problem, the revive >>>>>> logical is still there in CODEMAP::Iload_folded. >>>>>> There still possible dead phi/chi nodes revive in CODEMAP creation. So >>>>>> complete fix is rename after create CODEMAP. >>>>>> >>>>>> The main issue is create_codemap not simply create coderep, it also >>>>>> perform iload folding may change version. >>>>>> If iload folding function is totolay peformed in >>>>>> COMP_UNIT::Fold_lda_iload_istore, it has no problem. >>>>>> >>>>>> Regards >>>>>> Shi Hui >>>>>> >>>>>> On Mon, Jul 4, 2011 at 9:26 PM, Fred Chow <frdc...@gmail.com> wrote: >>>>>>> >>>>>>> Doing renaming is quite compile-time expensive, because it is like >>>>>>> running the SSA construction algorithm. The problem your mentioned >>>>>>> would >>>>>>> not have occurred if sym2v3 has not been propagated down to STMT3. I >>>>>>> don't >>>>>>> recall we were doing copy propagation for chi operands. The overlapped >>>>>>> live >>>>>>> range was caused by this bad copy propagation, NOT by the ILOAD folding. >>>>>>> >>>>>>> The above copy propagation is bad because sym2v4 is NOT dead, because >>>>>>> it aliases with the default vsym which appears in the mu of STMT2. So >>>>>>> your >>>>>>> fix should be in preventing the copy propagation. >>>>>>> >>>>>>> Fred >>>>>>> >>>>>>> On 06/29/2011 11:35 PM, Hui Shi wrote: >>>>>>> >>>>>>> Could gatekeeper help review this fix? >>>>>>> >>>>>>> In SSA::Create_CODEMAP ILOAD folding is performed during coderep >>>>>>> setup. >>>>>>> The problem is iload folding may revive some dead phi/chi node. >>>>>>> >>>>>>> For example >>>>>>> STMT1 >>>>>>> sym2v4 = chi(sym2v3) NOT LIVE >>>>>>> default_vsym_v4 = chi(default_vsym_v3) LIVE >>>>>>> .... >>>>>>> STMT2 >>>>>>> ILOAD mu(default_vsym_v4) >>>>>>> ... >>>>>>> STMT3 >>>>>>> sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead >>>>>>> chi. >>>>>>> default_vsym_v5 = chi(default_vsym_v4) >>>>>>> >>>>>>> If sym2v4 may use at STMT2, ILOAD's MU node must on STMT1's chi list >>>>>>> (because MU node must alias with sym2, otherwise its alias issue). >>>>>>> And MU node's corresponding CHI node must live on STMT1, because >>>>>>> set_required_mu will set chi node live. >>>>>>> So it's okay for DSE to mark fisrt CHI dead if STMT1 and STMT3 write >>>>>>> same memory location. >>>>>>> 1. It will not make STMT1 dead, because the other chi node is live. >>>>>>> 2. ILOAD's MU node's def statment still STMT1 >>>>>>> >>>>>>> If ILOAD folding transform ILOAD to LDID and LDID's AUX is sym2. >>>>>>> ILOD_FOLDING revives dead chi sym2v4 = chi(sym2v3) without update the >>>>>>> late chi node's opnd. >>>>>>> >>>>>>> This introduce overlapped between sym2v3 and sym2v4. >>>>>>> Fix is rename CODEMAP if iload folding revive dead phi/chi node. >>>>>>> OPT_REVISE_SSA::Fold_lda_indirects also rename CODEMAP after >>>>>>> optimization. >>>>>>> >>>>>>> Fix is >>>>>>> 1. record in htable, if dead chi/phi node is revived in >>>>>>> SSA::Create_CODEMAP. >>>>>>> This only happends when ILOAD_Folding happens and find >>>>>>> corresponding chi/phi is dead. >>>>>>> 2. After SSA::Create_CODEMAP, rename coderep if dead chi/phi node is >>>>>>> revive in CODEMAP setup. >>>>>>> >>>>>>> Different with fixing in DSE to mark first chi node live, this fix >>>>>>> not affect DSE, so not affect performance. >>>>>>> And this is verified with spec2006.rate on xeon. >>>>>>> >>>>>>> Regards >>>>>>> Shi Hui >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> All of the data generated in your IT infrastructure is seriously >>>>>>> valuable. >>>>>>> Why? It contains a definitive record of application performance, >>>>>>> security >>>>>>> threats, fraudulent activity, and more. Splunk takes this data and >>>>>>> makes >>>>>>> sense of it. IT sense. And common sense. >>>>>>> http://p.sf.net/sfu/splunk-d2d-c2 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Open64-devel mailing list >>>>>>> Open64-devel@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/open64-devel >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Magic Quadrant for Content-Aware Data Loss Prevention >>> Research study explores the data loss prevention market. Includes >>> in-depth >>> analysis on the changes within the DLP market, and the criteria used to >>> evaluate the strengths and weaknesses of these DLP solutions. >>> http://www.accelacomm.com/jaw/sfnl/114/51385063/ >>> _______________________________________________ >>> Open64-devel mailing list >>> Open64-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/open64-devel >>> >> >> > > > ------------------------------------------------------------------------------ > Got Input? Slashdot Needs You. > Take our quick survey online. Come on, we don't ask for help often. > Plus, you'll get a chance to win $100 to spend on ThinkGeek. > http://p.sf.net/sfu/slashdot-survey > > _______________________________________________ > Open64-devel mailing list > Open64-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/open64-devel > > ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Open64-devel mailing list Open64-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/open64-devel