can you get Pathscale's version of dse (or compiler) and see what is the diff in behavior first? Sun
On Mon, Jul 25, 2011 at 10:11 PM, Hui Shi <kalin....@gmail.com> wrote: > > Hi Fred, > > The new enhancement in DSE introduce problem that some chi node is not live > due to same location with last store. > > For fixing the problem from copy propagation. > " For example, you can leave opt_dse.cxx intact and consider preventing the > bad copy propagation instead. At PathScale, we did not do copy propagation > on chi occurrences, which I think is the root cause of the bug." > > In my initial example in this thread, do you mean don't copy propgate sym2v3 > to STMT3? > It's not the copy propagation in CreateCodeMap. > This happens after DSE mark sym2 chi node on STMT1 not live. > DSE::Dead_store_elim calles DSE::Propagate_vsym_bb, propagte sym2v3 to > STMT3, because sym2v4 = chi(sym2v3) is dead. > > I think this logical is correct, otherwise, STMT3's sym2 chi node's operand > can't directly find a live def. > > STMT1 > sym2v4 = chi(sym2v3) NOT LIVE > default_vsym_v4 = chi(default_vsym_v3) LIVE > .... > STMT2 > ILOAD mu(default_vsym_v4) > ... > STMT3 > sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead chi. > default_vsym_v5 = chi(default_vsym_v4) > > I think this patch is safe to apply on the current DSE. > > Regards > Shi Hui > > On Sun, Jul 24, 2011 at 11:18 AM, Fred Chow <frdc...@gmail.com> wrote: >> >> My comments below: >> >> On 07/23/2011 09:00 AM, Hui Shi wrote: >> >> Hi Fred, >> >> Thanks for your comments. >> Comments below. >> >> On Sat, Jul 23, 2011 at 8:04 PM, Fred Chow <frdc...@gmail.com> wrote: >>> >>> Hi Shi, >>> >>> Your change to force other aliased chi nodes to live in >>> DSE::Set_Reqruied_CHI() is fine with me. But I'm not comfortable with your >>> introduction of the _vse_required_set, which I understand is for finding >>> more dead stores. More below: >>> >>> On 07/22/2011 01:50 AM, Hui Shi wrote: >>> >>> Hi Fred, >>> >>> Follow your suggestion, if one chi node is live, other aliased chi nodes >>> in same chi_list should live. >>> I fix this issue in DSE, could you help review this patch? >>> >>> Fix is: >>> 1. Add hash_set _vse_requried_set in DSE, record if VER_STAB_ENTRY is >>> once processed by DSE::Set_Required_VSE. >>> >>> Each vse entry is an SSA version of an aux entry. Thus, from vse to aux >>> is a many-to-one mapping. Given a vse, vse->Any_use() already tells me that >>> this vse has once been required. Why do you have to set up this >>> _vse_required_set, and using DSE::VSE_required() to check the information >>> that is given by vse->Any_use()? Should it be an aux set instead of a vse >>> set? >> >> One vse entry set in _vse_required_set means this vse entry was once >> processed by DSE::Set_Required_VSE. >> However this doesn't means vse->Any_use() is true. >> >> I agree that the above is the case with the latest version of >> opt_dse.cxx. But in the original version of opt_dse.cxx from PathScale, >> vse->Any_use() is set to true whenever Set_required_VSE() is entered. So >> there has obviously been enhancements put in, and your creation of >> _vse_required_set is made necessary due to those enhancements. Since those >> enhancements has not been reviewed by me, I would appreciate the folks who >> did those enhancements provide their review opinion on this latest addition. >> >> In DSE::Set_Required_VSE( VER_STAB_ENTRY *vse, BOOL real_use, WN *ref_wn >> ). >> When VSE's def type is WHIRL statement or PHI, vse->any_use is true. >> When VSE's def type is CHI, VSE entry any_use will not set true, if AUX's >> last store write the same mem location with current chi's statement. >> >> VSE entry in _vse_required_set means its use in ldid/mu/phi/chi is live >> then this VSE entry is processed by DSE::Set_Required_VSE. >> But not necessary means VSE entry's def chi node is live. >> >>> >>> 2. In DSE::Set_Required_CHI >>> 1. If chi node is live, find aliased/non-live, once required chi >>> nodes in same chi list. If find, force these chi nodes live. >>> >>> If a chi is non-live, then "once required" must be false. So there is a >>> contradiction here. If your intention is "aux-once-required", there is >>> still a problem, because the aux may be found to be required later. >> >> From above explanation, it force live chi nodes that are >> 1. once processed in DSE::Set_Required_CHI before. >> 2. not live due to chi node's whirl statment write same mem location with >> same AUX's last store. >> >>> Set last_store be NULL and call set_required_vse. >>> 2. If chi node is not live, find aliased chi nodes live in same chi >>> list. If find, set current chi node live. >>> 3. Remove previous fix in set_required_mu >>> >>> VER_STAB_ENTRY is required means this AUX symbol's some version is really >>> used in program. >>> Using this information, we can exclude mark some AUX symbol's chi node >>> live, if this AUX is never used. >>> This is meaningful in following case, >>> int foo(int a) { >>> int t1, t2; >>> int *p; >>> >>> if(a){ >>> p = &t1; >>> } >>> else { >>> p = &t2; >>> } >>> t1 =3; >>> *p = 5; >>> return t2; >>> } >>> >>> chi nodes live info after original DSE >>> I4I4LDID 0 <st 20> T<4,.predef_I4,4> >>> I4INTCONST 0 (0x0) >>> I4I4NE >>> U8LDA 0 <st 4> T<53,anon_ptr.,8> >>> U8LDA 0 <st 5> T<53,anon_ptr.,8> >>> U8SELECT >>> U8STID 0 <st 21> T<53,anon_ptr.,8> {line: 1/6} {freq: 0, ln: 6, col: 0} >>> I4INTCONST 3 (0x3) >>> I4STID 0 <st 23> T<4,.predef_I4,4> {line: 1/12} {freq: 0, ln: 12, col: >>> 0} // STMT1 >>> sym9v22 <- chi( sym9v19 ) NOT LIVE // default vsym chi is dead. this >>> statement is dead >>> I4INTCONST 5 (0x5) >>> U8U8LDID 0 <st 21> T<53,anon_ptr.,8> >>> I4ISTORE 0 T<53,anon_ptr.,8> {line: 1/13} {freq: 0, ln: 13, col: 0} >>> //STMT2 >>> sym4v24 <- chi( sym4v23 ) NOT LIVE // t1 chi node is dead >>> sym5v25 <- chi( sym5v15 ) LIVE // t2 chi node is live >>> sym8v26 <- chi( sym8v18 ) LIVE // return vsym chi ndoe is live >>> sym9v27 <- chi( sym9v22 ) NOT LIVE // default vsym chi is dead >>> I4I4LDID 0 <st 25> T<4,.predef_I4,4> >>> I4STID 1 <st 28> T<4,.predef_I4,4> {line: 1/14} {freq: 0, ln: 14, col: 0} >>> RETURN {line: 1/14} {freq: 0, ln: 14, col: 0} >>> mu[ sym8v26 sym8v26 ] >>> >>> If simply mark alias chi node live, default vsym chi ndoe will live in >>> STMT2, because it's aliased with t2. >>> Then follow DU chian, default vsym chi nodes in STMT1 is also live. Cause >>> STMT1 not dead. >>> >>> Actually, default vsym will not be requried from return vsym or t2's DU >>> chain. >>> So we need allow aliased chi nodes not live exist, if this chi node is >>> never required in DSE analysis. >>> >>> I consider default vsym as being live at STMT2, because it aliases with >>> t2. This will cause STMT1 to be live. I understand by keeping the default >>> vsym non-live, you can delete STMT1. But my intuition is that making the >>> default vsym non-live (when in fact it is live) will cause other bugs. >>> >> >> default vsym chi nodes in STMT2 is not mark live, because it never >> processed by DSE::Set_Required_VSE. >> This means default vsym chi nodes in STMT2 has no explicit use. >> So there is no requirement for getting def statement from deafult vsym's >> use to STMT2. That's why I think its okay to leave the default vsym >> non-live. >> >> Is this a legacy problem or not?Is there same problem in DSE's initial >> version? >> >> Before your current fix, and also in the original version from PathScale, >> default vsym is not live at STMT2. But with your fix, and without your >> _vse_required_set mechanism, default vsym would be made live at STMT2. I >> understand your effort is in trying to prevent performance regression due to >> your fix. >> >> Strictly speaking, default vsym is live at STMT2, because it aliases with >> t2 which is used at the return statement. If this incorrect liveness of the >> default vsym does not cause any wrong optimization, then it is still OK. >> Maybe you should consider using a different approach to fix the bug. For >> example, you can leave opt_dse.cxx intact and consider preventing the bad >> copy propagation instead. At PathScale, we did not do copy propagation on >> chi occurrences, which I think is the root cause of the bug. >> >> Fred >> >> Regards >> Shi Hui >> >> >>> Fred >>> >>> Regards >>> Shi Hui >>> >>> >>> >>> On Tue, Jul 5, 2011 at 5:16 PM, Fred Chow <frdc...@gmail.com> wrote: >>>> >>>> Hi Shi, >>>> >>>> Your fix by calling CODEMAP rename is OK in its own right. But this fix >>>> is in reality for patching up a bug that was introduced by copy >>>> propagation. We should fix the root cause of the problem, NOT to patch it >>>> so the symptom does not appear. >>>> >>>> Basically, the following from your original mail: >>>> >>>> STMT1 >>>> sym2v4 = chi(sym2v3) NOT LIVE >>>> default_vsym_v4 = chi(default_vsym_v3) LIVE >>>> >>>> is already wrong. sym2v aliases with the default vsym. Thus, one >>>> cannot be live while the other is dead. >>>> >>>> In looking at the code in opt_dse.cxx, I found that it has diverged a >>>> lot from the version that I'm familiar with. Thus, I'm not in a position >>>> to >>>> point out what the root problem is. At least the version I worked with at >>>> PathScale does not have this problem. I would welcome others to provide >>>> input on this. >>>> >>>> Fred >>>> >>>> On 07/04/2011 07:37 PM, Hui Shi wrote: >>>> >>>> Hi Fred, >>>> >>>> Thanks for your comments. >>>> The reanming happens only when there are some dead phi/chi nodes revive >>>> in CODEMAP::Iload_folded. >>>> To evaluate how this patch will affect compile-time, I add dump when >>>> this renaming happens. >>>> Tested with SPEC2006 base/peak >>>> Following cases have CODEMAP rename after create CODEMAPE, and renaming >>>> count is seldom consider test's PU count.. >>>> 447.dealII base/peak rename 20+ times >>>> 483.xalancbmk base/peak rename 30+ times >>>> 416.gamess peak rename 4 times >>>> 458.sjeng base rename 1 time. >>>> So this renaming rarely happens and I think we can afford the increased >>>> compile time. >>>> >>>> And from another perspective. Rename_CODEMAP is already used in other >>>> phase like aft CGF optimization, COMP_UNIT::Fold_lda_iload_istore, etc. >>>> These optimizations and renaming are more likely happen. >>>> >>>> One utiIity I hope WOPT can have is CODEMAP rename only rename somve AUX >>>> (give in a table or list). Then rename cost can be decreased a lot. >>>> >>>> >>>> For copy propagation issue. >>>> The copy propagation happens in Dead_Store_Elminination, >>>> DSE::Propagate_vsym_bb. >>>> To disable the propagation, need change Dead_Store_Elimination. If one >>>> chi node is live, >>>> all other chi nodes binding to same statment and set_required should >>>> live because they are aliased with each other. >>>> This will also cause compile time isuee, because more live phi/chi node >>>> in codrep. >>>> >>>> Another issue is though we fix the copy propgation problem, the revive >>>> logical is still there in CODEMAP::Iload_folded. >>>> There still possible dead phi/chi nodes revive in CODEMAP creation. So >>>> complete fix is rename after create CODEMAP. >>>> >>>> The main issue is create_codemap not simply create coderep, it also >>>> perform iload folding may change version. >>>> If iload folding function is totolay peformed in >>>> COMP_UNIT::Fold_lda_iload_istore, it has no problem. >>>> >>>> Regards >>>> Shi Hui >>>> >>>> On Mon, Jul 4, 2011 at 9:26 PM, Fred Chow <frdc...@gmail.com> wrote: >>>>> >>>>> Doing renaming is quite compile-time expensive, because it is like >>>>> running the SSA construction algorithm. The problem your mentioned would >>>>> not have occurred if sym2v3 has not been propagated down to STMT3. I >>>>> don't >>>>> recall we were doing copy propagation for chi operands. The overlapped >>>>> live >>>>> range was caused by this bad copy propagation, NOT by the ILOAD folding. >>>>> >>>>> The above copy propagation is bad because sym2v4 is NOT dead, because >>>>> it aliases with the default vsym which appears in the mu of STMT2. So >>>>> your >>>>> fix should be in preventing the copy propagation. >>>>> >>>>> Fred >>>>> >>>>> On 06/29/2011 11:35 PM, Hui Shi wrote: >>>>> >>>>> Could gatekeeper help review this fix? >>>>> >>>>> In SSA::Create_CODEMAP ILOAD folding is performed during coderep setup. >>>>> The problem is iload folding may revive some dead phi/chi node. >>>>> >>>>> For example >>>>> STMT1 >>>>> sym2v4 = chi(sym2v3) NOT LIVE >>>>> default_vsym_v4 = chi(default_vsym_v3) LIVE >>>>> .... >>>>> STMT2 >>>>> ILOAD mu(default_vsym_v4) >>>>> ... >>>>> STMT3 >>>>> sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead >>>>> chi. >>>>> default_vsym_v5 = chi(default_vsym_v4) >>>>> >>>>> If sym2v4 may use at STMT2, ILOAD's MU node must on STMT1's chi list >>>>> (because MU node must alias with sym2, otherwise its alias issue). >>>>> And MU node's corresponding CHI node must live on STMT1, because >>>>> set_required_mu will set chi node live. >>>>> So it's okay for DSE to mark fisrt CHI dead if STMT1 and STMT3 write >>>>> same memory location. >>>>> 1. It will not make STMT1 dead, because the other chi node is live. >>>>> 2. ILOAD's MU node's def statment still STMT1 >>>>> >>>>> If ILOAD folding transform ILOAD to LDID and LDID's AUX is sym2. >>>>> ILOD_FOLDING revives dead chi sym2v4 = chi(sym2v3) without update the >>>>> late chi node's opnd. >>>>> >>>>> This introduce overlapped between sym2v3 and sym2v4. >>>>> Fix is rename CODEMAP if iload folding revive dead phi/chi node. >>>>> OPT_REVISE_SSA::Fold_lda_indirects also rename CODEMAP after >>>>> optimization. >>>>> >>>>> Fix is >>>>> 1. record in htable, if dead chi/phi node is revived in >>>>> SSA::Create_CODEMAP. >>>>> This only happends when ILOAD_Folding happens and find corresponding >>>>> chi/phi is dead. >>>>> 2. After SSA::Create_CODEMAP, rename coderep if dead chi/phi node is >>>>> revive in CODEMAP setup. >>>>> >>>>> Different with fixing in DSE to mark first chi node live, this fix not >>>>> affect DSE, so not affect performance. >>>>> And this is verified with spec2006.rate on xeon. >>>>> >>>>> Regards >>>>> Shi Hui >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> All of the data generated in your IT infrastructure is seriously >>>>> valuable. >>>>> Why? It contains a definitive record of application performance, >>>>> security >>>>> threats, fraudulent activity, and more. Splunk takes this data and >>>>> makes >>>>> sense of it. IT sense. And common sense. >>>>> http://p.sf.net/sfu/splunk-d2d-c2 >>>>> >>>>> _______________________________________________ >>>>> Open64-devel mailing list >>>>> Open64-devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/open64-devel >>>>> >>>> >>>> >>> >>> >> >> > > > ------------------------------------------------------------------------------ > Storage Efficiency Calculator > This modeling tool is based on patent-pending intellectual property that > has been used successfully in hundreds of IBM storage optimization engage- > ments, worldwide. Store less, Store more with what you own, Move data to > the right place. Try It Now! > http://www.accelacomm.com/jaw/sfnl/114/51427378/ > _______________________________________________ > Open64-devel mailing list > Open64-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/open64-devel > > ------------------------------------------------------------------------------ Storage Efficiency Calculator This modeling tool is based on patent-pending intellectual property that has been used successfully in hundreds of IBM storage optimization engage- ments, worldwide. Store less, Store more with what you own, Move data to the right place. Try It Now! http://www.accelacomm.com/jaw/sfnl/114/51427378/ _______________________________________________ Open64-devel mailing list Open64-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/open64-devel