May be you can convince Fred the dead chi analysis is correct (baring this fix)?
Sun

On Mon, Aug 1, 2011 at 12:22 PM, Ann <ann...@gmail.com> wrote:
> Hi Fred,
>     The goal of Propagate_vsym_bb() is based on the dead chi analysis, and
> the goal of it is to maintain ud chain after dead chi analysis.  So I
> believe the root cause is dead chi identification, and I think Hui's fix is
> in the right phase.
>     I haven't figured out a better and cleaner fix which can maintain the
> same performance at the same time.
> Thank you.
>
> Best Regards,
> Ann
>
>
> On Sun, Jul 31, 2011 at 1:48 AM, Fred Chow <frdc...@gmail.com> wrote:
>>
>> Ann,
>>
>> Since I do not completely understand the enhancements in Open64's version
>> of opt_dse.cxx (whose purpose I understand is to enable more aliased stores
>> to be deleted), I'm not in a position to judge whether Hui's latest fix is
>> the best solution.  Since you consider Hui's fix as correct, the question is
>> whether there is a better and cleaner fix.
>>
>> Do you agree that the overlapped live range bug was caused by
>> DSE::Propagate_vsym_bb()?  If so, would a better and cleaner fix be to
>> improve Propagate_vsym_bb() so it will not introduce overlapped live range
>> by taking into account the implicit mu's?
>>
>> Fred
>>
>> On 07/26/2011 09:45 AM, Ann wrote:
>>
>> Hi Fred,
>>   I think Hui's  fix is to mark necessary chi live, so as to avoid copy
>> propagation and avoid overlapped live range.
>>   I think the mu-chi representation in open64 are conservative, which
>> caused redundant chi and mu contained in the IR. The new dse is aggressive
>> to delete redundant chi's.
>>   I once tried to detect all implicit mus and use them to prevent chi
>> elimination. But this approach caused 2~5% performance degradation for 3
>> programs in SPECK2006. That means the mu lists carry redundant mu nodes
>> which may prevent optimization.
>>
>>   I agree with Hui's fix, it will not cause any performance degradation
>> and make the IR optimized and right.  The "right" means the IR is
>> consistent, leaving necessary chi left and causing no overlapped live
>> range.
>>
>> Best Regards,
>> Ann
>>
>> On Tue, Jul 26, 2011 at 6:31 AM, Fred Chow <frdc...@gmail.com> wrote:
>>>
>>> My answers below:
>>>
>>> On 07/25/2011 07:11 AM, Hui Shi wrote:
>>>
>>> Hi Fred,
>>>
>>> The new enhancement in DSE introduce problem that some chi node is not
>>> live due to same location with last store.
>>>
>>> For fixing the problem from copy propagation.
>>> " For example, you can leave opt_dse.cxx intact and consider preventing
>>> the bad copy propagation instead.  At PathScale, we did not do copy
>>> propagation on chi occurrences, which I think is the root cause of the bug."
>>>
>>> In my initial example in this thread, do you mean don't copy propgate
>>> sym2v3 to STMT3?
>>>
>>> Yes.
>>>
>>> It's not the copy propagation in CreateCodeMap.
>>> This happens after DSE mark sym2 chi node on STMT1 not live.
>>> DSE::Dead_store_elim calles DSE::Propagate_vsym_bb, propagte sym2v3 to
>>> STMT3, because sym2v4 = chi(sym2v3) is dead.
>>>
>>> DSE::Propagate_vsym_bb() is exactly what I was referring to by "copy
>>> propagation on chi occurrences".  I did not know that this is done in
>>> opt_dse.cxx, this being another enhancement put into Open64 that was not in
>>> PathScale.
>>>
>>> I think this logical is correct, otherwise, STMT3's sym2 chi node's
>>> operand can't directly find a live def.
>>>
>>> There is an implicit use of sym2 in STMT2.  If there is an explicit MU
>>> node at STMT2 for sym2, it would be using sym2v4.  That's why the copy
>>> propagation of sym2v3 to STMT3 is wrong, because it causes the overlapped
>>> live range.  When the implicit use of sym2 at STMT2 is converted to explicit
>>> due to ILOAD-folding, this triggers the live range overlap assertion.
>>>
>>> STMT1
>>> sym2v4 = chi(sym2v3)  NOT LIVE
>>> default_vsym_v4 = chi(default_vsym_v3) LIVE
>>> ....
>>> STMT2
>>>   ILOAD mu(default_vsym_v4)
>>> ...
>>> STMT3
>>> sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead
>>> chi.
>>> default_vsym_v5 = chi(default_vsym_v4)
>>>
>>> I think this patch is safe to apply on the current DSE.
>>>
>>> I think your patch is not needed if you fix Propagate_vsym_bb() by
>>> preventing propagation across implicit MU's.  The root cause of the bug is
>>> in DSE::Propagate_vsym_bb().
>>>
>>> Fred
>>>
>>> Regards
>>> Shi Hui
>>>
>>> On Sun, Jul 24, 2011 at 11:18 AM, Fred Chow <frdc...@gmail.com> wrote:
>>>>
>>>> My comments below:
>>>>
>>>> On 07/23/2011 09:00 AM, Hui Shi wrote:
>>>>
>>>> Hi Fred,
>>>>
>>>> Thanks for your comments.
>>>> Comments below.
>>>>
>>>> On Sat, Jul 23, 2011 at 8:04 PM, Fred Chow <frdc...@gmail.com> wrote:
>>>>>
>>>>> Hi Shi,
>>>>>
>>>>> Your change to force other aliased chi nodes to live in
>>>>> DSE::Set_Reqruied_CHI() is fine with me.  But I'm not comfortable with 
>>>>> your
>>>>> introduction of the _vse_required_set, which I understand is for finding
>>>>> more dead stores.  More below:
>>>>>
>>>>> On 07/22/2011 01:50 AM, Hui Shi wrote:
>>>>>
>>>>> Hi Fred,
>>>>>
>>>>> Follow your suggestion, if one chi node is live, other aliased chi
>>>>> nodes in same chi_list should live.
>>>>> I fix this issue in DSE,  could you help review this patch?
>>>>>
>>>>> Fix is:
>>>>> 1. Add hash_set _vse_requried_set in DSE, record if VER_STAB_ENTRY is
>>>>> once processed by DSE::Set_Required_VSE.
>>>>>
>>>>> Each vse entry is an SSA version of an aux entry.  Thus, from vse to
>>>>> aux is a many-to-one mapping.  Given a vse, vse->Any_use() already tells 
>>>>> me
>>>>> that this vse has once been required.  Why do you have to set up this
>>>>> _vse_required_set, and using DSE::VSE_required() to check the information
>>>>> that is given by vse->Any_use()?  Should it be an aux set instead of a vse
>>>>> set?
>>>>
>>>> One vse entry set in _vse_required_set means this vse entry was once
>>>> processed by DSE::Set_Required_VSE.
>>>> However this doesn't means vse->Any_use() is true.
>>>>
>>>> I agree that the above is the case with the latest version of
>>>> opt_dse.cxx.  But in the original version of opt_dse.cxx from PathScale,
>>>> vse->Any_use() is set to true whenever Set_required_VSE() is entered.  So
>>>> there has obviously been enhancements put in, and your creation of
>>>> _vse_required_set is made necessary due to those enhancements.  Since those
>>>> enhancements has not been reviewed by me, I would appreciate the folks who
>>>> did those enhancements provide their review opinion on this latest 
>>>> addition.
>>>>
>>>> In DSE::Set_Required_VSE( VER_STAB_ENTRY *vse, BOOL real_use, WN *ref_wn
>>>> ).
>>>> When VSE's def type is WHIRL statement or PHI, vse->any_use is true.
>>>> When VSE's def type is CHI, VSE entry any_use will not set true, if
>>>> AUX's last store write the same mem location with current chi's statement.
>>>>
>>>> VSE entry in _vse_required_set means its use in ldid/mu/phi/chi is live
>>>> then this VSE entry is processed by DSE::Set_Required_VSE.
>>>> But not necessary means VSE entry's def chi node is live.
>>>>
>>>>>
>>>>> 2. In DSE::Set_Required_CHI
>>>>>     1. If chi node is live, find aliased/non-live, once required chi
>>>>> nodes in same chi list. If find, force these chi nodes live.
>>>>>
>>>>> If a chi is non-live, then "once required" must be false.  So there is
>>>>> a contradiction here.  If your intention is "aux-once-required", there is
>>>>> still a problem, because the aux may be found to be required later.
>>>>
>>>> From above explanation, it force live chi nodes that are
>>>> 1. once processed in DSE::Set_Required_CHI before.
>>>> 2. not live due to chi node's whirl statment write same mem location
>>>> with same AUX's last store.
>>>>
>>>>>         Set last_store be NULL and call set_required_vse.
>>>>>     2. If chi node is not live, find aliased chi nodes live in same chi
>>>>> list. If find, set current chi node live.
>>>>> 3. Remove previous fix in set_required_mu
>>>>>
>>>>> VER_STAB_ENTRY is required means this AUX symbol's some version is
>>>>> really used in program.
>>>>> Using this information, we can exclude mark some AUX symbol's chi node
>>>>> live, if this AUX is never used.
>>>>> This is meaningful in following case,
>>>>> int foo(int a) {
>>>>> int t1, t2;
>>>>> int *p;
>>>>>
>>>>> if(a){
>>>>>   p = &t1;
>>>>> }
>>>>> else {
>>>>>   p = &t2;
>>>>> }
>>>>> t1 =3;
>>>>> *p = 5;
>>>>> return t2;
>>>>> }
>>>>>
>>>>> chi nodes live info after original DSE
>>>>>    I4I4LDID 0 <st 20> T<4,.predef_I4,4>
>>>>>    I4INTCONST 0 (0x0)
>>>>>   I4I4NE
>>>>>   U8LDA 0 <st 4> T<53,anon_ptr.,8>
>>>>>   U8LDA 0 <st 5> T<53,anon_ptr.,8>
>>>>>  U8SELECT
>>>>> U8STID 0 <st 21> T<53,anon_ptr.,8> {line: 1/6} {freq: 0, ln: 6, col: 0}
>>>>>  I4INTCONST 3 (0x3)
>>>>> I4STID 0 <st 23> T<4,.predef_I4,4> {line: 1/12} {freq: 0, ln: 12, col:
>>>>> 0}  // STMT1
>>>>> sym9v22 <- chi( sym9v19 ) NOT LIVE     // default vsym chi is dead.
>>>>> this statement is dead
>>>>>  I4INTCONST 5 (0x5)
>>>>>  U8U8LDID 0 <st 21> T<53,anon_ptr.,8>
>>>>> I4ISTORE 0 T<53,anon_ptr.,8> {line: 1/13} {freq: 0, ln: 13, col: 0}
>>>>> //STMT2
>>>>> sym4v24 <- chi( sym4v23 ) NOT LIVE  // t1 chi node is dead
>>>>> sym5v25 <- chi( sym5v15 ) LIVE         // t2 chi node is live
>>>>> sym8v26 <- chi( sym8v18 ) LIVE          // return vsym chi ndoe is live
>>>>> sym9v27 <- chi( sym9v22 ) NOT LIVE  // default vsym chi is dead
>>>>>  I4I4LDID 0 <st 25> T<4,.predef_I4,4>
>>>>> I4STID 1 <st 28> T<4,.predef_I4,4> {line: 1/14} {freq: 0, ln: 14, col:
>>>>> 0}
>>>>> RETURN {line: 1/14} {freq: 0, ln: 14, col: 0}
>>>>>        mu[ sym8v26  sym8v26 ]
>>>>>
>>>>> If simply mark alias chi node live, default vsym chi ndoe will live in
>>>>> STMT2, because it's aliased with t2.
>>>>> Then follow DU chian, default vsym chi nodes in STMT1 is also live.
>>>>> Cause STMT1 not dead.
>>>>>
>>>>> Actually, default vsym will not be requried from return vsym or t2's DU
>>>>> chain.
>>>>> So we need allow aliased chi nodes not live exist, if this chi node is
>>>>> never required in DSE analysis.
>>>>>
>>>>> I consider default vsym as being live at STMT2, because it aliases with
>>>>> t2.  This will cause STMT1 to be live.  I understand by keeping the 
>>>>> default
>>>>> vsym non-live, you can delete STMT1.  But my intuition is that making the
>>>>> default vsym non-live (when in fact it is live) will cause other bugs.
>>>>>
>>>>
>>>> default vsym chi nodes in STMT2 is not mark live, because it never
>>>> processed by DSE::Set_Required_VSE.
>>>> This means default vsym chi nodes in STMT2 has no explicit use.
>>>> So there is no requirement for getting def statement from deafult vsym's
>>>> use to STMT2. That's why I think its okay to leave the default vsym
>>>> non-live.
>>>>
>>>> Is this a legacy problem or not?Is there same problem in DSE's initial
>>>> version?
>>>>
>>>> Before your current fix, and also in the original version from
>>>> PathScale, default vsym is not live at STMT2.  But with your fix, and
>>>> without your _vse_required_set mechanism, default vsym would be made live 
>>>> at
>>>> STMT2.  I understand your effort is in trying to prevent performance
>>>> regression due to your fix.
>>>>
>>>> Strictly speaking, default vsym is live at STMT2, because it aliases
>>>> with t2 which is used at the return statement.  If this incorrect liveness
>>>> of the default vsym does not cause any wrong optimization, then it is still
>>>> OK.  Maybe you should consider using a different approach to fix the bug.
>>>> For example, you can leave opt_dse.cxx intact and consider preventing the
>>>> bad copy propagation instead.  At PathScale, we did not do copy propagation
>>>> on chi occurrences, which I think is the root cause of the bug.
>>>>
>>>> Fred
>>>>
>>>> Regards
>>>> Shi Hui
>>>>
>>>>
>>>>> Fred
>>>>>
>>>>> Regards
>>>>> Shi Hui
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 5, 2011 at 5:16 PM, Fred Chow <frdc...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Shi,
>>>>>>
>>>>>> Your fix by calling CODEMAP rename is OK in its own right.  But this
>>>>>> fix is in reality for patching up a bug that was introduced by copy
>>>>>> propagation.  We should fix the root cause of the problem, NOT to patch 
>>>>>> it
>>>>>> so the symptom does not appear.
>>>>>>
>>>>>> Basically, the following from your original mail:
>>>>>>
>>>>>> STMT1
>>>>>> sym2v4 = chi(sym2v3)  NOT LIVE
>>>>>> default_vsym_v4 = chi(default_vsym_v3) LIVE
>>>>>>
>>>>>> is already wrong.  sym2v aliases with the default vsym.  Thus, one
>>>>>> cannot be live while the other is dead.
>>>>>>
>>>>>> In looking at the code in opt_dse.cxx, I found that it has diverged a
>>>>>> lot from the version that I'm familiar with.  Thus, I'm not in a 
>>>>>> position to
>>>>>> point out what the root problem is.  At least the version I worked with 
>>>>>> at
>>>>>> PathScale does not have this problem.  I would welcome others to provide
>>>>>> input on this.
>>>>>>
>>>>>> Fred
>>>>>>
>>>>>> On 07/04/2011 07:37 PM, Hui Shi wrote:
>>>>>>
>>>>>> Hi Fred,
>>>>>>
>>>>>> Thanks for your comments.
>>>>>> The reanming happens only when there are some dead phi/chi nodes
>>>>>> revive in CODEMAP::Iload_folded.
>>>>>> To evaluate how this patch will affect compile-time, I add dump when
>>>>>> this renaming happens.
>>>>>> Tested with SPEC2006 base/peak
>>>>>> Following cases have CODEMAP rename after create CODEMAPE, and
>>>>>> renaming count is seldom consider test's PU count..
>>>>>> 447.dealII   base/peak  rename 20+ times
>>>>>> 483.xalancbmk  base/peak rename 30+ times
>>>>>> 416.gamess  peak  rename 4 times
>>>>>> 458.sjeng     base   rename 1 time.
>>>>>> So this renaming rarely happens and I think we can afford the
>>>>>> increased compile time.
>>>>>>
>>>>>> And from another perspective. Rename_CODEMAP is already used in other
>>>>>> phase like aft CGF optimization,  COMP_UNIT::Fold_lda_iload_istore, etc.
>>>>>> These optimizations and renaming are more likely happen.
>>>>>>
>>>>>> One utiIity I hope WOPT can have is CODEMAP rename only rename somve
>>>>>> AUX (give in a table or list). Then rename cost can be decreased a lot.
>>>>>>
>>>>>>
>>>>>> For copy propagation issue.
>>>>>> The copy propagation happens in Dead_Store_Elminination,
>>>>>> DSE::Propagate_vsym_bb.
>>>>>> To disable the propagation, need change Dead_Store_Elimination. If one
>>>>>> chi node is live,
>>>>>> all other chi nodes binding to same statment and set_required should
>>>>>> live because they are  aliased with each other.
>>>>>> This will also cause compile time isuee, because more live phi/chi
>>>>>> node in codrep.
>>>>>>
>>>>>> Another issue is though we fix the copy propgation problem, the revive
>>>>>> logical is still there in CODEMAP::Iload_folded.
>>>>>> There still possible dead phi/chi nodes revive in CODEMAP creation. So
>>>>>> complete fix is rename after create CODEMAP.
>>>>>>
>>>>>> The main issue is create_codemap not simply create coderep, it also
>>>>>> perform iload folding may change version.
>>>>>> If iload folding function is totolay peformed in
>>>>>> COMP_UNIT::Fold_lda_iload_istore, it has no problem.
>>>>>>
>>>>>> Regards
>>>>>> Shi Hui
>>>>>>
>>>>>> On Mon, Jul 4, 2011 at 9:26 PM, Fred Chow <frdc...@gmail.com> wrote:
>>>>>>>
>>>>>>> Doing renaming is quite compile-time expensive, because it is like
>>>>>>> running the SSA construction algorithm.  The problem your mentioned 
>>>>>>> would
>>>>>>> not have occurred if sym2v3 has not been propagated down to STMT3.  I 
>>>>>>> don't
>>>>>>> recall we were doing copy propagation for chi operands.  The overlapped 
>>>>>>> live
>>>>>>> range was caused by this bad copy propagation, NOT by the ILOAD folding.
>>>>>>>
>>>>>>> The above copy propagation is bad because sym2v4 is NOT dead, because
>>>>>>> it aliases with the default vsym which appears in the mu of STMT2.  So 
>>>>>>> your
>>>>>>> fix should be in preventing the copy propagation.
>>>>>>>
>>>>>>> Fred
>>>>>>>
>>>>>>> On 06/29/2011 11:35 PM, Hui Shi wrote:
>>>>>>>
>>>>>>> Could gatekeeper help review this fix?
>>>>>>>
>>>>>>> In SSA::Create_CODEMAP ILOAD folding is performed during coderep
>>>>>>> setup.
>>>>>>> The problem is iload folding may revive some dead phi/chi node.
>>>>>>>
>>>>>>> For example
>>>>>>> STMT1
>>>>>>> sym2v4 = chi(sym2v3)  NOT LIVE
>>>>>>> default_vsym_v4 = chi(default_vsym_v3) LIVE
>>>>>>> ....
>>>>>>> STMT2
>>>>>>>   ILOAD mu(default_vsym_v4)
>>>>>>> ...
>>>>>>> STMT3
>>>>>>> sym2v5 = chi(sym2v3) LIVE // sym2v3 is propagate down from above dead
>>>>>>> chi.
>>>>>>> default_vsym_v5 = chi(default_vsym_v4)
>>>>>>>
>>>>>>> If sym2v4 may use at STMT2, ILOAD's MU node must on STMT1's chi list
>>>>>>> (because MU node must alias with sym2, otherwise its alias issue).
>>>>>>> And MU node's corresponding CHI node must live on STMT1, because
>>>>>>> set_required_mu will set chi node live.
>>>>>>> So it's okay for DSE to mark fisrt CHI dead if STMT1 and STMT3 write
>>>>>>> same memory location.
>>>>>>> 1. It will not make STMT1 dead, because the other chi node is live.
>>>>>>> 2. ILOAD's MU node's def statment still STMT1
>>>>>>>
>>>>>>> If ILOAD folding transform ILOAD to LDID and LDID's AUX is sym2.
>>>>>>> ILOD_FOLDING revives dead chi sym2v4 = chi(sym2v3) without update the
>>>>>>> late chi node's opnd.
>>>>>>>
>>>>>>> This introduce overlapped between sym2v3 and sym2v4.
>>>>>>> Fix is rename CODEMAP if iload folding revive dead phi/chi node.
>>>>>>> OPT_REVISE_SSA::Fold_lda_indirects also rename CODEMAP after
>>>>>>> optimization.
>>>>>>>
>>>>>>> Fix is
>>>>>>> 1. record in htable, if dead chi/phi node is revived in
>>>>>>> SSA::Create_CODEMAP.
>>>>>>>    This only happends when ILOAD_Folding happens and find
>>>>>>> corresponding chi/phi is dead.
>>>>>>> 2. After SSA::Create_CODEMAP, rename coderep if dead chi/phi node is
>>>>>>> revive in CODEMAP setup.
>>>>>>>
>>>>>>> Different with fixing in DSE to mark first chi node live, this fix
>>>>>>> not affect DSE, so not affect performance.
>>>>>>> And this is verified with spec2006.rate on xeon.
>>>>>>>
>>>>>>> Regards
>>>>>>> Shi Hui
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> All of the data generated in your IT infrastructure is seriously
>>>>>>> valuable.
>>>>>>> Why? It contains a definitive record of application performance,
>>>>>>> security
>>>>>>> threats, fraudulent activity, and more. Splunk takes this data and
>>>>>>> makes
>>>>>>> sense of it. IT sense. And common sense.
>>>>>>> http://p.sf.net/sfu/splunk-d2d-c2
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Open64-devel mailing list
>>>>>>> Open64-devel@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Magic Quadrant for Content-Aware Data Loss Prevention
>>> Research study explores the data loss prevention market. Includes
>>> in-depth
>>> analysis on the changes within the DLP market, and the criteria used to
>>> evaluate the strengths and weaknesses of these DLP solutions.
>>> http://www.accelacomm.com/jaw/sfnl/114/51385063/
>>> _______________________________________________
>>> Open64-devel mailing list
>>> Open64-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/open64-devel
>>>
>>
>>
>
>
> ------------------------------------------------------------------------------
> Got Input?   Slashdot Needs You.
> Take our quick survey online.  Come on, we don't ask for help often.
> Plus, you'll get a chance to win $100 to spend on ThinkGeek.
> http://p.sf.net/sfu/slashdot-survey
>
> _______________________________________________
> Open64-devel mailing list
> Open64-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/open64-devel
>
>

------------------------------------------------------------------------------
Got Input?   Slashdot Needs You.
Take our quick survey online.  Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to