Dear Sun:

   This is an insight review and shed the light for me, thank you very
much!!

Why we have this assertion?

When we look at the detail trace of the LRA process, we get the answer:
OP:20>> [ 762,27] TN160(%rdx) :- movzbl TN161 ;
Avail integer registers:  %rax %rdi %rsi
Deallocated register %rdx
Allocated register %rdx for TN161
OP:19>> [ 762,26] TN161(%rdx) :- setne TN33(%rflags) ;
Avail integer registers:  %rax %rdi %rsi
Deallocated register %rdx
OP:18>> [ 762,25] TN33(%rflags) :- test32 TN158 TN158 ;
Avail integer registers:  %rax %rdi %rsi %rdx
Allocated register %rsi for TN158
OP:17>> [ 762,25] TN155(%rbp) :- mov32 TN173(%rax) ; copy
Avail integer registers:  %rax %rdi %rdx
Deallocated register %rbp
Allocated register %rax for TN173(%rax)
OP:16>> [ 762,24] TN158(%rsi) :- mov32 TN174(%rdx) ; copy
Avail integer registers:  %rbp %rdi %rdx
Deallocated register %rsi
Allocated register %rdx for TN174(%rdx)
OP:15>> [ 763,24] :- store8 TN170 TN149(%rcx) (0x8) ; WN id:29
Avail integer registers:  %rbp %rdi %rsi
The last OP, i.e OP15, we get
store8 TN170 TN149(%rcx) (0x8) ; WN id:29
Avail integer registers: %rbp %rdi %rsi
since all rbp,rdi,rsi are not byte addressable registers under -m32, so,
TN170 is unable to get a physical register each round, finally it get to
the max iterative rounds and asserts.

New suggested patch:
Under -m32, when we iterate the avail registers and return the
physics register for current op, we have a furthur look on the remaining
registers. If the current allocate register is a byte-addressable register
while all remainings are non byte-addressable, we just skip the current
allocate and return one of the avail non-addressable.

Patch below:
diff --git a/osprey/be/cg/lra.cxx b/osprey/be/cg/lra.cxx
index 834f4c6..79f200e 100644
--- a/osprey/be/cg/lra.cxx
+++ b/osprey/be/cg/lra.cxx
@@ -1899,7 +1899,56 @@ Get_Avail_Reg (ISA_REGISTER_CLASS regclass,
       }
     }
     if (lru_reg != REGISTER_UNDEFINED) {
-      return lru_reg;
+      // Open64.net bug953. Under -m32,
+      // In case of one byte-addressable register and the others
non-addressable avalable,
+      // we choose to skip the byte-addressable one and allocate one of
the non-addressables.
+      const REGISTER_SET byte_regs =
REGISTER_SUBCLASS_members(ISA_REGISTER_SUBCLASS_m32_8bit_regs);
+      if (Is_Target_32bit() &&
+          regclass == ISA_REGISTER_CLASS_integer && // We only handle the
integer case
+          REGISTER_SET_MemberP(byte_regs, lru_reg)) { // We are trying to
allocate a byte addressable register
+        BOOL Has_avail_m32_8bit_regs = TRUE;
+        for (reg = REGISTER_MIN; reg <= REGISTER_MAX; reg++) {
+          if (reg == lru_reg)
+            continue;
+          if (Is_Reg_Available(regclass, usable_regs, reg, lr) &&
+              reg != skip_reg) {
+            if (REGISTER_SET_MemberP(byte_regs, reg)) {
+              // Yes, there are other byte addressable registers available
+              Has_avail_m32_8bit_regs = TRUE;
+              break;
+            }
+            else
+              Has_avail_m32_8bit_regs = FALSE;
+          }
+        }
+        if (Has_avail_m32_8bit_regs) {
+          // has other available addressable registers, just return lru_rg;
+          return lru_reg;
+        } else {
+          // no byte-addressable registers available
+          if (Do_LRA_Trace(Trace_LRA_Detail)) {
+            fprintf(TFile, "The case of only non byte-addressable
registers left\n");
+            fprintf(TFile, "LRU reg: %s\n",
REGISTER_name(regclass,lru_reg));
+            fprintf(TFile, "usable registers: ");
+            for (reg = REGISTER_MIN; reg <= REGISTER_MAX; reg++) {
+              if (reg == lru_reg)
+                continue;
+              if (Is_Reg_Available(regclass, usable_regs, reg, lr) &&
+                  reg != skip_reg) {
+                fprintf(TFile, "%s ", REGISTER_name(regclass, reg));
+              }
+            }
+            fprintf(TFile, "\nskip %s\n", REGISTER_name(regclass,
lru_reg));
+          }
+          // We just skip current lru_reg, and select the non
byte-addressable alternative
+          return Get_Avail_Reg (regclass,
+                                usable_regs,
+                                lr,
+                                lru_reg);
+        }
+      } else {
+        return lru_reg;
+      }
     }
   } else
 #endif
The expected dump:

OP:29>> [ 764,33] :- store32 TN154 GTN4(%rsp) (0x4) ; WN id:31
Avail integer registers:  %rax %rbx %rbp %rdi %rsi %rdx
The case of only non byte-addressable registers left
LRU reg: %rbx
usable registers: %rbp %rdi %rsi
skip %rbx
Allocated register %rbp for TN154
......
OP:16>> [ 763,24] :- store8 TN170 TN149(%rcx) (0x8) ; WN id:29
Avail integer registers:  %rbx %rdi %rsi
Allocated register %rbx for TN170

Would you please kindly review the new patch? thank you very much.

Regards
Gang


On Tue, Mar 6, 2012 at 10:12 PM, Sun Chan <sun.c...@gmail.com> wrote:

> why don't you bite the bullet and figure out why you got that
> assertion. I used your same reasoning and conclude that you should
> always do -O0 and you have no answer.
> Sun
>
> On Mon, Mar 5, 2012 at 9:54 PM, Gang Yu <yugang...@gmail.com> wrote:
> > Some reasons to defend the fix suggestion:
> >
> > a.) it's a bug due to aggressive scheduling under inappropriate context,
> > disable the scheduling may fix the bug.
> > b.) Other two approaches may fix the bug:
> >    b.1) improved heuristic on pre-allocation scheduling like other
> compilers
> > do, but we haven't done that, before we get a better solution, we can get
> > temporay solution and track this problem further.
> >    b.2) enhancing the register allocation algorithm to get a spill for
> the
> > register that unable to spill. The is not preferrable since the LRA has
> > served as the production algorithm for a long time, changes to the grand
> > algorithmx need plenty of tests and carefully verification. Even we get
> that
> > TN170 spilled, this is still not good, since the code is scheduled, but
> also
> > introduced more spills lead to the performance regression.
> > c.) some optimizations are not good for some architectures, for example,
> x86
> > does not employ the Register Variable Identification(RVI) phase in WOPT
> > which will obviously increase the register pressures. So, shall we still
> > apply aggressive preallocation scheduling for IA32 ,PIC which is with
> very
> > limited register set?
> >
> > Thanks
> >
> > Regards
> > Gang
> >
> >
> > On Fri, Feb 24, 2012 at 8:18 PM, Sun Chan <sun.c...@gmail.com> wrote:
> >>
> >> this is not a fix, you might as well say, always do -O0 and you can
> >> close 99% of a lot of optimization bugs
> >> Sun
> >>
> >> On Fri, Feb 24, 2012 at 3:56 PM, Gang Yu <yugang...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> >   Could a gatekeeper please help review the fix for
> >> > bug953(http://bugs.open64.net/show_bug.cgi?id=953)?
> >> >
> >> > Cut-down case please have a look at the attachment:
> >> >
> >> > http://bugs.open64.net/attachment.cgi?id=512
> >> >
> >> > openCC -S bug953.cpp -fpic -m32 -O2
> >> >
> >> > ### Assertion failure at line 4678 of
> >> > /fc/home/yug/open64/osprey/be/cg/lra.cxx:
> >> > ### Compiler Error in file bug953.cpp during Local Register Allocation
> >> > phase:
> >> > ### LRA: Unable to spill TN:170 (cl:1) in BB:1, GRA grant:3
> >> > openCC INTERNAL ERROR:
> >> > /fc/home/yug/target/x86_64_dbg/lib/gcc-lib/x86_64-open64-linux/5.0/be
> >> > returned non-zero status 1
> >> >
> >> > Analysis:
> >> >
> >> > Dump of the failed bb before the LRA is:
> >> >
> >> > [ 762, 0] TN154 :- ld32 GTN4(%rsp) (sym:bufSize+0) ; WN id:23
> >> > bufSize+0x0
> >> > [ 762, 0] TN156 :- ld32 GTN4(%rsp) (sym:imageSize+0) ; WN id:27
> >> > imageSize+0x0
> >> > [   0, 0] TN159 :- zero32 ;
> >> > [ 759, 1] TN149 :- ld32 GTN4(%rsp) (sym:buffer+0) ; WN id:24
> buffer+0x0
> >> > [ 763, 1] TN164 :- ldc32 (0x1) ;
> >> > [ 761, 1] TN152 :- ldc32 (0x12345678) ;
> >> > [ 759, 2] TN150 :- ldc32 (0x2) ;
> >> > [ 762, 3] TN157 :- lea32 TN154 (0xffffffffffffffe0) ;
> >> > [ 762, 5] TN155 TN158 :- div32 TN156 TN159 TN157 ;
> >> > [ 763, 8] :- store8 TN164 TN149 (0x8) ; WN id:29
> >> > [ 761, 8] :- store32 TN152 TN149 (0x2) ; WN id:26
> >> > [ 759, 8] :- store16 TN150 TN149 (0x0) ; WN id:25
> >> > [ 762,23] TN33(%rflags) :- test32 TN158 TN158 ;
> >> > [ 762,24] TN161 :- setne TN33(%rflags) ;
> >> > [ 762,25] TN160 :- movzbl TN161 ;
> >> > [ 762,26] TN162 :- leax32 TN160 TN155 (0x1) (0x0) ;
> >> > [ 762,28] TN163 :- lea32 TN162 (0x1) ;
> >> > [ 762,30] :- store16 TN163 TN149 (0x6) ; WN id:28
> >> > [ 764,30] :- store32 TN154 GTN4(%rsp) (0x4) ; WN id:31
> >> > [ 764,30] :- store32 TN149 GTN4(%rsp) (0x0) ; WN id:30
> >> > [ 764,31] :- call
> (sym:_ZN7factory14checksumPacketEPNS_10rbu_packetEj+0)
> >> > ;
> >> > WN
> >> >
> >> > while before the IGLS_Schedule_Region, we can get
> >> >
> >> > [ 759, 0] TN149 :- ld32 GTN4(%rsp) (sym:buffer+0) ; WN id:24
> buffer+0x0
> >> > [ 759, 0] TN150 :- ldc32 (0x2) ;
> >> > [ 759, 0] :- store16 TN150 TN149 (0x0) ; WN id:25
> >> > [ 761, 0] TN152 :- ldc32 (0x12345678) ;
> >> > [ 761, 0] :- store32 TN152 TN149 (0x2) ; WN id:26
> >> > [ 762, 0] TN154 :- ld32 GTN4(%rsp) (sym:bufSize+0) ; WN id:23
> >> > bufSize+0x0
> >> > [ 762, 0] TN156 :- ld32 GTN4(%rsp) (sym:imageSize+0) ; WN id:27
> >> > imageSize+0x0
> >> > [ 762, 0] TN157 :- lea32 TN154 (0xffffffffffffffe0) ;
> >> > [   0, 0] TN159 :- zero32 ;
> >> > [ 762, 0] TN155 TN158 :- div32 TN156 TN159 TN157 ;
> >> > [ 762, 0] TN33(%rflags) :- test32 TN158 TN158 ;
> >> > [ 762, 0] TN161 :- setne TN33(%rflags) ;
> >> > [ 762, 0] TN160 :- movzbl TN161 ;
> >> > [ 762, 0] TN162 :- leax32 TN160 TN155 (0x1) (0x0) ;
> >> > [ 762, 0] TN163 :- lea32 TN162 (0x1) ;
> >> > [ 762, 0] :- store16 TN163 TN149 (0x6) ; WN id:28
> >> > [ 763, 0] TN164 :- ldc32 (0x1) ;
> >> > [ 763, 0] :- store8 TN164 TN149 (0x8) ; WN id:29
> >> > [ 764, 0] :- store32 TN149 GTN4(%rsp) (0x0) ; WN id:30
> >> > [ 764, 0] :- store32 TN154 GTN4(%rsp) (0x4) ; WN id:31
> >> > [ 764, 0] :- call
> (sym:_ZN7factory14checksumPacketEPNS_10rbu_packetEj+0)
> >> > ;
> >> > WN
> >> >
> >> > For the scheduling consideration, this is good since constants and
> loads
> >> > TNs
> >> > are lifted to the head , dependency between adjacent registers are
> >> > reduced.
> >> > However, this increase the register pressures and under m32, fpic ,
> the
> >> > physical registers are very limited (EBX used as GOT indexing dedicate
> >> > register) which cause the final TN170 unable to get a physical
> register.
> >> >
> >> > Suggested patch:
> >> >
> >> > disable IGLS_scheduling under m32 , PIC environment.
> >> >
> >> > index 9e644f8..a3c978a 100644
> >> > --- a/osprey/be/cg/cg.cxx
> >> > +++ b/osprey/be/cg/cg.cxx
> >> > @@ -1505,7 +1505,14 @@ extern void Generate_Return_Address(void);
> >> >      GRA_LIVE_Rename_TNs();
> >> >    }
> >> >  #if !defined(TARG_PPC32)    //  PPC IGLS_Schedule_Region bugs
> >> > +#ifdef TARG_X8664
> >> > +  // open64.net bug953. disable IGLS_Schedule_Region under
> >> > +  // -m32,-fpic environment.
> >> > +  if (!(Is_Target_32bit() && Gen_PIC_Shared))
> >> > +    IGLS_Schedule_Region (TRUE /* before register allocation */);
> >> > +#else
> >> >    IGLS_Schedule_Region (TRUE /* before register allocation */);
> >> > +#endif
> >> >  #ifdef TARG_X8664
> >> >    Examine_Loop_Info("after prescheduling", TRUE);
> >> >    void Counter_Merge (char*);
> >> >
> >> > TIA for review.
> >> >
> >> > Regards
> >> > Gang
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Virtualization & Cloud Management Using Capacity Planning
> >> > Cloud computing makes use of virtualization - but cloud computing
> >> > also focuses on allowing computing to be delivered as a service.
> >> > http://www.accelacomm.com/jaw/sfnl/114/51521223/
> >> > _______________________________________________
> >> > Open64-devel mailing list
> >> > Open64-devel@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/open64-devel
> >> >
> >
> >
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to