When I compile the following code

void f(int *x, int *y){
  *x = 7;
  *y = 4;
}

at -O2 for Itanium, I get the following assembly:

f:
        .prologue
        .body
        .mmi
        addl r14 = 7, r0
        ;;
        st4 [r32] = r14
        addl r14 = 4, r0
        ;;
        .mib
        st4 [r33] = r14
        nop 0
        br.ret.sptk.many b0
        .endp f#

The expected output is

f:
        .prologue
        .body
        .mii
        addl r14 = 7, r0
        addl r15 = 4, r0
        ;;
        nop 0
        .mmb
        st4 [r32] = r14
        st4 [r33] = r15
        br.ret.sptk.many b0
        .endp f#

In the .sched1 dump, I see the expected schedule:

;;        0-->     7 r341=0x7                          :2_A
;;        0-->     9 r342=0x4                          :2_A
;;        1-->     8 [in0]=r341                        :2_M_only_um23
;;        1-->    10 [in1]=r342                        :2_M_only_um23
;;   total time = 1

but in the .ira dump, the RTL has reverted back to the serial code.  Because of
the anti-dependency introduced by register allocation, the .mach dump shows an
inferior schedule:

;;        0-->     7 r14=0x7                           :2_A
;;        1-->     8 [r32]=r14                         :2_M_only_um23
;;        1-->    21 r14=0x4                           :2_A
;;        2-->    10 [r33]=r14                         :2_M_only_um23
;;        2-->    25 {return;use b0;}                  :2_B
;;   total time = 2

In GCC 4.3.2 and 3.4.6, I see the lreg pass likewise creating an inferior
schedule.  However, for PowerPC, MIPS, ARM, and FR-V, GCC 4.3.2 leaves the
initial schedule intact, whereas GCC 4.4.0 changes the order of insns in the
IRA pass for all targets.

For targets other than Itanium, I'm not sure this transformation in IRA
degrades performance, and it reduces register pressure, so it seems like a
positive change.  For Itanium, this degrades performance.  What's odd is that
the Itanium port had this behavior prior to GCC 4.4.0, while other ports did
not.  Is there some set of machine-specific parameters that the Itanium port
could tune to prevent this transformation in IRA (hopefully without degrading
performance elsewhere)?


-- 
           Summary: register allocator undoing optimal schedule
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: TabonyEE at austin dot rr dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: ia64-elf


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171

Reply via email to