https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77848

            Bug ID: 77848
           Summary: Gimple if-conversion results in redundant comparisons
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wschmidt at gcc dot gnu.org
  Target Milestone: ---
            Target: powerpc64le-unknown-linux-gnu

Gimple if-conversion is aggressive about converting PHIs to conditional
expressions.  When these expressions are not vectorized, they remain in
conditional form throughout the middle end phases.  Sometimes such conditionals
do not correspond to any target instructions, so they must be re-expanded to
branching logic.  When this happens, and several conditionals have the same
condition, GCC doesn't manage to combine the redundant conditions (at least,
not always).

I suspect that if such unusable conditionals were converted back to branching
logic after failed vectorization, jump threading would be able to pick up the
pieces and generate good code again, but I'm not certain.

As an example, on powerpc64le-linux, consider this Fortran code:

$ gfortran -S -O3 -mcpu=power8 -mtune=power8 -funroll-loops -ffast-math
-mrecip=all d138.f

      subroutine sub(x,a,n,m)
      implicit none
      real*8 x(*),a(*),atemp
      integer i,j,k,m,n
      real*8 s,t,u,v
      do j=1,m
         atemp=0.d0
         do i=1,n
            if (abs(a(i)).gt.atemp) then
               atemp=a(i)
               k = i
            end if
         enddo
         call dummy(atemp,k)
      enddo
      return
      end

Prior to if-conversion, we have:

  <bb 7>:
  # i_29 = PHI <i_20(10), 1(6)>
  # atemp_lsm.3_7 = PHI <atemp_lsm.3_9(10), 0.0(6)>
  # atemp_lsm.4_6 = PHI <atemp_lsm.4_28(10), 0(6)>
  # k_lsm.5_27 = PHI <k_lsm.5_26(10), k_lsm.5_38(6)>
  _1 = (integer(kind=8)) i_29;
  _2 = _1 + -1;
  _3 = *a_17(D)[_2];
  _4 = ABS_EXPR <_3>;
  if (_4 > atemp_lsm.3_7)
    goto <bb 8>;
  else
    goto <bb 9>;

  <bb 8>:

  <bb 9>:
  # atemp_lsm.3_9 = PHI <atemp_lsm.3_7(7), _3(8)>
  # atemp_lsm.4_28 = PHI <atemp_lsm.4_6(7), 1(8)>
  # k_lsm.5_26 = PHI <k_lsm.5_27(7), i_29(8)>
  i_20 = i_29 + 1;
  if (_16 < i_20)
    goto <bb 11>;
  else
    goto <bb 10>;

Following if-conversion, the PHIs in <bb 9> have been converted into
conditional expressions in <bb 7>:

  <bb 7>:
  # i_29 = PHI <i_20(8), 1(6)>
  # atemp_lsm.3_7 = PHI <atemp_lsm.3_9(8), 0.0(6)>
  # atemp_lsm.4_6 = PHI <atemp_lsm.4_28(8), 0(6)>
  # k_lsm.5_27 = PHI <k_lsm.5_26(8), k_lsm.5_38(6)>
  _1 = (integer(kind=8)) i_29;
  _2 = _1 + -1;
  _3 = *a_17(D)[_2];
  _4 = ABS_EXPR <_3>;
  atemp_lsm.3_9 = _4 > atemp_lsm.3_7 ? _3 : atemp_lsm.3_7;
  atemp_lsm.4_28 = _4 > atemp_lsm.3_7 ? 1 : atemp_lsm.4_6;
  k_lsm.5_26 = _4 > atemp_lsm.3_7 ? i_29 : k_lsm.5_27;
  i_20 = i_29 + 1;
  if (_16 < i_20)
    goto <bb 9>;
  else
    goto <bb 8>;

Types of the vars in the converted expressions are:

  integer(kind=4) k_lsm.5;
  logical(kind=4) atemp_lsm.4;
  real(kind=8) atemp_lsm.3;

The vectorizer is unable to vectorize the loop (unsupported pattern), so these
conditionals stay in place until expand time.  The first of these corresponds
to a floating-point select statement, so it is fine.  But the other two perform
floating-point comparisons to select between either integer or logical values,
and there is no such instruction for POWER.

The resulting code is (one iteration of an unrolled loop):

.L20:                                                                           
        addi 8,3,1                                                              
        extsw 10,10                                                             
        extsw 3,8                                                               
        addi 4,4,8                                                              
.L42:                                                                           
        lfd 2,0(4)                                                              
        fabs 3,2                                                                
        fcmpu 7,3,6                                                             
        fsub 4,6,3                                                              
        fsel 5,4,6,2                                                            
        ble 7,.L23                                                              
        li 9,1                                                                  
.L23:                                                                           
        fcmpu 0,3,6                                                             
        rldicl 9,9,0,32                                                         
        ble 0,.L24                                                              
        mr 10,3                                                                 
.L24:                                                                           

We didn't use to if-convert these prior to r235436
(https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/tree-if-conv.c?r1=235436&r2=235435&pathrev=235436).
 Using GCC 6.2, we see the following preferable code:

.L46:
        addi 10,10,1
        addi 8,8,8
        extsw 10,10
.L37:
        lfd 3,0(8)
        fabs 4,3
        fcmpu 5,4,0
        ble 5,.L47
        fmr 12,3
        fmr 0,3
        mr 3,10
        li 4,1
        li 6,1
.L47:

The added if-conversion causes approximately 30% degradation in performance.

(I am not specifically blaming r235436; this just exposed the problem for this
particular case.)

Reply via email to