Compiling this code with 3.4.6
void fill2 (unsigned int *arr,  unsigned int val, unsigned int start, unsigned
int limit)
{
  unsigned int i;
  for (i = start; i < start + limit; i++)
    arr[i] = val;
}
generates: 
.L10:
        movl    %ecx, (%ebx,%eax,4)
        incl    %eax
.L8:
        cmpl    %eax, %edx
        ja      .L10
4.0/4.1/4.2 -O2 generate:

.L4:
        incl    %edx
        movl    %esi, (%eax)
        addl    $4, %eax
        cmpl    %ecx, %edx
        jne     .L4
which is both slower and bigger. 

using -O2 -fno-ivopts the result is much better:
.L4:
        movl    %ecx, (%ebx,%eax,4)
        incl    %eax
        cmpl    %edx, %eax
        jb      .L4

The difference in the .final_cleanup dump with and without ivopts is obvious:
With ivopts: 

  void * ivtmp.29;
  unsigned int ivtmp.26;
  unsigned int D.1290;

<bb 0>:
  D.1290 = start + limit;
  if (start < D.1290) goto <L6>; else goto <L2>;

<L6>:;
  ivtmp.29 = arr + (unsigned int *) (start * 4);
  ivtmp.26 = 0;

<L0>:;
  MEM[base: (unsigned int *) ivtmp.29] = val;
  ivtmp.26 = ivtmp.26 + 1;
  ivtmp.29 = ivtmp.29 + 4B;
  if (ivtmp.26 != D.1290 - start) goto <L0>; else goto <L2>;

<L2>:;
  return;

Without ivopts:
  unsigned int i;
  unsigned int D.1290;

<bb 0>:
  D.1290 = start + limit;
  if (start < D.1290) goto <L11>; else goto <L2>;

<L11>:;
  i = start;

<L0>:;
  *((unsigned int *) (i * 4) + arr) = val;
  i = i + 1;
  if (i < D.1290) goto <L0>; else goto <L2>;

<L2>:;
  return;


The   "void * ivtmp.29" is created by the ivopts pass. Why is it
a void* when it is known to be assigned to a unsigned int* ? 

Note that loops like the one in this example are quite common. For example in
the assembly for PR8361 there are about 37 "fill" functions with very similar
code (they are intantiations of 2 different templates, but still...)


-- 
           Summary: [4.0/4.1/4.2 regression] code quality regression due to
                    ivopts
           Product: gcc
           Version: 4.0.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27440

Reply via email to