Re: [vox-tech] another gcc question

Mike Simons Wed, 27 Feb 2002 23:14:44 -0800

On Wed, Feb 27, 2002 at 09:25:41AM -0800, Peter Jay Salzman wrote:
>        -funroll-loops
>               Perform the optimization of loop  unrolling.   This
>               is  only  done for loops whose number of iterations
>               can be determined at compile time or run time.


  /* compile time */
  for (l = 100; l--;) do_something();

  /* run time */
  for (l = rand(); l--;) do_something();

>        -funroll-all-loops
>               Perform the optimization of loop  unrolling.   This
>               is done for all loops.  This usually makes programs
>               run more slowly.

  /* never known */
  while (do_something());


  There are thousands of variations of the three cases above... 
basically you have to ask yourself if the compiler knows for sure 
how many times it will run, else if the runtime system will know 
before the loop begins how many times to run, else if each loop 
iteration will decide if the next one runs.


Disclaimers:
  - All of this is user space trial and error, I've not actually looked
    at the gcc source.
  - Also, I don't know intel or gnu assembly, so if someone who does could 
    verify these observations that would be nice... 
  - Platform used was Debian/woody gcc 2.95.4 running on a Pentium.


  For case 1 (compile time) the compiler appears to do the following:
    - find the largest number that divides evenly into the loop counter 
      between 1 and 21... if the LCD isn't 1... then do this:
        for (l = loop_counter / LCD; l--;)
        {
          do_something();
          do_something();
          /* ...repeat LCD times... */
        }
      if however LCD *is* equal to 1, then behave the same as runtime
      case below;

  For case 2 (runtime) the compiler appears to:
    - if the loop body _modifies_ the loop counter in a non-constant way
      then use case 3 below...
    - otherwise break the code into something like the following case 
      statement:

      l = some_int;
    
      while (l)
      {
        switch (l > 4 ? 4 : l)
        {
          4: do_something();
          3: do_something();
          2: do_something();
          1: do_something();
          0: ;
        }
        l -= l > 4 : 4 : l;
      }

    
  For case 3 (never known) the compiler appears to:
  - Do something like the following...

  do {
    done = do_something() || do_something() || do_something() || 
           do_something() || do_something() || do_something() || 
           do_something() || do_something();
  } while (!done);


> ok, -funroll-all-loops usually makes programs run more slowly.  it also
> clearly makes the executable larger.
> 
> so what exactly are we optimizing here?

  Just theory... possible advantages:
  - Making it more likely to be able to fill all of the branch delay slots 
    on a machine with many slots.  (see other post on subject).
  - On some architectures taking a branch is (much) more expensive than not 
    taking unrolling will help, if an "never known" loop runs a long time.


On Wed, Feb 27, 2002 at 10:03:22AM -0800, ME wrote:
> $ gcc gcc -funroll-all-loops -S sample.c
[...]
> This would lead me to believe the generated asm, code is not unrolled if I
> understand the expectation of the unrolling process. 

  You are absolutely correct.

First, you need the following:
$ gcc -O -funroll-all-loops -S sample.c

  No optimizations are done unless optimizations are turned on.  Since
-funroll-all-loops is an optimizations without -O nothing happens.


  Second, the code sample doesn't actually do anything with m,
so with -O the compiler is smart enough to just remove the loops 
all together main becomes:
========
main:
        pushl %ebp
        movl %esp,%ebp
        xorl %eax,%eax
        leave
        ret
========

  I've changed the sample code to demo the three cases listed above
(see attachment).  If you look for in the assembly code for "call" ... 
in main, "printf" marks the boundary between block types, 
"do_something" shows you how the unrolling works out.

  Don't _run_ the sample code, it is just provided for viewing of assembly
code, the while loop is endless ... something more complex would
be needed to halt that loop and would just make the assembly produced
harder to understand.

    Later,
      Mike

ps:
  I noticed that people don't appear to be using text attachements much...
hope it's not against some list policy.

#include <stdio.h>
#include <stdlib.h>

int do_something(void)
{
  return printf(".");
}

int main(void)
{
  int l;

  /* compile time decision */
  for (l = 100; l--;) do_something();
  printf("\n");

  /* runtime decision */
  for (l = rand(); l--;) do_something();
  printf("\n");

  /* always unknown */
  while (do_something());
  printf("\n");

#if 0
  l = 100;

  while (l > 0)
  {
    switch (l > 4 ? 4 : l)
    {
      case 4: do_something();
      case 3: do_something();
      case 2: do_something();
      case 1: do_something();
      case 0: ;
    }
    l -= l > 4 ? 4 : l;
  }
#endif

  return 0;
}

        .file   "s.c"
        .version        "01.01"
gcc2_compiled.:
.section        .rodata
.LC0:
        .string "."
.text
        .align 4
.globl do_something
        .type    do_something,@function
do_something:
        pushl %ebp
        movl %esp,%ebp
        subl $8,%esp
        addl $-12,%esp
        pushl $.LC0
        call printf
        leave
        ret
.Lfe1:
        .size    do_something,.Lfe1-do_something
.section        .rodata
.LC1:
        .string "\n"
.text
        .align 4
.globl main
        .type    main,@function
main:
        pushl %ebp
        movl %esp,%ebp
        subl $20,%esp
        pushl %ebx
        movl $99,%ebx
        .p2align 4,,7
.L37:
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        call do_something
        addl $-20,%ebx
        cmpl $-1,%ebx
        jne .L37
        addl $-12,%esp
        pushl $.LC1
        call printf
        call rand
        movl %eax,%ebx
        addl $16,%esp
        subl $1,%ebx
        jc .L40
        movl %ebx,%eax
        notl %eax
        andl $3,%eax
        je .L42
        cmpl $3,%eax
        jge .L50
        cmpl $2,%eax
        jge .L51
        call do_something
        decl %ebx
.L51:
        call do_something
        decl %ebx
.L50:
        call do_something
        subl $1,%ebx
        jc .L40
        .p2align 4,,7
.L42:
        call do_something
        call do_something
        call do_something
        call do_something
        addl $-4,%ebx
        cmpl $-1,%ebx
        jne .L42
.L40:
        addl $-12,%esp
        pushl $.LC1
        call printf
        addl $16,%esp
        .p2align 4,,7
.L46:
        call do_something
        testl %eax,%eax
        jne .L46
        addl $-12,%esp
        pushl $.LC1
        call printf
        xorl %eax,%eax
        movl -24(%ebp),%ebx
        leave
        ret
.Lfe2:
        .size    main,.Lfe2-main
        .ident  "GCC: (GNU) 2.95.4  (Debian prerelease)"

Re: [vox-tech] another gcc question

Reply via email to