disabling loop unrolling in GCC
Hi all, I'm trying, without success, to disable loop unrolling when compiling a program with -O3 with gcc (4.4, but I see the same problem with 4.3). The program is the following one: volatile int v; void func() { int i; for( i=0; i8; ++i ) { v=0; } } I compile it with the following command line: gcc -c -O3 test.c An objdump -S test.o gives: test.o: file format elf64-x86-64 Disassembly of section .text: func: 0: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# a func+0xa 7: 00 00 00 a: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 14 func+0x14 11: 00 00 00 14: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 1e func+0x1e 1b: 00 00 00 1e: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 28 func+0x28 25: 00 00 00 28: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 32 func+0x32 2f: 00 00 00 32: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 3c func+0x3c 39: 00 00 00 3c: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 46 func+0x46 43: 00 00 00 46: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 50 func+0x50 4d: 00 00 00 50: c3 retq If I compile with -O2, the results are: test.o: file format elf64-x86-64 Disassembly of section .text: func: 0: 31 c0 xor%eax,%eax 2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 8: 83 c0 01add$0x1,%eax b: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 15 func+0x15 12: 00 00 00 15: 83 f8 08cmp$0x8,%eax 18: 75 ee jne8 func+0x8 1a: f3 c3 repz retq Where it gets worrying is when I try to cancel loop unrolling. I tried -fno-unroll-loops and -fno-peel-loops, to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect. Even more worryingly, the documentation seems totally wrong. It claims (http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#index-O3-632) that -O3 is equal to -O2 plus -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize. Trying to compile with -O2 and the additional optimization options does not, however, unroll the loop, which suggests that -O3 differs from -O2 in another way as well. Help? Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
Just out of curiousity: why do you care about the resulting assembly? It's a strong indication that you are doing something wrong :) I would try to set i to volatile or to an extern to trick the compiler to drop the optimization (if the flags don't work). --Aviv 2009/12/21 Shachar Shemesh shac...@shemesh.biz: Hi all, I'm trying, without success, to disable loop unrolling when compiling a program with -O3 with gcc (4.4, but I see the same problem with 4.3). The program is the following one: volatile int v; void func() { int i; for( i=0; i8; ++i ) { v=0; } } I compile it with the following command line: gcc -c -O3 test.c An objdump -S test.o gives: test.o: file format elf64-x86-64 Disassembly of section .text: func: 0: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # a func+0xa 7: 00 00 00 a: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 14 func+0x14 11: 00 00 00 14: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 1e func+0x1e 1b: 00 00 00 1e: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 28 func+0x28 25: 00 00 00 28: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 32 func+0x32 2f: 00 00 00 32: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 3c func+0x3c 39: 00 00 00 3c: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 46 func+0x46 43: 00 00 00 46: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 50 func+0x50 4d: 00 00 00 50: c3 retq If I compile with -O2, the results are: test.o: file format elf64-x86-64 Disassembly of section .text: func: 0: 31 c0 xor %eax,%eax 2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 8: 83 c0 01 add $0x1,%eax b: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 15 func+0x15 12: 00 00 00 15: 83 f8 08 cmp $0x8,%eax 18: 75 ee jne 8 func+0x8 1a: f3 c3 repz retq Where it gets worrying is when I try to cancel loop unrolling. I tried -fno-unroll-loops and -fno-peel-loops, to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect. Even more worryingly, the documentation seems totally wrong. It claims (http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#index-O3-632) that -O3 is equal to -O2 plus -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize. Trying to compile with -O2 and the additional optimization options does not, however, unroll the loop, which suggests that -O3 differs from -O2 in another way as well. Help? Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il -- Joan Crawford - I, Joan Crawford, I believe in the dollar. Everything I earn, I spend. - http://www.brainyquote.com/quotes/authors/j/joan_crawford.html ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
Aviv Greenberg wrote: Just out of curiousity: why do you care about the resulting assembly? It's a strong indication that you are doing something wrong :) First, we have found several bugs in GCC as a result of caring about the assembly. Lets agree that it's an indication that someone is doing something wrong. The reason I'm trying to disable this optimization is because it causes the code to be too big to fit onto the available ROM on which the code needs to be flashed. The X86 version I gave here shows the problem, but is no the platform on which the problem was diagnosed. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
This is what i get if i set i to be volatile in gcc 4.3.1 with -O3: 0: 55 push %ebp 1: 89 e5 mov%esp,%ebp 3: 83 ec 10sub$0x10,%esp 6: c7 45 fc 00 00 00 00movl $0x0,-0x4(%ebp) d: 8b 45 fcmov-0x4(%ebp),%eax 10: 83 f8 07cmp$0x7,%eax 13: 7f 1e jg 33 func+0x33 15: 8d 76 00lea0x0(%esi),%esi 18: c7 05 00 00 00 00 00movl $0x0,0x0 1f: 00 00 00 22: 8b 45 fcmov-0x4(%ebp),%eax 25: 83 c0 01add$0x1,%eax 28: 89 45 fcmov%eax,-0x4(%ebp) 2b: 8b 45 fcmov-0x4(%ebp),%eax 2e: 83 f8 07cmp$0x7,%eax 31: 7e e5 jle18 func+0x18 33: c9 leave 34: c3 ret looks like i was right! On Mon, Dec 21, 2009 at 15:54, Shachar Shemesh shac...@shemesh.biz wrote: Aviv Greenberg wrote: Just out of curiousity: why do you care about the resulting assembly? It's a strong indication that you are doing something wrong :) First, we have found several bugs in GCC as a result of caring about the assembly. Lets agree that it's an indication that someone is doing something wrong. The reason I'm trying to disable this optimization is because it causes the code to be too big to fit onto the available ROM on which the code needs to be flashed. The X86 version I gave here shows the problem, but is no the platform on which the problem was diagnosed. Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com -- Ogden Nash - The trouble with a kitten is that when it grows up, it's always a cat. - http://www.brainyquote.com/quotes/authors/o/ogden_nash.html ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
On Monday 21 December 2009 14:00:39 Shachar Shemesh wrote: Where it gets worrying is when I try to cancel loop unrolling. I tried -fno-unroll-loops and -fno-peel-loops, to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect max-completely-peeled-insns is your friend This param's value is also the difference between -O3 and -O2 you were missing ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
Also, i tried grepping for loop and then negate all loop related params: linux-gec2:~/projects/lu # gcc -c -O3 -fno-align-loops -fno-move-loop-invariants -fno-peel-loops -fno-prefetch-loop-arrays -fno-rerun-cse-after-loop -fno-reschedule-modulo-scheduled-loops -fno-tree-loop-im -fno-tree-loop-ivcanon -fno-tree-loop-linear -fno-tree-loop-optimize -fno-tree-vect-loop-version -fno-unroll-all-loops -fno-unroll-loops -fno-unsafe-loop-optimizations -fno-unswitch-loops -fno-loop-optimize -fno-rerun-loop-opt main.c linux-gec2:~/projects/lu # objdump -S main.o main.o: file format elf32-i386 Disassembly of section .text: func: 0: 55 push %ebp 1: 31 c0 xor%eax,%eax 3: 89 e5 mov%esp,%ebp 5: 8d 76 00lea0x0(%esi),%esi 8: 83 c0 01add$0x1,%eax b: 83 f8 07cmp$0x7,%eax e: c7 05 00 00 00 00 00movl $0x0,0x0 15: 00 00 00 18: 7e ee jle8 func+0x8 1a: 5d pop%ebp 1b: c3 ret 1c: 8d 74 26 00 lea0x0(%esi,%eiz,1),%esi Just need to figure out which param is the good one :) On Mon, Dec 21, 2009 at 16:41, Dotan Shavit do...@shavitos.com wrote: On Monday 21 December 2009 14:00:39 Shachar Shemesh wrote: Where it gets worrying is when I try to cancel loop unrolling. I tried -fno-unroll-loops and -fno-peel-loops, to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect max-completely-peeled-insns is your friend This param's value is also the difference between -O3 and -O2 you were missing ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il -- Charles de Gaulle - The better I get to know men, the more I find myself loving dogs. - http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
2009/12/21 Shachar Shemesh shac...@shemesh.biz: Hi all, I'm trying, without success, to disable loop unrolling when compiling a program with -O3 with gcc (4.4, but I see the same problem with 4.3). I am actually very surprized that -O3 unrolls loops. It is not supposed to. The idea to include -funroll-loops into O3 was raised quite a few times and was always rejected. Maybe something changed in recent years. The documentation certainly does not say loop unrolling is enabled with either -O2 or -O3. I suspect something is the matter with -ftree-loop-optimize. The gcc documentation says, `-ftree-loop-optimize' Perform loop optimizations on trees. This flag is enabled by default at `-O' and higher. However, the behaviour depends on which optimization options you use. E.g., -O2 won't unroll no matter what: $ gcc -c -O2 -ftree-loop-optimize loop.c $ objdump -S loop.o loop.o: file format elf64-x86-64 Disassembly of section .text: func: 0: 31 c0 xor%eax,%eax 2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 8: 83 c0 01add$0x1,%eax b: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 15 func+0x15 12: 00 00 00 15: 83 f8 08cmp$0x8,%eax 18: 75 ee jne8 func+0x8 1a: f3 c3 repz retq However, try compiling with -O3 -fno-tree-loop-optimize and you will succeed. $ gcc -c -O3 -fno-tree-loop-optimize loop.c $ objdump -S loop.o loop.o: file format elf64-x86-64 Disassembly of section .text: func: 0: 31 c0 xor%eax,%eax 2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 8: 83 c0 01add$0x1,%eax b: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# 15 func+0x15 12: 00 00 00 15: 83 f8 07cmp$0x7,%eax 18: 7e ee jle8 func+0x8 1a: f3 c3 repz retq Or, if you are primarily interested in code size as you indicate, why not -Os? $ gcc -c -Os loop.c $ objdump -S loop.o loop.o: file format elf64-x86-64 Disassembly of section .text: func: 0: 31 c0xor%eax,%eax 2: ff c0 inc%eax 4: c7 05 00 00 00 00 00movl $0x0,0x0(%rip)# e func+0xe b: 00 00 00 e: 83 f8 08 cmp$0x8,%eax 11: 75 ef jne2 func+0x2 13: c3 retq Hope it helps, -- Oleg Goldshmidt | p...@goldshmidt.org ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: disabling loop unrolling in GCC
Dotan Shavit wrote: On Monday 21 December 2009 14:00:39 Shachar Shemesh wrote: Where it gets worrying is when I try to cancel loop unrolling. I tried -fno-unroll-loops and -fno-peel-loops, to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect max-completely-peeled-insns is your friend This param's value is also the difference between -O3 and -O2 you were missing Out of curiosity, how do you know that? Did you grep the gcc sources? Shachar -- Shachar Shemesh Lingnu Open Source Consulting Ltd. http://www.lingnu.com ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il