[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827 Andrew Pinski changed: What|Removed |Added Known to work||6.1.0 Known to fail||5.5.0 --- Comment #12 from Andrew Pinski --- Looks like the testcase in comment #5 is fixed for GCC 6+.
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #7 from martsummsw at hotmail dot com 2006-08-27 06:37 --- I am the reporter of this bug (with a new email-adress) This problem seems to be solved with 4.1.1 =) (Consider only #5 and forward - the first is wrong/irrelevant) -- martsummsw at hotmail dot com changed: What|Removed |Added CC||martsummsw at hotmail dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #8 from rguenth at gcc dot gnu dot org 2006-08-27 10:51 --- The problem is still visible on the mainline, unrolling the loop on the tree level pessimizes the generated code. Zdenek, maybe you can have a look at this (testcase in comment #5). -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||rakdver at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #9 from martsummsw at hotmail dot com 2006-08-27 13:21 --- Hmmm - I am (also) wrong when I claimed it was solved in 4.1.1. It is improved since the example that goes wrong in #5 now is right, but it is just the limit (for when the compiler gets comfused) that is pushed a bit. e.g. for (int bp=0;bp11;++bp) // Up to 11 is fine unrolled in gcc 4.1.1 However 12 and above e.g. for (int bp=0;bp12;++bp) // this still produces the poor performing code -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #10 from rakdver at gcc dot gnu dot org 2006-08-27 14:56 --- (In reply to comment #8) The problem is still visible on the mainline, unrolling the loop on the tree level pessimizes the generated code. Zdenek, maybe you can have a look at this (testcase in comment #5). I probably do not understand what the problem is supposed to be: In auto_unrolled_knight_count8, the loop is fully unrolled (i.e., the loop ceases to exist), and as expected, constant propagation makes sure that the compile-time resolvable conditions are eliminated. In auto_unrolled_knight_count9, the number of unrollings necessary to fully unroll the loop (9) is considered too high, hence the loop gets only partially unrolled (the body of the loop is copied three times). This time, there are no compile-time resolvable conditions. If you really want even large loops to be unrolled, you may play with --param max-completely-peeled-insns and --param max-completely-peel-times parameters. There were thoughts about providing pragmas to enable requiring more unrolling just for specific loops, but as far as I know, nobody is working on that just now. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827
[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll
--- Comment #11 from martsummsw at hotmail dot com 2006-08-27 19:33 --- You are right =) I recall I did play with some params in 3.4, but without result but I did not in 4.0 - since I did not expect a so (in my head) fairly low number to be large ... It would be real nice if gcc had an option forceing it to compile both unrolled and not unrolled versions of known sizes and at last deciding the speed gain contra the extra used space. In this case with e.g 14 iterations the space is not even doubled in space however the speed is increased with more than 400%. (I know gcc cannot know how much faster it is) The #pragma would also be real nice I could dream about a pragma with the following behaviour ... #pragma unroll-next-loop [guess x1,x2,x3] and if guess was used (for unknown sizes) it expanded for (int u=0;ux;u++) to switch(x) { case x1: unrolled x1 times break; case x2: unrolled x2 times break; and so on.. default: not unrolled ... } But to be realistic - you probaly have a lot of work and lot of better suggestions to put into gcc. So maybe this should just be close now. Thanks for reply! I am sorry I have wasted your time regards Thorbjørn -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827