[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2023-05-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||6.1.0
  Known to fail||5.5.0

--- Comment #12 from Andrew Pinski  ---
Looks like the testcase in comment #5 is fixed for GCC 6+.

[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread martsummsw at hotmail dot com


--- Comment #7 from martsummsw at hotmail dot com  2006-08-27 06:37 ---
I am the reporter of this bug (with a new email-adress)
This problem seems to be solved with 4.1.1 =)

(Consider only #5 and forward - the first is wrong/irrelevant)


-- 

martsummsw at hotmail dot com changed:

   What|Removed |Added

 CC||martsummsw at hotmail dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread rguenth at gcc dot gnu dot org


--- Comment #8 from rguenth at gcc dot gnu dot org  2006-08-27 10:51 ---
The problem is still visible on the mainline, unrolling the loop on the tree
level pessimizes the generated code.  Zdenek, maybe you can have a look at this
(testcase in comment #5).


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rakdver at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread martsummsw at hotmail dot com


--- Comment #9 from martsummsw at hotmail dot com  2006-08-27 13:21 ---
Hmmm - I am (also) wrong when I claimed it was solved in 4.1.1. It is improved
since the example that goes wrong in #5 now is right, but it is just 
the limit (for when the compiler gets comfused) that is pushed a bit.

e.g.
for (int bp=0;bp11;++bp) 
// Up to 11 is fine unrolled in gcc 4.1.1 

However 12 and above e.g.
for (int bp=0;bp12;++bp) 
// this still produces the poor performing code


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread rakdver at gcc dot gnu dot org


--- Comment #10 from rakdver at gcc dot gnu dot org  2006-08-27 14:56 
---
(In reply to comment #8)
 The problem is still visible on the mainline, unrolling the loop on the tree
 level pessimizes the generated code.  Zdenek, maybe you can have a look at 
 this
 (testcase in comment #5).

I probably do not understand what the problem is supposed to be:

In auto_unrolled_knight_count8, the loop is fully unrolled (i.e., the loop
ceases to exist), and as expected, constant propagation makes sure that the
compile-time resolvable conditions are eliminated.

In auto_unrolled_knight_count9, the number of unrollings necessary to fully
unroll the loop (9) is considered too high, hence the loop gets only partially
unrolled (the body of the loop is copied three times).  This time, there are no
compile-time resolvable conditions.

If you really want even large loops to be unrolled, you may play with --param
max-completely-peeled-insns and --param max-completely-peel-times parameters. 
There were thoughts about providing pragmas to enable requiring more unrolling
just for specific loops, but as far as I know, nobody is working on that just
now.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827



[Bug tree-optimization/21827] unroll misses simple elimination - works with manual unroll

2006-08-27 Thread martsummsw at hotmail dot com


--- Comment #11 from martsummsw at hotmail dot com  2006-08-27 19:33 ---
You are right =) 

I recall I did play with some params in 3.4, but without result but I did
not in 4.0 - since I did not expect a so (in my head) fairly low number to be
large ...

It would be real nice if gcc had an option forceing it to compile both unrolled
and not unrolled versions of known sizes and at last deciding the speed gain
contra the extra used space. In this case with e.g 14 iterations the space is
not even doubled in space however the speed is increased with more than 400%.
(I know gcc cannot know how much faster it is)

The #pragma would also be real nice
I could dream about a pragma with the following behaviour ...
#pragma unroll-next-loop [guess x1,x2,x3] 

and if guess was used (for unknown sizes) it expanded for (int u=0;ux;u++)
to
switch(x)
{
  case x1:
unrolled x1 times
break;
  case x2:
unrolled x2 times
break;
  and so on..
  default: 
not unrolled ...
}


But to be realistic - you probaly have a lot of work and lot of better
suggestions to put into gcc. So maybe this should just be close now.

Thanks for reply! I am sorry I have wasted your time

regards
Thorbjørn


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21827