[Bug tree-optimization/92649] dead store elimination by iteration domain pruning

2019-11-26 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

--- Comment #8 from rguenther at suse dot de  ---
On Tue, 26 Nov 2019, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  CC||jakub at gcc dot gnu.org
> 
> --- Comment #7 from Jakub Jelinek  ---
> Or it could be handled by constexpr-like evaluation of the loops

But that's probibitly expensive for large number of iterations and
doesn't work for non-constant niters (but related ones for earlier
vs. later one)

[Bug tree-optimization/92649] dead store elimination by iteration domain pruning

2019-11-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Or it could be handled by constexpr-like evaluation of the loops, where we'd
evaluate the side-effects at compile time both to PHIs after loop and written
memory, and then FRE-like propagate that to the second loop if there is some
cheap simple expression with which it can be replaced.  Though, unsure if it
wouldn't handle only toy testcases like these.

[Bug tree-optimization/92649] dead store elimination

2019-11-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

--- Comment #6 from Richard Biener  ---
I wonder if it makes sense for a production compiler to do this kind of
transformation.  When presented with a program representation as SSA
on a CFG such transform would be quite ad-hoc or rather translating the
SSA + CFG representation into something a lot more high-level.  On that
high-level it would indeed be kind of a liveness analysis and dead code
removal by means of iteration domain adjustments.

I wonder whether the polyhedral representation that -floop-nest-optimize
uses provides any advantage here (we certainly do not transfer knowledge
that the polyhedral represents all uses of a specific array and thus
unneeded computes can be eliminated - but if we did, would it make such
transforms automagically by simply pruning the polyhedral?)

In GCC the closest "high-level" representation we try to build is the
RDG loop-distribution builds, but obviously that doesn't even cover
the iteration domain.

As said, we're still missing a classical loop fusion transform - such
transform would benefit from a better "high-level" representation of
("simple") "loops" as well (and ideally fusion would work in concert
with distribution, not as separate passes).  When we then have such
then pruning iteration domains from liveness analysis can be implemented
in such a framework (and liveness analysis should work on that high-level
representation).  Alternatively loop splitting sounds like a transform
this could piggy-back on (given liveness analysis is done), split the
domain and elide one of the loops as dead.

I would advise against trying to invent a new pass just for this particular
transform.

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

--- Comment #5 from Jiangning Liu  ---
Unrolling 1024 iterations would increase code size a lot, so usually we don't
do that. 1024 is only an example. Without knowing we could eliminate most of
them, we don't really want to do loop unrolling, I guess.

Yes. Assigning 5 to all a's elements is only an example as well. It could be
any random value or predefined number.

Let me give a more complicated case,

extern int rand(void);

#define LIVE_SIZE 100
#define DATA_SIZE 256

int f(void)
{
int a[DATA_SIZE], b[DATA_SIZE][DATA_SIZE];
int i,j;
long long s = 0;
int next;

for (i=0; i

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

--- Comment #4 from Richard Biener  ---
(In reply to Jiangning Liu from comment #3)
> It is a stupid test, but it is simplified from a real application.
> 
> To solve even more complicated scenario, this simple case needs to be
> addressed first.
> 
> If we change the case to be as below,
> 
> int f(void)
> {
> int i, a[1024], s=0;
> 
> for (i=0; i<1024; i++)
> a[i] = 5;
> 
> for (i=0; i<37; i++)
> s += a[i];
> return s;
> }
> 
> the loop peeling will not work, but compiler should still know the store to
> elements with index >= 37 can all be eliminated. Can any framework in GCC
> solve this problem?

No, not when faced with loops.  If both loops were completely unrolled then
DSE would do this job but as it stands DSE doesn't know how to adjust
a loops iteration space to elide dead stores.  In the above case splitting
the first loop into two so that the first iterates [0, 37[ and the second
[37, 1024[ would make the problem easier for DSE.  We could then also
fuse the loops, eliding a completely.  So I guess loop fusion might be
the actual transform we're looking after (which for enablement needs the
loop splitting).

I suppose the "real" application doesn't iniitalize a[i] to all 5?  But
the first loop is actually some kind of initialization?

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread jiangning.liu at amperecomputing dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

--- Comment #3 from Jiangning Liu  ---
It is a stupid test, but it is simplified from a real application.

To solve even more complicated scenario, this simple case needs to be addressed
first.

If we change the case to be as below,

int f(void)
{
int i, a[1024], s=0;

for (i=0; i<1024; i++)
a[i] = 5;

for (i=0; i<37; i++)
s += a[i];
return s;
}

the loop peeling will not work, but compiler should still know the store to
elements with index >= 37 can all be eliminated. Can any framework in GCC solve
this problem?

[Bug tree-optimization/92649] dead store elimination

2019-11-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-11-25
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed.  Note this would require DSE to adjust the iteration domain.

Peeling one iteratiou would "solve" that.

The testcase is also stupid, so what's the original motivating testcase?

[Bug tree-optimization/92649] dead store elimination

2019-11-24 Thread prathamesh3492 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92649

prathamesh3492 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||prathamesh3492 at gcc dot 
gnu.org

--- Comment #1 from prathamesh3492 at gcc dot gnu.org ---
This is likely dup of PR89332.

Thanks,
Prathamesh