http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
Richard Guenther rguenth at gcc dot gnu.org changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
Dominique d'Humieres dominiq at lps dot ens.fr changed:
What|Removed |Added
CC||irar at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #35 from Dominique d'Humieres dominiq at lps dot ens.fr
2011-09-16 15:42:15 UTC ---
This pr (as well as pr49006) seems to have been fixed between revisions 176696
and 177649. I am closing
pr49006 as fixed and I'll use this pr to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #34 from Dominique d'Humieres dominiq at lps dot ens.fr
2011-05-22 12:06:20 UTC ---
Created attachment 24325
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24325
reduced tests
The attached bzipped tar contains the files
--- Comment #33 from dominiq at lps dot ens dot fr 2008-04-23 21:26 ---
Created an attachment (id=15523)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15523action=view)
induct.f90 variants and their diff with the original file
The original diff's have space problems.
--
--- Comment #27 from dominiq at lps dot ens dot fr 2007-12-03 14:32 ---
I have had a look at the failure of gfortran.dg/array_1.f90 with patch #5. The
following reduced code gives the same failure:
! { dg-do run }
! PR 15553 : the array used to be filled with garbage
! this problem
--- Comment #28 from dominiq at lps dot ens dot fr 2007-12-03 14:33 ---
Created an attachment (id=14691)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14691action=view)
result of -fdump-tree-optimized with patch #5
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #26 from dominiq at lps dot ens dot fr 2007-12-03 14:08 ---
IMO, SLP should vectorize the sequence.
Uros,
What is the meaning of the above sentence?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #29 from dominiq at lps dot ens dot fr 2007-12-03 14:34 ---
Created an attachment (id=14692)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14692action=view)
result of -fdump-tree-optimized without patch #5
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #30 from ubizjak at gmail dot com 2007-12-03 16:30 ---
(In reply to comment #26)
IMO, SLP should vectorize the sequence.
What is the meaning of the above sentence?
Uh, sorry for being terse. If there are no loops, then straight-line
parallelization [SLP] should vectorize
--- Comment #31 from dominiq at lps dot ens dot fr 2007-12-03 18:58 ---
If there are no loops, then straight-line parallelization [SLP] should
vectorize
your manually unrolled sequence in comment #24.
Yes it should, but if does not after patch #5. The unanswered question so far
--- Comment #32 from irar at il dot ibm dot com 2007-12-04 06:56 ---
(In reply to comment #30)
Uh, sorry for being terse. If there are no loops, then straight-line
parallelization [SLP] should vectorize your manually unrolled sequence in
comment #24.
Currently only loop-aware SLP
--- Comment #25 from ubizjak at gmail dot com 2007-11-30 21:38 ---
(In reply to comment #24)
Then the loop is vectorized again.
IMO, SLP should vectorize the sequence.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #16 from dominiq at lps dot ens dot fr 2007-11-29 08:06 ---
A quick report of the comparison between the regression results for revision
130500 + patch in comment #5 + Tobias' patch for pr34262 and revision 130489 +
some patches applied to rev. 130500. I have the following
--- Comment #18 from dominiq at lps dot ens dot fr 2007-11-29 10:22 ---
I have had a look at what's happening for kepler.f90 (from the 2004 polyhedron
test suite?) and it looks like another missed vectorization: if I count the
mulpd in the kepler.s files, I find 24 before the patch and
--- Comment #17 from rguenth at gcc dot gnu dot org 2007-11-29 10:11
---
Doh, not only I missed to diff the chunk mentioned in comment #6, but I also
added the original unrolling pass, not the one only supposed to unroll inner
loops #)
So, change the passes.c hunk to
Index:
--- Comment #19 from dominiq at lps dot ens dot fr 2007-11-29 10:40 ---
Richard,
I am not sure to understand your patch in comment #17. I have already in
gcc/passes.c (after your patch in comment #5):
NEXT_PASS (pass_merge_phi);
NEXT_PASS (pass_vrp);
NEXT_PASS
--- Comment #20 from dominiq at lps dot ens dot fr 2007-11-29 11:00 ---
I have applied my interpretation of the first two changes in comment #17.
gfortran.dg/array_1.f90 still abort and induct.v3.f90 is still not vectorized.
The good news are that induct.f90 is still properly unrolled
--- Comment #21 from rguenther at suse dot de 2007-11-29 11:13 ---
Subject: Re: Missed optimizations
On Thu, 29 Nov 2007, dominiq at lps dot ens dot fr wrote:
Richard,
I am not sure to understand your patch in comment #17. I have already in
gcc/passes.c (after your patch in
--- Comment #22 from dominiq at lps dot ens dot fr 2007-11-29 11:16 ---
In top of the first two patches of comment #17, I have MOVED
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_complete_unrolli);
+ NEXT_PASS (pass_tree_loop_done);
to the first suggested place.
--- Comment #23 from dominiq at lps dot ens dot fr 2007-11-29 12:24 ---
In top of the first two patches of comment #17, I have MOVED
+ NEXT_PASS (pass_tree_loop_init);
+ NEXT_PASS (pass_complete_unrolli);
+ NEXT_PASS (pass_tree_loop_done);
to the second suggested place.
--- Comment #24 from dominiq at lps dot ens dot fr 2007-11-29 15:49 ---
I think I have now a partial understanding of what is happening for the induct
variants that do not vectorize with the patch in comment #5: they do not
contain any loop inside the k loop. If I replace
--- Comment #1 from dominiq at lps dot ens dot fr 2007-11-28 15:30 ---
Created an attachment (id=14654)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14654action=view)
Diffs between the original file and the simplest variants
In induct.v1.f90 'nominator' and 'denominator' are
--- Comment #2 from rguenth at gcc dot gnu dot org 2007-11-28 16:06 ---
GCC doesn't have a facility to split the inner loop and move it out of the
outer loops by introducing a array temporary.
As for completely unrolling, this only happens for innermost loops(?) and you
can tune the
--- Comment #3 from dominiq at lps dot ens dot fr 2007-11-28 16:14 ---
Note that complete unrolling happens too late to help LIM or vectorization.
Could this be translated as a YES to my first question: the fortran frontend
should unroll computations for short vectors?
--
--- Comment #4 from rguenth at gcc dot gnu dot org 2007-11-28 16:17 ---
I would in principle say no - we can instead improve the middle-end here. But
it may pay off to not generate a loop for short vectors in case the resulting
IL is smaller for example. Of course it would duplicate
--- Comment #5 from rguenth at gcc dot gnu dot org 2007-11-28 16:33 ---
Created an attachment (id=14655)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14655action=view)
patch for early complete unrolling of inner loops
For example with a patch like this.
--
--- Comment #6 from dominiq at lps dot ens dot fr 2007-11-28 18:18 ---
Subject: Re: Missed optimizations
For example with a patch like this.
You also need
--- ../_gcc_clean/gcc/tree-flow.h 2007-11-16 16:17:46.0 +0100
+++ ../gcc-4.3-work/gcc/tree-flow.h 2007-11-28
--- Comment #7 from dominiq at lps dot ens dot fr 2007-11-28 18:48 ---
Subject: Re: Missed optimizations
With your patch the runtime went from
93.670u 0.103s 1:33.85 99.9%0+0k 0+0io 32pf+0w
to
38.741u 0.038s 0:38.85 99.7%0+0k 0+1io 32pf+0w
Pretty impressive!
Note that
--- Comment #8 from jb at gcc dot gnu dot org 2007-11-28 20:48 ---
The vectorization of dot products is covered by PR31738, I suppose
--
jb at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #9 from burnus at gcc dot gnu dot org 2007-11-28 21:27 ---
With your patch the runtime went from
93.670u 0.103s 1:33.85 99.9%0+0k 0+0io 32pf+0w
to
38.741u 0.038s 0:38.85 99.7%0+0k 0+1io 32pf+0w
Thus: 59% faster. Here, it only went ~30% down from 49.89s to
--- Comment #10 from rguenth at gcc dot gnu dot org 2007-11-28 22:05
---
Indeed - unexpectedly impressive ;) The patch has (obviously) received no
tuning
as of the placement of the early unrolling in the pass pipeline and early
unrolling is only done if that doesn't increase code-size
--- Comment #11 from dominiq at lps dot ens dot fr 2007-11-28 22:35 ---
Here are the timings before and after the patch for the polyhedron tests and
some variants:
Before patch After patch
Benchmark Ave Run Number Estim: Ave Run
--- Comment #12 from steven at gcc dot gnu dot org 2007-11-28 22:49 ---
The only timings significantly changed are actually the compile times, which go
up significantly.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34265
--- Comment #13 from kargl at gcc dot gnu dot org 2007-11-28 23:06 ---
(In reply to comment #12)
The only timings significantly changed are actually the compile times, which
go
up significantly.
Look at the kepler execution time. 22.73 s without the patch and
26.11 s with the
--- Comment #14 from steven at gcc dot gnu dot org 2007-11-28 23:17 ---
Yes, that too. It was more a sarcastic addendum to your remark that there were
so few significantly changed numbers. It seemed to me you should not look at
just the execution times ;-)
--
--- Comment #15 from dominiq at lps dot ens dot fr 2007-11-28 23:57 ---
If I am allowed to be sacarstic too, I'll say that the increase in compile time
(worst case 11%, arithmetic average 5%) is not against the current trend one
can see for instance in
37 matches
Mail list logo