https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #39 from CVS Commits ---
The releases/gcc-10 branch has been updated by Jiu Fu Guo
:
https://gcc.gnu.org/g:60bd3f20baebeeddd60f8a2b85927e7da7c6016e
commit r10-8327-g60bd3f20baebeeddd60f8a2b85927e7da7c6016e
Author: guojiufu
Date:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #38 from Jiu Fu Guo ---
(In reply to Thomas Koenig from comment #37)
> (In reply to Jiu Fu Guo from comment #36)
>
> Will you also backport to gcc 10, the other affected branch?
Yes, after it is stable on trunk, then backport to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #37 from Thomas Koenig ---
(In reply to Jiu Fu Guo from comment #36)
> The patch which restores cunroll behavior was committed.
Thanks!
Will you also backport to gcc 10, the other affected branch?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
Jiu Fu Guo changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #35 from CVS Commits ---
The master branch has been updated by Jiu Fu Guo :
https://gcc.gnu.org/g:557a40f599f64e40cc1b20254bf82acc775375f5
commit r11-1020-g557a40f599f64e40cc1b20254bf82acc775375f5
Author: guojiufu
Date: Thu May
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #34 from Jiu Fu Guo ---
As previous patch 6d099a76a0f6a040a3e678f2bce7fc69cc3257d8(rs6000: Enable
limited unrolling at -O2) only affects simple loops on rs6000.
We may also set limits for GIMPLE cunroll, like for RTL unroller
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #33 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #32)
> Note I don't think the unrolling is excessive - store motion then applying
> to all count[] and all computations hoisted out of the loop may be a bit
> too much
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #32 from Richard Biener ---
Note I don't think the unrolling is excessive - store motion then applying
to all count[] and all computations hoisted out of the loop may be a bit
too much for register pressure though, especially since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #31 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #28)
> > For the loop which has multi-exits, it may not helpful to unroll it,
> > especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c.
> > Since
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #30 from Richard Biener ---
(In reply to Thomas Koenig from comment #29)
> It is also interesting that this variant
>
> --- a/libgfortran/generated/in_pack_i4.c
> +++ b/libgfortran/generated/in_pack_i4.c
> @@ -88,7 +88,7 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #29 from Thomas Koenig ---
It is also interesting that this variant
--- a/libgfortran/generated/in_pack_i4.c
+++ b/libgfortran/generated/in_pack_i4.c
@@ -88,7 +88,7 @@ internal_pack_4 (gfc_array_i4 * source)
count[0]++;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #28 from Richard Biener ---
> It the growth limit seems could be refined. The ^ is an exponent operation,
> right?
Yes. The idea is to limit growth more when there is no benefit of unrolling
detected by the cost model (which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #27 from Jiu Fu Guo ---
(In reply to Jiu Fu Guo from comment #26)
> (In reply to Richard Biener from comment #20)
> > (In reply to Jiu Fu Guo from comment #18)
> > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > if the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #26 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #20)
> (In reply to Jiu Fu Guo from comment #18)
> > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > if the loop has multi-exits or upbound is not a fixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #25 from Jiu Fu Guo ---
(In reply to Richard Biener from comment #23)
> (In reply to Richard Biener from comment #20)
> > (In reply to Jiu Fu Guo from comment #18)
> > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #24 from rguenther at suse dot de ---
On Tue, 12 May 2020, tkoenig at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
>
> --- Comment #21 from Thomas Koenig ---
> (In reply to Richard Biener from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #23 from Richard Biener ---
(In reply to Richard Biener from comment #20)
> (In reply to Jiu Fu Guo from comment #18)
> > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > if the loop has multi-exits or upbound is not a fixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #22 from Thomas Koenig ---
Here are the details of how I tested this.
I generated the in_pack_r4.i and in_unpack_r4.i by adding -save-temps to the
Makefile options in ~/trunk-bin/powerpc64le-unknown-linux-gnu/libgfortran ,
then
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #21 from Thomas Koenig ---
(In reply to Richard Biener from comment #19)
> Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?).
Just plain -O2 (for size reasons), with matmul as an exception
where we add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #20 from Richard Biener ---
(In reply to Jiu Fu Guo from comment #18)
> Currently, I'm thinking to enhance GCC 'cunroll' as:
> if the loop has multi-exits or upbound is not a fixed number, we may not do
> 'complete unroll' for the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #19 from Richard Biener ---
Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?). Note we
see
Estimating sizes for loop 3
BB: 14, after_exit: 0
size: 1 _20 = count[n_95];
size: 1 _21 = _20 + 1;
size: 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #18 from Jiu Fu Guo ---
Currently, I'm thinking to enhance GCC 'cunroll' as:
if the loop has multi-exits or upbound is not a fixed number, we may not do
'complete unroll' for the loop, except -funroll-all-loops is specified.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #17 from Jiu Fu Guo ---
For this case, as you said, I also think it is better to avoid unrolling for
the loop.
'#pragma GCC unroll 1' could help to prevent the loop to be unrolled, even
someone compiles it with aggressive unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #16 from Thomas Koenig ---
Hi,
I was unable to find a performance problem, so I take back my
presumption of the original problem. I have checked two versions
of the preprocessed source, with
+#pragma GCC unroll 1
while
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #15 from Jiu Fu Guo ---
Hi Thomas,
Are you using a test case to check the performance? If you have, would you
please share it, then we can use it to tune a heuristic improvement for
cunroll.
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #14 from Thomas Koenig ---
Most Fortran arrays are one- or two-dimensional.
Assuming a 10*10 two-dimensional array that is being packed, the
condition will be tested 121 times and the loop body will be entered
12 times. Only once
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #13 from Jiu Fu Guo ---
In this case, the loop body execution is at most a given number, but not an
exact number. It would be only some iterations are executed at runtime. As
above said this may false for 'while (count[n] ==
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #12 from Jiu Fu Guo ---
> executed at most 13 times. Then the complete unroller could handle this loop.
Correction: 13+1 times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #11 from Jiu Fu Guo ---
In general, 'cunroll' could help performance visibly on some workload, like
SPEC.
In this case, it may be in question if the performance is improved.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #10 from Jiu Fu Guo ---
For power, the patch enables -funroll-loops (with small loops unroller in RTL)
and which also enabled the 'cunroll'(complete unroller) on tree.
For this loop(the inner loop), 'cunroll' figures out that the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #9 from Thomas Koenig ---
Created attachment 48502
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48502=edit
Assembly file on x86 with -O2 -funroll-loops
So, it seems the decisions made for unrolling are bad for this case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
Thomas Koenig changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #7 from Thomas Koenig ---
Just checked aarch64, and that also isn't affected:
tkoenig@gcc116:~/gcc-bin/aarch64-unknown-linux-gnu/libgfortran$ objdump
--disassemble in_pack_i4.o | wc -l
95
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
Thomas Koenig changed:
What|Removed |Added
Summary|[11 Regression] Excessive |[10/11 Regression]
34 matches
Mail list logo