[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-12-13 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

Roman Zhuykov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |zhroma at gcc dot 
gnu.org

--- Comment #8 from Roman Zhuykov  ---
Fixed.

Yes, I'm closing the PR, I suppose this should not be backported.  It's a pity
we don't have an ability to create regression test for such situation.

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-12-13 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #7 from Arseny Solokha  ---
There are no backporting pending, I believe? If so, should this PR be closed
now?

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-12-13 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #6 from Roman Zhuykov  ---
Author: zhroma
Date: Fri Dec 13 17:02:53 2019
New Revision: 279375

URL: https://gcc.gnu.org/viewcvs?rev=279375=gcc=rev
Log:
modulo-sched: speed up DDG analysis (PR90001)

PR rtl-optimization/90001
* ddg.c (create_ddg): Init max_dist array for each node.
(free_ddg): Free max_dist array.
(create_ddg_edge): Use bool field instead of aux union.
(set_recurrence_length): Use prepared max_dist information instead
of calling longest_simple_path.
(create_scc): Remove graph argument, fill node's aux.count with
SCC id, and move set_recurrence_length call to...
(create_ddg_all_sccs): ...here, after filling all max_dist arrays
using Floyd–Warshall-like algorithm.
(update_dist_to_successors): Remove the whole function.
(longest_simple_path): Likewise.
* ddg.h (struct ddg_node): Add max_dist pointer.
(struct ddg_edge): Use bool field instead of unused aux union.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ddg.c
trunk/gcc/ddg.h

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-16 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #5 from Roman Zhuykov  ---
Retested patch separately, everything works.
Have found 2 more slow Fortran examples on (obsolete) spu platform and with
additional options like -O1/O2 -fomit-frame-pointer -funroll-loops -fpeel-loops
-ftracer -finline-functions

gfortran.dg/sms-2.f90 
gfortran.dg/forall_10.f90

Compilation|
time, sec  |unpatched vs patched| sms options
forall_10  |  35.02   0.66  | -fmodulo-sched
forall_10  |  87.44   0.52  | -fmodulo-sched -fmodulo-sched-allow-regmoves
sms-2  |  34.58   0.44  | -fmodulo-sched
sms-2  |  86.77   0.27  | -fmodulo-sched -fmodulo-sched-allow-regmoves

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-08 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #4 from Roman Zhuykov  ---
Thanks for testcase.
2-3 weeks ago I already caught and fixed this on my local branch, see some info
in the bottom.

Current algorithm which finds recurrence_length for all DDG strongly connected
components works in like O(N^6) time, where N in the number of nodes in DDG.
The time is so bad mostly for graphs with lots of edges, like almost N^2 edges.
Richard's suggestion is right - it will be still something like O(N^5) in worst
case even without bitmap overhead for such graphs. My proposed algorithm works
in O(N^3). Algorithm of finding SCCs itself is also not optimal (maybe up to
O(N^4)), but here it left untouched.

For some situations, when amount of edges is smaller (like equal to N), new
algorithm can be unfortunately slower than old one. But I think it's better
here to add some bail-out when we got more than 1000 nodes for example.

Before creating this patch, I tested special version of it, where both
approaches were in action and asserts were inserted to check that algorithms
results (longest_simple_path values) are absolutely the same. I can publish
this special version if needed.

I wonder how regression test can be created for such a situation?

[Testing]
Proposed patch with a bunch of ~25 other patches was tested a lot:
*(1) Bootstrapped and regtested on x86_64
*(2) Cross-compiler regtest on aarch64, arm, powerpc, powerpc64, ia64 and s390
*(3) Also done (1) and (2) with -fmodulo-sched enabled by default
*(4) Also done (1) and (2) with -fmodulo-sched and
-fmodulo-sched-allow-regmoves enabled by default
*(5) Moreover, all (1-4) was also done with 4.9, 5, 6, 7, and 8 branches, on
active branches an trunk date was 20190327.

More than 250 compiler instances built and tested in total (counting
both "unpatched" vs "patched").
No new failures which can relate to this algorithm were found.
"Special version" was also tested in practically same scenarios (and one more
week before, like 20190320), but not exactly all of them.

But still have to retest it separately without all my stuff :)

[PS] Last month I spent a lot of time updating my patches described here
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html and have locally added
several other patches, including this fix. My updated branches are not
published yet, because there are still some unsolved issues, I can't fix some
bugzilla PRs also. I'll try to add comments in another modulo-sched-related PRs
soon.

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-08 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #3 from Roman Zhuykov  ---
Created attachment 46099
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46099=edit
Proposed patch

Untested on trunk yet

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

Richard Biener  changed:

   What|Removed |Added

 Target|powerpc-*-linux-gnu |powerpc-*-linux-gnu
   ||powerpc64le-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-08
  Component|target  |rtl-optimization
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed also on ppc64le (thus likely on any doloop capable target which SMS
needs).  Note that SMS is unmaintained.

The issue is that set_recurrence_length is called with a graph with
18662 back-arcs and the longest_simple_path does work on the order
of the graph size because it uses sbitmaps.  Using bitmaps doesn't
help here though.  I guess the underlying algorithm for
set_recurrence_length simply isn't scalable and needs to be replaced.