[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-10 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #29 from Ilya Leoshkevich  ---
Created attachment 47463
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47463&action=edit
nop plugin

Hi Maxim,

Just to clear my conscience, could you please try the nop trick in your
setup?  I normally use the attached plugin for that.  Just build it and
add e.g.

-fplugin=$HOME/gcc-nop-plugin/libgcc_nop_plugin.so
-fplugin-arg-libgcc_nop_plugin-S_regmatch=4

to the compiler flags.

Best regards,
Ilya

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-10 Thread mkuvyrkov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #28 from Maxim Kuvyrkov  ---
(In reply to Ilya Leoshkevich from comment #27)
> With
> 
> -DSPEC_CPU -DNDEBUG -DPERL_CORE   -O3 -save-temps=obj
> -fopt-info-vec-optimized   -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64
> -fgnu89-inline
> 
> on gcc113 I can see 2% slowdown:
> 
> r277511 (without this fix): 880.09s
> r277515 (with this fix):897.85s
> 
> The function that degraded the most is indeed S_regmatch:
> 
> $ perf diff perf-9760321.data perf-44b2b4c.data
> 32.24%   exe[.] S_regmatch  
> 
>  8.92%   exe[.] S_find_byclass.isra.0   
> 
>  6.80%   +0.28%  libc-2.19.so   [.] 0x0007dec0  
> 
>  5.20%   exe[.] S_regtry
> 
> 
> However, the "shape" of S_regmatch did not change, that is, when all
> offsets and register numbers are replaced with "x" in the objdump
> output, the old and the new versions are identical.  This hints at some
> microarchitectural effect - aliasing in the branch predictor maybe?
> 
> From my perspective, this happens too often, so I use the following test
> to rule this out: just add a nop at the beginning of the problematic
> function. This changes all the offsets and makes aliasing situation
> completely different.  And indeed, by adding a single nop to S_regmatch,
> I get wildly different results (for now this is just 1 repeat, I will
> run best-of-3 overnight):
> 
> r277511 (without this fix): 929.1s
> r277515 (with this fix):931.48s

Hi Ilya,

Thanks for the analysis.  Doesn't seem like we can do anything useful about
this regression.

[For completeness, I see same 5% slowdown with "-O3 -funroll-loops" as with
plain -O3 on Cortex-A57.]

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-09 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #27 from Ilya Leoshkevich  ---
With

-DSPEC_CPU -DNDEBUG -DPERL_CORE   -O3 -save-temps=obj -fopt-info-vec-optimized 
 -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 -fgnu89-inline

on gcc113 I can see 2% slowdown:

r277511 (without this fix): 880.09s
r277515 (with this fix):897.85s

The function that degraded the most is indeed S_regmatch:

$ perf diff perf-9760321.data perf-44b2b4c.data
32.24%   exe[.] S_regmatch
 8.92%   exe[.] S_find_byclass.isra.0 
 6.80%   +0.28%  libc-2.19.so   [.] 0x0007dec0
 5.20%   exe[.] S_regtry  

However, the "shape" of S_regmatch did not change, that is, when all
offsets and register numbers are replaced with "x" in the objdump
output, the old and the new versions are identical.  This hints at some
microarchitectural effect - aliasing in the branch predictor maybe?

From my perspective, this happens too often, so I use the following test
to rule this out: just add a nop at the beginning of the problematic
function. This changes all the offsets and makes aliasing situation
completely different.  And indeed, by adding a single nop to S_regmatch,
I get wildly different results (for now this is just 1 repeat, I will
run best-of-3 overnight):

r277511 (without this fix): 929.1s
r277515 (with this fix):931.48s

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-09 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #26 from Ilya Leoshkevich  ---
Whoops, I accidentally used a script I normally use for big-endian
machines (s390 actually).  But gcc113 is an APM X-Gene Mustang board.
I'll try again with your flags and see if I can reproduce the regression
there.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-09 Thread mkuvyrkov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #25 from Maxim Kuvyrkov  ---
(In reply to Ilya Leoshkevich from comment #24)
> I got the following results on gcc113:
> 
> 400.perlbench
> 
> Compiler flags: -DSPEC_CPU -DNDEBUG -DPERL_CORE   -march=native -g -O3
> -funroll-loops -fopt-info-vec-optimized   -DSPEC_CPU -DNDEBUG
> -DPERL_CORE -DSPEC_CPU_LINUX -DSPEC_CPU_BIGENDIAN -D_GNU_SOURCE
> -DSPEC_CPU_LP64 -fno-strict-aliasing -std=gnu90
> 
> r277511 (without this fix): 884.11s
> r277515 (with this fix):874.93s
> 
> Maxim, could you please share compiler flags with which you are seeing the
> regression?

Hi Ilya,

Thank you for looking into this.

The flags were "-O3 -save-temps=obj -c -o av.o -DSPEC_CPU -DNDEBUG -DPERL_CORE
-DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 -fgnu89-inline" .  From
"-DSPEC_CPU_BIGENDIAN" I'm guessing a Power architecture, and I've confirmed
the regression on AArch64 Cortex-A57.

I'll start a run with "-funroll-loops -fopt-info-vec-optimized" to check if
they are making the problem go away.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-09 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #24 from Ilya Leoshkevich  ---
I got the following results on gcc113:

400.perlbench

Compiler flags: -DSPEC_CPU -DNDEBUG -DPERL_CORE   -march=native -g -O3
-funroll-loops -fopt-info-vec-optimized   -DSPEC_CPU -DNDEBUG -DPERL_CORE
-DSPEC_CPU_LINUX -DSPEC_CPU_BIGENDIAN -D_GNU_SOURCE -DSPEC_CPU_LP64
-fno-strict-aliasing -std=gnu90

r277511 (without this fix): 884.11s
r277515 (with this fix):874.93s

Maxim, could you please share compiler flags with which you are seeing the
regression?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #23 from Andrew Pinski  ---
(In reply to Maxim Kuvyrkov from comment #21)
> 
> Is there a way to fix the problem gcc-9-branch in less intrusive way?

Could this be an alignment issue?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-06 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #22 from Ilya Leoshkevich  ---
Hello Maxim,

Sorry about that!

I don't think it's possible to simply move jump threading back, since
it's not correct to have it where it used to be.  So I will build and
run the new and the old 400.perlbench on gcc compile farm and see what
else I can do about the difference.

Best regards,
Ilya

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-06 Thread mkuvyrkov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

Maxim Kuvyrkov  changed:

   What|Removed |Added

 CC||mkuvyrkov at gcc dot gnu.org

--- Comment #21 from Maxim Kuvyrkov  ---
(In reply to iii from comment #18)
> Author: iii
> Date: Mon Oct 28 13:09:54 2019
> New Revision: 277515
> 
> URL: https://gcc.gnu.org/viewcvs?rev=277515&root=gcc&view=rev
> Log:
> Move jump threading before reload

Hi Ilya,

This patch regresses performance of SPEC CPU2006's 400.perlbench on
aarch64-linux-gnu -O3 by 5% with most of the slowdown in the hottest function
S_regmatch (this is for gcc-9-branch).

benchmark,symbol,rel_sample,rel_size,results-0:sample,results-1:sample,results-0:size,results-1:size
400.perlbench,perlbench_base.default, 105,100,9281,9761,1281408,1285488
400.perlbench,[.] S_regmatch, 107,100,3641,3910,16460,16460

Is there a way to fix the problem gcc-9-branch in less intrusive way?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Jakub Jelinek  ---
.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-12-04 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #19 from Arseny Solokha  ---
I believe this is fixed on both affected branches for over a month, thus the PR
can be closed.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-28 Thread iii at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #18 from iii at gcc dot gnu.org ---
Author: iii
Date: Mon Oct 28 13:09:54 2019
New Revision: 277515

URL: https://gcc.gnu.org/viewcvs?rev=277515&root=gcc&view=rev
Log:
Move jump threading before reload

r266734 has introduced a new instance of jump threading pass in order to
take advantage of opportunities that combine opens up.  It was perceived
back then that it was beneficial to delay it after reload, since that
might produce even more such opportunities.

Unfortunately jump threading interferes with hot/cold partitioning.  In
the code from PR92007, it converts the following

  +-- 2/HOT +
  | |
  v v
3/HOT --> 5/HOT --> 8/HOT --> 11/COLD --> 6/HOT --EH--> 16/HOT
|   ^
|   |
+---+

into the following:

  +-- 2/HOT --+
  |   |
  v   v
3/HOT --> 8/HOT --> 11/COLD --> 6/COLD --EH--> 16/HOT

This makes hot bb 6 dominated by cold bb 11, and because of this
fixup_partitions makes bb 6 cold as well, which in turn makes EH edge
6->16 a crossing one.  Not only can't we have crossing EH edges, we are
also not allowed to introduce new crossing edges after reload in
general, since it might require extra registers on some targets.

Therefore, move the jump threading pass between combine and hot/cold
partitioning.  Building SPEC 2006 and SPEC 2017 with the old and the new
code indicates that:

* When doing jump threading right after reload, 3889 edges are threaded.
* When doing jump threading right after combine, 3918 edges are
  threaded.

This means this change will not introduce performance regressions.

gcc/ChangeLog:

2019-10-28  Ilya Leoshkevich  

Backport from mainline
PR rtl-optimization/92007
* cfgcleanup.c (thread_jump): Add an assertion that we don't
call it after reload if hot/cold partitioning has been done.
(class pass_postreload_jump): Rename to
pass_jump_after_combine.
(make_pass_postreload_jump): Rename to
make_pass_jump_after_combine.
* passes.def(pass_postreload_jump): Move before reload, rename
to pass_jump_after_combine.
* tree-pass.h (make_pass_postreload_jump): Rename to
make_pass_jump_after_combine.

gcc/testsuite/ChangeLog:

2019-10-28  Ilya Leoshkevich  

Backport from mainline
PR rtl-optimization/92007
* g++.dg/opt/pr92007.C: New test (from Arseny Solokha).

Added:
branches/gcc-9-branch/gcc/testsuite/g++.dg/opt/pr92007.C
Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/cfgcleanup.c
branches/gcc-9-branch/gcc/passes.def
branches/gcc-9-branch/gcc/testsuite/ChangeLog
branches/gcc-9-branch/gcc/tree-pass.h

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-28 Thread iii at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #17 from iii at gcc dot gnu.org ---
Author: iii
Date: Mon Oct 28 10:04:31 2019
New Revision: 277507

URL: https://gcc.gnu.org/viewcvs?rev=277507&root=gcc&view=rev
Log:
Move jump threading before reload

r266734 has introduced a new instance of jump threading pass in order to
take advantage of opportunities that combine opens up.  It was perceived
back then that it was beneficial to delay it after reload, since that
might produce even more such opportunities.

Unfortunately jump threading interferes with hot/cold partitioning.  In
the code from PR92007, it converts the following

  +-- 2/HOT +
  | |
  v v
3/HOT --> 5/HOT --> 8/HOT --> 11/COLD --> 6/HOT --EH--> 16/HOT
|   ^
|   |
+---+

into the following:

  +-- 2/HOT --+
  |   |
  v   v
3/HOT --> 8/HOT --> 11/COLD --> 6/COLD --EH--> 16/HOT

This makes hot bb 6 dominated by cold bb 11, and because of this
fixup_partitions makes bb 6 cold as well, which in turn makes EH edge
6->16 a crossing one.  Not only can't we have crossing EH edges, we are
also not allowed to introduce new crossing edges after reload in
general, since it might require extra registers on some targets.

Therefore, move the jump threading pass between combine and hot/cold
partitioning.  Building SPEC 2006 and SPEC 2017 with the old and the new
code indicates that:

* When doing jump threading right after reload, 3889 edges are threaded.
* When doing jump threading right after combine, 3918 edges are
  threaded.

This means this change will not introduce performance regressions.

gcc/ChangeLog:

2019-10-28  Ilya Leoshkevich  

PR rtl-optimization/92007
* cfgcleanup.c (thread_jump): Add an assertion that we don't
call it after reload if hot/cold partitioning has been done.
(class pass_postreload_jump): Rename to
pass_jump_after_combine.
(make_pass_postreload_jump): Rename to
make_pass_jump_after_combine.
* passes.def(pass_postreload_jump): Move before reload, rename
to pass_jump_after_combine.
* tree-pass.h (make_pass_postreload_jump): Rename to
make_pass_jump_after_combine.

gcc/testsuite/ChangeLog:

2019-10-28  Ilya Leoshkevich  

PR rtl-optimization/92007
* g++.dg/opt/pr92007.C: New test (from Arseny Solokha).

Added:
trunk/gcc/testsuite/g++.dg/opt/pr92007.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgcleanup.c
trunk/gcc/passes.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-pass.h

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #16 from Segher Boessenkool  ---
Oh doh, I am blind, apparently :-)

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-17 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #15 from Ilya Leoshkevich  ---
Created attachment 47059
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47059&action=edit
proposed fix (without renaming the pass so far)

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-17 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #14 from Ilya Leoshkevich  ---
Created attachment 47058
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47058&action=edit
temporary patch for finding out the number of threaded edges

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-17 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #13 from Segher Boessenkool  ---
I don't see a patch there?  If you have one, please attach it?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-17 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #12 from Ilya Leoshkevich  ---
> Well, it apparently has found new jump threading opportunities after
> partition_blocks.  Are such changes useful?  Does it happen often?

It's still combine that was responsible for this particular opportunity.
I've added a simple counter of threaded edges and built two compiler
versions: with and without the patch from comment 3. The value of the
counter is the same in both cases for the code from this bugreport.

Furthermore, I've built SPEC 2006 and SPEC 2017 with vanilla and patched
compilers and aggregated the counter values.

When doing jump threading right after reload, 3889 edges are threaded.
When doing jump threading right after combine, 3918 edges are threaded.

Both figures are more or less the same, we even end up losing some
opportunities if we delay jump threading.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-16 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #11 from Segher Boessenkool  ---
Well, it apparently has found new jump threading opportunities after
partition_blocks.  Are such changes useful?  Does it happen often?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-16 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #10 from Ilya Leoshkevich  ---
> Question is how to figure out which to do when.

I would always do the former before reload, and always the latter after
reload.

However, I have a concern regarding this approach: in more complicated
cases instead of just a single 11/COLD we might have a larger lump of
cold basic blocks.  In order to avoid introducing new crossing edges, we
would have to make them all hot (using e.g. a simple worklist
algorithm).  Is such an end result desirable?

I'd also still like to understand the motivation behind doing this pass
after reload.  When I introduced it in r266734, the only goal was to
clean up the CFG after combine.  I was advised to put it where it is
now, and back then I did not see any downsides to doing so.  But now
that this problem has arisen - what is the advantage of doing this after
the following 16 additional passes?  What would be the downside of doing
it between pass_combine and pass_partition_blocks?

  NEXT_PASS (pass_combine);
--
  NEXT_PASS (pass_if_after_combine);
  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_outof_cfg_layout_mode);
  NEXT_PASS (pass_split_all_insns);
  NEXT_PASS (pass_lower_subreg3);
  NEXT_PASS (pass_df_initialize_no_opt);
  NEXT_PASS (pass_stack_ptr_mod);
  NEXT_PASS (pass_mode_switching);
  NEXT_PASS (pass_match_asm_constraints);
  NEXT_PASS (pass_sms);
  NEXT_PASS (pass_live_range_shrinkage);
  NEXT_PASS (pass_sched);
  NEXT_PASS (pass_early_remat);
  NEXT_PASS (pass_ira);
  NEXT_PASS (pass_reload);
  NEXT_PASS (pass_postreload);
  PUSH_INSERT_PASSES_WITHIN (pass_postreload)
--
  NEXT_PASS (pass_postreload_jump);

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-15 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #9 from Segher Boessenkool  ---
(In reply to Ilya Leoshkevich from comment #7)
> Having eliminated bb 5, we cannot avoid making bb 6 cold, since this
> would violate CFG integrity: as far as I understand, it's important to
> maintain the property that cold bbs cannot dominate hot bbs.

But we can make bb 11 HOT, instead?

Question is how to figure out which to do when.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #8 from Segher Boessenkool  ---
The current two jump passes we have after reload are there for a reason.
Some targets will be very unhappy if you delete them.

Like Jakub says, you need to avoid doing stuff with crossing edges in
many cases, in the late passes.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #7 from Ilya Leoshkevich  ---
How can we do this here?

When we make a decision to eliminate bb 5, all the "nearby" edges are
hot.

Having eliminated bb 5, we cannot avoid making bb 6 cold, since this
would violate CFG integrity: as far as I understand, it's important to
maintain the property that cold bbs cannot dominate hot bbs.

So we would have to avoid eliminating bb 5 in the first place, and for
that we would need to analyze which consequences that would have w.r.t.
dominators and partitioning, and that might be costly.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #6 from Jakub Jelinek  ---
We can also just punt on crossing edges where needed.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #5 from Ilya Leoshkevich  ---
+1 regarding renaming, I just wanted to keep it simple here.

Landing pad issue aside, I'm beginning to wonder if we can have a jump
pass after reload at all?  For example, if hotness of a basic block
changes, and a related jump becomes a crossing one: can it be that on
some targets we would have to change a "simple" branching instruction
to a sequence that first fetches a target address from a literal pool?
And then, since reload has already completed, how do we allocate a
register for that?

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #4 from Segher Boessenkool  ---
Well it should at least be renamed then ;-)

But is that good anyway?  We then do not have a jump pass after reload
(and before split2 and pro/epi, i.e. shrink-wrapping) any more.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-11 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

--- Comment #3 from Ilya Leoshkevich  ---
Jump threading has converted this:

  +-- 2/HOT +
  | |
  v v
3/HOT --> 5/HOT --> 8/HOT --> 11/COLD --> 6/HOT --EH--> 16/HOT
|   ^
|   |
+---+

into this:

  +-- 2/HOT --+
  |   |
  v   v
3/HOT --> 8/HOT --> 11/COLD --> 6/COLD --EH--> 16/HOT

by eleminating bb 5.  This made bb 6 dominated by cold bb 11, and
because of this fixup_partitions made bb 6 cold as well, which in turn
made EH edge 6->16 a crossing one.

According to

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=176696

in this situation we need to create a cold landing pad.


But I wonder whether we could just do the following instead?

--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -439,6 +439,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ud_rtl_dce);
   NEXT_PASS (pass_combine);
   NEXT_PASS (pass_if_after_combine);
+  NEXT_PASS (pass_postreload_jump);
   NEXT_PASS (pass_partition_blocks);
   NEXT_PASS (pass_outof_cfg_layout_mode);
   NEXT_PASS (pass_split_all_insns);
@@ -455,7 +456,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_reload);
   NEXT_PASS (pass_postreload);
   PUSH_INSERT_PASSES_WITHIN (pass_postreload)
-  NEXT_PASS (pass_postreload_jump);
   NEXT_PASS (pass_postreload_cse);
   NEXT_PASS (pass_gcse2);
   NEXT_PASS (pass_split_after_reload);

This will fix this problem while retaining the benefits of the
additional jump threading pass.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-09 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

Ilya Leoshkevich  changed:

   What|Removed |Added

 CC||iii at linux dot ibm.com

--- Comment #2 from Ilya Leoshkevich  ---
I will have a look at this and try to adjust the CLEANUP_THREADING code.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-07
 CC||iii at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Started with r266734.
Guess CLEANUP_THREADING code isn't prepared to handle hot/cold partitioning,
which wasn't a problem before, when CLEANUP_THREADING was only invoked before
the hot/cold partitions are created.

[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)

2019-10-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.4