[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P2
   Target Milestone|10.0|11.0

--- Comment #16 from Richard Biener  ---
Downgrading and re-targeting to GCC 11 (sorry).  The underlying issue is latent
since a long time while the actual testcase with SMS + partitioning on the
respective target worked before "by accident" and now no longer works.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #15 from Jakub Jelinek  ---
And I agree having such a named patterns (but also with standardized RTL in it,
so that jump.c can recognize those and redirect) looks like a good idea, we
could then enable partitioning if HAVE_LONG_UNCOND_BRANCH or this named pattern
exists and disable otherwise, and targets could then enable it one by one at
their own pace.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #14 from Jakub Jelinek  ---
I bet even i386 shouldn't claim to support it if ix86_cmodel == CM_LARGE ||
ix86_cmodel == CM_LARGE_PIC.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #13 from Richard Biener  ---
And for the moment we could declare !HAS_LONG_UNCOND_BRANCH as unsupported for
partitioning.  The list of unconditionally supported targets then is
just cr16, ft32, i386, m32c, moxie and pru.  aarch64 supports such branches
when aarch64_cmodel == AARCH64_CMODEL_TINY || aarch64_cmodel ==
AARCH64_CMODEL_TINY_PIC.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #12 from Richard Biener  ---
(In reply to Roman Zhuykov from comment #6)
> First, I want here to mention that Richard have recently discussed
> partitioning in mailing list with Segher, starting from
> https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00666.html
> 
> > I'll run some cross-testing to check how it works for now.
> Second, those tests have finished and there is nothing to say about the
> results. From correctness aspect everything looks fine.  But I haven't tried
> to look at each example where it prevents/allows loops to be scheduled, and
> know nothing about performance impact.
> 
> (In reply to rsand...@gcc.gnu.org from comment #4)
> > Yeah, it ought to be better to do mode switching first.  But I think
> > the more important ones are:
> >
> >        NEXT_PASS (pass_split_all_insns);
> >        NEXT_PASS (pass_lower_subreg3);
> >
> > Scheduling should happen on the split form of insns rather than the
> > unsplit form.  lower_subreg should also improve "schedulability".
> Agreed, so it would be much better to fix the issue conservatively without
> moving the pass.

As of pass ordering partitioning needs to run before RA but I don't see
why it necessarily needs to run that early.  Well, it works on CFG layout
mode and it's the "last" pass before we go out of CFG layout - but obviously
SMS goes back into so we could do the very same dance for partitioning
and only schedule it after SMS (or even later).

The whole complication arises from the direct CFG edges that are
implemented with indirect jumps and which we cannot easily redirect.
A more high-level representation of those "direct jumps" would make
redirection possible again (at the expense of micro-optimizing the
reg set + indirect jump between partitioning and the point we
expose the operation).  The only thing we need is the RA giving us
a scratch register we can use.  Maybe we can have a special named
pattern for those jumps allocating an extra scratch and have targets
either split them after reload into true indirect jumps or leave them
in place all the way to asm output.  So they'd appear as simple_jump_p
for a much longer time?

Currently we have

(insn 577 350 578 27 (set (reg:DI 309)
(high:DI (label_ref:DI 220))) "pr71550.c":19:20 -1
 (insn_list:REG_LABEL_OPERAND 220 (nil)))
(insn 578 577 579 27 (set (reg:DI 308)
(lo_sum:DI (reg:DI 309)
(label_ref:DI 220))) "pr71550.c":19:20 -1
 (insn_list:REG_LABEL_OPERAND 220 (expr_list:REG_EQUAL (label_ref:DI 220)
(nil
(jump_insn/j 579 578 220 27 (set (pc)
(reg:DI 308)) "pr71550.c":19:20 -1
 (nil)

not sure if jump_insns can have extra operands for the scratch but
we want part of the jump to be like

(jump_insn/j 579 578 220 27 (set (pc)
(label_ref:DI 220)) "pr71550.c":19:20 -1
 (nil)

but also mention (reg:DI 309) as scratch.

Ideas?  I guess doing

(jump_insn/j 579 578 220 27 [
  (set (pc) (label_ref:DI 220))
  (clobber (reg:DI 309))])

might work?  The pattern of the jump insn woudl be a parallel though
so not match simplejump_p.  There's also JUMP_LABEL, not sure if that's
usable.  There's condjump_in_parallel_p so we could add
simplejump_in_parallel_p as well and use that in CFG manipulation.

This woudl of course need support from targets but where partitioning
decides it needs to emit indirect jumps it could say that it needs
such target support or otherwise refuse to operate.  Currently
we look at HAS_LONG_COND_BRANCH / HAS_LONG_UNCOND_BRANCH where
only failure in the latter needs indirect jumps using "indirect_jump".
So we'd add sth like

(define_insn "crossing_jump"
  [(set (pc) (label_ref (match_operand 0 "" ""))
   (clobber (match_scratch:DI 1))]
  ""
  "br\\t%1"
  [(set_attr "type" "branch")]
)

with the assembly part adjusted accordingly (the move is missing).  As said
a late splitter may be the best approach (but that's up to the target).

And partitioning would require either HAS_LONG_UNCOND_BRANCH or
targetm.have_crossing_jump and we'd arrange for CFG manipulation to
handle this form.
Oh, and while these issues can trigger in a way appearing as P1 regression
they of course really are not new issues.  So yeah, not really P1.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-02 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #11 from Roman Zhuykov  ---
(In reply to Richard Biener from comment #9)

Thank you, I'm glad to see new ideas and some discussion.

> On the testcase itself
> 
> diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
> index 77254b31b42..66260fa34f1 100644
> --- a/gcc/modulo-sched.c
> +++ b/gcc/modulo-sched.c
> @@ -1347,8 +1347,7 @@ sms_schedule (void)
>edge latch_edge;
>HOST_WIDE_INT trip_count, max_trip_count;
>  
> -  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
> -  | LOOPS_HAVE_RECORDED_EXITS);
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
>if (number_of_loops (cfun) <= 1)
>  {
>loop_optimizer_finalize ();
> 
> works.
Unfortunately, this brokes the whole SMS workflow.  See e.g. loop_canon_p
function and a call to single_exit inside.  I've actually tried only improved
(with all my patches) master version and only few examples, but with
AVOID_CFG_MODIFICATIONS some of them bailout with "SMS loop many exits" instead
of succesfully passing analysis and transformation phases with "SMS succeeded".

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #10 from Richard Biener  ---
Makes me wonder if hot/cold splitting should use a special jump instruction
for crossing jumps which we could fixup/split very late so we see

 (parallel
(set reg (label_ref ..))
(set pc (reg))
(clobber reg))

or something like that.  That is, make sure the crossing jump, if
indirect, has the destination computation easily accessible and
replaceable.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #9 from Richard Biener  ---
Pilot error.

loop->header is in the cold partition, both latch sources are as well,
the loop entry source is in the hot partition.  We're correctly
redirecting that from hot -> cold to hot -> cold state so

  if (e->flags & EDGE_CROSSING
  && BB_PARTITION (e->src) == BB_PARTITION (dest)
  && simplejump_p (BB_END (src)))
{

doesn't apply anyways so we run into

edge
try_redirect_by_replacing_jump (edge e, basic_block target, bool in_cfglayout)
{
...
  /* If we are partitioning hot/cold basic blocks, we don't want to
 mess up unconditional or indirect jumps that cross between hot
 and cold sections.

 Basic block partitioning may result in some jumps that appear to
 be optimizable (or blocks that appear to be mergeable), but which really
 must be left untouched (they are required to make it safely across
 partition boundaries).  See  the comments at the top of
 bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

  if (BB_PARTITION (src) != BB_PARTITION (target))
return NULL;

where the referenced comment says

   IMPORTANT NOTE: This optimization causes some messy interactions
   with the cfg cleanup optimizations; those optimizations want to
   merge blocks wherever possible, and to collapse indirect jump
   sequences (change "A jumps to B jumps to C" directly into "A jumps
   to C").  Those optimizations can undo the jump fixes that
   partitioning is required to make (see above), in order to ensure
   that jumps attempting to cross section boundaries are really able
   to cover whatever distance the jump requires (on many architectures
   conditional or unconditional jumps are not able to reach all of
   memory).  Therefore tests have to be inserted into each such
   optimization to make sure that it does not undo stuff necessary to
   cross partition boundaries.  This would be much less of a problem
   if we could perform this optimization later in the compilation, but
   unfortunately the fact that we may need to create indirect jumps
   (through registers) requires that this optimization be performed
   before register allocation.

but any such fixup jumps would appear inside the original section
(and not crossing).  So preserving the crossing state should be a
good enough and better check?

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index fb551e3efc0..f2783b1d7ed 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1046,7 +1046,7 @@ try_redirect_by_replacing_jump (edge e, basic_block
target, bool in_cfglayout)
  partition boundaries).  See  the comments at the top of
  bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

-  if (BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (e->dest) != BB_PARTITION (target))
 return NULL;

   /* We can replace or remove a complex jump only when we have exactly

covers the testcase but will inhibit redirects allowed previously
when e->src is hot, e->dest was cold and target is now hot.  But that
should be covered by the special-case using simplejump_p.

Ah, OK, here the jump is an indirect one, so not simple.  And we'd
need to replace it with an indirect one for partitioning correctness
which we are not able to do here.

For the testcase at hand we could use sth else than make_forwarder_block,
like manually create a new BB, redirect all latches to it and then
create a new edge to the old header from it.  The redirects/creations
would be all in the same partition.  But then there may be the case
of a latch edge being crossing (a very cold latch in a hot loop) and
we'd face the very same issue.

Currently loop analysis thinks it can always make loops only have a single
latch so "failure" isn't an option here.

So we need to teach cfgrtl.c to redirect forming a similar instruction
as it was there before (create another indirect jump, but after reload
this needs a register - we don't know whether the jump target reg is
live).  Ugh.

On the testcase itself

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
index 77254b31b42..66260fa34f1 100644
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -1347,8 +1347,7 @@ sms_schedule (void)
   edge latch_edge;
   HOST_WIDE_INT trip_count, max_trip_count;

-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-  | LOOPS_HAVE_RECORDED_EXITS);
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
   if (number_of_loops (cfun) <= 1)
 {
   loop_optimizer_finalize ();

works.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #8 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #7)
> Unfortunately I can't reproduce on todays trunk, will try rewiding backwards
> to the reporting time to have a closer look.

Strange, I can (tried r10-7514).
./cc1 -quiet -nostdinc -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops
-ftracer -finline-functions -fmodulo-sched -freorder-blocks-and-partition
pr71550.c
x86_64-linux -> aarch64-linux cross with installed cross-binutils for the
target.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-04-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #7 from Richard Biener  ---
So let's try to address this in cfgloop.c - we're likely facing the situation
of

 header:
...
  if (...) goto latch1;

 latch2:
   goto header;

 latch1: // in cold section
   goto header;

where latch disambiguation via merge_latch_egdes tries to build

  header:
...
  if (...) goto latch1;

 latch2:
   goto latch3;

 latch1: // in cold section
   goto latch3;

 latch3: // somewhere
   goto header;

but somehow we end up redirecting a jump that was formerly crossing
to non-crossing.  Looking at the backtrace it must be entry edges
that are being redirected but the whole setup should be so that
the crossing state of a branch is never changed.

Unfortunately I can't reproduce on todays trunk, will try rewiding backwards
to the reporting time to have a closer look.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-02-21 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Roman Zhuykov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #6 from Roman Zhuykov  ---
First, I want here to mention that Richard have recently discussed partitioning
in mailing list with Segher, starting from
https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00666.html

> I'll run some cross-testing to check how it works for now.
Second, those tests have finished and there is nothing to say about the
results. From correctness aspect everything looks fine.  But I haven't tried to
look at each example where it prevents/allows loops to be scheduled, and know
nothing about performance impact.

(In reply to rsand...@gcc.gnu.org from comment #4)
> Yeah, it ought to be better to do mode switching first.  But I think
> the more important ones are:
>
>        NEXT_PASS (pass_split_all_insns);
>        NEXT_PASS (pass_lower_subreg3);
>
> Scheduling should happen on the split form of insns rather than the
> unsplit form.  lower_subreg should also improve "schedulability".
Agreed, so it would be much better to fix the issue conservatively without
moving the pass.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-02-12 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #5 from Roman Zhuykov  ---
Created attachment 47820
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47820&action=edit
Considered "moving sms earlier" patch

I haven't tested "moving sms earlier" patch since 2011.  But I remember there
were some testsuite items where it makes difference, not sure whether it was
arm or ia64 platform.  I decided those passes between substantially change the
loop code and DDG.

I'll run some cross-testing to check how it works for now.

> In general it's a bad idea to try go "back" to CFG layout mode and the fix is
> to not do that.
Also qouting https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00714.html:

> I think the expectation that you can go back to CFG layout mode
> and then work with CFG layout tools after we've lowered to CFG RTL
> is simply bogus.  Yeah, you can probably do analysis things but
> I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
> can wreck things.  Undoubtedly doing CFG manipulations is not going
> to work since CFG layout does not respect CFG RTL restrictions.
> Partitioning simply uncovered latent bugs, there's nothing wrong
> with it IMHO.
There are two more passes re-entering cfg layout: pass_reorder_blocks and
pass_machine_reorg.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-02-11 Thread rsandifo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

rsandifo at gcc dot gnu.org  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #4 from rsandifo at gcc dot gnu.org  
---
(In reply to Richard Biener from comment #3)
> In general it's a bad idea to try go "back" to CFG layout mode and the fix is
> to not do that.  Which probably means scheduling pass_sms earlier and indeed
> then best before pass_partition_blocks.  I don't see any reason to _not_ do
> this, the current pipeline is
> 
>   NEXT_PASS (pass_partition_blocks);
>   NEXT_PASS (pass_outof_cfg_layout_mode);
>   NEXT_PASS (pass_split_all_insns);
>   NEXT_PASS (pass_lower_subreg3);
>   NEXT_PASS (pass_df_initialize_no_opt);
>   NEXT_PASS (pass_stack_ptr_mod);
>   NEXT_PASS (pass_mode_switching);
>   NEXT_PASS (pass_match_asm_constraints);
>   NEXT_PASS (pass_sms);
>   NEXT_PASS (pass_live_range_shrinkage);
>   NEXT_PASS (pass_sched);
>   NEXT_PASS (pass_early_remat);
>   NEXT_PASS (pass_ira);
>   NEXT_PASS (pass_reload);
>   NEXT_PASS (pass_postreload);
>   PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> 
> and while I understand sms wants to run as close to RA as possible the
> passes between its current position and partitioning should not mess
> with the schedule (mode switching might insert code I think).

Yeah, it ought to be better to do mode switching first.  But I think
the more important ones are:

   NEXT_PASS (pass_split_all_insns);
   NEXT_PASS (pass_lower_subreg3);

Scheduling should happen on the split form of insns rather than the
unsplit form.  lower_subreg should also improve "schedulability".

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-02-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

--- Comment #3 from Richard Biener  ---
In general it's a bad idea to try go "back" to CFG layout mode and the fix is
to not do that.  Which probably means scheduling pass_sms earlier and indeed
then best before pass_partition_blocks.  I don't see any reason to _not_ do
this, the current pipeline is

  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_outof_cfg_layout_mode);
  NEXT_PASS (pass_split_all_insns);
  NEXT_PASS (pass_lower_subreg3);
  NEXT_PASS (pass_df_initialize_no_opt);
  NEXT_PASS (pass_stack_ptr_mod);
  NEXT_PASS (pass_mode_switching);
  NEXT_PASS (pass_match_asm_constraints);
  NEXT_PASS (pass_sms);
  NEXT_PASS (pass_live_range_shrinkage);
  NEXT_PASS (pass_sched);
  NEXT_PASS (pass_early_remat);
  NEXT_PASS (pass_ira);
  NEXT_PASS (pass_reload);
  NEXT_PASS (pass_postreload);
  PUSH_INSERT_PASSES_WITHIN (pass_postreload)

and while I understand sms wants to run as close to RA as possible the
passes between its current position and partitioning should not mess
with the schedule (mode switching might insert code I think).

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-02-05 Thread zhroma at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Roman Zhuykov  changed:

   What|Removed |Added

 CC||abel at gcc dot gnu.org

--- Comment #2 from Roman Zhuykov  ---
So I decide to share my thoughts here to prevent silently dropping this to P2
before gcc-10 release.  Some of the following is already presented in meta-bug
(PR90040).

(In reply to Jakub Jelinek from comment #1)
> It tries to redirect a crossing jump from cold to hot partition to a new bb
> that is in cold partition.
> The generic code really doesn't know what it should remove to replace it
> with the fallthru edge.
Thank you for looking at it.  In 2018 Andrey helps me with looking at PR85408
and PR85426 and we together came to the same conclusion.  That time we decide
RTL folks will eventually look at those bugs and find some solution.  A year
later I've opened a meta-bug, and now in 2020 we still have those issues.

> cfg_layout_redirect_edge_and_branch has code to handle this case, but only
> if the 
> branch is simplejump:
> 4429if (e->flags & EDGE_CROSSING
> 4430&& BB_PARTITION (e->src) == BB_PARTITION (dest)
> 4431&& simplejump_p (BB_END (src)))
> 4432  {
> 4433if (dump_file)
> 4434  fprintf (dump_file,
> 4435   "Removing crossing jump while redirecting edge form %i 
> to %i\n",
> 4436   e->src->index, dest->index);
> 4437delete_insn (BB_END (src));
> 4438remove_barriers_from_footer (src);
> 4439e->flags |= EDGE_FALLTHRU;
> 4440  }
Yep, and this code is rather new, it was also added in 2018 half a year before
PR87475 fix.  I mentioned the quoted hunk in meta-bug discussion in [Appendix
1].  Although Honza created that patch as a fix for PR84058 which was open as
code quality issue, same hunk also solved an ICE in PR83886.

Bug 84058 comment 9 lefts that PR open to track enhancements, but hopefully ICE
 issues will also have some attention.

> So, I'd say something in the caller or a few callers up should have punted
> on it before, but no idea what.
Quoting bug 85408 comment 2: "Honza, thoughts on this?" :)

Here I will also desribe some alternative ways to fix the issue, from simplest
to hardest. IMHO all those ideas are ugly.

(1) We can prohibit usage of -freorder-blocks-and-partition and -fmodulo-sched
together.  Maybe it should issue a note and enable only the pass which goes
last in compile options.  This behaviour will be much more confusing when my
sms changes goes in, which will allow to use sms on x86, where partitioning is
enabled by default.

(2) We can move pass_sms before pass_partition_blocks.  Althougt this will
eventually prevent another entering/exiting cfg_layout mode, it is a bad idea. 
It was discussed previously (see meta-bug), probably sms will become less
accurate.  It's better to keep sms pass as closer as possible to register
allocation pass.

(3) As mentioned in meta-bug [Appendix 2] we probably have some latent issues
with entering cfg_layout.  Sms doesn't guard entering, while all other passes
which run after partitioning have some logic like this (hw-doloop.c):
  /* We can't enter cfglayout mode anymore if basic block partitioning
 already happened.  */
  if (do_reorder && !crtl->has_bb_partition)
  ...
SMS wants loop_optimizer_init, and it may call loop_version when succeeded, so
it actually can't proceed without entering cfg_layout.  If we add a guard like
that, we will stop sms to process any function where has_bb_partition is true. 
Mostly this have same problems as solution (1), but it will also silently
prioritize partitioning over sms instead of noting the user.

(4) We can add another pass_sms item in passes.def before pass_partition_blocks
and add some confusing logic: when paritioning is enabled, sms will run
earlier. Certainly, it will run as usual otherwise.

(5) Approach (4) can be extended if we somehow split pass_partition_blocks into
separate 'analyze' and 'change_cfg' phases. Than "earlier sms" should take
effect only when 'analyze' is successfull.  And certainly as "earlier sms" will
itself change the cfg, we than have to re-run 'analyze' again.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-01-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-01-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-16
 CC||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
It tries to redirect a crossing jump from cold to hot partition to a new bb
that is in cold partition.
cfg_layout_redirect_edge_and_branch has code to handle this case, but only if
the 
branch is simplejump:
4429  if (e->flags & EDGE_CROSSING
4430  && BB_PARTITION (e->src) == BB_PARTITION (dest)
4431  && simplejump_p (BB_END (src)))
4432{
4433  if (dump_file)
4434fprintf (dump_file,
4435 "Removing crossing jump while redirecting edge form %i
to %i\n",
4436 e->src->index, dest->index);
4437  delete_insn (BB_END (src));
4438  remove_barriers_from_footer (src);
4439  e->flags |= EDGE_FALLTHRU;
4440}
which is not the case here:
(insn 577 350 578 27 (set (reg:DI 309)
(high:DI (label_ref:DI 220))) "pr71550.c":19:20 -1
 (insn_list:REG_LABEL_OPERAND 220 (nil)))
(insn 578 577 579 27 (set (reg:DI 308)
(lo_sum:DI (reg:DI 309)
(label_ref:DI 220))) "pr71550.c":19:20 -1
 (insn_list:REG_LABEL_OPERAND 220 (expr_list:REG_EQUAL (label_ref:DI 220)
(nil
(jump_insn/j 579 578 220 27 (set (pc)
(reg:DI 308)) "pr71550.c":19:20 -1
 (nil)
 -> 220)
The generic code really doesn't know what it should remove to replace it with
the fallthru edge.
So, I'd say something in the caller or a few callers up should have punted on
it before, but no idea what.

[Bug rtl-optimization/93264] [10 Regression] ICE in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4522

2020-01-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0