Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-23 Thread Kewen.Lin
on 2023/11/23 16:20, Richard Biener wrote:
> On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin  wrote:
>>
>> on 2023/11/22 18:25, Richard Biener wrote:
>>> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:

 on 2023/11/17 20:55, Alexander Monakov wrote:
>
> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>> to put it early in schedule_insns.
>>
>> Thanks for the suggestion, I placed it at the beginning of 
>> haifa_sched_init
>> instead, since schedule_insns invokes haifa_sched_init, although the
>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>> ahead but they are all "setup" functions, shouldn't affect or be affected
>> by this placement.
>
> I was worried because sched_init invokes df_analyze, and I'm not sure if
> cfg_cleanup can invalidate it.

 Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
 considers df, like compact_blocks checks df, try_optimize_cfg invokes
 df_analyze etc., but I agree that moving cleanup_cfg before sched_init
 makes more sense.

>
>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>
>> By looking into some failures, I found that although cleanup_cfg is 
>> executed
>> there would be still some empty blocks left, by analyzing a few failures 
>> there
>> are at least such cases:
>>   1. empty function body
>>   2. block holding a label for return.
>>   3. block without any successor.
>>   4. block which becomes empty after scheduling some other block.
>>   5. block which looks mergeable with its always successor but left.
>>   ...
>>
>> For 1,2, there is one single successor EXIT block, I think they don't 
>> affect
>> state transition, for 3, it's the same.  For 4, it depends on if we can 
>> have
>> the assumption this kind of empty block doesn't have the chance to have 
>> debug
>> insn (like associated debug insn should be moved along), I'm not sure.  
>> For 5,
>> a reduced test case is:
>
> Oh, I should have thought of cases like these, really sorry about the slip
> of attention, and thanks for showing a testcase for item 5. As Richard as
> saying in his response, cfg_cleanup cannot be a fix here. The thing to 
> check
> would be changing no_real_insns_p to always return false, and see if the
> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> a non-bootstrapped compiler are still informative).

 As you suggested, I forced no_real_insns_p to return false all the time, 
 some
 issues got exposed, almost all of them are asserting NOTE_P insn shouldn't 
 be
 encountered in those places, so the adjustments for most of them are just 
 to
 consider NOTE_P or this kind of special block and so on.  One draft patch 
 is
 attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
 btw, it's without the previous cfg_cleanup adjustment (hope it can get more
 empty blocks and expose more issues).  The draft isn't qualified for code
 review but I hope it can provide some information on what kinds of changes
 are needed for the proposal.  If this is the direction which we all agree 
 on,
 I'll further refine it and post a formal patch.  One thing I want to note 
 is
 that this patch disable one assertion below:

 diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
 index e5964f54ead..abd334864fb 100644
 --- a/gcc/sched-rgn.cc
 +++ b/gcc/sched-rgn.cc
 @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
  }

/* Sanity check: verify that all region insns were scheduled.  */
 -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
 +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);

sched_finish_ready_list ();

 Some cases can cause this assertion to fail, it's due to the mismatch on
 to-be-scheduled and scheduled insn counts.  The reason why it happens is 
 that
 one block previously has only one INSN_P but while scheduling some other 
 blocks
 it gets moved as well then we ends up with an empty block so that the only
 NOTE_P insn was counted then, but since this block isn't empty initially 
 and
 NOTE_P gets skipped in a normal block, the count to-be-scheduled can't 
 count
 it in.  It can be fixed with special-casing this kind of block for counting
 like initially recording which block is empty and if a block isn't recorded
 before then fix up the count for it accordingly.  I'm not sure if someone 
 may
 have an argument that all the complication make this proposal beaten by
 previous special-casing debug insn approach, looking forward to more 
 comments.
>>>
>>> Just a comment that the NOTE_P thing is odd - do 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-23 Thread Richard Biener
On Thu, Nov 23, 2023 at 4:02 AM Kewen.Lin  wrote:
>
> on 2023/11/22 18:25, Richard Biener wrote:
> > On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:
> >>
> >> on 2023/11/17 20:55, Alexander Monakov wrote:
> >>>
> >>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
> 
>  Thanks for the suggestion, I placed it at the beginning of 
>  haifa_sched_init
>  instead, since schedule_insns invokes haifa_sched_init, although the
>  calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>  ahead but they are all "setup" functions, shouldn't affect or be affected
>  by this placement.
> >>>
> >>> I was worried because sched_init invokes df_analyze, and I'm not sure if
> >>> cfg_cleanup can invalidate it.
> >>
> >> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> >> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> >> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> >> makes more sense.
> >>
> >>>
> > I suspect this may be caused by invoking cleanup_cfg too late.
> 
>  By looking into some failures, I found that although cleanup_cfg is 
>  executed
>  there would be still some empty blocks left, by analyzing a few failures 
>  there
>  are at least such cases:
>    1. empty function body
>    2. block holding a label for return.
>    3. block without any successor.
>    4. block which becomes empty after scheduling some other block.
>    5. block which looks mergeable with its always successor but left.
>    ...
> 
>  For 1,2, there is one single successor EXIT block, I think they don't 
>  affect
>  state transition, for 3, it's the same.  For 4, it depends on if we can 
>  have
>  the assumption this kind of empty block doesn't have the chance to have 
>  debug
>  insn (like associated debug insn should be moved along), I'm not sure.  
>  For 5,
>  a reduced test case is:
> >>>
> >>> Oh, I should have thought of cases like these, really sorry about the slip
> >>> of attention, and thanks for showing a testcase for item 5. As Richard as
> >>> saying in his response, cfg_cleanup cannot be a fix here. The thing to 
> >>> check
> >>> would be changing no_real_insns_p to always return false, and see if the
> >>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> >>> a non-bootstrapped compiler are still informative).
> >>
> >> As you suggested, I forced no_real_insns_p to return false all the time, 
> >> some
> >> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't 
> >> be
> >> encountered in those places, so the adjustments for most of them are just 
> >> to
> >> consider NOTE_P or this kind of special block and so on.  One draft patch 
> >> is
> >> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> >> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> >> empty blocks and expose more issues).  The draft isn't qualified for code
> >> review but I hope it can provide some information on what kinds of changes
> >> are needed for the proposal.  If this is the direction which we all agree 
> >> on,
> >> I'll further refine it and post a formal patch.  One thing I want to note 
> >> is
> >> that this patch disable one assertion below:
> >>
> >> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> >> index e5964f54ead..abd334864fb 100644
> >> --- a/gcc/sched-rgn.cc
> >> +++ b/gcc/sched-rgn.cc
> >> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
> >>  }
> >>
> >>/* Sanity check: verify that all region insns were scheduled.  */
> >> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> >>
> >>sched_finish_ready_list ();
> >>
> >> Some cases can cause this assertion to fail, it's due to the mismatch on
> >> to-be-scheduled and scheduled insn counts.  The reason why it happens is 
> >> that
> >> one block previously has only one INSN_P but while scheduling some other 
> >> blocks
> >> it gets moved as well then we ends up with an empty block so that the only
> >> NOTE_P insn was counted then, but since this block isn't empty initially 
> >> and
> >> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't 
> >> count
> >> it in.  It can be fixed with special-casing this kind of block for counting
> >> like initially recording which block is empty and if a block isn't recorded
> >> before then fix up the count for it accordingly.  I'm not sure if someone 
> >> may
> >> have an argument that all the complication make this proposal beaten by
> >> previous special-casing debug insn approach, looking forward to more 
> >> comments.
> >
> > Just a comment that the NOTE_P thing is odd - do we only ever have those for
> > otherwise empty 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-22 Thread Kewen.Lin
on 2023/11/22 18:25, Richard Biener wrote:
> On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:
>>
>> on 2023/11/17 20:55, Alexander Monakov wrote:
>>>
>>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
> I don't think you can run cleanup_cfg after sched_init. I would suggest
> to put it early in schedule_insns.

 Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
 instead, since schedule_insns invokes haifa_sched_init, although the
 calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
 ahead but they are all "setup" functions, shouldn't affect or be affected
 by this placement.
>>>
>>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>>> cfg_cleanup can invalidate it.
>>
>> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
>> considers df, like compact_blocks checks df, try_optimize_cfg invokes
>> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
>> makes more sense.
>>
>>>
> I suspect this may be caused by invoking cleanup_cfg too late.

 By looking into some failures, I found that although cleanup_cfg is 
 executed
 there would be still some empty blocks left, by analyzing a few failures 
 there
 are at least such cases:
   1. empty function body
   2. block holding a label for return.
   3. block without any successor.
   4. block which becomes empty after scheduling some other block.
   5. block which looks mergeable with its always successor but left.
   ...

 For 1,2, there is one single successor EXIT block, I think they don't 
 affect
 state transition, for 3, it's the same.  For 4, it depends on if we can 
 have
 the assumption this kind of empty block doesn't have the chance to have 
 debug
 insn (like associated debug insn should be moved along), I'm not sure.  
 For 5,
 a reduced test case is:
>>>
>>> Oh, I should have thought of cases like these, really sorry about the slip
>>> of attention, and thanks for showing a testcase for item 5. As Richard as
>>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>>> would be changing no_real_insns_p to always return false, and see if the
>>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>>> a non-bootstrapped compiler are still informative).
>>
>> As you suggested, I forced no_real_insns_p to return false all the time, some
>> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
>> encountered in those places, so the adjustments for most of them are just to
>> consider NOTE_P or this kind of special block and so on.  One draft patch is
>> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
>> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
>> empty blocks and expose more issues).  The draft isn't qualified for code
>> review but I hope it can provide some information on what kinds of changes
>> are needed for the proposal.  If this is the direction which we all agree on,
>> I'll further refine it and post a formal patch.  One thing I want to note is
>> that this patch disable one assertion below:
>>
>> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
>> index e5964f54ead..abd334864fb 100644
>> --- a/gcc/sched-rgn.cc
>> +++ b/gcc/sched-rgn.cc
>> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>>  }
>>
>>/* Sanity check: verify that all region insns were scheduled.  */
>> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>>
>>sched_finish_ready_list ();
>>
>> Some cases can cause this assertion to fail, it's due to the mismatch on
>> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
>> one block previously has only one INSN_P but while scheduling some other 
>> blocks
>> it gets moved as well then we ends up with an empty block so that the only
>> NOTE_P insn was counted then, but since this block isn't empty initially and
>> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
>> it in.  It can be fixed with special-casing this kind of block for counting
>> like initially recording which block is empty and if a block isn't recorded
>> before then fix up the count for it accordingly.  I'm not sure if someone may
>> have an argument that all the complication make this proposal beaten by
>> previous special-casing debug insn approach, looking forward to more 
>> comments.
> 
> Just a comment that the NOTE_P thing is odd - do we only ever have those for
> otherwise empty BBs?  How are they skipped otherwise (and why does that not
> work for otherwise empty BBs)?

Yes, previously (bypassing empty BBs) there is no chance to encounter NOTE_P
when scheduling insns, as for notes in normal BBs, when setting up the head
and tail, some are skipped (like get_ebb_head_tail), and there are also 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-22 Thread Richard Biener
On Wed, Nov 22, 2023 at 10:31 AM Kewen.Lin  wrote:
>
> on 2023/11/17 20:55, Alexander Monakov wrote:
> >
> > On Fri, 17 Nov 2023, Kewen.Lin wrote:
> >>> I don't think you can run cleanup_cfg after sched_init. I would suggest
> >>> to put it early in schedule_insns.
> >>
> >> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> >> instead, since schedule_insns invokes haifa_sched_init, although the
> >> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> >> ahead but they are all "setup" functions, shouldn't affect or be affected
> >> by this placement.
> >
> > I was worried because sched_init invokes df_analyze, and I'm not sure if
> > cfg_cleanup can invalidate it.
>
> Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
>
> >
> >>> I suspect this may be caused by invoking cleanup_cfg too late.
> >>
> >> By looking into some failures, I found that although cleanup_cfg is 
> >> executed
> >> there would be still some empty blocks left, by analyzing a few failures 
> >> there
> >> are at least such cases:
> >>   1. empty function body
> >>   2. block holding a label for return.
> >>   3. block without any successor.
> >>   4. block which becomes empty after scheduling some other block.
> >>   5. block which looks mergeable with its always successor but left.
> >>   ...
> >>
> >> For 1,2, there is one single successor EXIT block, I think they don't 
> >> affect
> >> state transition, for 3, it's the same.  For 4, it depends on if we can 
> >> have
> >> the assumption this kind of empty block doesn't have the chance to have 
> >> debug
> >> insn (like associated debug insn should be moved along), I'm not sure.  
> >> For 5,
> >> a reduced test case is:
> >
> > Oh, I should have thought of cases like these, really sorry about the slip
> > of attention, and thanks for showing a testcase for item 5. As Richard as
> > saying in his response, cfg_cleanup cannot be a fix here. The thing to check
> > would be changing no_real_insns_p to always return false, and see if the
> > situation looks recoverable (if it breaks bootstrap, regtest statistics of
> > a non-bootstrapped compiler are still informative).
>
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on.  One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues).  The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal.  If this is the direction which we all agree on,
> I'll further refine it and post a formal patch.  One thing I want to note is
> that this patch disable one assertion below:
>
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
>  }
>
>/* Sanity check: verify that all region insns were scheduled.  */
> -  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> +  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>
>sched_finish_ready_list ();
>
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts.  The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other 
> blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in.  It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly.  I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.

Just a comment that the NOTE_P thing is odd - do we only ever have those for
otherwise empty BBs?  How are they skipped otherwise (and why does that not
work for otherwise empty BBs)?

Richard.

> BR,
> Kewen
>


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-22 Thread Kewen.Lin
on 2023/11/17 20:55, Alexander Monakov wrote:
> 
> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>> to put it early in schedule_insns.
>>
>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>> instead, since schedule_insns invokes haifa_sched_init, although the
>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>> ahead but they are all "setup" functions, shouldn't affect or be affected
>> by this placement.
> 
> I was worried because sched_init invokes df_analyze, and I'm not sure if
> cfg_cleanup can invalidate it.

Thanks for further explaining!  By scanning cleanup_cfg, it seems that it
considers df, like compact_blocks checks df, try_optimize_cfg invokes
df_analyze etc., but I agree that moving cleanup_cfg before sched_init
makes more sense.

> 
>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>
>> By looking into some failures, I found that although cleanup_cfg is executed
>> there would be still some empty blocks left, by analyzing a few failures 
>> there
>> are at least such cases:
>>   1. empty function body
>>   2. block holding a label for return.
>>   3. block without any successor.
>>   4. block which becomes empty after scheduling some other block.
>>   5. block which looks mergeable with its always successor but left.
>>   ...
>>
>> For 1,2, there is one single successor EXIT block, I think they don't affect
>> state transition, for 3, it's the same.  For 4, it depends on if we can have
>> the assumption this kind of empty block doesn't have the chance to have debug
>> insn (like associated debug insn should be moved along), I'm not sure.  For 
>> 5,
>> a reduced test case is:
> 
> Oh, I should have thought of cases like these, really sorry about the slip
> of attention, and thanks for showing a testcase for item 5. As Richard as
> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
> would be changing no_real_insns_p to always return false, and see if the
> situation looks recoverable (if it breaks bootstrap, regtest statistics of
> a non-bootstrapped compiler are still informative).

As you suggested, I forced no_real_insns_p to return false all the time, some
issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
encountered in those places, so the adjustments for most of them are just to
consider NOTE_P or this kind of special block and so on.  One draft patch is
attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
btw, it's without the previous cfg_cleanup adjustment (hope it can get more
empty blocks and expose more issues).  The draft isn't qualified for code
review but I hope it can provide some information on what kinds of changes
are needed for the proposal.  If this is the direction which we all agree on,
I'll further refine it and post a formal patch.  One thing I want to note is
that this patch disable one assertion below:

diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index e5964f54ead..abd334864fb 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3219,7 +3219,7 @@ schedule_region (int rgn)
 }

   /* Sanity check: verify that all region insns were scheduled.  */
-  gcc_assert (sched_rgn_n_insns == rgn_n_insns);
+  // gcc_assert (sched_rgn_n_insns == rgn_n_insns);

   sched_finish_ready_list ();

Some cases can cause this assertion to fail, it's due to the mismatch on
to-be-scheduled and scheduled insn counts.  The reason why it happens is that
one block previously has only one INSN_P but while scheduling some other blocks
it gets moved as well then we ends up with an empty block so that the only
NOTE_P insn was counted then, but since this block isn't empty initially and
NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
it in.  It can be fixed with special-casing this kind of block for counting
like initially recording which block is empty and if a block isn't recorded
before then fix up the count for it accordingly.  I'm not sure if someone may
have an argument that all the complication make this proposal beaten by
previous special-casing debug insn approach, looking forward to more comments.

BR,
Kewen

From d350f411b23f6064a33a72a6ca7afc49b0ccea65 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Wed, 22 Nov 2023 00:08:59 -0600
Subject: [PATCH] sched: Don't skip empty block in scheduling

att
---
 gcc/haifa-sched.cc | 166 +++--
 gcc/rtl.h  |   4 +-
 gcc/sched-rgn.cc   |   2 +-
 3 files changed, 103 insertions(+), 69 deletions(-)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..62377d99162 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -1207,6 +1207,11 @@ recompute_todo_spec (rtx_insn *next, bool for_backtrack)
   int n_replace = 0;
   bool first_p = true;
 
+  /* We don't skip no_real_insns_p any more, so it's possible to
+ meet NOTE insn 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-17 Thread Alexander Monakov


On Fri, 17 Nov 2023, Kewen.Lin wrote:
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
> 
> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> instead, since schedule_insns invokes haifa_sched_init, although the
> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> ahead but they are all "setup" functions, shouldn't affect or be affected
> by this placement.

I was worried because sched_init invokes df_analyze, and I'm not sure if
cfg_cleanup can invalidate it.

> > I suspect this may be caused by invoking cleanup_cfg too late.
> 
> By looking into some failures, I found that although cleanup_cfg is executed
> there would be still some empty blocks left, by analyzing a few failures there
> are at least such cases:
>   1. empty function body
>   2. block holding a label for return.
>   3. block without any successor.
>   4. block which becomes empty after scheduling some other block.
>   5. block which looks mergeable with its always successor but left.
>   ...
> 
> For 1,2, there is one single successor EXIT block, I think they don't affect
> state transition, for 3, it's the same.  For 4, it depends on if we can have
> the assumption this kind of empty block doesn't have the chance to have debug
> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> a reduced test case is:

Oh, I should have thought of cases like these, really sorry about the slip
of attention, and thanks for showing a testcase for item 5. As Richard as
saying in his response, cfg_cleanup cannot be a fix here. The thing to check
would be changing no_real_insns_p to always return false, and see if the
situation looks recoverable (if it breaks bootstrap, regtest statistics of
a non-bootstrapped compiler are still informative).

Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-17 Thread Richard Biener
On Fri, Nov 17, 2023 at 10:04 AM Kewen.Lin  wrote:
>
> on 2023/11/15 17:43, Alexander Monakov wrote:
> >
> > On Wed, 15 Nov 2023, Kewen.Lin wrote:
> >
>  And I suppose it would be OK to do that.  Empty BBs are usually removed 
>  by
>  CFG cleanup so the situation should only happen in rare corner cases 
>  where
>  the fix would be to actually run CFG cleanup ...
> >>>
> >>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> >>> may be a preferable compromise for sched-rgn as well.
> >>
> >> Inspired by this discussion, I tested the attached patch 1 which is to run
> >> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> >> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
> >
> > I don't think you can run cleanup_cfg after sched_init. I would suggest
> > to put it early in schedule_insns.
>
> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
> instead, since schedule_insns invokes haifa_sched_init, although the
> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
> ahead but they are all "setup" functions, shouldn't affect or be affected
> by this placement.
>
> >
> >> Then I assumed some of the current uses of no_real_insns_p won't encounter
> >> empty blocks any more, so made a patch 2 with some explicit assertions, but
> >> unfortunately I got ICEs during bootstrapping happens in function
> >> compute_priorities.  I'm going to investigate it further and post more
> >> findings, but just heads-up to ensure if this is on the right track.
> >
> > I suspect this may be caused by invoking cleanup_cfg too late.
>
> By looking into some failures, I found that although cleanup_cfg is executed
> there would be still some empty blocks left, by analyzing a few failures there
> are at least such cases:
>   1. empty function body
>   2. block holding a label for return.
>   3. block without any successor.
>   4. block which becomes empty after scheduling some other block.
>   5. block which looks mergeable with its always successor but left.
>   ...
>
> For 1,2, there is one single successor EXIT block, I think they don't affect
> state transition, for 3, it's the same.  For 4, it depends on if we can have
> the assumption this kind of empty block doesn't have the chance to have debug
> insn (like associated debug insn should be moved along), I'm not sure.  For 5,
> a reduced test case is:
>
> #include 
> namespace std {
> template <>
> basic_istream &
> basic_istream::getline (char_type *, streamsize, char_type) __try
> {
>   __streambuf_type *a = rdbuf ();
>   a->sgetc ();
>   while (traits_type::eq_int_type)
> ;
> }
> __catch (...) {}
> } // namespace std
>
> slim RTL:
>
> 1: NOTE_INSN_DELETED
> 7: NOTE_INSN_BASIC_BLOCK 2
> ...
>15: r137:CCUNS=cmp(r135:DI,r136:DI)
>   REG_DEAD r136:DI
>   REG_DEAD r135:DI
>16: pc={(ltu(r137:CCUNS,0))?L24:pc}
>   REG_DEAD r137:CCUNS
>   REG_BR_PROB 966367644
>17: NOTE_INSN_BASIC_BLOCK 3
>18: r138:DI=[r122:DI]
>19: r139:DI=[r138:DI+0x48]
>   REG_DEAD r138:DI
>20: %3:DI=r122:DI
>   REG_DEAD r122:DI
>21: %12:DI=r139:DI
>22: ctr:DI=r139:DI
>   REG_DEAD r139:DI
>23: %3:DI=call [ctr:DI] argc:0
>   REG_DEAD ctr:DI
>   REG_DEAD %12:DI
>   REG_UNUSED %3:DI
>   REG_EH_REGION 0x1
>   REG_CALL_DECL (nil)
>24: L24:
>25: NOTE_INSN_BASIC_BLOCK 4
>51: L51:
>48: NOTE_INSN_BASIC_BLOCK 5
>47: NOTE_INSN_DELETED
>52: pc=L51
>53: barrier
>39: L39:
>41: NOTE_INSN_BASIC_BLOCK 6
>32: %3:DI=call [`__cxa_begin_catch'] argc:0
>   REG_UNUSED %3:DI
>   REG_CALL_DECL `__cxa_begin_catch'
>   REG_EH_REGION 0
>33: call [`__cxa_end_catch'] argc:0
>   REG_CALL_DECL `__cxa_end_catch'
>34: barrier
>
> ;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 
> 1.), maybe hot
> ;;  prev block 3, next block 5, flags: (REACHABLE, RTL)
> ;;  pred:   3 [always]  count:1063111 (estimated locally, freq 0.1000) 
> (FALLTHRU)
> ;;  2 [90.0% (guessed)]  count:9567997 (estimated locally, freq 
> 0.9000)
> ;;  succ:   5 [always]  count:10631108 (estimated locally, freq 1.) 
> (FALLTHRU)
> ;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 
> 101.), maybe hot
> ;;  prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED)
> ;;  pred:   4 [always]  count:10631108 (estimated locally, freq 1.) 
> (FALLTHRU)
> ;;  5 [always]  count:1073741824 (estimated locally, freq 
> 101.)
> ;;  succ:   5 [always]  count:1073741824 (estimated locally, freq 
> 101.)
>
> It looks bb 4 can be merged with bb 5, a miss-opt?  There can be some other 
> cases
> similar to this, either it's miss-opt or not, it seems to affect our 
> assumption.
>
> With the limited findings above so far, I wonder if we still want to go with 
> this
> direction that running cleanup_cfg 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-17 Thread Kewen.Lin
on 2023/11/15 17:43, Alexander Monakov wrote:
> 
> On Wed, 15 Nov 2023, Kewen.Lin wrote:
> 
 And I suppose it would be OK to do that.  Empty BBs are usually removed by
 CFG cleanup so the situation should only happen in rare corner cases where
 the fix would be to actually run CFG cleanup ...
>>>
>>> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
>>> may be a preferable compromise for sched-rgn as well.
>>
>> Inspired by this discussion, I tested the attached patch 1 which is to run
>> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
>> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.
> 
> I don't think you can run cleanup_cfg after sched_init. I would suggest
> to put it early in schedule_insns.

Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
instead, since schedule_insns invokes haifa_sched_init, although the
calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
ahead but they are all "setup" functions, shouldn't affect or be affected
by this placement.

> 
>> Then I assumed some of the current uses of no_real_insns_p won't encounter
>> empty blocks any more, so made a patch 2 with some explicit assertions, but
>> unfortunately I got ICEs during bootstrapping happens in function
>> compute_priorities.  I'm going to investigate it further and post more
>> findings, but just heads-up to ensure if this is on the right track.
> 
> I suspect this may be caused by invoking cleanup_cfg too late.

By looking into some failures, I found that although cleanup_cfg is executed
there would be still some empty blocks left, by analyzing a few failures there
are at least such cases:
  1. empty function body
  2. block holding a label for return.
  3. block without any successor.
  4. block which becomes empty after scheduling some other block.
  5. block which looks mergeable with its always successor but left.
  ...

For 1,2, there is one single successor EXIT block, I think they don't affect
state transition, for 3, it's the same.  For 4, it depends on if we can have
the assumption this kind of empty block doesn't have the chance to have debug
insn (like associated debug insn should be moved along), I'm not sure.  For 5,
a reduced test case is:

#include 
namespace std {
template <>
basic_istream &
basic_istream::getline (char_type *, streamsize, char_type) __try
{
  __streambuf_type *a = rdbuf ();
  a->sgetc ();
  while (traits_type::eq_int_type)
;
}
__catch (...) {}
} // namespace std

slim RTL:

1: NOTE_INSN_DELETED
7: NOTE_INSN_BASIC_BLOCK 2
...
   15: r137:CCUNS=cmp(r135:DI,r136:DI)
  REG_DEAD r136:DI
  REG_DEAD r135:DI
   16: pc={(ltu(r137:CCUNS,0))?L24:pc}
  REG_DEAD r137:CCUNS
  REG_BR_PROB 966367644
   17: NOTE_INSN_BASIC_BLOCK 3
   18: r138:DI=[r122:DI]
   19: r139:DI=[r138:DI+0x48]
  REG_DEAD r138:DI
   20: %3:DI=r122:DI
  REG_DEAD r122:DI
   21: %12:DI=r139:DI
   22: ctr:DI=r139:DI
  REG_DEAD r139:DI
   23: %3:DI=call [ctr:DI] argc:0
  REG_DEAD ctr:DI
  REG_DEAD %12:DI
  REG_UNUSED %3:DI
  REG_EH_REGION 0x1
  REG_CALL_DECL (nil)
   24: L24:
   25: NOTE_INSN_BASIC_BLOCK 4
   51: L51:
   48: NOTE_INSN_BASIC_BLOCK 5
   47: NOTE_INSN_DELETED
   52: pc=L51
   53: barrier
   39: L39:
   41: NOTE_INSN_BASIC_BLOCK 6
   32: %3:DI=call [`__cxa_begin_catch'] argc:0
  REG_UNUSED %3:DI
  REG_CALL_DECL `__cxa_begin_catch'
  REG_EH_REGION 0
   33: call [`__cxa_end_catch'] argc:0
  REG_CALL_DECL `__cxa_end_catch'
   34: barrier

;; basic block 4, loop depth 0, count 10631108 (estimated locally, freq 
1.), maybe hot
;;  prev block 3, next block 5, flags: (REACHABLE, RTL)
;;  pred:   3 [always]  count:1063111 (estimated locally, freq 0.1000) 
(FALLTHRU)
;;  2 [90.0% (guessed)]  count:9567997 (estimated locally, freq 
0.9000)
;;  succ:   5 [always]  count:10631108 (estimated locally, freq 1.) 
(FALLTHRU)
;; basic block 5, loop depth 0, count 1073741824 (estimated locally, freq 
101.), maybe hot
;;  prev block 4, next block 6, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   4 [always]  count:10631108 (estimated locally, freq 1.) 
(FALLTHRU)
;;  5 [always]  count:1073741824 (estimated locally, freq 101.)
;;  succ:   5 [always]  count:1073741824 (estimated locally, freq 101.)

It looks bb 4 can be merged with bb 5, a miss-opt?  There can be some other 
cases
similar to this, either it's miss-opt or not, it seems to affect our assumption.

With the limited findings above so far, I wonder if we still want to go with 
this
direction that running cleanup_cfg first and make the assumption that there is 
no
such empty block which can cause debug and non-debug state inconsistency?
IMHO it seems risky to make it.  Thoughts?

BR,
Kewen


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-15 Thread Alexander Monakov


On Wed, 15 Nov 2023, Kewen.Lin wrote:

> >> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> >> CFG cleanup so the situation should only happen in rare corner cases where
> >> the fix would be to actually run CFG cleanup ...
> > 
> > Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> > may be a preferable compromise for sched-rgn as well.
> 
> Inspired by this discussion, I tested the attached patch 1 which is to run
> cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
> regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

I don't think you can run cleanup_cfg after sched_init. I would suggest
to put it early in schedule_insns.

> Then I assumed some of the current uses of no_real_insns_p won't encounter
> empty blocks any more, so made a patch 2 with some explicit assertions, but
> unfortunately I got ICEs during bootstrapping happens in function
> compute_priorities.  I'm going to investigate it further and post more
> findings, but just heads-up to ensure if this is on the right track.

I suspect this may be caused by invoking cleanup_cfg too late.

Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-15 Thread Kewen.Lin
Hi Alexander/Richard/Jeff,

Thanks for the insightful comments!

on 2023/11/10 22:41, Alexander Monakov wrote:
> 
> On Fri, 10 Nov 2023, Richard Biener wrote:
> 
>> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
>>>
>>>
>>> On Fri, 10 Nov 2023, Richard Biener wrote:
>>>
> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> design:
> DEBUG_INSNs participate in dependency graph so that schedulers can remove 
> or
> mutate them as needed when moving real insns across them.

 Note that debug-only BBs do not exist - the BB would be there even without 
 debug
 insns!
>>>
>>> Yep, sorry, I misspoke when I earlier said
>>>
> and cause divergence when passing through a debug-only BB which would not 
> be
> present at all without -g.
>>>
>>> They are present in the region, but skipped via no_real_insns_p.
>>>
 So instead you have to handle BBs with just debug insns the same you
 handle a completely empty BB.
>>>
>>> Yeah. There would be no problem if the scheduler never used no_real_insns_p
>>> and handled empty and non-empty BBs the same way.
>>
>> And I suppose it would be OK to do that.  Empty BBs are usually removed by
>> CFG cleanup so the situation should only happen in rare corner cases where
>> the fix would be to actually run CFG cleanup ...
> 
> Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
> may be a preferable compromise for sched-rgn as well.
> 

Inspired by this discussion, I tested the attached patch 1 which is to run
cleanup_cfg (0) first in haifa_sched_init, it's bootstrapped and
regress-tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu.

Then I assumed some of the current uses of no_real_insns_p won't encounter
empty blocks any more, so made a patch 2 with some explicit assertions, but
unfortunately I got ICEs during bootstrapping happens in function
compute_priorities.  I'm going to investigate it further and post more
findings, but just heads-up to ensure if this is on the right track.

BR,
Kewen

From 7652655f278cfe0f6271c50aecb56e68e0877cc2 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Tue, 14 Nov 2023 15:16:47 +0800
Subject: [PATCH] sched: cleanup cfg for empty blocks first in haifa_sched_init
 [PR108273]

PR108273 exposed the inconsistent states issue between
non-debug mode and debug mode, as the discussion in [1],
we can follow the current practice in sel_global_init,
to run cleanup_cfg (0) first to remove empty blocks.

This patch is to follow this direction and remove empty
blocks by cleanup_cfg (0) in haifa_sched_init which
affects sms, ebb and rgn schedulings.

PR rtl-optimization/108273

gcc/ChangeLog:

* haifa-sched.cc (haifa_sched_init): Call cleanup_cfg (0) to remove
empty blocks.
---
 gcc/haifa-sched.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..e348d1a2119 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -7375,6 +7375,12 @@ haifa_sched_init (void)
   sched_deps_info->generate_spec_deps = 1;
 }
 
+  /* Remove empty blocks to avoid some inconsistency like: we skip
+ empty block in scheduling but don't for empty block + only
+ debug_insn, it could result in different subsequent states
+ and unexpected insn sequence difference.  */
+  cleanup_cfg (0);
+
   /* Initialize luids, dependency caches, target and h_i_d for the
  whole function.  */
   {
-- 
2.39.1

From efe1ed8fb5b151c9c4819ff1fa9af579151e259c Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Tue, 14 Nov 2023 15:39:24 +0800
Subject: [PATCH] sched: Assert we don't have any chance to get empty blocks

att.

gcc/ChangeLog:

* sched-ebb.cc (schedule_ebb): Assert no empty blocks.
* sched-rgn.cc (compute_priorities): Likewise.
(schedule_region): Likewise.
---
 gcc/sched-ebb.cc | 5 -
 gcc/sched-rgn.cc | 7 ++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/sched-ebb.cc b/gcc/sched-ebb.cc
index 110fcdbca4d..9fdfd72b6fc 100644
--- a/gcc/sched-ebb.cc
+++ b/gcc/sched-ebb.cc
@@ -492,7 +492,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool 
modulo_scheduling)
   last_bb = BLOCK_FOR_INSN (tail);
 
   if (no_real_insns_p (head, tail))
-return BLOCK_FOR_INSN (tail);
+{
+  gcc_unreachable ();
+  return BLOCK_FOR_INSN (tail);
+}
 
   gcc_assert (INSN_P (head) && INSN_P (tail));
 
diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
index 1c8acf5068a..795c455872e 100644
--- a/gcc/sched-rgn.cc
+++ b/gcc/sched-rgn.cc
@@ -3025,7 +3025,10 @@ compute_priorities (void)
   get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), , );
 
   if (no_real_insns_p (head, tail))
-   continue;
+   {
+ gcc_unreachable ();
+ continue;
+   }
 
   rgn_n_insns += set_priorities (head, tail);
 }
@@ -3160,6 +3163,7 @@ schedule_region (int rgn)
 
  if (no_real_insns_p (head, tail))
   

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov

On Fri, 10 Nov 2023, Richard Biener wrote:

> On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
> >
> >
> > On Fri, 10 Nov 2023, Richard Biener wrote:
> >
> > > > I'm afraid ignoring debug-only BBs goes contrary to overall 
> > > > var-tracking design:
> > > > DEBUG_INSNs participate in dependency graph so that schedulers can 
> > > > remove or
> > > > mutate them as needed when moving real insns across them.
> > >
> > > Note that debug-only BBs do not exist - the BB would be there even 
> > > without debug
> > > insns!
> >
> > Yep, sorry, I misspoke when I earlier said
> >
> > >> and cause divergence when passing through a debug-only BB which would 
> > >> not be
> > >> present at all without -g.
> >
> > They are present in the region, but skipped via no_real_insns_p.
> >
> > > So instead you have to handle BBs with just debug insns the same you
> > > handle a completely empty BB.
> >
> > Yeah. There would be no problem if the scheduler never used no_real_insns_p
> > and handled empty and non-empty BBs the same way.
> 
> And I suppose it would be OK to do that.  Empty BBs are usually removed by
> CFG cleanup so the situation should only happen in rare corner cases where
> the fix would be to actually run CFG cleanup ...

Yeah, sel-sched invokes 'cfg_cleanup (0)' up front, and I suppose that
may be a preferable compromise for sched-rgn as well.

I'm afraid one does not simply remove all uses of no_real_insns_p from
sched-rgn, but would be happy to be wrong about that.

Alexander

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Richard Biener
On Fri, Nov 10, 2023 at 3:18 PM Alexander Monakov  wrote:
>
>
> On Fri, 10 Nov 2023, Richard Biener wrote:
>
> > > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> > > design:
> > > DEBUG_INSNs participate in dependency graph so that schedulers can remove 
> > > or
> > > mutate them as needed when moving real insns across them.
> >
> > Note that debug-only BBs do not exist - the BB would be there even without 
> > debug
> > insns!
>
> Yep, sorry, I misspoke when I earlier said
>
> >> and cause divergence when passing through a debug-only BB which would not 
> >> be
> >> present at all without -g.
>
> They are present in the region, but skipped via no_real_insns_p.
>
> > So instead you have to handle BBs with just debug insns the same you
> > handle a completely empty BB.
>
> Yeah. There would be no problem if the scheduler never used no_real_insns_p
> and handled empty and non-empty BBs the same way.

And I suppose it would be OK to do that.  Empty BBs are usually removed by
CFG cleanup so the situation should only happen in rare corner cases where
the fix would be to actually run CFG cleanup ...

Richard.

> Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov


On Fri, 10 Nov 2023, Richard Biener wrote:

> > I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> > design:
> > DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> > mutate them as needed when moving real insns across them.
> 
> Note that debug-only BBs do not exist - the BB would be there even without 
> debug
> insns!

Yep, sorry, I misspoke when I earlier said

>> and cause divergence when passing through a debug-only BB which would not be
>> present at all without -g.

They are present in the region, but skipped via no_real_insns_p.

> So instead you have to handle BBs with just debug insns the same you
> handle a completely empty BB.

Yeah. There would be no problem if the scheduler never used no_real_insns_p
and handled empty and non-empty BBs the same way.

Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Richard Biener
On Fri, Nov 10, 2023 at 12:25 PM Alexander Monakov  wrote:
>
>
> On Thu, 9 Nov 2023, Jeff Law wrote:
>
> > > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > > operations.  When I started to work on this issue, initially I wanted to 
> > > try
> > > something similar to your idea #2, but when checking the APIs, I realized
> > > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > > scheduled range), skipping it can save some resource allocation (like 
> > > block
> > > dependencies) and make it more efficient (not enter function 
> > > schedule_block
> > > etc.), from this perspective it seems an enhancement.  Does it sound
> > > reasonable to you?
> > It sounds reasonable, but only if doing so doesn't add significant
> > implementation complexity.  ie, the gains from doing less work here are 
> > likely
> > to be very marginal, so I'm more interested in clean, easy to maintain code.
>
> I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking 
> design:
> DEBUG_INSNs participate in dependency graph so that schedulers can remove or
> mutate them as needed when moving real insns across them.

Note that debug-only BBs do not exist - the BB would be there even without debug
insns!  So instead you have to handle BBs with just debug insns the same you
handle a completely empty BB.

> Cc'ing Alexandre Oliva who can correct me on that if necessary.
>
> Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-10 Thread Alexander Monakov


On Thu, 9 Nov 2023, Jeff Law wrote:

> > Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal
> > operations.  When I started to work on this issue, initially I wanted to try
> > something similar to your idea #2, but when checking the APIs, I realized
> > why not just skip the basic block with NOTEs and LABELs, DEBUG_INSNs as
> > well.  IMHO there is no value to try to schedule this kind of BB (to be
> > scheduled range), skipping it can save some resource allocation (like block
> > dependencies) and make it more efficient (not enter function schedule_block
> > etc.), from this perspective it seems an enhancement.  Does it sound
> > reasonable to you?
> It sounds reasonable, but only if doing so doesn't add significant
> implementation complexity.  ie, the gains from doing less work here are likely
> to be very marginal, so I'm more interested in clean, easy to maintain code.

I'm afraid ignoring debug-only BBs goes contrary to overall var-tracking design:
DEBUG_INSNs participate in dependency graph so that schedulers can remove or
mutate them as needed when moving real insns across them.

Cc'ing Alexandre Oliva who can correct me on that if necessary.

Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Jeff Law




On 11/9/23 18:57, Kewen.Lin wrote:

Hi Maxim and Alexander,

Thanks a lot for the review comments!

on 2023/11/10 01:40, Alexander Monakov wrote:


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:


Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if
the patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
already quite a bit of logic in the scheduler to skip them _as part of normal
operation_.


Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal 
operations.
When I started to work on this issue, initially I wanted to try something 
similar
to your idea #2, but when checking the APIs, I realized why not just skip the 
basic
block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to 
try to
schedule this kind of BB (to be scheduled range), skipping it can save some 
resource
allocation (like block dependencies) and make it more efficient (not enter 
function
schedule_block etc.), from this perspective it seems an enhancement.  Does it 
sound
reasonable to you?
It sounds reasonable, but only if doing so doesn't add significant 
implementation complexity.  ie, the gains from doing less work here are 
likely to be very marginal, so I'm more interested in clean, easy to 
maintain code.


Jeff


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Jeff Law




On 11/9/23 10:40, Alexander Monakov wrote:


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:


Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if
the patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
already quite a bit of logic in the scheduler to skip them _as part of normal
operation_.


I agree with the concern. I hoped that solving the problem by skipping the BB
like the (bit-rotted) debug code needs to would be a minor surgery. As things
look now, it may be better to remove the non-working sched_block debug counter
entirely and implement a good solution for the problem at hand.
I wouldn't lose sleep over this -- if removing a debug counter from this 
code simplifies the implementation, that's fine with me.  I believe they 
were all added when significant work in the scheduler was more common.





Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
boundaries with -fcompare-debug is enabled? It should make the problem
readily detectable by -fcompare-debug even when scheduling did not diverge.

I like it.

jeff


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Kewen.Lin
Hi Maxim and Alexander,

Thanks a lot for the review comments!

on 2023/11/10 01:40, Alexander Monakov wrote:
> 
> On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:
> 
>> Hi Kewen,
>>
>> Below are my comments.  I don't want to override Alexander's review, and if
>> the patch looks good to him, it's fine to ignore my concerns.
>>
>> My main concern is that this adds a new entity -- forceful skipping of
>> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
>> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
>> already quite a bit of logic in the scheduler to skip them _as part of normal
>> operation_.

Yeah, I noticed that the scheduler takes care of DEBUG_INSNs as normal 
operations.
When I started to work on this issue, initially I wanted to try something 
similar
to your idea #2, but when checking the APIs, I realized why not just skip the 
basic
block with NOTEs and LABELs, DEBUG_INSNs as well.  IMHO there is no value to 
try to
schedule this kind of BB (to be scheduled range), skipping it can save some 
resource
allocation (like block dependencies) and make it more efficient (not enter 
function
schedule_block etc.), from this perspective it seems an enhancement.  Does it 
sound
reasonable to you?

> 
> I agree with the concern. I hoped that solving the problem by skipping the BB
> like the (bit-rotted) debug code needs to would be a minor surgery. As things
> look now, it may be better to remove the non-working sched_block debug counter
> entirely and implement a good solution for the problem at hand.

OK, if debug counter sched_block is useless and can be removed, then the 
proposed
new skipping becomes the only actual need for the artificial resolve_forw_deps.

> 
>>
>> Would you please consider 2 ideas below.
>>
>> #1:
>> After a brief look, I'm guessing this part is causing the problem:
>> haifa-sched.cc :schedule_block():
>> === [1]
>>   /* Loop until all the insns in BB are scheduled.  */
>>   while ((*current_sched_info->schedule_more_p) ())
>> {
>>   perform_replacements_new_cycle ();
>>   do
>>  {
>>start_clock_var = clock_var;
>>
>>clock_var++;
>>
>>advance_one_cycle ();
> 
> As I understand, we have spurious calls to advance_one_cycle on basic block
> boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
> and cause divergence when passing through a debug-only BB which would not be
> present at all without -g.
> 
> Since EBBs and regions may not have jump targets in the middle, advancing
> a cycle on BB boundaries does not seem well motivated. Can we remove it?
> 
> Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
> boundaries with -fcompare-debug is enabled? It should make the problem
> readily detectable by -fcompare-debug even when scheduling did not diverge.

Good idea!  It would be easy to detect the inconsistent issue with such note.

BR,
Kewen


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Alexander Monakov


On Thu, 9 Nov 2023, Maxim Kuvyrkov wrote:

> Hi Kewen,
> 
> Below are my comments.  I don't want to override Alexander's review, and if
> the patch looks good to him, it's fine to ignore my concerns.
> 
> My main concern is that this adds a new entity -- forceful skipping of
> DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change
> in behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is
> already quite a bit of logic in the scheduler to skip them _as part of normal
> operation_.

I agree with the concern. I hoped that solving the problem by skipping the BB
like the (bit-rotted) debug code needs to would be a minor surgery. As things
look now, it may be better to remove the non-working sched_block debug counter
entirely and implement a good solution for the problem at hand.

> 
> Would you please consider 2 ideas below.
> 
> #1:
> After a brief look, I'm guessing this part is causing the problem:
> haifa-sched.cc :schedule_block():
> === [1]
>   /* Loop until all the insns in BB are scheduled.  */
>   while ((*current_sched_info->schedule_more_p) ())
> {
>   perform_replacements_new_cycle ();
>   do
>   {
> start_clock_var = clock_var;
> 
> clock_var++;
> 
> advance_one_cycle ();

As I understand, we have spurious calls to advance_one_cycle on basic block
boundaries, which don't model the hardware (the CPU doesn't see BB boundaries)
and cause divergence when passing through a debug-only BB which would not be
present at all without -g.

Since EBBs and regions may not have jump targets in the middle, advancing
a cycle on BB boundaries does not seem well motivated. Can we remove it?

Can we teach haifa-sched to emit RTX NOTEs with hashes of DFA states on BB
boundaries with -fcompare-debug is enabled? It should make the problem
readily detectable by -fcompare-debug even when scheduling did not diverge.

Alexander


Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-09 Thread Maxim Kuvyrkov
Hi Kewen,

Below are my comments.  I don't want to override Alexander's review, and if the 
patch looks good to him, it's fine to ignore my concerns.

My main concern is that this adds a new entity -- forceful skipping of 
DEBUG_INSN-only basic blocks -- to the scheduler for a somewhat minor change in 
behavior.  Unlike NOTEs and LABELs, DEBUG_INSNs are INSNS, and there is already 
quite a bit of logic in the scheduler to skip them _as part of normal 
operation_.

Would you please consider 2 ideas below.

#1:
After a brief look, I'm guessing this part is causing the problem:
haifa-sched.cc :schedule_block():
=== [1]
  /* Loop until all the insns in BB are scheduled.  */
  while ((*current_sched_info->schedule_more_p) ())
{
  perform_replacements_new_cycle ();
  do
{
  start_clock_var = clock_var;

  clock_var++;

  advance_one_cycle ();
===

and then in the nested loop we have
=== [2]
  /* We don't want md sched reorder to even see debug isns, so put
 them out right away.  */
  if (ready.n_ready && DEBUG_INSN_P (ready_element (, 0))
  && (*current_sched_info->schedule_more_p) ())
{
  while (ready.n_ready && DEBUG_INSN_P (ready_element (, 0)))
{
  rtx_insn *insn = ready_remove_first ();
  gcc_assert (DEBUG_INSN_P (insn));
  (*current_sched_info->begin_schedule_ready) (insn);
  scheduled_insns.safe_push (insn);
  last_scheduled_insn = insn;
  advance = schedule_insn (insn);
  gcc_assert (advance == 0);
  if (ready.n_ready > 0)
ready_sort ();
}
}
===
.  At the [1] point we already have sorted ready list, and I don't see any 
blockers to doing [2] before calling advance_one_cycle().

#2
Another approach, which might be even easier, is to save the state of DFA 
before the initial advance_one_cycle(), and then restore it if no real insns 
have been scheduled.

Kind regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> On Nov 8, 2023, at 06:49, Kewen.Lin  wrote:
> 
> Hi,
> 
> Gentle ping this:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html
> 
> BR,
> Kewen
> 
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>  1) rebased from r14-4810 which is relatively up-to-date,
>> some conflicts on "int to bool" return type change have
>> been resolved;
>>  2) adjust commit log a bit;
>>  3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>  - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>  - Introduce enum rgn_bb_deps_free_action for three
>>kinds of actions to free deps;
>>  - Change function free_deps_for_bb_no_real_insns_p to
>>resolve_forw_deps which only focuses on forward deps;
>>  - Extend the handlings to cover dbg-cnt sched_block,
>>add one test case for it;
>>  - Move free_trg_info call in schedule_region to an
>>appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at 

Re: PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-08 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

Sorry for the lack of review on this.  Personally, I've never looked
at this part of code base in detail, so I don't think I can do a proper
review.  I'll try to have a look in stage 3 if no one more qualified
beats me to it.

Thanks,
Richard

>
> BR,
> Kewen
>
> on 2023/10/25 10:45, Kewen.Lin wrote:
>> Hi,
>> 
>> This is almost a repost for v2 which was posted at[1] in March
>> excepting for:
>>   1) rebased from r14-4810 which is relatively up-to-date,
>>  some conflicts on "int to bool" return type change have
>>  been resolved;
>>   2) adjust commit log a bit;
>>   3) fix misspelled "articial" with "artificial" somewhere;
>> 
>> --
>> *v2 comments*:
>> 
>> By addressing Alexander's comments, against v1 this
>> patch v2 mainly:
>> 
>>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>>   - Introduce enum rgn_bb_deps_free_action for three
>> kinds of actions to free deps;
>>   - Change function free_deps_for_bb_no_real_insns_p to
>> resolve_forw_deps which only focuses on forward deps;
>>   - Extend the handlings to cover dbg-cnt sched_block,
>> add one test case for it;
>>   - Move free_trg_info call in schedule_region to an
>> appropriate place.
>> 
>> One thing I'm not sure about is the change in function
>> sched_rgn_local_finish, currently the invocation to
>> sched_rgn_local_free is guarded with !sel_sched_p (),
>> so I just follow it, but the initialization of those
>> structures (in sched_rgn_local_init) isn't guarded
>> with !sel_sched_p (), it looks odd.
>> 
>> --
>> 
>> As PR108273 shows, when there is one block which only has
>> NOTE_P and LABEL_P insns at non-debug mode while has some
>> extra DEBUG_INSN_P insns at debug mode, after scheduling
>> it, the DFA states would be different between debug mode
>> and non-debug mode.  Since at non-debug mode, the block
>> meets no_real_insns_p, it gets skipped; while at debug
>> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
>> and DEBUG_INSN_P, the call of function advance_one_cycle
>> will change the DFA state.  PR108519 also shows this issue
>> can be exposed by some scheduler changes.
>> 
>> This patch is to change function no_real_insns_p to
>> function no_real_nondebug_insns_p by taking debug insn into
>> account, which make us not try to schedule for the block
>> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
>> resulting in consistent DFA states between non-debug and
>> debug mode.
>> 
>> Changing no_real_insns_p to no_real_nondebug_insns_p caused
>> ICE when doing free_block_dependencies, the root cause is
>> that we create dependencies for debug insns, those
>> dependencies are expected to be resolved during scheduling
>> insns, but they get skipped after this change.
>> By checking the code, it looks it's reasonable to skip to
>> compute block dependences for no_real_nondebug_insns_p
>> blocks.  There is also another issue, which gets exposed
>> in SPEC2017 bmks build at option -O2 -g, is that we could
>> skip to schedule some block, which already gets dependency
>> graph built so has dependencies computed and rgn_n_insns
>> accumulated, then the later verification on if the graph
>> becomes exhausted by scheduling would fail as follow:
>> 
>>   /* Sanity check: verify that all region insns were
>>  scheduled.  */
>> gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>> 
>> , and also some forward deps aren't resovled.
>> 
>> As Alexander pointed out, the current debug count handling
>> also suffers the similar issue, so this patch handles these
>> two cases together: one is for some block gets skipped by
>> !dbg_cnt (sched_block), the other is for some block which
>> is not no_real_nondebug_insns_p initially but becomes
>> no_real_nondebug_insns_p due to speculative scheduling.
>> 
>> This patch can be bootstrapped and regress-tested on
>> x86_64-redhat-linux, aarch64-linux-gnu and
>> powerpc64{,le}-linux-gnu.
>> 
>> I also verified this patch can pass SPEC2017 both intrate
>> and fprate bmks building at -g -O2/-O3.
>> 
>> Any thoughts?  Is it ok for trunk?
>> 
>> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
>> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
>> 
>> BR,
>> Kewen
>> -
>>  PR rtl-optimization/108273
>> 
>> gcc/ChangeLog:
>> 
>>  * haifa-sched.cc (no_real_insns_p): Rename to ...
>>  (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>>  * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>>  no_real_nondebug_insns_p.
>>  * sched-int.h (no_real_insns_p): Rename to ...
>>  (no_real_nondebug_insns_p): ... this.
>>  * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>>  (bb_deps_free_actions): New static variable.
>>  (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>>  (resolve_forw_deps): New function.
>>  

PING^1 [PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-11-07 Thread Kewen.Lin
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634201.html

BR,
Kewen

on 2023/10/25 10:45, Kewen.Lin wrote:
> Hi,
> 
> This is almost a repost for v2 which was posted at[1] in March
> excepting for:
>   1) rebased from r14-4810 which is relatively up-to-date,
>  some conflicts on "int to bool" return type change have
>  been resolved;
>   2) adjust commit log a bit;
>   3) fix misspelled "articial" with "artificial" somewhere;
> 
> --
> *v2 comments*:
> 
> By addressing Alexander's comments, against v1 this
> patch v2 mainly:
> 
>   - Rename no_real_insns_p to no_real_nondebug_insns_p;
>   - Introduce enum rgn_bb_deps_free_action for three
> kinds of actions to free deps;
>   - Change function free_deps_for_bb_no_real_insns_p to
> resolve_forw_deps which only focuses on forward deps;
>   - Extend the handlings to cover dbg-cnt sched_block,
> add one test case for it;
>   - Move free_trg_info call in schedule_region to an
> appropriate place.
> 
> One thing I'm not sure about is the change in function
> sched_rgn_local_finish, currently the invocation to
> sched_rgn_local_free is guarded with !sel_sched_p (),
> so I just follow it, but the initialization of those
> structures (in sched_rgn_local_init) isn't guarded
> with !sel_sched_p (), it looks odd.
> 
> --
> 
> As PR108273 shows, when there is one block which only has
> NOTE_P and LABEL_P insns at non-debug mode while has some
> extra DEBUG_INSN_P insns at debug mode, after scheduling
> it, the DFA states would be different between debug mode
> and non-debug mode.  Since at non-debug mode, the block
> meets no_real_insns_p, it gets skipped; while at debug
> mode, it gets scheduled, even it only has NOTE_P, LABEL_P
> and DEBUG_INSN_P, the call of function advance_one_cycle
> will change the DFA state.  PR108519 also shows this issue
> can be exposed by some scheduler changes.
> 
> This patch is to change function no_real_insns_p to
> function no_real_nondebug_insns_p by taking debug insn into
> account, which make us not try to schedule for the block
> having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
> resulting in consistent DFA states between non-debug and
> debug mode.
> 
> Changing no_real_insns_p to no_real_nondebug_insns_p caused
> ICE when doing free_block_dependencies, the root cause is
> that we create dependencies for debug insns, those
> dependencies are expected to be resolved during scheduling
> insns, but they get skipped after this change.
> By checking the code, it looks it's reasonable to skip to
> compute block dependences for no_real_nondebug_insns_p
> blocks.  There is also another issue, which gets exposed
> in SPEC2017 bmks build at option -O2 -g, is that we could
> skip to schedule some block, which already gets dependency
> graph built so has dependencies computed and rgn_n_insns
> accumulated, then the later verification on if the graph
> becomes exhausted by scheduling would fail as follow:
> 
>   /* Sanity check: verify that all region insns were
>  scheduled.  */
> gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> 
> , and also some forward deps aren't resovled.
> 
> As Alexander pointed out, the current debug count handling
> also suffers the similar issue, so this patch handles these
> two cases together: one is for some block gets skipped by
> !dbg_cnt (sched_block), the other is for some block which
> is not no_real_nondebug_insns_p initially but becomes
> no_real_nondebug_insns_p due to speculative scheduling.
> 
> This patch can be bootstrapped and regress-tested on
> x86_64-redhat-linux, aarch64-linux-gnu and
> powerpc64{,le}-linux-gnu.
> 
> I also verified this patch can pass SPEC2017 both intrate
> and fprate bmks building at -g -O2/-O3.
> 
> Any thoughts?  Is it ok for trunk?
> 
> [1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
> [2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html
> 
> BR,
> Kewen
> -
>   PR rtl-optimization/108273
> 
> gcc/ChangeLog:
> 
>   * haifa-sched.cc (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
>   * sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   * sched-int.h (no_real_insns_p): Rename to ...
>   (no_real_nondebug_insns_p): ... this.
>   * sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
>   (bb_deps_free_actions): New static variable.
>   (compute_block_dependences): Skip for no_real_nondebug_insns_p.
>   (resolve_forw_deps): New function.
>   (free_block_dependencies): Check bb_deps_free_actions and call
>   function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
>   (compute_priorities): Replace no_real_insns_p with
>   no_real_nondebug_insns_p.
>   (schedule_region): Replace no_real_insns_p with
>   no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
>   get 

[PATCH v3] sched: Change no_real_insns_p to no_real_nondebug_insns_p [PR108273]

2023-10-24 Thread Kewen.Lin
Hi,

This is almost a repost for v2 which was posted at[1] in March
excepting for:
  1) rebased from r14-4810 which is relatively up-to-date,
 some conflicts on "int to bool" return type change have
 been resolved;
  2) adjust commit log a bit;
  3) fix misspelled "articial" with "artificial" somewhere;

--
*v2 comments*:

By addressing Alexander's comments, against v1 this
patch v2 mainly:

  - Rename no_real_insns_p to no_real_nondebug_insns_p;
  - Introduce enum rgn_bb_deps_free_action for three
kinds of actions to free deps;
  - Change function free_deps_for_bb_no_real_insns_p to
resolve_forw_deps which only focuses on forward deps;
  - Extend the handlings to cover dbg-cnt sched_block,
add one test case for it;
  - Move free_trg_info call in schedule_region to an
appropriate place.

One thing I'm not sure about is the change in function
sched_rgn_local_finish, currently the invocation to
sched_rgn_local_free is guarded with !sel_sched_p (),
so I just follow it, but the initialization of those
structures (in sched_rgn_local_init) isn't guarded
with !sel_sched_p (), it looks odd.

--

As PR108273 shows, when there is one block which only has
NOTE_P and LABEL_P insns at non-debug mode while has some
extra DEBUG_INSN_P insns at debug mode, after scheduling
it, the DFA states would be different between debug mode
and non-debug mode.  Since at non-debug mode, the block
meets no_real_insns_p, it gets skipped; while at debug
mode, it gets scheduled, even it only has NOTE_P, LABEL_P
and DEBUG_INSN_P, the call of function advance_one_cycle
will change the DFA state.  PR108519 also shows this issue
can be exposed by some scheduler changes.

This patch is to change function no_real_insns_p to
function no_real_nondebug_insns_p by taking debug insn into
account, which make us not try to schedule for the block
having only NOTE_P, LABEL_P and DEBUG_INSN_P insns,
resulting in consistent DFA states between non-debug and
debug mode.

Changing no_real_insns_p to no_real_nondebug_insns_p caused
ICE when doing free_block_dependencies, the root cause is
that we create dependencies for debug insns, those
dependencies are expected to be resolved during scheduling
insns, but they get skipped after this change.
By checking the code, it looks it's reasonable to skip to
compute block dependences for no_real_nondebug_insns_p
blocks.  There is also another issue, which gets exposed
in SPEC2017 bmks build at option -O2 -g, is that we could
skip to schedule some block, which already gets dependency
graph built so has dependencies computed and rgn_n_insns
accumulated, then the later verification on if the graph
becomes exhausted by scheduling would fail as follow:

  /* Sanity check: verify that all region insns were
 scheduled.  */
gcc_assert (sched_rgn_n_insns == rgn_n_insns);

, and also some forward deps aren't resovled.

As Alexander pointed out, the current debug count handling
also suffers the similar issue, so this patch handles these
two cases together: one is for some block gets skipped by
!dbg_cnt (sched_block), the other is for some block which
is not no_real_nondebug_insns_p initially but becomes
no_real_nondebug_insns_p due to speculative scheduling.

This patch can be bootstrapped and regress-tested on
x86_64-redhat-linux, aarch64-linux-gnu and
powerpc64{,le}-linux-gnu.

I also verified this patch can pass SPEC2017 both intrate
and fprate bmks building at -g -O2/-O3.

Any thoughts?  Is it ok for trunk?

[1] v2: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614818.html
[2] v1: https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614224.html

BR,
Kewen
-
PR rtl-optimization/108273

gcc/ChangeLog:

* haifa-sched.cc (no_real_insns_p): Rename to ...
(no_real_nondebug_insns_p): ... this, and consider DEBUG_INSN_P insn.
* sched-ebb.cc (schedule_ebb): Replace no_real_insns_p with
no_real_nondebug_insns_p.
* sched-int.h (no_real_insns_p): Rename to ...
(no_real_nondebug_insns_p): ... this.
* sched-rgn.cc (enum rgn_bb_deps_free_action): New enum.
(bb_deps_free_actions): New static variable.
(compute_block_dependences): Skip for no_real_nondebug_insns_p.
(resolve_forw_deps): New function.
(free_block_dependencies): Check bb_deps_free_actions and call
function resolve_forw_deps for RGN_BB_DEPS_FREE_ARTIFICIAL.
(compute_priorities): Replace no_real_insns_p with
no_real_nondebug_insns_p.
(schedule_region): Replace no_real_insns_p with
no_real_nondebug_insns_p, set RGN_BB_DEPS_FREE_ARTIFICIAL if the block
get dependencies computed before but skipped now, fix up count
sched_rgn_n_insns for it too.  Call free_trg_info when the block
gets scheduled, and move sched_rgn_local_finish after the loop
of free_block_dependencies loop.
(sched_rgn_local_init): Allocate and compute bb_deps_free_actions.