Re: [PATCH] AArch64: Switch off early scheduling

2024-11-13 Thread Kyrylo Tkachov


> On 12 Nov 2024, at 18:55, Richard Sandiford  wrote:
> 
> Wilco Dijkstra  writes:
>> Hi,
>> 
> What do you think about disabling late scheduling as well?
 
 I think this would definitely need separate consideration and evaluation 
 given the above.
 
 Another thing to consider is the macro fusion machinery. IIRC it works 
 during scheduling so if we don’t run any scheduling we don’t get an 
 opportunity to bring those instructions together?
 
 That said, I’m not sure the scheduling actually tries to bring macro fused 
 instructions together rather than simply avoiding moving them apart.
>>> 
>>> I will run the numbers, but if useful, late scheduling could be disabled 
>>> separately from fusion
>>> scheduling. However fusion really shouldn't be done as a scheduling hack - 
>>> we should use
>>> fused RTL patterns like we do for GOT accesses and AES fusion.
>> 
>> I ran the numbers, late scheduling (using the ancient Cortex-A57 model) gives
>> 0.23% speedup on SPECINT and 0.73% on SPECFP. I guess the gains come from
>> scheduling loads and other critical operations earlier and perhaps improved 
>> dispatch
>> due to increased interleaving of operations.
> 
> Thanks.  So yeah, I agree we should keep it.
> 
> I think it was common ground that the benefits are kind-of incidental
> (in that the scheduler really is trying to fill Cortex-A57 pipeline bubbles
> rather than find a good frontend mix for modern cores) and that it would
> be better to try to target the effect that we want directly.  But in terms
> of whether the switch we have should be on or off, I agree it should be on.
> 
> The patch is OK if there are no objections by Thursday morning UTC.

The patch is fine by me. Thanks for running the numbers.
Kyrill

> 
> Richard



Re: [PATCH] AArch64: Switch off early scheduling

2024-11-12 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi,
>
 What do you think about disabling late scheduling as well?
>>>
>>> I think this would definitely need separate consideration and evaluation 
>>> given the above.
>>>
>>> Another thing to consider is the macro fusion machinery. IIRC it works 
>>> during scheduling so if we don’t run any scheduling we don’t get an 
>>> opportunity to bring those instructions together?
>>>
>>> That said, I’m not sure the scheduling actually tries to bring macro fused 
>>> instructions together rather than simply avoiding moving them apart.
>>
>> I will run the numbers, but if useful, late scheduling could be disabled 
>> separately from fusion
>> scheduling. However fusion really shouldn't be done as a scheduling hack - 
>> we should use
>> fused RTL patterns like we do for GOT accesses and AES fusion.
>
> I ran the numbers, late scheduling (using the ancient Cortex-A57 model) gives
> 0.23% speedup on SPECINT and 0.73% on SPECFP. I guess the gains come from
> scheduling loads and other critical operations earlier and perhaps improved 
> dispatch
> due to increased interleaving of operations.

Thanks.  So yeah, I agree we should keep it.

I think it was common ground that the benefits are kind-of incidental
(in that the scheduler really is trying to fill Cortex-A57 pipeline bubbles
rather than find a good frontend mix for modern cores) and that it would
be better to try to target the effect that we want directly.  But in terms
of whether the switch we have should be on or off, I agree it should be on.

The patch is OK if there are no objections by Thursday morning UTC.

Richard


Re: [PATCH] AArch64: Switch off early scheduling

2024-11-12 Thread Wilco Dijkstra
Hi,

>>> What do you think about disabling late scheduling as well?
>>
>> I think this would definitely need separate consideration and evaluation 
>> given the above.
>>
>> Another thing to consider is the macro fusion machinery. IIRC it works 
>> during scheduling so if we don’t run any scheduling we don’t get an 
>> opportunity to bring those instructions together?
>>
>> That said, I’m not sure the scheduling actually tries to bring macro fused 
>> instructions together rather than simply avoiding moving them apart.
>
> I will run the numbers, but if useful, late scheduling could be disabled 
> separately from fusion
> scheduling. However fusion really shouldn't be done as a scheduling hack - we 
> should use
> fused RTL patterns like we do for GOT accesses and AES fusion.

I ran the numbers, late scheduling (using the ancient Cortex-A57 model) gives
0.23% speedup on SPECINT and 0.73% on SPECFP. I guess the gains come from
scheduling loads and other critical operations earlier and perhaps improved 
dispatch
due to increased interleaving of operations.

Cheers,
Wilco

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra
Hi Kyrill,

> I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH 
> hooks like x86 does for bdver1-4.
> That would try to exploit the dispatch constraints information in the SWOGs 
> rather than the instruction latency and throughput tables.
> That would still require some annotation of SVE patterns but it is 
> conceptually different metadata that we’d specify in the MD files.

Yes, trying to schedule for dispatch is likely better than traditional 
scheduling on
wide OoO pipelines.

Also reducing register pressure in complex blocks may be useful as a separate 
pass
(without all the complexities of scheduling for a CPU model).

> Yeah, I’m okay with disabling the early scheduling (is 33% the worst-case 
> scenario though?
> It feels that if it was really taking that much in most code it would have 
> appeared in bugzilla as a compile-time hog)

No, it's the average when building SPECINT, so this includes linking and file 
IO overheads...

>> What do you think about disabling late scheduling as well?
>
> I think this would definitely need separate consideration and evaluation 
> given the above.
>
> Another thing to consider is the macro fusion machinery. IIRC it works during 
> scheduling so if we don’t run any scheduling we don’t get an opportunity to 
> bring those instructions together?
> 
> That said, I’m not sure the scheduling actually tries to bring macro fused 
> instructions together rather than simply avoiding moving them apart.

I will run the numbers, but if useful, late scheduling could be disabled 
separately from fusion
scheduling. However fusion really shouldn't be done as a scheduling hack - we 
should use
fused RTL patterns like we do for GOT accesses and AES fusion. 

Cheers,
Wilco

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Andrew Pinski
On Thu, Oct 31, 2024 at 11:03 AM Wilco Dijkstra  wrote:
>
> Hi Andrew,
>
> > I suspect the following scheduling models could be removed due either
> > to hw never going to production or no longer being used by anyone:
> > thunderx3t110.md
> > falkor.md
> > saphira.md
>
> If you're planning to remove these, it would also be good to remove the
> falkor-tag-collision-avoidance.cc pass and saphira.h/qdf24xx.h tunings too.
> The question is whether it would be useful to keep supporting the old CPU
> names. I believe it's better to remove them with the rest of the tunings since
> nobody would be using them, and there is no mechanism to hide a CPU
> name or mark them as obsolescent.

I am going to keep the tunings for falkor/saphira for now; maybe for
GCC 16 I will propose removing them. And yes removing the
falkor-tag-collision-avoidance is definitely something I am going to
do since it is not well tested either.

Thanks,
Andrew Pinski

>
> Cheers,
> Wilco


Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Wilco Dijkstra
Hi Andrew,

> I suspect the following scheduling models could be removed due either
> to hw never going to production or no longer being used by anyone:
> thunderx3t110.md
> falkor.md
> saphira.md

If you're planning to remove these, it would also be good to remove the
falkor-tag-collision-avoidance.cc pass and saphira.h/qdf24xx.h tunings too.
The question is whether it would be useful to keep supporting the old CPU
names. I believe it's better to remove them with the rest of the tunings since
nobody would be using them, and there is no mechanism to hide a CPU
name or mark them as obsolescent.

Cheers,
Wilco

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Andrew Pinski
On Thu, Oct 31, 2024 at 10:25 AM Kyrylo Tkachov  wrote:
>
>
>
> > On 31 Oct 2024, at 18:06, Richard Sandiford  
> > wrote:
> >
> > Wilco Dijkstra  writes:
> >> The early scheduler takes up ~33% of the total build time, however it 
> >> doesn't
> >> provide a meaningful performance gain.  This is partly because modern OoO 
> >> cores
> >> need far less scheduling, partly because the scheduler tends to create many
> >> unnecessary spills by increasing register pressure.  Building applications
> >> 56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
> >> early scheduling on AArch64.  Codesize reduces by ~0.2%.
> >>
> >> The combine_and_move pass runs if the scheduler is disabled and 
> >> aggressively
> >> combines moves.  The movsf/df patterns allow all FP immediates since they
> >> rely on a split pattern, however splits do not happen this late.  To fix 
> >> this,
> >> use a more accurate check that blocks creation of literal loads during
> >> combine_and_move.  Fix various tests that depend on scheduling by 
> >> explicitly
> >> adding -fschedule-insns.
> >>
> >> Passes bootstrap & regress, OK for commit?
> >
> > I'm in favour of this.  Obviously the numbers are what count, but
> > also from first principles:
> >
> > - I can't remember the last time a scheduling model was added to the port.
>
> thunderx3t110.md from 2020 is the most recent one IIRC.

Thunderx3 hardware never got outside of Marvell and all of the folks
who worked on that core have left Marvell that same year.
I suspect the following scheduling models could be removed due either
to hw never going to production or no longer being used by anyone:
thunderx3t110.md
falkor.md
saphira.md

The latter 2 being Qualcomm cores which have been no longer in use for
a few years now and unrelated to the new oryon based cores.
I am going to propose a patch for those 2 in a little bit, anyways so
it is less confusing to have them no longer around.

Thanks,
Andrew

>
>
> >
> > - We've (consciously) never added scheduling types for SVE.
>
> I’ve been wanting to revisit that throughout the year but whenever I work on 
> it it feels like indeed it’s not worth the trouble for big cores.
> I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH 
> hooks like x86 does for bdver1-4.
> That would try to exploit the dispatch constraints information in the SWOGs 
> rather than the instruction latency and throughput tables.
> That would still require some annotation of SVE patterns but it is 
> conceptually different metadata that we’d specify in the MD files.
>
> >
> > - It doesn't make logical sense to schedule for Neoverse V3 (say)
> >  as thought it were a Cortex A57.
>
> I seem to remember that having at least a coarse-grained breakdown of 
> instructions into GP, LS and FP/SIMD was useful.
> I had to write https://gcc.gnu.org/g:8bb9a5e66a150b73c97aeffee52b57147022a817 
> some time ago because UDOT was getting bad scheduling in some tight 
> arithmetic kernels
>
> >
> > So at this point, it seems better for scheduling to be opt-in rather
> > than opt-out.  (That is, we can switch to a tune-based default if
> > anyone does add a new scheduling model in future.)
>
> Yeah, I’m okay with disabling the early scheduling (is 33% the worst-case 
> scenario though? It feels that if it was really taking that much in most code 
> it would have appeared in bugzilla as a compile-time hog)
>
> >
> > Let's see what others think.
> >
> > Please split the md changes out into a separate pre-patch though.
> >
> > What do you think about disabling late scheduling as well?
> >
>
> I think this would definitely need separate consideration and evaluation 
> given the above.
> Another thing to consider is the macro fusion machinery. IIRC it works during 
> scheduling so if we don’t run any scheduling we don’t get an opportunity to 
> bring those instructions together?
> That said, I’m not sure the scheduling actually tries to bring macro fused 
> instructions together rather than simply avoiding moving them apart.
>
> Thanks,
> Kyrill
>
>
> > Thanks,
> > Richard
> >
> >> gcc/ChangeLog:
> >>* common/config/aarch64/aarch64-common.cc: Switch off 
> >> fschedule_insns.
> >>* config/aarch64/aarch64.md (movhf_aarch64): Use 
> >> aarch64_valid_fp_move.
> >>(movsf_aarch64): Likewise.
> >>(movdf_aarch64): Likewise.
> >>* config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> >>* config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
> >>
> >> gcc/testsuite/ChangeLog:
> >>* testsuite/gcc.target/aarch64/ldp_aligned.c: Fix test.
> >>* testsuite/gcc.target/aarch64/ldp_always.c: Likewise.
> >>* testsuite/gcc.target/aarch64/ldp_stp_10.c: Add -fschedule-insns.
> >>* testsuite/gcc.target/aarch64/ldp_stp_12.c: Likewise.
> >>* testsuite/gcc.target/aarch64/ldp_stp_13.c: Remove test.
> >>* testsuite/gcc.target/aarch64/ldp_stp_21.c: Add -fschedule

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Andrew Pinski
On Thu, Oct 31, 2024 at 10:07 AM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > The early scheduler takes up ~33% of the total build time, however it 
> > doesn't
> > provide a meaningful performance gain.  This is partly because modern OoO 
> > cores
> > need far less scheduling, partly because the scheduler tends to create many
> > unnecessary spills by increasing register pressure.  Building applications
> > 56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
> > early scheduling on AArch64.  Codesize reduces by ~0.2%.
> >
> > The combine_and_move pass runs if the scheduler is disabled and aggressively
> > combines moves.  The movsf/df patterns allow all FP immediates since they
> > rely on a split pattern, however splits do not happen this late.  To fix 
> > this,
> > use a more accurate check that blocks creation of literal loads during
> > combine_and_move.  Fix various tests that depend on scheduling by explicitly
> > adding -fschedule-insns.
> >
> > Passes bootstrap & regress, OK for commit?
>
> I'm in favour of this.  Obviously the numbers are what count, but
> also from first principles:
>
> - I can't remember the last time a scheduling model was added to the port.

We have one internally for oryon-1 but I have not had time to
benchmark with it vs without it yet but I suspect it won't help enough
to even think about upstreaming it.
I think the last model added was tsv110.md in 2020.

>
> - We've (consciously) never added scheduling types for SVE.
>
> - It doesn't make logical sense to schedule for Neoverse V3 (say)
>   as thought it were a Cortex A57.
>
> So at this point, it seems better for scheduling to be opt-in rather
> than opt-out.  (That is, we can switch to a tune-based default if
> anyone does add a new scheduling model in future.)
>
> Let's see what others think.
>
> Please split the md changes out into a separate pre-patch though.
>
> What do you think about disabling late scheduling as well?

EBB scheduling can actually help (after register allocation) due to
moving things before branches and even with branch prediction on
modern hardware being decent because sometimes the HW gets confused.

Thanks,
Andrew Pinski


>
> Thanks,
> Richard
>
> > gcc/ChangeLog:
> > * common/config/aarch64/aarch64-common.cc: Switch off 
> > fschedule_insns.
> > * config/aarch64/aarch64.md (movhf_aarch64): Use 
> > aarch64_valid_fp_move.
> > (movsf_aarch64): Likewise.
> > (movdf_aarch64): Likewise.
> > * config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> > * config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> > * testsuite/gcc.target/aarch64/ldp_aligned.c: Fix test.
> > * testsuite/gcc.target/aarch64/ldp_always.c: Likewise.
> > * testsuite/gcc.target/aarch64/ldp_stp_10.c: Add -fschedule-insns.
> > * testsuite/gcc.target/aarch64/ldp_stp_12.c: Likewise.
> > * testsuite/gcc.target/aarch64/ldp_stp_13.c: Remove test.
> > * testsuite/gcc.target/aarch64/ldp_stp_21.c: Add -fschedule-insns.
> > * testsuite/gcc.target/aarch64/ldp_stp_8.c: Likewise.
> > * testsuite/gcc.target/aarch64/ldp_vec_v2sf.c: Likewise.
> > * testsuite/gcc.target/aarch64/ldp_vec_v2si.c: Likewise.
> > * testsuite/gcc.target/aarch64/test_frame_16.c: Fix test.
> > * testsuite/gcc.target/aarch64/sve/vcond_12.c: Add -fschedule-insns.
> > * testsuite/gcc.target/aarch64/sve/acle/general/ldff1_3.c: Likewise.
> >
> > ---
> >
> > diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> > b/gcc/common/config/aarch64/aarch64-common.cc
> > index 
> > 2bfc597e333b6018970a9ee6e370a66b6d0960ef..845747e31e821c2f3970fd39ea70f046eddbe920
> >  100644
> > --- a/gcc/common/config/aarch64/aarch64-common.cc
> > +++ b/gcc/common/config/aarch64/aarch64-common.cc
> > @@ -54,6 +54,8 @@ static const struct default_options 
> > aarch_option_optimization_table[] =
> >  { OPT_LEVELS_ALL, OPT_fomit_frame_pointer, NULL, 0 },
> >  /* Enable -fsched-pressure by default when optimizing.  */
> >  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> > +/* Disable early scheduling due to high compile-time overheads.  */
> > +{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> >  /* Enable redundant extension instructions removal at -O2 and higher.  
> > */
> >  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> >  { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index 
> > 250c5b96a21ea1c969a0e77e420525eec90e4de4..b30329d7f85f5b962dca43cf12ca938898425874
> >  100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -758,6 +758,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
> >  opt_machine_mode aarch64_vq_mode (scalar_mode);
> >  opt_ma

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Kyrylo Tkachov


> On 31 Oct 2024, at 18:06, Richard Sandiford  wrote:
> 
> Wilco Dijkstra  writes:
>> The early scheduler takes up ~33% of the total build time, however it doesn't
>> provide a meaningful performance gain.  This is partly because modern OoO 
>> cores
>> need far less scheduling, partly because the scheduler tends to create many
>> unnecessary spills by increasing register pressure.  Building applications
>> 56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
>> early scheduling on AArch64.  Codesize reduces by ~0.2%.
>> 
>> The combine_and_move pass runs if the scheduler is disabled and aggressively
>> combines moves.  The movsf/df patterns allow all FP immediates since they
>> rely on a split pattern, however splits do not happen this late.  To fix 
>> this,
>> use a more accurate check that blocks creation of literal loads during
>> combine_and_move.  Fix various tests that depend on scheduling by explicitly
>> adding -fschedule-insns.
>> 
>> Passes bootstrap & regress, OK for commit?
> 
> I'm in favour of this.  Obviously the numbers are what count, but
> also from first principles:
> 
> - I can't remember the last time a scheduling model was added to the port.

thunderx3t110.md from 2020 is the most recent one IIRC.


> 
> - We've (consciously) never added scheduling types for SVE.

I’ve been wanting to revisit that throughout the year but whenever I work on it 
it feels like indeed it’s not worth the trouble for big cores.
I think the approach that I’d like to try is using the TARGET_SCHED_DISPATCH 
hooks like x86 does for bdver1-4.
That would try to exploit the dispatch constraints information in the SWOGs 
rather than the instruction latency and throughput tables.
That would still require some annotation of SVE patterns but it is conceptually 
different metadata that we’d specify in the MD files.

> 
> - It doesn't make logical sense to schedule for Neoverse V3 (say)
>  as thought it were a Cortex A57.

I seem to remember that having at least a coarse-grained breakdown of 
instructions into GP, LS and FP/SIMD was useful.
I had to write https://gcc.gnu.org/g:8bb9a5e66a150b73c97aeffee52b57147022a817 
some time ago because UDOT was getting bad scheduling in some tight arithmetic 
kernels

> 
> So at this point, it seems better for scheduling to be opt-in rather
> than opt-out.  (That is, we can switch to a tune-based default if
> anyone does add a new scheduling model in future.)

Yeah, I’m okay with disabling the early scheduling (is 33% the worst-case 
scenario though? It feels that if it was really taking that much in most code 
it would have appeared in bugzilla as a compile-time hog)

> 
> Let's see what others think.
> 
> Please split the md changes out into a separate pre-patch though.
> 
> What do you think about disabling late scheduling as well?
> 

I think this would definitely need separate consideration and evaluation given 
the above.
Another thing to consider is the macro fusion machinery. IIRC it works during 
scheduling so if we don’t run any scheduling we don’t get an opportunity to 
bring those instructions together?
That said, I’m not sure the scheduling actually tries to bring macro fused 
instructions together rather than simply avoiding moving them apart.

Thanks,
Kyrill


> Thanks,
> Richard
> 
>> gcc/ChangeLog:
>>* common/config/aarch64/aarch64-common.cc: Switch off fschedule_insns.
>>* config/aarch64/aarch64.md (movhf_aarch64): Use 
>> aarch64_valid_fp_move.
>>(movsf_aarch64): Likewise.
>>(movdf_aarch64): Likewise.
>>* config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
>>* config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>>* testsuite/gcc.target/aarch64/ldp_aligned.c: Fix test.
>>* testsuite/gcc.target/aarch64/ldp_always.c: Likewise.
>>* testsuite/gcc.target/aarch64/ldp_stp_10.c: Add -fschedule-insns.
>>* testsuite/gcc.target/aarch64/ldp_stp_12.c: Likewise.
>>* testsuite/gcc.target/aarch64/ldp_stp_13.c: Remove test.
>>* testsuite/gcc.target/aarch64/ldp_stp_21.c: Add -fschedule-insns.
>>* testsuite/gcc.target/aarch64/ldp_stp_8.c: Likewise.
>>* testsuite/gcc.target/aarch64/ldp_vec_v2sf.c: Likewise.
>>* testsuite/gcc.target/aarch64/ldp_vec_v2si.c: Likewise.
>>* testsuite/gcc.target/aarch64/test_frame_16.c: Fix test.
>>* testsuite/gcc.target/aarch64/sve/vcond_12.c: Add -fschedule-insns.
>>* testsuite/gcc.target/aarch64/sve/acle/general/ldff1_3.c: Likewise.
>> 
>> ---
>> 
>> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
>> b/gcc/common/config/aarch64/aarch64-common.cc
>> index 
>> 2bfc597e333b6018970a9ee6e370a66b6d0960ef..845747e31e821c2f3970fd39ea70f046eddbe920
>>  100644
>> --- a/gcc/common/config/aarch64/aarch64-common.cc
>> +++ b/gcc/common/config/aarch64/aarch64-common.cc
>> @@ -54,6 +54,8 @@ static const struct default_options 
>> aar

Re: [PATCH] AArch64: Switch off early scheduling

2024-10-31 Thread Richard Sandiford
Wilco Dijkstra  writes:
> The early scheduler takes up ~33% of the total build time, however it doesn't
> provide a meaningful performance gain.  This is partly because modern OoO 
> cores
> need far less scheduling, partly because the scheduler tends to create many
> unnecessary spills by increasing register pressure.  Building applications
> 56% faster is far more useful than ~0.1% improvement on SPEC, so switch off
> early scheduling on AArch64.  Codesize reduces by ~0.2%.
>
> The combine_and_move pass runs if the scheduler is disabled and aggressively
> combines moves.  The movsf/df patterns allow all FP immediates since they
> rely on a split pattern, however splits do not happen this late.  To fix this,
> use a more accurate check that blocks creation of literal loads during
> combine_and_move.  Fix various tests that depend on scheduling by explicitly
> adding -fschedule-insns.
>
> Passes bootstrap & regress, OK for commit?

I'm in favour of this.  Obviously the numbers are what count, but
also from first principles:

- I can't remember the last time a scheduling model was added to the port.

- We've (consciously) never added scheduling types for SVE.

- It doesn't make logical sense to schedule for Neoverse V3 (say)
  as thought it were a Cortex A57.

So at this point, it seems better for scheduling to be opt-in rather
than opt-out.  (That is, we can switch to a tune-based default if
anyone does add a new scheduling model in future.)

Let's see what others think.

Please split the md changes out into a separate pre-patch though.

What do you think about disabling late scheduling as well?

Thanks,
Richard

> gcc/ChangeLog:
> * common/config/aarch64/aarch64-common.cc: Switch off fschedule_insns.
> * config/aarch64/aarch64.md (movhf_aarch64): Use 
> aarch64_valid_fp_move.
> (movsf_aarch64): Likewise.
> (movdf_aarch64): Likewise.
> * config/aarch64/aarch64.cc (aarch64_valid_fp_move): New function.
> * config/aarch64/aarch64-protos.h (aarch64_valid_fp_move): Likewise.
>
> gcc/testsuite/ChangeLog:
> * testsuite/gcc.target/aarch64/ldp_aligned.c: Fix test.
> * testsuite/gcc.target/aarch64/ldp_always.c: Likewise.
> * testsuite/gcc.target/aarch64/ldp_stp_10.c: Add -fschedule-insns.
> * testsuite/gcc.target/aarch64/ldp_stp_12.c: Likewise.
> * testsuite/gcc.target/aarch64/ldp_stp_13.c: Remove test.
> * testsuite/gcc.target/aarch64/ldp_stp_21.c: Add -fschedule-insns.
> * testsuite/gcc.target/aarch64/ldp_stp_8.c: Likewise.
> * testsuite/gcc.target/aarch64/ldp_vec_v2sf.c: Likewise.
> * testsuite/gcc.target/aarch64/ldp_vec_v2si.c: Likewise.
> * testsuite/gcc.target/aarch64/test_frame_16.c: Fix test.
> * testsuite/gcc.target/aarch64/sve/vcond_12.c: Add -fschedule-insns.
> * testsuite/gcc.target/aarch64/sve/acle/general/ldff1_3.c: Likewise.
>
> ---
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> 2bfc597e333b6018970a9ee6e370a66b6d0960ef..845747e31e821c2f3970fd39ea70f046eddbe920
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -54,6 +54,8 @@ static const struct default_options 
> aarch_option_optimization_table[] =
>  { OPT_LEVELS_ALL, OPT_fomit_frame_pointer, NULL, 0 },
>  /* Enable -fsched-pressure by default when optimizing.  */
>  { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
> +/* Disable early scheduling due to high compile-time overheads.  */
> +{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
>  /* Enable redundant extension instructions removal at -O2 and higher.  */
>  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>  { OPT_LEVELS_2_PLUS, OPT_mearly_ra_, NULL, AARCH64_EARLY_RA_ALL },
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 250c5b96a21ea1c969a0e77e420525eec90e4de4..b30329d7f85f5b962dca43cf12ca938898425874
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -758,6 +758,7 @@ bool aarch64_advsimd_struct_mode_p (machine_mode mode);
>  opt_machine_mode aarch64_vq_mode (scalar_mode);
>  opt_machine_mode aarch64_full_sve_mode (scalar_mode);
>  bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
> +bool aarch64_valid_fp_move (rtx, rtx, machine_mode);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
>  bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
> HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 2647293f7cf020378dacc37b7bfbccc856573e44..965ec18412a6486e6ac4ff2e4a7d742bf61e5d75
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11223,6 +11223,36 @@ aarch64_can_const_movi_rtx_p (rtx x, machine_mode 
> mode)
>