Re: [to-be-committed][RISC-V] Clamp long reservations to 7c

2026-01-09 Thread Jeffrey Law




On 1/9/2026 12:27 AM, Richard Biener wrote:

Does it make sense to impose such clamping from extern as argument
to genautomata?  That would leave the large reservations in the .md file
for documentation purposes (I'd have added comments in places where
you clamped at least for this reason)?
It conceptually makes sense and as Andrew notes there are other cases 
where we're clamping.


There's alternate approaches like we see with the power port where each 
part's scheduling description often has multiple DFAs with the long 
running reservations in one DFA and everything else in another.   When 
the long running reservations are in a distinct DFA, the explosion 
problems aren't so bad.  The problem with that approach is you can't 
model resource contentions across the distinct DFAs -- so if you're 
focused on dispatch more than latency or other function unit hazards, it 
falls flat.


It's unclear how clamping those cases would behave, though I suspect it 
just doesn't matter in practice.


Jeff





Re: [to-be-committed][RISC-V] Clamp long reservations to 7c

2026-01-09 Thread Andrew Pinski
On Thu, Jan 8, 2026 at 11:28 PM Richard Biener
 wrote:
>
> On Fri, Jan 9, 2026 at 12:33 AM Jeffrey Law
>  wrote:
> >
> > So I've been noticing the cycle time for a native build/test on the
> > Pioneer and BPI rising over the last many months.  I've suspected a pain
> > point is likely genautomata due to long reservations in the DFAs.
> > Trying to describe a 30+ cycle bubble in the pipeline just isn't useful
> > and causes the DFA to blow up.
> >
> > This is time to build insn-automata.cc using an optimized genautomata
> > using my skylake server cross compiling to riscv64. The baseline is what
> > we have today.  Then I clamped the reservations (but not the latency) to
> > 7c.  7c is arbitrary, but known not to blow up the DFA.  I fixed the BPI
> > first, then the Andes 23 and so-on.
> >
> > Baseline 52s
> > BPI  52s
> > Andes-23 45s
> > Andes-25 16s
> > Andes-45 16s
> > Generic  15s
> > Mips-870015s
> > Sifive-7 13s
> > Final13s
> >
> > That's a significant improvement, though I probably wouldn't go forward
> > with just that improvement.  It's less than a minute and skylake systems
> > aren't exactly new anymore...
> >
> > Let's try that with an unoptimized genautomata.  I often build that way
> > when debugging.
> >
> >
> > Baseline343s
> > Final79s
> >
> > So that's saving ~4m on my skylake server for a common build. Given I
> > use ccache, that 4m is often a significant amount of the build time.  So
> > this feels like a better motivating example.
> >
> > But I'm really after bringing down bootstrap cycle times on the BPI and
> > Pioneer.  So let's see what the BPI does.  For an optimized genautomata
> > we get (not testing all the intermediate steps):
> >
> > Baseline 310s
> > Final:   110s
> >
> > Not bad.  And if we look at unoptimized genautomata:
> >
> > Baseline:   2196s
> > Final:   553s
> >
> > Now we can see why bootstrap times have crept up meaningfully. That's
> > ~27 minutes out of a 9hr bootstrap time on the BPI (pure bootstrap, no
> > testing).  The effect is more pronounced on the Pioneer where the
> > improvement is 30+ minutes on a 4hr bootstrap time (each core is slower,
> > but there's 8x as many cores).
> >
> > Tested on riscv{32,64}-elf and bootstrapped on the Pioneer (regression
> > testing in progress).  I'll wait for pre-commit CI to do its thing.
>
> Does it make sense to impose such clamping from extern as argument
> to genautomata?  That would leave the large reservations in the .md file
> for documentation purposes (I'd have added comments in places where
> you clamped at least for this reason)?

Considering is this the 4 target that has done the clamping, I think
we should do the clamping in a generic way.
aarch64, mips and x86_64 were the other 3 I know of that has done the
clamping before.

Thanks,
Andrew


>
> Richard.
>
> >
> >
> > Jeff
> >
> >
> >
> >
> >
> >
> >
> >
> >


Re: [to-be-committed][RISC-V] Clamp long reservations to 7c

2026-01-08 Thread Richard Biener
On Fri, Jan 9, 2026 at 12:33 AM Jeffrey Law
 wrote:
>
> So I've been noticing the cycle time for a native build/test on the
> Pioneer and BPI rising over the last many months.  I've suspected a pain
> point is likely genautomata due to long reservations in the DFAs.
> Trying to describe a 30+ cycle bubble in the pipeline just isn't useful
> and causes the DFA to blow up.
>
> This is time to build insn-automata.cc using an optimized genautomata
> using my skylake server cross compiling to riscv64. The baseline is what
> we have today.  Then I clamped the reservations (but not the latency) to
> 7c.  7c is arbitrary, but known not to blow up the DFA.  I fixed the BPI
> first, then the Andes 23 and so-on.
>
> Baseline 52s
> BPI  52s
> Andes-23 45s
> Andes-25 16s
> Andes-45 16s
> Generic  15s
> Mips-870015s
> Sifive-7 13s
> Final13s
>
> That's a significant improvement, though I probably wouldn't go forward
> with just that improvement.  It's less than a minute and skylake systems
> aren't exactly new anymore...
>
> Let's try that with an unoptimized genautomata.  I often build that way
> when debugging.
>
>
> Baseline343s
> Final79s
>
> So that's saving ~4m on my skylake server for a common build. Given I
> use ccache, that 4m is often a significant amount of the build time.  So
> this feels like a better motivating example.
>
> But I'm really after bringing down bootstrap cycle times on the BPI and
> Pioneer.  So let's see what the BPI does.  For an optimized genautomata
> we get (not testing all the intermediate steps):
>
> Baseline 310s
> Final:   110s
>
> Not bad.  And if we look at unoptimized genautomata:
>
> Baseline:   2196s
> Final:   553s
>
> Now we can see why bootstrap times have crept up meaningfully. That's
> ~27 minutes out of a 9hr bootstrap time on the BPI (pure bootstrap, no
> testing).  The effect is more pronounced on the Pioneer where the
> improvement is 30+ minutes on a 4hr bootstrap time (each core is slower,
> but there's 8x as many cores).
>
> Tested on riscv{32,64}-elf and bootstrapped on the Pioneer (regression
> testing in progress).  I'll wait for pre-commit CI to do its thing.

Does it make sense to impose such clamping from extern as argument
to genautomata?  That would leave the large reservations in the .md file
for documentation purposes (I'd have added comments in places where
you clamped at least for this reason)?

Richard.

>
>
> Jeff
>
>
>
>
>
>
>
>
>


[to-be-committed][RISC-V] Clamp long reservations to 7c

2026-01-08 Thread Jeffrey Law
So I've been noticing the cycle time for a native build/test on the 
Pioneer and BPI rising over the last many months.  I've suspected a pain 
point is likely genautomata due to long reservations in the DFAs.  
Trying to describe a 30+ cycle bubble in the pipeline just isn't useful 
and causes the DFA to blow up.


This is time to build insn-automata.cc using an optimized genautomata 
using my skylake server cross compiling to riscv64. The baseline is what 
we have today.  Then I clamped the reservations (but not the latency) to 
7c.  7c is arbitrary, but known not to blow up the DFA.  I fixed the BPI 
first, then the Andes 23 and so-on.


Baseline     52s
BPI          52s
Andes-23     45s
Andes-25     16s
Andes-45     16s
Generic      15s
Mips-8700    15s
Sifive-7     13s
Final        13s

That's a significant improvement, though I probably wouldn't go forward 
with just that improvement.  It's less than a minute and skylake systems 
aren't exactly new anymore...


Let's try that with an unoptimized genautomata.  I often build that way 
when debugging.



Baseline    343s
Final        79s

So that's saving ~4m on my skylake server for a common build. Given I 
use ccache, that 4m is often a significant amount of the build time.  So 
this feels like a better motivating example.


But I'm really after bringing down bootstrap cycle times on the BPI and 
Pioneer.  So let's see what the BPI does.  For an optimized genautomata 
we get (not testing all the intermediate steps):


Baseline     310s
Final:       110s

Not bad.  And if we look at unoptimized genautomata:

Baseline:   2196s
Final:       553s

Now we can see why bootstrap times have crept up meaningfully. That's 
~27 minutes out of a 9hr bootstrap time on the BPI (pure bootstrap, no 
testing).  The effect is more pronounced on the Pioneer where the 
improvement is 30+ minutes on a 4hr bootstrap time (each core is slower, 
but there's 8x as many cores).


Tested on riscv{32,64}-elf and bootstrapped on the Pioneer (regression 
testing in progress).  I'll wait for pre-commit CI to do its thing.



Jeff









gcc/
* config/riscv/andes-23-series.md: Clamp reservations to 7c.
* config/riscv/andes-25-series.md: Likewise.
* config/riscv/andes-45-series.md: Likewise.
* config/riscv/generic.md: Likewise.
* config/riscv/mips-p8700.md: Likewise.
* config/riscv/sifive-7.md: Likewise.
* config/riscv/spacemit-x60.md: Likewise.

diff --git a/gcc/config/riscv/andes-23-series.md 
b/gcc/config/riscv/andes-23-series.md
index 8e19e05da17d..a1bb3235903f 100644
--- a/gcc/config/riscv/andes-23-series.md
+++ b/gcc/config/riscv/andes-23-series.md
@@ -72,13 +72,13 @@ (define_insn_reservation "andes_23_idivsi" 35
   (and (eq_attr "tune" "andes_23_series")
(and (eq_attr "type" "idiv")
 (eq_attr "mode" "SI")))
-  "andes_23_pipe_unify, andes_23_mdu* 34")
+  "andes_23_pipe_unify, andes_23_mdu* 6")
 
 (define_insn_reservation "andes_23_idivdi" 35
   (and (eq_attr "tune" "andes_23_series")
(and (eq_attr "type" "idiv")
 (eq_attr "mode" "DI")))
-  "andes_23_pipe_unify, andes_23_mdu* 34")
+  "andes_23_pipe_unify, andes_23_mdu* 6")
 
 (define_insn_reservation "andes_23_xfer" 1
   (and (eq_attr "tune" "andes_23_series")
@@ -103,12 +103,12 @@ (define_insn_reservation "andes_23_fpu_mac" 4
 (define_insn_reservation "andes_23_fpu_div" 33
   (and (eq_attr "tune" "andes_23_series")
(eq_attr "type" "fdiv"))
-  "andes_23_pipe_unify, andes_23_fpu*33")
+  "andes_23_pipe_unify, andes_23_fpu*6")
 
 (define_insn_reservation "andes_23_fpu_sqrt" 33
   (and (eq_attr "tune" "andes_23_series")
(eq_attr "type" "fsqrt"))
-  "andes_23_pipe_unify, andes_23_fpu*33")
+  "andes_23_pipe_unify, andes_23_fpu*6")
 
 (define_insn_reservation "andes_23_fpu_move" 2
   (and (eq_attr "tune" "andes_23_series")
diff --git a/gcc/config/riscv/andes-25-series.md 
b/gcc/config/riscv/andes-25-series.md
index ef1a926de864..bb22ffbc2467 100644
--- a/gcc/config/riscv/andes-25-series.md
+++ b/gcc/config/riscv/andes-25-series.md
@@ -88,13 +88,13 @@ (define_insn_reservation "andes_25_idivsi" 38
   (and (eq_attr "tune" "andes_25_series")
(and (eq_attr "type" "idiv")
(eq_attr "mode" "SI")))
-  "andes_25_pipe, andes_25_mdu * 34")
+  "andes_25_pipe, andes_25_mdu * 6")
 
 (define_insn_reservation "andes_25_idivdi" 70
   (and (eq_attr "tune" "andes_25_series")
(and (eq_attr "type" "idiv")
(eq_attr "mode" "DI")))
-  "andes_25_pipe, andes_25_mdu * 66")
+  "andes_25_pipe, andes_25_mdu * 6")
 
 (define_insn_reservation "andes_25_xfer" 1
   (and (eq_attr "tune" "andes_25_series")
@@ -119,12 +119,12 @@ (define_insn_reservation "andes_25_fpu_mac" 5
 (define_insn_reservation "andes_25_fpu_div" 33
   (and (eq_attr "tune" "andes_25_series")
(eq_attr "type" "fdiv"))
-  "andes_25_fpu_arith, andes_25_fpu_eu * 27")
+  "andes_25_fpu_arith, andes_25_fpu_eu * 6")
 
 (define_insn_reservation "andes_2