On Thu, Nov 27, 2025 at 04:55:17PM +0200, Grygorii S wrote:
> Hi All,
> 
> On 21.06.25 17:14, Koichiro Den wrote:
> > When a running unit is about to be scheduled out due to a competing unit
> > with the highest remaining credit, the residual credit of the previous
> > unit is currently ignored in csched2_runtime() because it hasn't yet
> > been reinserted into the runqueue.
> > 
> > As a result, two equally weighted, busy units can often each be granted
> > almost the maximum possible runtime (i.e. consuming CSCHED2_CREDIT_INIT
> > in one shot) when only those two are active. In broad strokes two units
> > switch back and forth every 10ms (CSCHED2_MAX_TIMER). In contrast, when
> > more than two busy units are competing, such coarse runtime allocations
> > are rarely seen, since at least one active unit remains in the runqueue.
> > 
> > To ensure consistent behavior, have csched2_runtime() take into account
> > the previous unit's latest credit when it still can/wants to run.
> > 
> > Signed-off-by: Koichiro Den <[email protected]>
> > Reviewed-by: Juergen Gross <[email protected]>
> > ---
> >   xen/common/sched/credit2.c | 28 +++++++++++++++++++++-------
> >   1 file changed, 21 insertions(+), 7 deletions(-)
> > 
> 
> We observe regression on ARM64 with this patch.
> commit ae648e9f8013 ("xen/credit2: factor in previous active unit's credit in 
> csched2_runtime()")
> 
> general observation:
>  This commit causes Linux guest boot time increase  >5 times for some of our 
> the credit2
>  specific tests.
>  Reverting it makes issue gone.
> 
>  - normal log
>    (XEN) DOM1: [    6.496166] io scheduler bfq registered
>    ...
>    (XEN) DOM1: [    9.845108] Freeing unused kernel memory: 9216K
>    (XEN) DOM1: [    9.874792] Run /init as init process
>    (XEN) sched_smt_power_savings: disabled
>    (XEN) NOW=16800131328
> 
>  - failed log
>    (XEN) DOM1: [   37.281776] io scheduler bfq registered
>    (XEN) DOM1: [   61.856512] EINJ: ACPI disabled.
>    test: timed out
> 
> Run Details:
>  Platform: ARM64 (Device Tree)
>  Execution platform: qemu 6.0 (2 pCPU, 2G)
>  Boot: dom0less, 1 domain (2 vCPU)
>  Command line: "console=dtuart guest_loglvl=debug conswitch=ax"
> 
>  Dom0less cfg:
>     chosen {
>         xen,xen-bootargs = "console=dtuart guest_loglvl=debug conswitch=ax";
>         #size-cells = <0x00000002>;
>         #address-cells = <0x00000002>;
>         stdout-path = "/pl011@9000000";
>         kaslr-seed = <0x5a7b5649 0x9122e194>;
>         cpupool_0 {
>             cpupool-sched = "credit2";
>             cpupool-cpus = <0x00008001>;
>             compatible = "xen,cpupool";
>             phandle = <0xfffffffe>;
>         };
>         domU0 {
>             domain-cpupool = <0xfffffffe>;
>             vpl011;
>             cpus = <0x00000002>;
>             memory = <0x00000000 0x00040000>;
>             #size-cells = <0x00000002>;
>             #address-cells = <0x00000002>;
>             compatible = "xen,domain";
>             module@42E00000 {
>                 reg = <0x00000000 0x42e00000 0x00000000 0x000f1160>;
>                 compatible = "multiboot,ramdisk", "multiboot,module";
>             };
>             module@40400000 {
>                 bootargs = "console=ttyAMA0";
>                 reg = <0x00000000 0x40400000 0x00000000 0x02920000>;
>                 compatible = "multiboot,kernel", "multiboot,module";
>             };
>         };
>     };
> 
> Investigation:
>  It was narrowed down to a specific configuration with cpupool assigned to 
> the domain (100% reproducible):
>  Host has 2 pCPU
>  Domain has 2 vCPU
>  cpupool_0 has 1 pCPU (cpu@1 credit2)
>  domain <- cpupool_0
> 
>  if Domain is assigned 1 vCPU - no issues.
>  if cpupool_0 is assigned 2 pCPU -  no issues (seems slower a bit, but it is 
> on a error  margin level)
> 
> I'd be appreciated for any help with this (or revert :().
> 
> -- 
> Best regards,
> -grygorii
> 

Hi Grygorii,

Thank you for the detailed report. Could you please try increasing the
ratelimit_us (the -r/--ratelimit_us option), for example to 5000 or 10000
microseconds, and see whether the long boot time issue disappears? That
would help determine whether the previous behaviour (before the patch) had
simply masked the effect of the default 1ms rate limit in your setup. In
other words, after the patch merged, you may need to set -r/--ratelimit_us
explicitly to some appropriate value, which is larger than 1ms.

That said, this change touches long-standing credit2 behaviour, and we
probably should've discussed backward-compatibility more carefully. I'm
completely fine with reverting it if maintainers think that is the best
choice for now. (To be honest, I hadn't even realised that this had been
merged until receiving your email, since it only had a single Reviewed-by.)

Best regards,

-Koichiro

Reply via email to