On Fri, Jun 30, 2023 at 6:48 PM Julien Grall <jul...@xen.org> wrote:

> Hi George,
>
> On 30/06/2023 12:37, George Dunlap wrote:
> > The credit scheduler tries as hard as it can to ensure that it always
> > runs scheduling units with positive credit (PRI_TS_UNDER) before
> > running those with negative credit (PRI_TS_OVER).  If the next
> > runnable scheduling unit is of priority OVER, it will always run the
> > load balancer, which will scour the system looking for another
> > scheduling unit of the UNDER priority.
> >
> > Unfortunately, as the number of cores on a system has grown, the cost
> > of the work-stealing algorithm has dramatically increased; a recent
> > trace on a system with 128 cores showed this taking over 50
> > microseconds.
> >
> > Add a parameter, load_balance_ratelimit, to limit the frequency of
> > load balance operations on a given pcpu.  Default this to 1
> > millisecond.
> >
> > Invert the load balancing conditional to make it more clear, and line
> > up more closely with the comment above it.
> >
> > Overall it might be cleaner to have the last_load_balance checking
> > happen inside csched_load_balance(), but that would require either
> > passing both now and spc into the function, or looking them up again;
> > both of which seemed to be worse than simply checking and setting the
> > values before calling it.
> >
> > Without this patch, on a system with a vcpu:pcpu ratio of 2:1, running
> > Windows guests (which will end up calling YIELD during spinlock
> > contention), this patch increased performance significantly.
>
> I don't understand this sentence. Did you intende to write
>
> "Without this patch, ..., the performance are significantly worse"?
>

Hmm, yes this was bad editing.  The first clause was written when I was
expecting to include actual numbers; but when I looked at the internal
numbers I had available, they weren't easy to summarize.  The revised
sentence should have simply been:

"On a system with a vcpu:pcpu ratio of 2:1, running Windows guests (which
will end up calling YIELD during spinlock contention), this patch increased
performance significantly."



> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1856,6 +1856,12 @@ By default, Xen will use the INVPCID instruction
> for TLB management if
> >   it is available.  This option can be used to cause Xen to fall back to
> >   older mechanisms, which are generally slower.
> >
> > +### load-balance-ratelimit
> > +> `= <integer>`
> > +
> > +The minimum interval between load balancing events on a given pcpu.
> > +At the moment only credit honors this parameter.
>
> I would suggest to mention the default value. So a reader don't have to
> look up in the code to find out.
>
> Also, AFAICT, there is max value. I would mention it here too.
>

Ack


> > +/*
> > + * Minimum delay, in microseconds, between load balance operations.
> > + * This prevents spending too much time doing load balancing,
> particularly
> > + * when the system has a high number of YIELDs due to spinlock priority
> inversion.
> > + */
> > +static unsigned int __read_mostly load_balance_ratelimit_us =
> CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US;
>
> AFAICT, load_balance_ratelimit_us is not updated after boot. So
> shouldn't the attribute be __ro_after_init?
>

Ack


> +#define XEN_SYSCTL_CSCHED_LB_RATE_MAX_US (1000000)
> > +    uint32_t load_balance_ratelimit_us;
>
> Shouldn't this change bump the sysctl interface version?
>

Er, yes.

v2 on its way.

 -George

Reply via email to