On Fri, Jun 30, 2023 at 6:48 PM Julien Grall <jul...@xen.org> wrote:
> Hi George, > > On 30/06/2023 12:37, George Dunlap wrote: > > The credit scheduler tries as hard as it can to ensure that it always > > runs scheduling units with positive credit (PRI_TS_UNDER) before > > running those with negative credit (PRI_TS_OVER). If the next > > runnable scheduling unit is of priority OVER, it will always run the > > load balancer, which will scour the system looking for another > > scheduling unit of the UNDER priority. > > > > Unfortunately, as the number of cores on a system has grown, the cost > > of the work-stealing algorithm has dramatically increased; a recent > > trace on a system with 128 cores showed this taking over 50 > > microseconds. > > > > Add a parameter, load_balance_ratelimit, to limit the frequency of > > load balance operations on a given pcpu. Default this to 1 > > millisecond. > > > > Invert the load balancing conditional to make it more clear, and line > > up more closely with the comment above it. > > > > Overall it might be cleaner to have the last_load_balance checking > > happen inside csched_load_balance(), but that would require either > > passing both now and spc into the function, or looking them up again; > > both of which seemed to be worse than simply checking and setting the > > values before calling it. > > > > Without this patch, on a system with a vcpu:pcpu ratio of 2:1, running > > Windows guests (which will end up calling YIELD during spinlock > > contention), this patch increased performance significantly. > > I don't understand this sentence. Did you intende to write > > "Without this patch, ..., the performance are significantly worse"? > Hmm, yes this was bad editing. The first clause was written when I was expecting to include actual numbers; but when I looked at the internal numbers I had available, they weren't easy to summarize. The revised sentence should have simply been: "On a system with a vcpu:pcpu ratio of 2:1, running Windows guests (which will end up calling YIELD during spinlock contention), this patch increased performance significantly." > > --- a/docs/misc/xen-command-line.pandoc > > +++ b/docs/misc/xen-command-line.pandoc > > @@ -1856,6 +1856,12 @@ By default, Xen will use the INVPCID instruction > for TLB management if > > it is available. This option can be used to cause Xen to fall back to > > older mechanisms, which are generally slower. > > > > +### load-balance-ratelimit > > +> `= <integer>` > > + > > +The minimum interval between load balancing events on a given pcpu. > > +At the moment only credit honors this parameter. > > I would suggest to mention the default value. So a reader don't have to > look up in the code to find out. > > Also, AFAICT, there is max value. I would mention it here too. > Ack > > +/* > > + * Minimum delay, in microseconds, between load balance operations. > > + * This prevents spending too much time doing load balancing, > particularly > > + * when the system has a high number of YIELDs due to spinlock priority > inversion. > > + */ > > +static unsigned int __read_mostly load_balance_ratelimit_us = > CSCHED_DEFAULT_LOAD_BALANCE_RATELIMIT_US; > > AFAICT, load_balance_ratelimit_us is not updated after boot. So > shouldn't the attribute be __ro_after_init? > Ack > +#define XEN_SYSCTL_CSCHED_LB_RATE_MAX_US (1000000) > > + uint32_t load_balance_ratelimit_us; > > Shouldn't this change bump the sysctl interface version? > Er, yes. v2 on its way. -George