On Wed, 29 Sep 2021 10:22:39 +0100 Daniel P. Berrangé <berra...@redhat.com> wrote:
> On Wed, Sep 29, 2021 at 11:05:31AM +0200, Igor Mammedov wrote: > > On Tue, 28 Sep 2021 00:47:01 +0800 > > Nan Wang <wangnan.li...@bytedance.com> wrote: > > > > > On 2021/9/27 11:16, David Hildenbrand wrote: > > > > On 27.09.21 15:19, Nan Wang wrote: > > > >> From: "wangnan.light" <wangnan.li...@bytedance.com> > > > >> > > > >> the default number of prealloc threads is 1, for huge memory backend > > > >> file, single thread touch page is really slow. > > > >> We can adjust thread number by prealloc-threads property, but if the > > > >> default value updated to MachineState::smp::cpus may be better. > > > >> For example, old version of qemu(prealloc-threads have not been > > > >> introduced yet), the value of threads num is MachineState::smp::cpus, > > > >> so if someone use the same commandline to start current verion of qemu > > > >> and old version of qemu which will lead to different behaviors. > > > > > > > > The introducing patch mentions: > > > > > > > > commit ffac16fab33bb42f17e47624985220c1fd864e9d > > > > Author: Igor Mammedov <imamm...@redhat.com> > > > > Date: Wed Feb 19 11:09:50 2020 -0500 > > > > > > > > hostmem: introduce "prealloc-threads" property > > > > > > > > the property will allow user to specify number of threads to use > > > > in pre-allocation stage. It also will allow to reduce implicit > > > > hostmem dependency on current_machine. > > > > On object creation it will default to 1, but via machine > > > > compat property it will be updated to MachineState::smp::cpus > > > > to keep current behavior for hostmem and main RAM (which is > > > > now also hostmem based). > > > > > > > > So it looks like we want to do the latter via compat properties > > > > eventually. > > > > > > > > However, I'd like to note that more prealloc threads might be good for > > > > large backends, and might be bad for small backends. To me, it feels > > > > like a workload that relies on this should really do this manually. So > > > > I > > > > am still not sure if this is the right thing to do. > > > Yes, I agree with you "more prealloc threas are good for large backends, > > > and bad for small backends". But I think most situation large backends > > > always with large vcpu numbers and small backens always with small vcpu > > > numbers, because most users will not create a vm with large vcpu numbers > > > with small memory. > > > > > > > > > > > > > > Note that qapi/qom.json: > > > > > > > > "@prealloc-threads: number of CPU threads to use for prealloc (default: > > > > 1", so that doc would be wrong now. > > > > > > > > Why exactly can't workload that cares not simply set this manually? > > > > Performance tuning smells like something to be done manually for a > > > > specific workload. > > > > > > > It is a simply way that let workload set the prealloc threads manually. > > > For example, for large backends it set many prealloc threads, and set 1 > > > prealloc threads manually for small backends. Yes, workload can > > > `maunally` set prealloc thread to 1, rather than use `default` value 1. > > > So when workload want to(or maybe just forget specify the > > > prealloc-threads property) use the default value, I think the > > > MachineState::smp::cpus maybe better than 1. > > > > as commit mentioned by David states, it creates implicit dependency > > on Machine and we were working getting rid of such dependencies > > where it's possible. > > > > So if you have to change prealloc-threads to a larger number, > > you can try to use specific machine compat properties to do it, > > instead of pushing machine to generic backend code. But 'good' > > default for your workload doesn't guaranties it's a good one > > another. > > > > My preference is that user (mgmt layer) should set property > > explicitly if it cares. It's leads to more stable VM config, > > as opposed to using defaults which could change over time and > > unexpectedly 'regress' such VMs, and can factor in host/workload > > specific nuances without need to change QEMU. > > Setting prealloc_threads to match vCPUs count feels like it is making > an assumption that if we've allowed 4 vCPUs, it is OK for the prealloc > to consume 4 host CPUs. This assumption could be tricky when QEMU is > strictly pinned to host CPUs, as vCPU threads are pinned to some pCPUs > but emulator threads might be pinned differently. > > Would there still be a performance advantage to prealloc_threads > 1, > if all non-vCPU threads are pinned to the same single host CPU ? I'd imagine it will only introduce unnecessary task contention. Current conservative default (1) is the best we can do without knowing workload/host configuration, since it affects host and already running VMs less than higher number of pre-allocation threads. > > > Regards, > Daniel