On Tue, Nov 04, 2025 at 08:33:05PM +0000, Jon Kohler wrote: > > > > On Nov 3, 2025, at 4:14 PM, Daniel P. Berrangé <[email protected]> wrote: > > > > On Mon, Nov 03, 2025 at 11:57:50AM -0700, Jon Kohler wrote: > >> Increase MAX_MEM_PREALLOC_THREAD_COUNT from 16 to 32. This was last > >> touched in 2017 [1] and, since then, physical machine sizes and VMs > >> therein have continue to get even bigger, both on average and on the > >> extremes. > >> > >> For very large VMs, using 16 threads to preallocate memory can be a > >> non-trivial bottleneck during VM start-up and migration. Increasing > >> this limit to 32 threads reduces the time taken for these operations. > >> > >> Test results from quad socket Intel 8490H (4x 60 cores) show a fairly > >> linear gain of 50% with the 2x thread count increase. > >> > >> --------------------------------------------- > >> Idle Guest w/ 2M HugePages | Start-up time > >> --------------------------------------------- > >> 240 vCPU, 7.5TB (16 threads) | 2m41.955s > >> --------------------------------------------- > >> 240 vCPU, 7.5TB (32 threads) | 1m19.404s > >> --------------------------------------------- > > > > If we're configuring a guest with 240 vCPUs, then this implies the admin > > is expecting that the guest will consume upto 240 host CPUs worth of > > compute time. > > > > What is the purpose of limiting the number of prealloc threads to a > > value that is an order of magnitude less than the number of vCPUs the > > guest has been given ? > > Daniel - thanks for the quick review and thoughts here. > > I looked back through the original commits that led up to the current 16 > thread max, and it wasn’t immediately clear to me why we clamped it at > 16. Perhaps there was some other contention at the time. > > > Have you measured what startup time would look like with 240 prealloc > > threads ? Do we hit some scaling limit before that point making more > > prealloc threads counter-productive ? > > I have, and it isn’t wildly better, it comes down to about 50-ish seconds, > as you start running into practical limitations on the speed of memory, as > well as context switching if you’re doing other things on the host at the > same time. > > In playing around with some other values, here’s how they shake out: > 32 threads: 1m19s > 48 threads: 1m4s > 64 threads: 59s > … > 240 threads: 50s > > This also looks much less exciting when the amount of memory is > smaller. For smaller memory sizes (I’m testing with 7.5TB), anything > smaller than that gets less and less fun from a speedup perspective. > > Putting that all together, 32 seemed like a sane number with a solid > speedup on fairly modern hardware.
Yep, that's useful background, I've no objectino to picking 32. Perhaps worth putting a bit more of this details into the commit message as background. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
