I'll share some data from my daily work.
My server has 128 CPUs and 256G memory. When there were two world builds (40000+ tasks for each world build), I got OOM several times. I once got an OOM when there was only one world build. So I mounted an extra 256G swap on that server.

I just noticed that the 64 cap was there since 2021. So all these OOM happened even with this 64 cap. So we'd better not increase this default cap number unless it's proved to be really necessary.

Regards,
Qi


On 11/29/24 02:49, Jörg Sommer via lists.openembedded.org wrote:
Ross Burton schrieb am Do 28. Nov, 13:36 (+0000):
On 27 Nov 2024, at 16:33, Jörg Sommer via lists.openembedded.org 
<[email protected]> wrote:
From: Jörg Sommer <[email protected]>

We have a system with 96 CPUs and 128 are not uncommon. The border of 64
limits the number of parallel tasks make or ninja spawns, because the value
goes into `PARALLEL_MAKE ?= "-j ${@oe.utils.cpu_count()}"`. If only a single
recipe gets build (e.g. rust, due to dependencies) this leaves one third of
our CPUs idle.
192 seems like it was arbitrarily chosen as “more than your current
system”: if we’re doing that then we should just remove the maximum cap by
reverting the commit that added it in the first place.

The point of the default is to be reasonable, and in my benchmarking on a
system with 128 cores going beyond 64 only gives you more chance of OOMs,
I/O contention, and other users of the presumably shared machine being
angry.
How much RAM did this system have? Ours has 128GB and it's more or less
empty during the build. Also the NVME-RAID shows seldom an exhaustion. To
the point that I an build at least two of our images in parallel in a tmpfs.

I try to record some graphs with systemd-bootchart. It's a bit tricky,
because I have to split the recording. Do you know a better tool? When
analysing a system I watch with atop, but this doesn't give graphs. And our
monitoring system contains too much information.

If you have a powerful server that only does a single build then you’re
welcome to set PARALLEL_MAKE = “-j128” in your build environment to take
full advantage of it.
My other solution is to run three or more builds in parallel, but there are
not often many images to build at the same time.


Regards, Jörg




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#207998): 
https://lists.openembedded.org/g/openembedded-core/message/207998
Mute This Topic: https://lists.openembedded.org/mt/109808873/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to