Hello, I run a Spark cluster on YARN, and we have a bunch of client-mode applications we use for interactive work. Whenever we start one of this, an application master container is started.
My understanding is that this is mostly an empty shell, used to request further containers or get status from YARN. Is that correct? spark.yarn.am.cores is 1, and that AM gets one full vCore on the cluster. Because I am using DominantResourceCalculator to take vCores into account for scheduling, this results in a lot of unused CPU capacity overall because all those AMs each block one full vCore. With enough jobs, this adds up quickly. I am trying to understand if we can work around that -- ideally, by allocating fractional vCores (e.g., give 100 millicores to the AM), or by allocating no vCores at all for the AM (I am fine with a bit of oversubscription because of that). Any idea on how to avoid blocking so many YARN vCores just for the Spark AMs? Thanks!