Hello,

I run a Spark cluster on YARN, and we have a bunch of client-mode applications 
we use for interactive work. Whenever we start one of this, an application 
master container is started.

My understanding is that this is mostly an empty shell, used to request further 
containers or get status from YARN. Is that correct?

spark.yarn.am.cores is 1, and that AM gets one full vCore on the cluster. 
Because I am using DominantResourceCalculator to take vCores into account for 
scheduling, this results in a lot of unused CPU capacity overall because all 
those AMs each block one full vCore. With enough jobs, this adds up quickly.

I am trying to understand if we can work around that -- ideally, by allocating 
fractional vCores (e.g., give 100 millicores to the AM), or by allocating no 
vCores at all for the AM (I am fine with a bit of oversubscription because of 
that).

Any idea on how to avoid blocking so many YARN vCores just for the Spark AMs?

Thanks!

Reply via email to