Spark-YARN | Scheduling of containers

Akshay Bhardwaj Sun, 19 May 2019 11:56:06 -0700

Hi All,

I am running Spark 2.3 on YARN using HDP 2.6


I am running spark job using dynamic resource allocation on YARN with
minimum 2 executors and maximum 6. My job read data from parquet files
which are present on S3 buckets and store some enriched data to cassandra.

My question is, how does YARN decide which nodes to launch containers?
I have around 12 YARN nodes running in the cluster, but still i see
repeated patterns of 3-4 containers launched on the same node for a
particular job.

What is the best way to start debugging this reason?

Akshay Bhardwaj
+91-97111-33849

Spark-YARN | Scheduling of containers

Reply via email to