Hi All, I am running Spark 2.3 on YARN using HDP 2.6
I am running spark job using dynamic resource allocation on YARN with minimum 2 executors and maximum 6. My job read data from parquet files which are present on S3 buckets and store some enriched data to cassandra. My question is, how does YARN decide which nodes to launch containers? I have around 12 YARN nodes running in the cluster, but still i see repeated patterns of 3-4 containers launched on the same node for a particular job. What is the best way to start debugging this reason? Akshay Bhardwaj +91-97111-33849