Hi Hari, Thanks for this information.
Do you have any resources on/can explain, why YARN has this as default behaviour? What would be the advantages/scenarios to have multiple assignments in single heartbeat? Regards Akshay Bhardwaj +91-97111-33849 On Mon, May 20, 2019 at 1:29 PM Hariharan <hariharan...@gmail.com> wrote: > Hi Akshay, > > I believe HDP uses the capacity scheduler by default. In the capacity > scheduler, assignment of multiple containers on the same node is > determined by the option > yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled, > which is true by default. If you would like YARN to spread out the > containers, you can set this for false. > > You can read learn about this and associated parameters here > - > https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > > ~ Hari > > > On Mon, May 20, 2019 at 11:16 AM Akshay Bhardwaj > <akshay.bhardwaj1...@gmail.com> wrote: > > > > Hi All, > > > > Just floating this email again. Grateful for any suggestions. > > > > Akshay Bhardwaj > > +91-97111-33849 > > > > > > On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj < > akshay.bhardwaj1...@gmail.com> wrote: > >> > >> Hi All, > >> > >> I am running Spark 2.3 on YARN using HDP 2.6 > >> > >> I am running spark job using dynamic resource allocation on YARN with > minimum 2 executors and maximum 6. My job read data from parquet files > which are present on S3 buckets and store some enriched data to cassandra. > >> > >> My question is, how does YARN decide which nodes to launch containers? > >> I have around 12 YARN nodes running in the cluster, but still i see > repeated patterns of 3-4 containers launched on the same node for a > particular job. > >> > >> What is the best way to start debugging this reason? > >> > >> Akshay Bhardwaj > >> +91-97111-33849 >