Have you checked the yarn.scheduler.fair.assignmultiple and yarn.scheduler.fair.max.assign parameters for the ResourceManager configuration?
On Wed, Jan 9, 2019 at 9:49 AM Or Raz <r...@post.bgu.ac.il> wrote: > How can I change/suggest a different allocation of containers to tasks in > Hadoop? Regarding a native Hadoop (2.9.1) cluster on AWS. > > I am running a native Hadoop cluster (2.9.1) on AWS (with EC2, not EMR) > and I want the scheduling/allocating of the containers (Mappers/Reducers) > would be more balanced than it is currently. It seems like RM is assigning > the Mappers in a Bin Packing way (where the data resides) and for the > reducers, it looks more balanced. My setup includes three Machines with > replication rate three (all the data is on every machine), and I run my > jobs with mapreduce.job.reduce.slowstart.completedmaps=0 to start shuffle > as fast as possible (It is vital for me that all the containers are working > in concurrency, it is a must condition). Also, according to the EC2 > instances I have chosen and my settings of the YARN cluster, I can run at > most 93 containers (31 each). > > For example, if I want to have nine reducers then (93-9-1=83), 83 > containers could be left for the mappers, and one is for the AM. I have > played with the size of split input > (mapreduce.input.fileinputformat.split.minsize, > mapreduce.input.fileinputformat.split.maxsize) to find the right balance > where all of the machines have the same "work" for the map phase. But it > seems like the first 31 mappers would be allocated in one computer, the > next 31 to the second one and the last 31 in the last machine. Thus, I can > try to use 87 mappers where 31 of them in Machine #1, another 31 in Machine > #2 and another 25 in Machine #3 and the rest is left for the reducers and > as Machine #1 and Machine #2 are fully occupied then the reducers would > have to be placed in Machine #3. This way I get an almost balanced > allocation of mappers at the expense of unbalanced reducers allocation. And > this is not what I want... > > # of mappers = size_input / split size [Bytes] > > split size > =max(mapreduce.input.fileinputformat.split.minsize,min(mapreduce.input.fileinputformat.split.maxsize, > dfs.blocksize)) >