Have you checked the yarn.scheduler.fair.assignmultiple
and yarn.scheduler.fair.max.assign parameters for the ResourceManager
configuration?

On Wed, Jan 9, 2019 at 9:49 AM Or Raz <r...@post.bgu.ac.il> wrote:

> How can I change/suggest a different allocation of containers to tasks in
> Hadoop? Regarding a native Hadoop (2.9.1) cluster on AWS.
>
> I am running a native Hadoop cluster (2.9.1) on AWS (with EC2, not EMR)
> and I want the scheduling/allocating of the containers (Mappers/Reducers)
> would be more balanced than it is currently. It seems like RM is assigning
> the Mappers in a Bin Packing way (where the data resides) and for the
> reducers, it looks more balanced. My setup includes three Machines with
> replication rate three (all the data is on every machine), and I run my
> jobs with mapreduce.job.reduce.slowstart.completedmaps=0 to start shuffle
> as fast as possible (It is vital for me that all the containers are working
> in concurrency, it is a must condition). Also, according to the EC2
> instances I have chosen and my settings of the YARN cluster, I can run at
> most 93 containers (31 each).
>
> For example, if I want to have nine reducers then (93-9-1=83), 83
> containers could be left for the mappers, and one is for the AM. I have
> played with the size of split input
> (mapreduce.input.fileinputformat.split.minsize,
> mapreduce.input.fileinputformat.split.maxsize) to find the right balance
> where all of the machines have the same "work" for the map phase. But it
> seems like the first 31 mappers would be allocated in one computer, the
> next 31 to the second one and the last 31 in the last machine. Thus, I can
> try to use 87 mappers where 31 of them in Machine #1, another 31 in Machine
> #2 and another 25 in Machine #3 and the rest is left for the reducers and
> as Machine #1 and Machine #2 are fully occupied then the reducers would
> have to be placed in Machine #3. This way I get an almost balanced
> allocation of mappers at the expense of unbalanced reducers allocation. And
> this is not what I want...
>
> # of mappers = size_input / split size [Bytes]
>
> split size
> =max(mapreduce.input.fileinputformat.split.minsize,min(mapreduce.input.fileinputformat.split.maxsize,
> dfs.blocksize))
>

Reply via email to