Re: The number of simultaneous map tasks is unexpected.

Tomasz Guziałek Thu, 10 Jul 2014 01:11:28 -0700

Hi Adam,

yarn.nodemanager.resource.memory-mb = 2370 MiB,
yarn.nodemanager.resource.cpu-vcores = 2,
yarn.resourcemanager.scheduler.class =
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler,
Use CGroups for Resource Management
yarn.nodemanager.linux-container-executor.resources-handler.class
is NOT checked.


Considering that I have 8 cores in my cluster and not 16 as I thought at
the beginning, starting more than 7 map tasks (and AM) is not supposed to
give me performance gains as all the cores have been used already. Am I
right?

True, for hundreds or thousands of nodes a single coordination node might
be a bottleneck. My deployments are not expected to exceed 32 or 64 nodes.

Pozdrawiam / Regards / Med venlig hilsen
Tomasz Guziałek


2014-07-09 16:01 GMT+02:00 Adam Kawa <[email protected]>:

> Hi Tomek,
>
> You have 9.26GB in 4 nodes what is 2.315GB on average. What is your value
> of yarn.nodemanager.resource.memory-mb?
>
> You consume 1GB of RAM per container (8 containers running = 8GB of memory
> used). My idea is that, after running 8 containers (1 AM + 7 map tasks),
> you have only 315MB of available memory on each NodeManager. Therefore,
> when you request 1GB to get a container for #8 map task, there is no
> NodeManager than can give you a whole 1GB (despite having more than 1GB of
> aggregated memory on the cluster).
>
> To verify this, please check the value of
> yarn.nodemanager.resource.memory-mb.
>
> Thanks,
> Adam
>
> PS1.
> Just our of curiosity. What are your values of
> *yarn.nodemanager.resource.cpu-vcores* (is not it 2?)
> *yarn.resourcemanager.scheduler.class* (I assume that Fair Scheduler, but
> just to confirm. Could you have any non-default settings in your
> scheduler's configuration that limit the number of resources per user?)
> *yarn.nodemanager.linux-container-executor.resources-handler.class*
> ?
>
> PS2.
> "I am comparing M/R implementation with a custom one, where one node is
> dedicated for coordination and I utilize 4 slaves fully for computation."
>
> Note that this might not work on a larger scale, because "one node is
> dedicated for coordination" might become the bottleneck. This is one of a
> couple of reasons why YARN and original MapReduce at Google have decided to
> run coordination processes on slave nodes.
>
>
>
>
> 2014-07-09 9:47 GMT+02:00 Tomasz Guziałek <[email protected]>:
>
> Thank you for your assistance, Adam.
>>
>> Containers running | Memory used | Memory total | Memory reserved
>>                          8 |             8 GB |        9.26 GB
>> |                     0 B
>>
>> Seems like you are right: the ApplicationMaster is occupying one slot as
>> I have 8 containers running, but 7 map tasks.
>>
>> Again, I revised my information about m1.large instance on EC2. There are
>> only 2 cores available per node giving 4 computing units (ECU units
>> introduced by Amazon). So 8 slots at a time is expected. However,
>> scheduling AM on a slave node ruins my experiment. I am comparing M/R
>> implementation with a custom one, where one node is dedicated for
>> coordination and I utilize 4 slaves fully for computation. This one core
>> for AM is extending the execution time by a factor of 2. Does any one have
>> an idea how to have 8 map tasks running?
>>
>> Pozdrawiam / Regards / Med venlig hilsen
>> Tomasz Guziałek
>>
>>
>> 2014-07-09 0:56 GMT+02:00 Adam Kawa <[email protected]>:
>>
>> If you run an application (e.g. MapReduce job) on YARN cluster, first the
>>> Application Master will be is started on some slave node to coordinate the
>>> execution of all tasks within the job. The ApplicationMaster and tasks that
>>> belong to its application run in the containers controlled by the
>>> NodeManagers.
>>>
>>> Maybe, you simply run 8 containers on your YARN cluster and 1 container
>>> is consumed by MapReduce AppMaster and 7 containers are consumed by map
>>> tasks. But it seems not to be a root cause of you problem, because
>>> according to your settings you should be able to run 16 containers
>>> maximally.
>>>
>>> Another idea might be that your are bottlenecked by the amount of memory
>>> on the cluster (each container consumes memory) and despite having vcore(s)
>>> available, you can not launch new tasks. When you go to the ResourceManager
>>> Web UI, do you see that you utilize whole cluster memory?
>>>
>>>
>>>
>>> 2014-07-08 21:06 GMT+02:00 Tomasz Guziałek <[email protected]>:
>>>
>>> I was not precise when describing my cluster. I have 4 slave nodes and a
>>>> separate master node. The master has ResourceManager role (along with
>>>> JobHistory role) and the rest have NodeManager roles. If this really is an
>>>> ApplicationMaster, is it possible to schedule it on the master node? This
>>>> single waiting map task is doubling my execution time.
>>>>
>>>> Pozdrawiam / Regards / Med venlig hilsen
>>>> Tomasz Guziałek
>>>>
>>>>
>>>> 2014-07-08 18:42 GMT+02:00 Adam Kawa <[email protected]>:
>>>>
>>>> Is not your MapReduce AppMaster occupying one slot?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> > On 8 jul 2014, at 13:01, Tomasz Guziałek <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Hello all,
>>>>> >
>>>>> > I am running a 4-nodes CDH5 cluster on Amazon EC2 . The instances
>>>>> used are m1.large, so I have 4 cores (2 core x 2 unit) per node. My HBase
>>>>> table has 8 regions, so I expected at least 8 (if not 16) mapper tasks to
>>>>> run simultaneously. However, only 7 are running and 1 is waiting for an
>>>>> empty slot. Why this surprising number came up? I have checked that the
>>>>> regions are equally distributed on the region servers (2 per node).
>>>>> >
>>>>> > My properties in the job:
>>>>> > Configuration mapReduceConfiguration = HBaseConfiguration.create();
>>>>> > mapReduceConfiguration.set("hbase.client.max.perregion.tasks", "4");
>>>>> >
>>>>> mapReduceConfiguration.set("mapreduce.tasktracker.map.tasks.maximum", 
>>>>> "16");
>>>>> >
>>>>> > My properties in the CDH:
>>>>> > yarn.scheduler.minimum-allocation-vcores = 1
>>>>> > yarn.scheduler.maximum-allocation-vcores = 4
>>>>> >
>>>>> > Do I miss some property? Please share your experience.
>>>>> >
>>>>> > Best regards
>>>>> > Tomasz
>>>>>
>>>>
>>>>
>>>
>>
>

Re: The number of simultaneous map tasks is unexpected.

Reply via email to