Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes?

Ravi Prakash Thu, 10 Nov 2016 10:27:17 -0800

Is there a reason you want that behavior? I'm not sure you can get it
easily. Here's a link to the code that may be coming into play (depending
on your configuration) :
https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java#L1372


On Thu, Nov 10, 2016 at 1:57 AM, Rafał Radecki <[email protected]>
wrote:

> I have already used maximum-capacity for both queues (70 and 30) to limit
> their resource usage but it seems that this mechanism does not work on node
> level but rather on cluster level.
> We have samza tasks on the cluster and they run for a very long time so we
> cannot depend on the elasticity mechanism.
>
> 2016-11-10 10:31 GMT+01:00 Bibinchundatt <[email protected]>:
>
>> Hi Rafai,
>>
>>
>>
>> Probably the following 2 two option you can look into
>>
>> 1.       *Elasticity* - Free resources can be allocated to any queue
>> beyond it’s capacity. When there is demand for these resources from queues
>> running below capacity at a future point in time, as tasks scheduled on
>> these resources complete, they will be assigned to applications on queues
>> running below the capacity (pre-emption is not supported). This ensures
>> that resources are available in a predictable and elastic manner to queues,
>> thus preventing artifical silos of resources in the cluster which helps
>> utilization.
>>
>>
>>
>> http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn
>> -site/CapacityScheduler.html
>>
>>
>>
>>
>>
>> yarn.scheduler.capacity.<queue-path>.maximum-capacity
>>
>> Maximum queue capacity in percentage (%) as a float. This limits the
>> *elasticity* for applications in the queue. Defaults to -1 which
>> disables it.
>>
>>
>>
>> 2.       Preemption of containers.
>>
>>
>>
>>
>>
>> Regards
>>
>> Bibin
>>
>>
>>
>> *From:* Rafał Radecki [mailto:[email protected]]
>> *Sent:* 10 November 2016 17:26
>> *To:* Bibinchundatt
>> *Cc:* Ravi Prakash; user
>>
>> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to
>> nodes?
>>
>>
>>
>> We have 4 nodes and 4 large (~30GB each tasks), additionally we have
>> about 25 small (~2 GB each) tasks. All tasks can possibly be started in
>> random order.
>> On each node we have 50GB for yarn. So in case we start all 4 large tasks
>> at the beginning the are correctly scheduled to all 4 nodes.
>> But in case we first start all short tasks they all go to the first
>> cluster node and there is no free capacity on it. Then we try to start 4
>> large tasks but we only have resources from remaining 3 nodes available and
>> cannot start one of the large tasks.
>>
>>
>>
>> BR,
>>
>> Rafal.
>>
>>
>>
>> 2016-11-10 9:54 GMT+01:00 Bibinchundatt <[email protected]>:
>>
>> Hi Rafal!
>>
>> Is there a way to force yarn to use configured above thresholds (70% and
>> 30%) per node?
>>
>> -Currently we can’t specify threshold per node.
>>
>>
>>
>> As per your initial mail Yarn per node is ~50GB means all nodes resources
>> are same. Any usecase specifically for per node allocation based on
>> percentage?
>>
>>
>>
>>
>>
>> *From:* Rafał Radecki [mailto:[email protected]]
>> *Sent:* 10 November 2016 14:59
>> *To:* Ravi Prakash
>> *Cc:* user
>> *Subject:* Re: Yarn 2.7.3 - capacity scheduler container allocation to
>> nodes?
>>
>>
>>
>> Hi Ravi.
>>
>>
>>
>> I did not specify labels this time ;) I just created two queues as it is
>> visible in the configuration.
>>
>> Overall queues work but allocation of jobs is different then expected by
>> me as I wrote at the beginning.
>>
>>
>>
>> BR,
>>
>> Rafal.
>>
>>
>>
>> 2016-11-10 2:48 GMT+01:00 Ravi Prakash <[email protected]>:
>>
>> Hi Rafal!
>>
>> Have you been able to launch the job successfully first without
>> configuring node-labels? Do you really need node-labels? How much total
>> memory do you have on the cluster? Node labels are usually for specifying
>> special capabilities of the nodes (e.g. some nodes could have GPUs and your
>> application could request to be run on only the nodes which have GPUs)
>>
>> HTH
>>
>> Ravi
>>
>>
>>
>> On Wed, Nov 9, 2016 at 5:37 AM, Rafał Radecki <[email protected]>
>> wrote:
>>
>> Hi All.
>>
>>
>>
>> I have a 4 node cluster on which I run yarn. I created 2 queues "long"
>> and "short", first with 70% resource allocation, the second with 30%
>> allocation. Both queues are configured on all available nodes by default.
>>
>>
>>
>> My memory for yarn per node is ~50GB. Initially I thought that when I
>> will run tasks in "short" queue yarn will allocate them on all nodes using
>> 30% of the memory on every node. So for example if I run 20 tasks, 2GB each
>> (40GB summary), in short queue:
>>
>> - ~7 first will be scheduled on node1 (14GB total, 30% out of 50GB
>> available on this node for "short" queue -> 15GB)
>> - next ~7 tasks will be scheduled on node2
>>
>> - ~6 remaining tasks will be scheduled on node3
>>
>> - yarn on node4 will not use any resources assigned to "short" queue.
>>
>> But this seems not to be the case. At the moment I see that all tasks are
>> started on node1 and other nodes have no tasks started.
>>
>>
>>
>> I attached my yarn-site.xml and capacity-scheduler.xml.
>>
>>
>>
>> Is there a way to force yarn to use configured above thresholds (70% and
>> 30%) per node and not per cluster as a whole? I would like to get a
>> configuration in which on every node 70% is always available for "short"
>> queue, 70% for "long" queue and in case any resources are free for a
>> particular queue they are not used by other queues. Is it possible?
>>
>>
>>
>> BR,
>>
>> Rafal.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Yarn 2.7.3 - capacity scheduler container allocation to nodes?

Reply via email to