[
https://issues.apache.org/jira/browse/YARN-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158390#comment-16158390
]
kyungwan nam commented on YARN-7177:
------------------------------------
I have checked that absoluteUsedCapacity is not used in
ProportionalCapacityPreemptionPolicy by YARN-3849, which is included in
hadoop-2.7.3.
so, there is no preemption problem in hadoop-2.7.3 or higher.
> AvailableMB, AvailableVCores in the QueueMetrics is not correct when there
> are nodes whose node-label is not default
> --------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-7177
> URL: https://issues.apache.org/jira/browse/YARN-7177
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.1
> Reporter: kyungwan nam
> Attachments: YARN-7177-branch-2.7.001.patch
>
>
> - default-node-label has total resource <memory:248832, vCores:144>
> - ‘label1’ node-label has total resource <memory:248832, vCores:144>
> - ‘large’ and ’small’ queues are respectively 50% and 50% of
> default-node-label capacity.
> - ‘label1’ queue is 100% of ‘label1’ node-label capacity.
> - an application using <memory:48128, vCores:13> is submitted to 'small' queue
> we could see that AvailableMB, AvailableVCores are not correct as follows.
> {code}
> {
> name: "Hadoop:service=ResourceManager,name=QueueMetrics,q0=root,q1=small",
> modelerType: "QueueMetrics,q0=root,q1=small",
> tag.Queue: "root.small",
> tag.Context: "yarn",
> tag.Hostname: "host1.com",
> running_0: 1,
> running_60: 0,
> running_300: 0,
> running_1440: 0,
> AppsSubmitted: 1,
> AppsRunning: 1,
> AppsPending: 0,
> AppsCompleted: 0,
> AppsKilled: 0,
> AppsFailed: 0,
> AllocatedMB: 48128,
> AllocatedVCores: 13,
> AllocatedContainers: 13,
> AggregateContainersAllocated: 17,
> AggregateContainersReleased: 4,
> AvailableMB: 200704,
> AvailableVCores: 131,
> PendingMB: 0,
> PendingVCores: 0,
> PendingContainers: 0,
> ReservedMB: 0,
> ReservedVCores: 0,
> ReservedContainers: 0,
> ActiveUsers: 0,
> ActiveApplications: 0
> },
> {code}
> I think it should be calculated based on default-node-label as follows.
> * AvailableMB = ( 248832 <default-node-label total resource> - 48128 <used
> resource> ) * 0.5 <small queue capacity>
> * AvailableVCores = ( 144 <default-node-label total resource> - 13 <used
> resource> ) * 0.5 <small queue capacity>
> we could see the another problem that absoluteUsedCapacity, usedCapacity are
> not correct in the log.
> {code}
> 2017-09-07 16:21:06,058 INFO capacity.LeafQueue
> (LeafQueue.java:releaseResource(1762)) - small used=<memory:48128, vCores:13>
> numContainers=13 user=test user-resources=<memory:48128, vCores:13>
> 2017-09-07 16:21:06,058 INFO capacity.LeafQueue
> (LeafQueue.java:completedContainer(1713)) - completedContainer
> container=Container: [ContainerId:
> container_e15_1504768325902_0001_01_000017, NodeId: host2.com:45454,
> NodeHttpAddress: host2.com:8042, Resource: <memory:4096, vCores:1>, Priority:
> 1073741826, Token: Token { kind: ContainerToken, service: 10.10.10.1:45454 },
> ] queue=small: capacity=0.5, absoluteCapacity=0.5,
> usedResources=<memory:48128, vCores:13>, usedCapacity=0.19341564,
> absoluteUsedCapacity=0.09670782, numApps=1, numContainers=13
> cluster=<memory:497664, vCores:288>
> {code}
> Those are calculated based on total resources for all node-labels.
> likewise, it should be default-node-label based as follows.
> * usedCapacity = 48128 <used resource> / ( 248832 <default-node-label total
> resource> * 0.5 <small queue capacity> = 0.38683127
> * absoluteUsedCapacity = 48128 <used resource> / 248832 <default-node-label
> total resource> = 0.19341563
> it makes me confused.
> but that’s not all. because the absoluteUsedCapacity is used in
> ProportionalCapacityPreemptionPolicy, wrong value can cause a problem with
> regards to preemption.
> {code}
> private TempQueue cloneQueues(CSQueue root, Resource clusterResources) {
> TempQueue ret;
> synchronized (root) {
> String queueName = root.getQueueName();
> float absUsed = root.getAbsoluteUsedCapacity();
> float absCap = root.getAbsoluteCapacity();
> float absMaxCap = root.getAbsoluteMaximumCapacity();
> boolean preemptionDisabled = root.getPreemptionDisabled();
> {code}
> it seems like this problem does not happen in the hadoop-2.8 or higher.
> but, we need to fix it for the hadoop-2.7.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]