[
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bibin A Chundatt updated YARN-3884:
-----------------------------------
Attachment: 0001-YARN-3884.patch
Attaching patch please do review
> Capacity scheduler for DROP_RESERVATION event container status not updated
> --------------------------------------------------------------------------
>
> Key: YARN-3884
> URL: https://issues.apache.org/jira/browse/YARN-3884
> Project: Hadoop YARN
> Issue Type: Bug
> Environment: Suse11 Sp3
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg,
> Elapsed Time.jpg
>
>
> Setup
> ===============
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===============
> 1.Submit apps to Queue 1 with 512 mb 1 core
> 2.Submit apps to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case
> {code}
> 2015-07-02 20:45:31,169 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to
> RESERVED
> 2015-07-02 20:45:31,170 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Reserved container application=application_1435849994778_0002
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>,
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1,
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625
> used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,170 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>,
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1,
> numContainers=6
> 2015-07-02 20:45:31,170 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=0.96875
> absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31>
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf
> OPERATION=AM Allocated Container TARGET=SchedulerApp
> RESULT=SUCCESS APPID=application_1435849994778_0001
> CONTAINERID=container_e24_1435849994778_0001_01_000014
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> Assigned container container_e24_1435849994778_0001_01_000014 of capacity
> <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6
> containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available
> after allocation
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> assignedContainer application attempt=appattempt_1435849994778_0001_000001
> container=Container: [ContainerId:
> container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318,
> NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>,
> Priority: 20, Token: null, ] queue=default: capacity=0.2,
> absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>,
> usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1,
> numContainers=5 clusterResource=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting assigned queue: root.default stats: default: capacity=0.2,
> absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>,
> usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0
> used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,143 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e24_1435849994778_0001_01_000014 Container Transitioned from
> ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Trying to fulfill reservation for application application_1435849994778_0002
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:32,174 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Reserved container application=application_1435849994778_0002
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>,
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1,
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,174 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Skipping scheduling since node host-10-19-92-143:64318 is reserved by
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:32,213 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_e24_1435849994778_0001_01_000014 Container Transitioned from
> ACQUIRED to RUNNING
> 2015-07-02 20:45:32,213 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Null container completed...
> 2015-07-02 20:45:33,178 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Trying to fulfill reservation for application application_1435849994778_0002
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:33,178 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Reserved container application=application_1435849994778_0002
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>,
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1,
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,178 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Skipping scheduling since node host-10-19-92-143:64318 is reserved by
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:33,704 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
> Application application_1435849994778_0002 unreserved on node host:
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3>
> used=<memory:2560, vCores:13>, currently has 0 at priority 20;
> currentReservation <memory:0, vCores:0>
> 2015-07-02 20:45:33,704 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf
> user-resources=<memory:2560, vCores:21>
> 2015-07-02 20:45:33,710 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> completedContainer container=Container: [ContainerId:
> container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318,
> NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>,
> Priority: 20, Token: null, ] queue=QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>,
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1,
> numContainers=5 cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,710 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> completedContainer queue=root usedCapacity=0.9166667
> absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27>
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue:
> Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4,
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>,
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1,
> numContainers=5
> 2015-07-02 20:45:33,711 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Application attempt appattempt_1435849994778_0002_000001 released container
> container_e24_1435849994778_0002_01_000013 on node: host:
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3>
> used=<memory:2560, vCores:13> with event: KILL
> {code}
> *Impact:*
> In application history server the status get updated to -1000 (INVALID)
> but the end time not updated so Elapsed Time always changes.
> Please check the snapshot attached
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)