[jira] [Commented] (YARN-3884) App History status not updated when RMContainer transitions from RESERVED to KILLED

Joep Rottinghuis (JIRA) Mon, 13 Mar 2017 09:50:07 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907825#comment-15907825
 ]


Joep Rottinghuis commented on YARN-3884:
----------------------------------------

While the load on the RM daemon to write to atsv2 is a valid point, and so is 
the point that schedulers do not _have_ to implement reservations with an 
actual container ID (this may be an internal implementation detail), there is 
another aspect to consider here.
If a reservation counts towards the fair share and/or capacity calculation, 
then it should also count towards the cost of an application in chargeback 
derived from atsv2 data. When talking to [~jlowe] on Friday he made a good 
point that a reservation should count towards the min and max usage 
calculations for a user, because it essentially blocks others from using it. In 
that sense it doesn't matter that the user reserves with the intend to combine 
the reservation with additional resources to eventually launch a container, or 
just hangs on to the reservation. The effect is the same from the perspective 
that other users cannot use these resources. From that perspective, reserved 
containers should be charged against the app.


> App History status not updated when RMContainer transitions from RESERVED to 
> KILLED
> -----------------------------------------------------------------------------------
>
>                 Key: YARN-3884
>                 URL: https://issues.apache.org/jira/browse/YARN-3884
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>         Environment: Suse11 Sp3
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>              Labels: oct16-easy
>         Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
> Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, 
> YARN-3884.0003.patch, YARN-3884.0004.patch, YARN-3884.0005.patch, 
> YARN-3884.0006.patch, YARN-3884.0007.patch, YARN-3884.0008.patch
>
>
> Setup
> ===============
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===============
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to 
> RESERVED
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
> used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.96875 
> absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to 
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=AM Allocated Container        TARGET=SchedulerApp     
> RESULT=SUCCESS  APPID=application_1435849994778_0001    
> CONTAINERID=container_e24_1435849994778_0001_01_000014
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_e24_1435849994778_0001_01_000014 of capacity 
> <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6 
> containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available 
> after allocation
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1435849994778_0001_000001 
> container=Container: [ContainerId: 
> container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318, 
> NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, 
> Priority: 20, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>, 
> usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, 
> numContainers=5 clusterResource=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>, 
> usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
> used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,143 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ACQUIRED to RUNNING
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Null container completed...
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Application application_1435849994778_0002 unreserved  on node host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13>, currently has 0 at priority 20; 
> currentReservation <memory:0, vCores:0>
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf 
> user-resources=<memory:2560, vCores:21>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318, 
> NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>, 
> Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.9166667 
> absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1435849994778_0002_000001 released container 
> container_e24_1435849994778_0002_01_000013 on node: host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13> with event: KILL
> {code}
> *Impact:*
> In application history server the status get updated to -1000 (INVALID)
> but the end time not updated so Elapsed Time always changes.
> Please check the snapshot attached



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-3884) App History status not updated when RMContainer transitions from RESERVED to KILLED

Reply via email to