[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

Varun Saxena (JIRA) Wed, 23 Nov 2016 23:29:04 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692477#comment-15692477
 ]


Varun Saxena commented on YARN-3884:
------------------------------------

Thanks [~bibinchundatt] for the patch.
The log changes made here are not related to your fix. Maybe we can do them in 
a different JIRA as well but they are trivial fixes so I am fine having them 
here as well.

Few nits
# For the log added below in FiCaSchedulerApp, we should have space before 
container on line 683. Also container information should be at the end or 
before node because lines after it correspond to reservation for application. 
Maybe something like Application xxx unreserved container xyz on node....
{code}
682             LOG.info("Application " + getApplicationId() + " unreserved "
683                 + " on node " + node + "container "
684                 + reservedContainer.getContainerId() + ", currently has "
685                 + reservedContainers.size() + " at priority "
686                 + schedulerKey.getPriority() + "; currentReservation "
687                 + this.attemptResourceUsage.getReserved() + " on 
node-label="
688                 + node.getPartition());
{code}
# For the other log too in FiCaSchedulerApp, put a comma before 
reservedContainer
# In the test added, in method createReservation we have a comment saying Wait 
for allocation and reservation. We are not waiting for anything here. Maybe say 
something along the lines of send 2 events which leads to reservation.

Other than that +1 from my side.

> RMContainerImpl transition from RESERVED to KILL apphistory status not updated
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3884
>                 URL: https://issues.apache.org/jira/browse/YARN-3884
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>         Environment: Suse11 Sp3
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>              Labels: oct16-easy
>         Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
> Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, 
> YARN-3884.0003.patch, YARN-3884.0004.patch
>
>
> Setup
> ===============
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===============
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0002_01_000013 Container Transitioned from NEW to 
> RESERVED
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
> used=<memory:2560, vCores:21> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.96875 
> absoluteUsedCapacity=0.96875 used=<memory:5632, vCores:31> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from NEW to 
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=AM Allocated Container        TARGET=SchedulerApp     
> RESULT=SUCCESS  APPID=application_1435849994778_0001    
> CONTAINERID=container_e24_1435849994778_0001_01_000014
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_e24_1435849994778_0001_01_000014 of capacity 
> <memory:512, vCores:1> on host host-10-19-92-117:64318, which has 6 
> containers, <memory:3072, vCores:14> used and <memory:0, vCores:2> available 
> after allocation
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1435849994778_0001_000001 
> container=Container: [ContainerId: 
> container_e24_1435849994778_0001_01_000014, NodeId: host-10-19-92-117:64318, 
> NodeHttpAddress: host-10-19-92-117:65321, Resource: <memory:512, vCores:1>, 
> Priority: 20, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:2560, vCores:5>, 
> usedCapacity=2.0846906, absoluteUsedCapacity=0.41666666, numApps=1, 
> numContainers=5 clusterResource=<memory:6144, vCores:32>
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=<memory:3072, vCores:6>, 
> usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
> used=<memory:6144, vCores:32> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,143 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_000014 Container Transitioned from 
> ACQUIRED to RUNNING
> 2015-07-02 20:45:32,213 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Null container completed...
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1435849994778_0002 
> on node: host-10-19-92-143:64318
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource=<memory:512, vCores:5> queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:3072, vCores:26>, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6 usedCapacity=2.0317461 absoluteUsedCapacity=0.8125 
> used=<memory:3072, vCores:26> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,178 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Skipping scheduling since node host-10-19-92-143:64318 is reserved by 
> application appattempt_1435849994778_0002_000001
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Application application_1435849994778_0002 unreserved  on node host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13>, currently has 0 at priority 20; 
> currentReservation <memory:0, vCores:0>
> 2015-07-02 20:45:33,704 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> QueueA used=<memory:2560, vCores:21> numContainers=5 user=dsperf 
> user-resources=<memory:2560, vCores:21>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> completedContainer container=Container: [ContainerId: 
> container_e24_1435849994778_0002_01_000013, NodeId: host-10-19-92-143:64318, 
> NodeHttpAddress: host-10-19-92-143:65321, Resource: <memory:512, vCores:5>, 
> Priority: 20, Token: null, ] queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,710 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> completedContainer queue=root usedCapacity=0.9166667 
> absoluteUsedCapacity=0.9166667 used=<memory:5632, vCores:27> 
> cluster=<memory:6144, vCores:32>
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting completed queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=<memory:2560, vCores:21>, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5
> 2015-07-02 20:45:33,711 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Application attempt appattempt_1435849994778_0002_000001 released container 
> container_e24_1435849994778_0002_01_000013 on node: host: 
> host-10-19-92-143:64318 #containers=5 available=<memory:512, vCores:3> 
> used=<memory:2560, vCores:13> with event: KILL
> {code}
> *Impact:*
> In application history server the status get updated to -1000 (INVALID)
> but the end time not updated so Elapsed Time always changes.
> Please check the snapshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

Reply via email to