[jira] [Comment Edited] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298192#comment-16298192 ] zhangshilong edited comment on YARN-7672 at 12/20/17 9:55 AM: -- [~cxcw] I use two daemons deployed in different two hosts. I start 1000~5000 threads to simulate NM/AM,because I need to simulate 1 apps running with 1 NM nodes. one task use 1vcore and 2304Mb. And one NM has 50 vcore and 50*2304 Mb resources. All of NM and AM simulators are all cpu type of task. So cpu.load will go up to 100+ (only 32 cores) And as we know, Scheduler will also use one process for allocating resources. was (Author: zsl2007): [~cxcw] I use two daemons deployed on different two hosts. I start 1000~5000 threads to simulate NM/AM,because I need to simulate 1 apps running with 1 NM nodes. one task use 1vcore and 2304Mb. And one NM has 50 vcore and 50*2304 Mb resources. All of NM and AM simulators are all cpu type of task. So cpu.load will go up to 100+ (only 32 cores) And as we know, Scheduler will also use one process for allocating resources. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298192#comment-16298192 ] zhangshilong commented on YARN-7672: [~cxcw] I use two daemons deployed on different two hosts. I start 1000~5000 threads to simulate NM/AM,because I need to simulate 1 apps running with 1 NM nodes. one task use 1vcore and 2304Mb. And one NM has 50 vcore and 50*2304 Mb resources. All of NM and AM simulators are all cpu type of task. So cpu.load will go up to 100+ (only 32 cores) And as we know, Scheduler will also use one process for allocating resources. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-7672: --- Attachment: YARN-7672.patch > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-7672: --- Description: Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler pressure test. Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very high to 100+. I thought that will affect performance evaluation of scheduler. So I thought to separate the scheduler from the simulator. I start a real RM. Then SLS will register nodes to RM,And submit apps to RM using RM RPC. was: Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler pressure test. we start 2000+ threads to simulate NM and AM. So cpu.load very high to 100+. I thought that will affect performance evaluation of scheduler. So I thought to separate the scheduler from the simulator. I start a real RM. Then SLS will register nodes to RM,And submit apps to RM using RM RPC. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
zhangshilong created YARN-7672: -- Summary: hadoop-sls can not simulate huge scale of YARN Key: YARN-7672 URL: https://issues.apache.org/jira/browse/YARN-7672 Project: Hadoop YARN Issue Type: Improvement Reporter: zhangshilong Assignee: zhangshilong Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler pressure test. we start 2000+ threads to simulate NM and AM. So cpu.load very high to 100+. I thought that will affect performance evaluation of scheduler. So I thought to separate the scheduler from the simulator. I start a real RM. Then SLS will register nodes to RM,And submit apps to RM using RM RPC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7214) duplicated container completed To AM
[ https://issues.apache.org/jira/browse/YARN-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171396#comment-16171396 ] zhangshilong edited comment on YARN-7214 at 9/19/17 9:38 AM: - !screenshot-1.png! generally, 1、 NM complete one container(c) and send to RM 2、RM sent c to AM, tell AM c is completed. 3、RM sent c to NM, tell NM c can be removed from NM. If RM restart before step 3, c will be duplicated container completed to AM. was (Author: zsl2007): !screenshot-1.png! generally, 1、 NM complete one container(c) and send to RM 2、RM sent c to AM, tell AM c is completed. 3、RM sent c to NM, tell NM c can be removed from NM. If RM restart before step 3, c will be in in context of NM for ever. If RM restart again, c will be duplicated container completed to AM. > duplicated container completed To AM > > > Key: YARN-7214 > URL: https://issues.apache.org/jira/browse/YARN-7214 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 3.0.0-alpha3 > Environment: hadoop 2.7.1 rm recovery and nm recovery enabled >Reporter: zhangshilong > Attachments: screenshot-1.png > > > env: hadoop 2.7.1 with rm recovery and nm recovery enabled > case: > spark app(app1) running least one container(named c1) in NM1. > 1、NM1 crashed,and RM found NM1 expired in 10 minutes. > 2、RM will remove all containers in NM1(RMNodeImpl). and app1 will receive > c1 completed message.But RM can not send c1(to be removed) to NM1 because NM1 > lost. > 3、NM1 restart and register with RM(c1 in register request),but RM found NM1 > is lost and will not handle containers from NM1. > 4、NM1 will not heartbeat with c1(c1 not in heartbeat request). So c1 will > not removed from context of NM1. > 5、 RM restart, NM1 re register with RM。And c1 will be handled and recovered. > RM will send c1 complted message to AM of app1. So, app1 received duplicated > c1. > once spark AM receive one container completed from RM, it will allocate one > new container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7214) duplicated container completed To AM
[ https://issues.apache.org/jira/browse/YARN-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171396#comment-16171396 ] zhangshilong commented on YARN-7214: !screenshot-1.png! generally, 1、 NM complete one container(c) and send to RM 2、RM sent c to AM, tell AM c is completed. 3、RM sent c to NM, tell NM c can be removed from NM. If RM restart before step 3, c will be in in context of NM for ever. If RM restart again, c will be duplicated container completed to AM. > duplicated container completed To AM > > > Key: YARN-7214 > URL: https://issues.apache.org/jira/browse/YARN-7214 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 3.0.0-alpha3 > Environment: hadoop 2.7.1 rm recovery and nm recovery enabled >Reporter: zhangshilong > Attachments: screenshot-1.png > > > env: hadoop 2.7.1 with rm recovery and nm recovery enabled > case: > spark app(app1) running least one container(named c1) in NM1. > 1、NM1 crashed,and RM found NM1 expired in 10 minutes. > 2、RM will remove all containers in NM1(RMNodeImpl). and app1 will receive > c1 completed message.But RM can not send c1(to be removed) to NM1 because NM1 > lost. > 3、NM1 restart and register with RM(c1 in register request),but RM found NM1 > is lost and will not handle containers from NM1. > 4、NM1 will not heartbeat with c1(c1 not in heartbeat request). So c1 will > not removed from context of NM1. > 5、 RM restart, NM1 re register with RM。And c1 will be handled and recovered. > RM will send c1 complted message to AM of app1. So, app1 received duplicated > c1. > once spark AM receive one container completed from RM, it will allocate one > new container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7214) duplicated container completed To AM
[ https://issues.apache.org/jira/browse/YARN-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-7214: --- Attachment: screenshot-1.png > duplicated container completed To AM > > > Key: YARN-7214 > URL: https://issues.apache.org/jira/browse/YARN-7214 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 3.0.0-alpha3 > Environment: hadoop 2.7.1 rm recovery and nm recovery enabled >Reporter: zhangshilong > Attachments: screenshot-1.png > > > env: hadoop 2.7.1 with rm recovery and nm recovery enabled > case: > spark app(app1) running least one container(named c1) in NM1. > 1、NM1 crashed,and RM found NM1 expired in 10 minutes. > 2、RM will remove all containers in NM1(RMNodeImpl). and app1 will receive > c1 completed message.But RM can not send c1(to be removed) to NM1 because NM1 > lost. > 3、NM1 restart and register with RM(c1 in register request),but RM found NM1 > is lost and will not handle containers from NM1. > 4、NM1 will not heartbeat with c1(c1 not in heartbeat request). So c1 will > not removed from context of NM1. > 5、 RM restart, NM1 re register with RM。And c1 will be handled and recovered. > RM will send c1 complted message to AM of app1. So, app1 received duplicated > c1. > once spark AM receive one container completed from RM, it will allocate one > new container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7214) duplicated container completed To AM
[ https://issues.apache.org/jira/browse/YARN-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171322#comment-16171322 ] zhangshilong commented on YARN-7214: in my thought, containers in recentlyStoppedContainers can be removed from NMContext if NM heartbeat normally with RM. > duplicated container completed To AM > > > Key: YARN-7214 > URL: https://issues.apache.org/jira/browse/YARN-7214 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 3.0.0-alpha3 > Environment: hadoop 2.7.1 rm recovery and nm recovery enabled >Reporter: zhangshilong > > env: hadoop 2.7.1 with rm recovery and nm recovery enabled > case: > spark app(app1) running least one container(named c1) in NM1. > 1、NM1 crashed,and RM found NM1 expired in 10 minutes. > 2、RM will remove all containers in NM1(RMNodeImpl). and app1 will receive > c1 completed message.But RM can not send c1(to be removed) to NM1 because NM1 > lost. > 3、NM1 restart and register with RM(c1 in register request),but RM found NM1 > is lost and will not handle containers from NM1. > 4、NM1 will not heartbeat with c1(c1 not in heartbeat request). So c1 will > not removed from context of NM1. > 5、 RM restart, NM1 re register with RM。And c1 will be handled and recovered. > RM will send c1 complted message to AM of app1. So, app1 received duplicated > c1. > once spark AM receive one container completed from RM, it will allocate one > new container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7214) duplicated container completed To AM
[ https://issues.apache.org/jira/browse/YARN-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171270#comment-16171270 ] zhangshilong commented on YARN-7214: 3. {code:java} public static class AddNodeTransition implements SingleArcTransition { @Override public void transition(RMNodeImpl rmNode, RMNodeEvent event) { // Inform the scheduler RMNodeStartedEvent startEvent = (RMNodeStartedEvent) event; List containers = null; NodeId nodeId = rmNode.nodeId; RMNode previousRMNode = rmNode.context.getInactiveRMNodes().remove(nodeId); if (previousRMNode != null) { rmNode.updateMetricsForRejoinedNode(previousRMNode.getState()); } else { NodeId unknownNodeId = NodesListManager.createUnknownNodeId(nodeId.getHost()); previousRMNode = rmNode.context.getInactiveRMNodes().remove(unknownNodeId); if (previousRMNode != null) { ClusterMetrics.getMetrics().decrDecommisionedNMs(); } // Increment activeNodes explicitly because this is a new node. ClusterMetrics.getMetrics().incrNumActiveNodes(); containers = startEvent.getNMContainerStatuses(); if (containers != null && !containers.isEmpty()) { for (NMContainerStatus container : containers) { if (container.getContainerState() == ContainerState.RUNNING || container.getContainerState() == ContainerState.SCHEDULED) { rmNode.launchedContainers.add(container.getContainerId()); } } } } if (null != startEvent.getRunningApplications()) { for (ApplicationId appId : startEvent.getRunningApplications()) { handleRunningAppOnNode(rmNode, rmNode.context, appId, rmNode.nodeId); } } rmNode.context.getDispatcher().getEventHandler() .handle(new NodeAddedSchedulerEvent(rmNode, containers)); rmNode.context.getDispatcher().getEventHandler().handle( new NodesListManagerEvent( NodesListManagerEventType.NODE_USABLE, rmNode)); } } {code} 4、 in NodeStatusUpdaterImpl.java before register: getNMContainerStatuses will be called. So completedContainer will be put into recentlyStoppedContainers. in register request: completed containers will be sent to RM. {code:java} public void addCompletedContainer(ContainerId containerId) { synchronized (recentlyStoppedContainers) { removeVeryOldStoppedContainersFromCache(); if (!recentlyStoppedContainers.containsKey(containerId)) { recentlyStoppedContainers.put(containerId, System.currentTimeMillis() + durationToTrackStoppedContainers); } } } {code} normal heartbeat, getContainerStatuses is called. So completed container will not be put into containerStatuses beacause it is in recentlyStoppedContainers. So completed container will not be sent to RM. {code:java} protected List getContainerStatuses() throws IOException { List containerStatuses = new ArrayList(); for (Container container : this.context.getContainers().values()) { ContainerId containerId = container.getContainerId(); ApplicationId applicationId = containerId.getApplicationAttemptId() .getApplicationId(); org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); if (containerStatus.getState() == ContainerState.COMPLETE) { if (isApplicationStopped(applicationId)) { if (LOG.isDebugEnabled()) { LOG.debug(applicationId + " is completing, " + " remove " + containerId + " from NM context."); } context.getContainers().remove(containerId); pendingCompletedContainers.put(containerId, containerStatus); } else { if (!isContainerRecentlyStopped(containerId)) { pendingCompletedContainers.put(containerId, containerStatus); } } // Adding to finished containers cache. Cache will keep it around at // least for #durationToTrackStoppedContainers duration. In the // subsequent call to stop container it will get removed from cache. addCompletedContainer(containerId); } else { containerStatuses.add(containerStatus); } } containerStatuses.addAll(pendingCompletedContainers.values()); if (LOG.isDebugEnabled()) { LOG.debug("Sending out " + containerStatuses.size() + " container statuses: " + containerStatuses); } return containerStatuses; } {code} > duplicated container completed To AM > > > Key: YARN-7214 > URL: https://issues.apache.org/jira/browse/YARN-7214 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 3.
[jira] [Created] (YARN-7214) duplicated container completed To AM
zhangshilong created YARN-7214: -- Summary: duplicated container completed To AM Key: YARN-7214 URL: https://issues.apache.org/jira/browse/YARN-7214 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha3, 2.7.1 Environment: hadoop 2.7.1 rm recovery and nm recovery enabled Reporter: zhangshilong env: hadoop 2.7.1 with rm recovery and nm recovery enabled case: spark app(app1) running least one container(named c1) in NM1. 1、NM1 crashed,and RM found NM1 expired in 10 minutes. 2、RM will remove all containers in NM1(RMNodeImpl). and app1 will receive c1 completed message.But RM can not send c1(to be removed) to NM1 because NM1 lost. 3、NM1 restart and register with RM(c1 in register request),but RM found NM1 is lost and will not handle containers from NM1. 4、NM1 will not heartbeat with c1(c1 not in heartbeat request). So c1 will not removed from context of NM1. 5、 RM restart, NM1 re register with RM。And c1 will be handled and recovered. RM will send c1 complted message to AM of app1. So, app1 received duplicated c1. once spark AM receive one container completed from RM, it will allocate one new container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-4090) Make Collections.sort() more efficient by caching resource usage
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137944#comment-16137944 ] zhangshilong edited comment on YARN-4090 at 8/23/17 6:17 AM: - [~dan...@cloudera.com] [~yufeigu] Never mind. It is my fault. was (Author: zsl2007): [~dan...@cloudera.com] [~yufeigu] Never mind. It is my fault. I patch v7 In FSAppAttmept.java function "containerCompleted" can be called by RM preemption,AM release and NM release. RM preemption is considered in Patch v7. But AM and NM may also release same container. So in my thought, {code:java} // Remove from the list of containers RMContainer removedContainer = liveContainers.remove(rmContainer.getContainerId()); if(removedContainer != null){ this.fsQueue.decResourceUsage(removedContainer.getAllocatedResource()); } {code} > Make Collections.sort() more efficient by caching resource usage > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090.007.patch, > YARN-4090-preview.patch, YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient by caching resource usage
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137944#comment-16137944 ] zhangshilong commented on YARN-4090: [~dan...@cloudera.com] [~yufeigu] Never mind. It is my fault. I patch v7 In FSAppAttmept.java function "containerCompleted" can be called by RM preemption,AM release and NM release. RM preemption is considered in Patch v7. But AM and NM may also release same container. So in my thought, {code:java} // Remove from the list of containers RMContainer removedContainer = liveContainers.remove(rmContainer.getContainerId()); if(removedContainer != null){ this.fsQueue.decResourceUsage(removedContainer.getAllocatedResource()); } {code} > Make Collections.sort() more efficient by caching resource usage > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090.007.patch, > YARN-4090-preview.patch, YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient by caching resource usage
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131643#comment-16131643 ] zhangshilong commented on YARN-4090: I am very sorry. I am always working on YARN project.But Job is busy so I had no time to finish the patch. I will try my best to finish this before 2017.10.1. > Make Collections.sort() more efficient by caching resource usage > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4752) FairScheduler should preempt for a ResourceRequest and all preempted containers should be on the same node
[ https://issues.apache.org/jira/browse/YARN-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885081#comment-15885081 ] zhangshilong commented on YARN-4752: [~kasha] I found one problem. In FSLeafQueue: I think resourceUsage of app should not be changed in assignContainer because FairShareComparator uses resourceUsage to sort Apps. {code:java} private TreeSet fetchAppsWithDemand() { TreeSet pendingForResourceApps = new TreeSet<>(policy.getComparator()); readLock.lock(); try { for (FSAppAttempt app : runnableApps) { Resource pending = app.getAppAttemptResourceUsage().getPending(); if (!pending.equals(none())) { pendingForResourceApps.add(app); } } } finally { readLock.unlock(); } return pendingForResourceApps; } {code} But In FSPreemptionThread run->preemptContainers->app.trackContainerForPreemption preemptedResources of app will be changed without FairScheduler Lock. So getResourceUsage of App will be changed in function: assignContainer in FSLeafQueue. {code:java} @Override public Resource getResourceUsage() { /* * getResourcesToPreempt() returns zero, except when there are containers * to preempt. Avoid creating an object in the common case. */ return getPreemptedResources().equals(Resources.none()) ? getCurrentConsumption() : Resources.subtract(getCurrentConsumption(), getPreemptedResources()); } {code} > FairScheduler should preempt for a ResourceRequest and all preempted > containers should be on the same node > -- > > Key: YARN-4752 > URL: https://issues.apache.org/jira/browse/YARN-4752 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-4752-1.patch, yarn-4752.2.patch, yarn-4752.3.patch, > yarn-4752.4.patch, yarn-4752.4.patch, > YARN-4752.FairSchedulerPreemptionOverhaul.pdf, yarn-6076-branch-2.1.patch > > > A number of issues have been reported with respect to preemption in > FairScheduler along the lines of: > # FairScheduler preempts resources from nodes even if the resultant free > resources cannot fit the incoming request. > # Preemption doesn't preempt from sibling queues > # Preemption doesn't preempt from sibling apps under the same queue that is > over its fairshare > # ... > Filing this umbrella JIRA to group all the issues together and think of a > comprehensive solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885072#comment-15885072 ] zhangshilong commented on YARN-4090: [~yufeigu] I found one problem when doing this issue. In FSLeafQueue: I think resourceUsage of app should not be changed in assignContainer because FairShareComparator uses resourceUsage to sort Apps. {code:java} private TreeSet fetchAppsWithDemand() { TreeSet pendingForResourceApps = new TreeSet<>(policy.getComparator()); readLock.lock(); try { for (FSAppAttempt app : runnableApps) { Resource pending = app.getAppAttemptResourceUsage().getPending(); if (!pending.equals(none())) { pendingForResourceApps.add(app); } } } finally { readLock.unlock(); } return pendingForResourceApps; } {code} But In FSPreemptionThread run->preemptContainers->app.trackContainerForPreemption preemptedResources of app will be changed without FairScheduler Lock. So getResourceUsage of App will be changed in function: assignContainer in FSLeafQueue. {code:java} @Override public Resource getResourceUsage() { /* * getResourcesToPreempt() returns zero, except when there are containers * to preempt. Avoid creating an object in the common case. */ return getPreemptedResources().equals(Resources.none()) ? getCurrentConsumption() : Resources.subtract(getCurrentConsumption(), getPreemptedResources()); } {code} > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863277#comment-15863277 ] zhangshilong commented on YARN-4090: Thanks [~yufeigu] for remind for more information. YARN-4691 is about the same thing for ResourceUsage. this JIRA will solve the problem mentioned in YARN-4691. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859019#comment-15859019 ] zhangshilong commented on YARN-4090: Thanks [~yufeigu]. when application finishes or its tasks finish, FSParentQueue and FSLeafQueue should update resourceUsage. Even in Preempte, resourceUsage should be updated. In [~xinxianyin]'s patch YARN-4090.003.patch,Preempte and tasks finish Have been considered. When creating the patch file, one of my commits is ignored by mistake. In my thought, resourceUsage in FSParentQueue and FSLeafQueue will be updated while allocating, taskComplete and Preempte. As Messages from QA, I found unittests are needed, So I will add unitests for calculating resourceUsage. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-4090: --- Attachment: YARN-4090.006.patch > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090.006.patch, YARN-4090-preview.patch, > YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855267#comment-15855267 ] zhangshilong commented on YARN-4090: so sorry for whitespace.. A new patch will be submitted. I think there is no need for more unitTests。What do you think? [~yufeigu] > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090-preview.patch, YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-4090: --- Attachment: YARN-4090.005.patch > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090.005.patch, YARN-4090-preview.patch, YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851288#comment-15851288 ] zhangshilong commented on YARN-4090: Thanks [~yufeigu]. I will submit the new patch as soon as possible. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: sampling1.jpg, sampling2.jpg, YARN-4090.001.patch, > YARN-4090.002.patch, YARN-4090.003.patch, YARN-4090.004.patch, > YARN-4090-preview.patch, YARN-4090-TestResult.pdf > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5188) FairScheduler performance bug
[ https://issues.apache.org/jira/browse/YARN-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851286#comment-15851286 ] zhangshilong commented on YARN-5188: [~chenfolin] Good idea! and how is the performance of this patch? > FairScheduler performance bug > - > > Key: YARN-5188 > URL: https://issues.apache.org/jira/browse/YARN-5188 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.5.0 >Reporter: ChenFolin > Attachments: YARN-5188-1.patch > > > My Hadoop Cluster has recently encountered a performance problem. Details as > Follows. > There are two point which can cause this performance issue. > 1: application sort before assign container at FSLeafQueue. TreeSet is not > the best, Why not keep orderly ? and then we can use binary search to help > keep orderly when a application's resource usage has changed. > 2: queue sort and assignContainerPreCheck will lead to compute all leafqueue > resource usage ,Why can we store the leafqueue usage at memory and update it > when assign container op release container happen? > >The efficiency of assign container in the Resourcemanager may fall > when the number of running and pending application grows. And the fact is the > cluster has too many PendingMB or PengdingVcore , and the Cluster > current utilization rate may below 20%. >I checked the resourcemanager logs, I found that every assign > container may cost 5 ~ 10 ms, but just 0 ~ 1 ms at usual time. > >I use TestFairScheduler to reproduce the scene: > >Just one queue: root.defalut > 10240 apps. > >assign container avg time: 6753.9 us ( 6.7539 ms) > apps sort time (FSLeafQueue : Collections.sort(runnableApps, > comparator); ): 4657.01 us ( 4.657 ms ) > compute LeafQueue Resource usage : 905.171 us ( 0.905171 ms ) > > When just root.default, one assign container op contains : ( one apps > sort op ) + 2 * ( compute leafqueue usage op ) >According to the above situation, I think the assign container op has > a performance problem . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong reassigned YARN-4090: -- Assignee: zhangshilong > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: zhangshilong > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, > YARN-4090.004.patch, sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787353#comment-15787353 ] zhangshilong commented on YARN-4090: no problem, thank you very much for your patch, a great help to me. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, > YARN-4090.004.patch, sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-4090: --- Attachment: YARN-4090.004.patch fix 2.6 deadlock in FSParentQueue.java > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, > YARN-4090.004.patch, sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787258#comment-15787258 ] zhangshilong commented on YARN-4090: I see. In branch-2.6, FSParentQueue.java: will lock FSQueue. But the code changes in version 2.7.1 {code:java} @Override public synchronized List getQueueUserAclInfo( UserGroupInformation user) { List userAcls = new ArrayList(); // Add queue acls userAcls.add(getUserAclInfo(user)); // Add children queue acls for (FSQueue child : childQueues) { userAcls.addAll(child.getQueueUserAclInfo(user)); } return userAcls; } {code} > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, > sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6045) apps/queues that have no pending containers will still affect the efficiency of scheduling
[ https://issues.apache.org/jira/browse/YARN-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-6045: --- Environment: yarn: 2.7.1 release jdk 1.7 kernel:2.6.32-431.20.3.el6 was: jdk 1.7 kernel:2.6.32-431.20.3.el6 > apps/queues that have no pending containers will still affect the efficiency > of scheduling > -- > > Key: YARN-6045 > URL: https://issues.apache.org/jira/browse/YARN-6045 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 > Environment: yarn: 2.7.1 release > jdk 1.7 > kernel:2.6.32-431.20.3.el6 >Reporter: zhangshilong >Assignee: zhangshilong > > Sorting queues/apps consumes a significant amount of time during a single > container allocation. > Each time a container is assigned, all queues / apps are sorted by hierarchy. > In practice, many queues / apps without pending container do not need to > participate in the sort. > Without the need for resources, apps / queues do not participate in sorting, > scheduling performance will increase a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6045) apps/queues that have no pending containers will still affect the efficiency of scheduling
zhangshilong created YARN-6045: -- Summary: apps/queues that have no pending containers will still affect the efficiency of scheduling Key: YARN-6045 URL: https://issues.apache.org/jira/browse/YARN-6045 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.7.1 Environment: jdk 1.7 kernel:2.6.32-431.20.3.el6 Reporter: zhangshilong Assignee: zhangshilong Sorting queues/apps consumes a significant amount of time during a single container allocation. Each time a container is assigned, all queues / apps are sorted by hierarchy. In practice, many queues / apps without pending container do not need to participate in the sort. Without the need for resources, apps / queues do not participate in sorting, scheduling performance will increase a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784417#comment-15784417 ] zhangshilong commented on YARN-4090: would you please tell me yarn version you used? In trunk: FairScheduler.getQueueUserAclInfo will not lock the FSQueue object. FSQueue object will be locked only when decResourceUsage or incrResourceUsage. FairScheduler: {code:java} @Override public List getQueueUserAclInfo() { UserGroupInformation user; try { user = UserGroupInformation.getCurrentUser(); } catch (IOException ioe) { return new ArrayList(); } return queueMgr.getRootQueue().getQueueUserAclInfo(user); } {code} FSParentQueue.java {code:java} @Override public List getQueueUserAclInfo(UserGroupInformation user) { List userAcls = new ArrayList<>(); // Add queue acls userAcls.add(getUserAclInfo(user)); // Add children queue acls readLock.lock(); try { for (FSQueue child : childQueues) { userAcls.addAll(child.getQueueUserAclInfo(user)); } } finally { readLock.unlock(); } return userAcls; } {code} > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, > sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15784392#comment-15784392 ] zhangshilong commented on YARN-4090: [~xinxianyin] [~yufeigu] This optimization works in our environment very well, I hope to continue this issue. > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > YARN-4090.001.patch, YARN-4090.002.patch, YARN-4090.003.patch, sampling1.jpg, > sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5969) FairShareComparator: Cache value of getResourceUsage for better performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782248#comment-15782248 ] zhangshilong commented on YARN-5969: Thanks [~yufeigu] for advice and review and [~kasha] for commit. YARN scale reaches nearly 4000 in our company, Fairscheduler performance encountered many problems, I hope to submit more optimizations to the community. > FairShareComparator: Cache value of getResourceUsage for better performance > --- > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, > apprunning_after.png, apprunning_before.png, > containerAllocatedDelta_before.png, containerAllocated_after.png, > pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15779799#comment-15779799 ] zhangshilong commented on YARN-5969: My fault, I mistaken the git address:https://github.com/apache/hadoop-common.git. I submitted a new patch using git address: https://github.com/apache/hadoop.git. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, > apprunning_after.png, apprunning_before.png, > containerAllocatedDelta_before.png, containerAllocated_after.png, > pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Attachment: YARN-5969.patch > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, 20161222.patch, YARN-5969.patch, > apprunning_after.png, apprunning_before.png, > containerAllocatedDelta_before.png, containerAllocated_after.png, > pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Attachment: 20161222.patch > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, 20161222.patch, apprunning_after.png, > apprunning_before.png, containerAllocatedDelta_before.png, > containerAllocated_after.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769268#comment-15769268 ] zhangshilong commented on YARN-5969: Thanks yufei Gu for the reminder, I will improve my patch soon. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, apprunning_after.png, > apprunning_before.png, containerAllocatedDelta_before.png, > containerAllocated_after.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766471#comment-15766471 ] zhangshilong edited comment on YARN-5969 at 12/21/16 8:34 AM: -- ContainerAllocated picture means container allocation per minute. After patch, Container allocation per minute improves about 50%. obviously, 500 apps finish faster after patch. was (Author: zsl2007): ContainerAllocated picture means container allocation per minute. After patch, Container allocation per minute improves about 50%. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, apprunning_after.png, > apprunning_before.png, containerAllocatedDelta_before.png, > containerAllocated_after.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766471#comment-15766471 ] zhangshilong commented on YARN-5969: ContainerAllocated picture means container allocation per minute. After patch, Container allocation per minute improves about 50%. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, apprunning_after.png, > apprunning_before.png, containerAllocatedDelta_before.png, > containerAllocated_after.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Attachment: containerAllocated_after.png apprunning_after.png > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, apprunning_after.png, > apprunning_before.png, containerAllocatedDelta_before.png, > containerAllocated_after.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Attachment: pending_before.png pending_after.png containerAllocatedDelta_before.png apprunning_before.png > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch, apprunning_before.png, > containerAllocatedDelta_before.png, pending_after.png, pending_before.png > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766437#comment-15766437 ] zhangshilong commented on YARN-5969: Test case: 500 app,3000 nm nodes queue: parent queue number: 100 leaf queue number per parent queue: 5 500 apps submitted to 155 leaf queues. Average queue contains 4 apps. all apps are mapreduce job. One job contains 325 mapper and 44 reducer. Every mapper/reducer does: sleep 20 seconds. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong >Assignee: zhangshilong > Attachments: 20161206.patch > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Description: in FairShareComparator class, the performance of function getResourceUsage() is very poor. It will be executed above 100,000,000 times per second. In our scene, It takes 20 seconds per minute. A simple solution is to reduce call counts of the function. was: in FairShareComparator.java, the performance of function getResourceUsage() is very poor. It will be executed above 100,000,000 times per second. In our scene, It takes 20 seconds per minute. A simple solution is to reduce call counts of the function. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong > Attachments: 20161206.patch > > > in FairShareComparator class, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Description: in FairShareComparator.java, the performance of function getResourceUsage() is very poor. It will be executed above 100,000,000 times per second. In our scene, It takes 20 seconds per minute. A simple solution is to reduce call counts of the function. was: in FairShareComparator.java, the performance of function getResourceUsage() is very pool. It will be executed above 100,000,000 times per second. In our scene, It takes 20 seconds per minute. A simple solution is to reduce call counts of the function. > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong > Attachments: 20161206.patch > > > in FairShareComparator.java, the performance of function getResourceUsage() > is very poor. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage poor performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Summary: FairShareComparator getResourceUsage poor performance (was: FairShareComparator getResourceUsage pool performance) > FairShareComparator getResourceUsage poor performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong > Attachments: 20161206.patch > > > in FairShareComparator.java, the performance of function getResourceUsage() > is very pool. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5969) FairShareComparator getResourceUsage pool performance
[ https://issues.apache.org/jira/browse/YARN-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-5969: --- Attachment: 20161206.patch > FairShareComparator getResourceUsage pool performance > - > > Key: YARN-5969 > URL: https://issues.apache.org/jira/browse/YARN-5969 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.1 >Reporter: zhangshilong > Attachments: 20161206.patch > > > in FairShareComparator.java, the performance of function getResourceUsage() > is very pool. It will be executed above 100,000,000 times per second. > In our scene, It takes 20 seconds per minute. > A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5969) FairShareComparator getResourceUsage pool performance
zhangshilong created YARN-5969: -- Summary: FairShareComparator getResourceUsage pool performance Key: YARN-5969 URL: https://issues.apache.org/jira/browse/YARN-5969 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.7.1 Reporter: zhangshilong in FairShareComparator.java, the performance of function getResourceUsage() is very pool. It will be executed above 100,000,000 times per second. In our scene, It takes 20 seconds per minute. A simple solution is to reduce call counts of the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters
[ https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069121#comment-15069121 ] zhangshilong commented on YARN-4327: yeah,I tried yarn.timeline-service.http-authentication.type=kerberos. So jobs could be submitted, but users can not access application history from webapp. > RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters > -- > > Key: YARN-4327 > URL: https://issues.apache.org/jira/browse/YARN-4327 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.1 > Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use > kerberos security. > conf like this: > > hadoop.security.authorization > true > Is service-level authorization enabled? > > > hadoop.security.authentication > kerberos > Possible values are simple (no authentication), and kerberos > > >Reporter: zhangshilong > > bin hadoop 2.7.1 > ATS conf like this: > > yarn.timeline-service.http-authentication.type > simple > > > yarn.timeline-service.http-authentication.kerberos.principal > HTTP/_h...@xxx.com > > > yarn.timeline-service.http-authentication.kerberos.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.principal > xxx/_h...@xxx.com > > > yarn.timeline-service.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.best-effort > true > > > yarn.timeline-service.enabled > true > > > I'd like to allow everyone to access ATS from HTTP as RM,HDFS. > client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM > Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application > to failure. > RM logs: > 2015-11-03 11:58:38,191 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, > Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, > realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, > masterKeyId=2) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: HTTP status [500], message [Null user] > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) > at > org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTok
[jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation
[ https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997932#comment-14997932 ] zhangshilong commented on YARN-4325: If permissions with hdfs is right, is there any other problem? If set yarn.log-aggregation-enable = false, does NM recovery work well? > purge app state from NM state-store should be independent of log aggregation > > > Key: YARN-4325 > URL: https://issues.apache.org/jira/browse/YARN-4325 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > From a long running cluster, we found tens of thousands of stale apps still > be recovered in NM restart recovery. The reason is some wrong configuration > setting to log aggregation so the end of log aggregation events are not > received so stale apps are not purged properly. We should make sure the > removal of app state to be independent of log aggregation life cycle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters
[ https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-4327: --- Summary: RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters (was: RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters) > RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters > -- > > Key: YARN-4327 > URL: https://issues.apache.org/jira/browse/YARN-4327 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.1 > Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use > kerberos security. > conf like this: > > hadoop.security.authorization > true > Is service-level authorization enabled? > > > hadoop.security.authentication > kerberos > Possible values are simple (no authentication), and kerberos > > >Reporter: zhangshilong > > bin hadoop 2.7.1 > ATS conf like this: > > yarn.timeline-service.http-authentication.type > simple > > > yarn.timeline-service.http-authentication.kerberos.principal > HTTP/_h...@xxx.com > > > yarn.timeline-service.http-authentication.kerberos.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.principal > xxx/_h...@xxx.com > > > yarn.timeline-service.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.best-effort > true > > > yarn.timeline-service.enabled > true > > > I'd like to allow everyone to access ATS from HTTP as RM,HDFS. > client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM > Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application > to failure. > RM logs: > 2015-11-03 11:58:38,191 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, > Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, > realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, > masterKeyId=2) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: HTTP status [500], message [Null user] > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) > at > org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) > at org.apache.hadoop.security.token.
[jira] [Commented] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters
[ https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987054#comment-14987054 ] zhangshilong commented on YARN-4327: yarn.timeline-service.http-authentication.type simple, ATS will use PseudoAuthenticationHandler, and generate a token like u=null&t=null. This make ATS causes java.lang.IllegalArgumentException. > RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters > -- > > Key: YARN-4327 > URL: https://issues.apache.org/jira/browse/YARN-4327 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.1 > Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use > kerberos security. > conf like this: > > hadoop.security.authorization > true > Is service-level authorization enabled? > > > hadoop.security.authentication > kerberos > Possible values are simple (no authentication), and kerberos > > >Reporter: zhangshilong > > in hadoop 2.7.1 > ATS conf like this: > > yarn.timeline-service.http-authentication.type > simple > > > yarn.timeline-service.http-authentication.kerberos.principal > HTTP/_h...@xxx.com > > > yarn.timeline-service.http-authentication.kerberos.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.principal > xxx/_h...@xxx.com > > > yarn.timeline-service.keytab > /etc/hadoop/keytabs/xxx.keytab > > > yarn.timeline-service.best-effort > true > > > yarn.timeline-service.enabled > true > > > I'd like to allow everyone to access ATS from HTTP as RM,HDFS. > client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM > Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application > to failure. > RM logs: > 2015-11-03 11:58:38,191 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, > Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, > realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, > masterKeyId=2) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: HTTP status [500], message [Null user] > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) > at > org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdent
[jira] [Updated] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters
[ https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangshilong updated YARN-4327: --- Description: in hadoop 2.7.1 ATS conf like this: yarn.timeline-service.http-authentication.type simple yarn.timeline-service.http-authentication.kerberos.principal HTTP/_h...@xxx.com yarn.timeline-service.http-authentication.kerberos.keytab /etc/hadoop/keytabs/xxx.keytab yarn.timeline-service.principal xxx/_h...@xxx.com yarn.timeline-service.keytab /etc/hadoop/keytabs/xxx.keytab yarn.timeline-service.best-effort true yarn.timeline-service.enabled true I'd like to allow everyone to access ATS from HTTP as RM,HDFS. client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application to failure. RM logs: 2015-11-03 11:58:38,191 WARN org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, masterKeyId=2) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [500], message [Null user] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:543) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:540) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:538) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:437) ... 6 more ATS logs: 2015-11-03 14:47:45,407 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Creating password for identifier: owner=yarn-test, renewer=yarn-test, realUser=, issueDate=1446533265407, maxDate=144
[jira] [Created] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters
zhangshilong created YARN-4327: -- Summary: RM can not renew TIMELINE_DELEGATION_TOKEN in securt clusters Key: YARN-4327 URL: https://issues.apache.org/jira/browse/YARN-4327 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.1 Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use kerberos security. conf like this: hadoop.security.authorization true Is service-level authorization enabled? hadoop.security.authentication kerberos Possible values are simple (no authentication), and kerberos Reporter: zhangshilong in hadoop 2.7.1 ATS conf like this: yarn.timeline-service.http-authentication.type simple yarn.timeline-service.http-authentication.kerberos.principal HTTP/_h...@xxx.com yarn.timeline-service.http-authentication.kerberos.keytab /etc/hadoop/keytabs/xxx.keytab yarn.timeline-service.principal xxx/_h...@xxx.com yarn.timeline-service.keytab /etc/hadoop/keytabs/xxx.keytab yarn.timeline-service.best-effort true yarn.timeline-service.enabled true I'd like to allow everyone to access ATS from HTTP as RM,HDFS. client can submit job to RM and add TIMELINE_DELEGATION_TOKEN to AM Context, but RM can not renew TIMELINE_DELEGATION_TOKEN and make application to failure. RM logs: 2015-11-03 11:58:38,191 WARN org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer. java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, masterKeyId=2) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: HTTP status [500], message [Null user] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:377) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:543) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:540) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)