[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642625#comment-14642625 ] Hudson commented on YARN-3958: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #999 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/999/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642615#comment-14642615 ] Hudson commented on YARN-3958: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #269 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/269/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED
[ https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642803#comment-14642803 ] Sangjin Lee commented on YARN-2856: --- The patch applies to 2.6.0 cleanly. Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED Key: YARN-2856 URL: https://issues.apache.org/jira/browse/YARN-2856 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-2856.1.patch, YARN-2856.patch It is observed that recovering an application with its attempt KILLED final state throw below exception. And application remain in accepted state forever. {code} 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't handle this event at current state | org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED
[ https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2856: -- Labels: 2.6.1-candidate (was: ) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED Key: YARN-2856 URL: https://issues.apache.org/jira/browse/YARN-2856 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-2856.1.patch, YARN-2856.patch It is observed that recovering an application with its attempt KILLED final state throw below exception. And application remain in accepted state forever. {code} 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't handle this event at current state | org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673) org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues
[ https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642816#comment-14642816 ] Sangjin Lee commented on YARN-3850: --- The merge to 2.6.0 is straightforward. NM fails to read files from full disks which can lead to container logs being lost and other issues --- Key: YARN-3850 URL: https://issues.apache.org/jira/browse/YARN-3850 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation, nodemanager Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: YARN-3850.01.patch, YARN-3850.02.patch *Container logs* can be lost if disk has become full(~90% full). When application finishes, we upload logs after aggregation by calling {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would return nothing. So none of the container logs are aggregated and uploaded. But on application finish, we also call {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the application directory which contains container logs. This is because it calls {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks as well. So we are left with neither aggregated logs for the app nor the individual container logs for the app. In addition to this, there are 2 more issues : # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so NM will fail to serve up logs from full disks from its web interfaces. # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks so it is possible that on container recovery, PID file is not found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3949) ensure timely flush of timeline writes
[ https://issues.apache.org/jira/browse/YARN-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642839#comment-14642839 ] Sangjin Lee commented on YARN-3949: --- Thanks folks for reviewing and committing the patch! ensure timely flush of timeline writes -- Key: YARN-3949 URL: https://issues.apache.org/jira/browse/YARN-3949 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Sangjin Lee Fix For: YARN-2928 Attachments: YARN-3949-YARN-2928.001.patch, YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.002.patch, YARN-3949-YARN-2928.003.patch, YARN-3949-YARN-2928.004.patch, YARN-3949-YARN-2928.004.patch Currently flushing of timeline writes is not really handled. For example, {{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch and write puts asynchronously. However, {{BufferedMutator}} may not flush them to HBase unless the internal buffer fills up. We do need a flush functionality first to ensure that data are written in a reasonably timely manner, and to be able to ensure some critical writes are done synchronously (e.g. key lifecycle events). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642838#comment-14642838 ] Rohith Sharma K S commented on YARN-3543: - [~xgong] would you have look at updated patch please? ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Sharma K S Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, 0005-YARN-3543.patch, 0006-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2905) AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted
[ https://issues.apache.org/jira/browse/YARN-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642804#comment-14642804 ] Sangjin Lee commented on YARN-2905: --- The patch applies to 2.6.0 cleanly. AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted Key: YARN-2905 URL: https://issues.apache.org/jira/browse/YARN-2905 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-2905.patch If the AggregatedLogsBlock page tries to serve up a portion of a log file that has been corrupted (e.g.: like the case that was fixed by YARN-2724) then it can spin forever trying to seek to the targeted log segment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642855#comment-14642855 ] Hudson commented on YARN-3958: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #266 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/266/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3948: -- Attachment: 0003-YARN-3948.patch Uploading a patch after fixing test failures. Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3238: -- Labels: 2.6.1-candidate (was: ) Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642811#comment-14642811 ] Hudson commented on YARN-3958: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #258 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/258/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3222: -- Labels: 2.6.1-candidate (was: ) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order --- Key: YARN-3222 URL: https://issues.apache.org/jira/browse/YARN-3222 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the scheduler in a events node_added,node_removed or node_resource_update. These events should be notified in an sequential order i.e node_added event and next node_resource_update events. But if the node is reconnected with different http port, the oder of scheduler events are node_removed -- node_resource_update -- node_added which causes scheduler does not find the node and throw NPE and RM exit. Node_Resource_update event should be always should be triggered via RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3369) Missing NullPointer check in AppSchedulingInfo causes RM to die
[ https://issues.apache.org/jira/browse/YARN-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642814#comment-14642814 ] Sangjin Lee commented on YARN-3369: --- The patch applies to 2.6.0 cleanly. Missing NullPointer check in AppSchedulingInfo causes RM to die Key: YARN-3369 URL: https://issues.apache.org/jira/browse/YARN-3369 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Giovanni Matteo Fumarola Assignee: Brahma Reddy Battula Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3369-003.patch, YARN-3369.2.patch, YARN-3369.patch In AppSchedulingInfo.java the method checkForDeactivation() has these 2 consecutive lines: {code} ResourceRequest request = getResourceRequest(priority, ResourceRequest.ANY); if (request.getNumContainers() 0) { {code} the first line calls getResourceRequest and it can return null. {code} synchronized public ResourceRequest getResourceRequest( Priority priority, String resourceName) { MapString, ResourceRequest nodeRequests = requests.get(priority); return (nodeRequests == null) ? {color:red} null : nodeRequests.get(resourceName); } {code} The second line dereferences the pointer directly without a check. If the pointer is null, the RM dies. {quote}2015-03-17 14:14:04,757 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.checkForDeactivation(AppSchedulingInfo.java:383) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.decrementOutstanding(AppSchedulingInfo.java:375) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocateOffSwitch(AppSchedulingInfo.java:360) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:270) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.allocate(FiCaSchedulerApp.java:142) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1384) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1263) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:816) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:588) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:449) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1017) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1059) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:739) at java.lang.Thread.run(Thread.java:722) {color:red} *2015-03-17 14:14:04,758 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..*{color} {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3238) Connection timeouts to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642805#comment-14642805 ] Sangjin Lee commented on YARN-3238: --- The patch applies to 2.6.0 cleanly. Connection timeouts to nodemanagers are retried at multiple levels -- Key: YARN-3238 URL: https://issues.apache.org/jira/browse/YARN-3238 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3238.001.patch The IPC layer will retry connection timeouts automatically (see Client.java), but we are also retrying them with YARN's RetryPolicy put in place when the NM proxy is created. This causes a two-level retry mechanism where the IPC layer has already retried quite a few times (45 by default) for each YARN RetryPolicy error that is retried. The end result is that NM clients can wait a very, very long time for the connection to finally fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
Karthik Kambatla created YARN-3980: -- Summary: Plumb resource-utilization info in node heartbeat through to the scheduler Key: YARN-3980 URL: https://issues.apache.org/jira/browse/YARN-3980 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla YARN-1012 and YARN-3534 collect resource utilization information for all containers and the node respectively and send it to the RM on node heartbeat. We should plumb it through to the scheduler so the scheduler can make use of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642770#comment-14642770 ] Hudson commented on YARN-3958: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2196 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2196/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642813#comment-14642813 ] Sangjin Lee commented on YARN-3222: --- The merge to 2.6.0 is straightforward. RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order --- Key: YARN-3222 URL: https://issues.apache.org/jira/browse/YARN-3222 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: 0001-YARN-3222.patch, 0002-YARN-3222.patch, 0003-YARN-3222.patch, 0004-YARN-3222.patch, 0005-YARN-3222.patch When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the scheduler in a events node_added,node_removed or node_resource_update. These events should be notified in an sequential order i.e node_added event and next node_resource_update events. But if the node is reconnected with different http port, the oder of scheduler events are node_removed -- node_resource_update -- node_added which causes scheduler does not find the node and throw NPE and RM exit. Node_Resource_update event should be always should be triggered via RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642860#comment-14642860 ] Sangjin Lee commented on YARN-3908: --- Thanks for the update [~vrushalic]. Are folks OK with this going in as is and making further changes to the event schema in a separate ticket? Let me know, and if everyone is fine, I'll merge this patch later today. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642886#comment-14642886 ] Hudson commented on YARN-3958: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2215 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2215/]) YARN-3958. TestYarnConfigurationFields should be moved to hadoop-yarn-api module. Contributed by Varun Saxena. (aajisaka: rev 42d4e0ae99d162fde52902cb86e29f2c82a084c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643059#comment-14643059 ] Sangjin Lee commented on YARN-3814: --- Agreed. It would be client's responsibility to encode them correctly. Also, the server (e.g. jetty) may decode them properly so that the hadoop code may not be concerned about this. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager
[ https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643085#comment-14643085 ] zhihai xu commented on YARN-3965: - thanks for the explanation [~zhiguohong]! bq. One option is to make nmStartupTime as a non-static filed of NMContext. But I doubt is it worth to make simple thing complecated. BTW, the startup timestampt of ResourceManager is also static. Thanks for the information, Since time stamp of ResourceManager is also static, I am ok to use static. In practice, there is only one active RM and many active NMs. Adding nmStartupTime to NMContext is not bad, which can provide more information about the NM in the context. So either static or non-static is ok to me. Let's see what other people's opinions are. If we use static, Can we add an API to get the time stamp for NM similar as ResourceManager.getClusterTimeStamp for RM? bq. It's final so don't need warry about that Yes, you are right. I missed final keywords Add starup timestamp for nodemanager Key: YARN-3965 URL: https://issues.apache.org/jira/browse/YARN-3965 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-3965-2.patch, YARN-3965.patch We have startup timestamp for RM already, but don't for NM. Sometimes cluster operator modified configuration of all nodes and kicked off command to restart all NMs. He found out it's hard for him to check whether all NMs are restarted. Actually there's always some NMs didn't restart as he expected, which leads to some error later due to inconsistent configuration. If we have startup timestamp for NM, the operator could easily fetch it via NM webservice and find out which NM didn't restart, and take mannaul action for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643099#comment-14643099 ] MENG DING commented on YARN-1644: - bq. NM re-registration can still happen between the time the increase action is accepted, and the time it's added into increasedContainers. Even startContainer has the same problem, newly started container may fall into this tiny window that RM won't recover this container. Yes, you are right that startContainer would have the same problem. So to make it clear, RM restart/NM re-registration can happen in the following scenarios: * 1. Container resource increase is already completed. In this case, NM re-registration can send the correct (increased) container size (through containerStatus object) for RM recovery. * 2. Container to be increased has been added into increasedContainers, but the resource is not yet updated. In this case, NM re-registration can send the correct container size through both containerStatus and increasedContainers objects for RM recovery. * 3. The increase action is accepted, but the container to be increased has not been added into increasedContainers. In this case, the resource view between NM and RM becomes different. The same issue applies to startContainers. I don't have a solution for c yet, but I think the chance for scenario 3 to happen is very small, especially with the {{blockNewContainerRequests}} and matching RM identifier logic right now. Maybe we can log a separate JIRA for scenario 3, and fix that for both container increase and container launch? RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing - Key: YARN-1644 URL: https://issues.apache.org/jira/browse/YARN-1644 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Wangda Tan Assignee: MENG DING Attachments: YARN-1644-YARN-1197.4.patch, YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643048#comment-14643048 ] Li Lu commented on YARN-3908: - Hi [~sjlee0], I'm OK with checking this in and address the event schema problem in a separate JIRA. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643155#comment-14643155 ] Naganarasimha G R commented on YARN-3963: - Hi [~bibinchundatt], In Distributed mode, only node to label mapping is done by individual NM's so there should not be any impact in distributed mode. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3981) support timeline clients not associated with an application
Sangjin Lee created YARN-3981: - Summary: support timeline clients not associated with an application Key: YARN-3981 URL: https://issues.apache.org/jira/browse/YARN-3981 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee In the current v.2 design, all timeline writes must belong in a flow/application context (cluster + user + flow + flow run + application). But there are use cases that require writing data outside the context of an application. One such example is a higher level client (e.g. tez client or hive/oozie/cascading client) writing flow-level data that spans multiple applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643141#comment-14643141 ] Naganarasimha G R commented on YARN-3963: - Hi [~leftnoteasy], How will the user know if he is not informed that the last operation failed ? Any limitations if we throw exception ? I feel user need to be informed if not he will assume that last operation succeeded. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api module
[ https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642952#comment-14642952 ] Varun Saxena commented on YARN-3958: Thanks [~ajisakaa] for the review and commit TestYarnConfigurationFields should be moved to hadoop-yarn-api module - Key: YARN-3958 URL: https://issues.apache.org/jira/browse/YARN-3958 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Fix For: 2.8.0 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, YARN-3958.03.patch Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The test is for checking whether all the configurations declared in YarnConfiguration exist in yarn-default.xml or not. But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this file, it is not necessary that this test will be run. So if the developer misses to update yarn-default.xml and patch is committed, it will lead to unnecessary test failures after commit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3981) support timeline clients not associated with an application
[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643006#comment-14643006 ] Sangjin Lee commented on YARN-3981: --- Some of us had an offline discussion on this. There are some major challenges in supporting this in the v.2 design. First, obviously they may lack an application-specific context as they can span multiple YARN apps. Second, even if we solved the problem of the context, these clients are likely off-cluster, and they need a way to write to the cluster. Ideas such as a separate dedicated timeline writer just for these have been discussed, but their scalability is problematic at best. One idea that was suggested involves creating a specialized YARN application that can act as a proxy for these off-cluster clients. For example, suppose you started a tez client that can start multiple YARN apps. It can also start a special dedicated (flow-level) timeline client. This client would launch a special YARN app under the covers whose app master and its associated timeline writer can serve as the proxy for timeline data the client may write. When this special timeline client shuts down, it would tear down the associated YARN app also. If we go this route, we would write the YARN app itself so that the app master listens on requests coming from the client and proxies it to the timeline writer. We would also write the timeline client piece so that it manages the YARN app as well as sending the write requests to the app master. support timeline clients not associated with an application --- Key: YARN-3981 URL: https://issues.apache.org/jira/browse/YARN-3981 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee In the current v.2 design, all timeline writes must belong in a flow/application context (cluster + user + flow + flow run + application). But there are use cases that require writing data outside the context of an application. One such example is a higher level client (e.g. tez client or hive/oozie/cascading client) writing flow-level data that spans multiple applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643105#comment-14643105 ] Wangda Tan commented on YARN-3971: -- [~bibinchundatt]. Thanks for working on this. I think one simpler solution for this is, we can leverage {{AbstractService#getServiceState}}. Doing remove label check only when state == STARTED. Thoughts? Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643055#comment-14643055 ] Hadoop QA commented on YARN-3948: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 49s | Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 20s | The applied patch generated 1 new checkstyle issues (total was 15, now 16). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 28s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 52m 28s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 112m 49s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747352/0003-YARN-3948.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2196e39 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8678/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8678/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8678/console | This message was automatically generated. Display Application Priority in RM Web UI - Key: YARN-3948 URL: https://issues.apache.org/jira/browse/YARN-3948 Project: Hadoop YARN Issue Type: Sub-task Components: webapp Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 0003-YARN-3948.patch, ApplicationPage.png, ClusterPage.png Application Priority can be displayed in RM Web UI Application page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3981) support timeline clients not associated with an application
[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3981: - Assignee: Zhijie Shen support timeline clients not associated with an application --- Key: YARN-3981 URL: https://issues.apache.org/jira/browse/YARN-3981 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Assignee: Zhijie Shen In the current v.2 design, all timeline writes must belong in a flow/application context (cluster + user + flow + flow run + application). But there are use cases that require writing data outside the context of an application. One such example is a higher level client (e.g. tez client or hive/oozie/cascading client) writing flow-level data that spans multiple applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3981) support timeline clients not associated with an application
[ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643056#comment-14643056 ] Zhijie Shen commented on YARN-3981: --- Thanks for filing the jira. I'm going to pick this up. support timeline clients not associated with an application --- Key: YARN-3981 URL: https://issues.apache.org/jira/browse/YARN-3981 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee In the current v.2 design, all timeline writes must belong in a flow/application context (cluster + user + flow + flow run + application). But there are use cases that require writing data outside the context of an application. One such example is a higher level client (e.g. tez client or hive/oozie/cascading client) writing flow-level data that spans multiple applications. We need to find a way to support them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643116#comment-14643116 ] Wangda Tan commented on YARN-3963: -- I think we shouldn't throw exception in this case. It's better to print some WARN message at client side when trying to add existed labels, but currently YARN lack of channel to put diagnostic information. I would prefer to print WARN message at service side, and keeps the behavior unchanged. Thoughts? [~sunilg], [~bibinchundatt]. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643113#comment-14643113 ] Zhijie Shen commented on YARN-3908: --- [~vrushalic], thanks for fixing the problem. W.R.T the column key, shall we use: {code} e!eventId?eventTimestamp?eventInfoKey : eventInfoValue {code} Image we have two KILL events: one on TS1 and the other on TS2. IMHO, we want to scan through the two events' columns one-by-one instead of in a interleaved manner. This will make reader to parse multiple events much easier and encapsulate them one after the other. It will be more useful in the future if we want to just retrieve part of the events of a big job (e.g. within a given time window or the most recent events). Thoughts? Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500
[ https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643110#comment-14643110 ] Anubhav Dhoot commented on YARN-3957: - Thanks [~kasha] for review and commit FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500 Key: YARN-3957 URL: https://issues.apache.org/jira/browse/YARN-3957 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.8.0 Attachments: YARN-3957.001.patch, YARN-3957.002.patch There is a NPE causing the webpage of http://localhost:23188/cluster/scheduler to return a 500. This seems to be because of YARN-2336 setting null for childQueues and then getChildQueues hits the NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643204#comment-14643204 ] Naganarasimha G R commented on YARN-3963: - this approach should be fine ! AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643215#comment-14643215 ] Wangda Tan commented on YARN-3873: -- [~sunilg], thanks for working on this, looked at main logic of the patch, haven't looked at tests yet. Two major comments: 1) It's no need to deprecate getApplicationComparator, it's not a public API, we should simply remove it. 2) Can we assume pendingApplication's ordering policy is equals to activeApplication's ordering policy? I think we can assume this at least for now. For example, fair/priority/fifo. This assumption can avoid configure for pending-application-ordering-policy. And we don't need to create a getActiveIterator in OrderingPolicy, we can reuse getAssignmentIterator. Thoughts? pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch, 0003-YARN-3873.patch, 0004-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643261#comment-14643261 ] Naganarasimha G R commented on YARN-3963: - Also can we check for all labels and print one message ? AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3971: --- Attachment: 0004-YARN-3971.patch [~leftnoteasy] Thank you for review. I agree with you. Updated patch as per you comments and also testcase updated. Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch, 0004-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643283#comment-14643283 ] Hadoop QA commented on YARN-3963: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 55s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 10m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 27s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 20s | Tests passed in hadoop-yarn-common. | | | | 49m 19s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747398/0003-YARN-3963.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f36835f | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8679/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8679/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8679/console | This message was automatically generated. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643188#comment-14643188 ] Vrushali C commented on YARN-3908: -- Hi Zhijie Thanks.. that is a good point. But if we put the event timestamp first, we have no way of querying for a particular event key unless we know the exact timestamp. I think knowing the exact time is probably almost impossible. Imagine that there is another event that occurs between the two kill events, so it has a timestamp kill1 and kill2. Now we still have to fetch all those and filter them out. So placing the timestamp first does not help in this case. But if we have the event key first, the columns will be placed together and the event timestamps will be stored in a chronological order (using the long.max - ts value). So the first one being fetched for kill event would be the latest for that event key. thanks Vrushali Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643267#comment-14643267 ] Zhijie Shen commented on YARN-3908: --- Okay, it's fair point. It seems that the key design significantly depends on how we want to operate on the events. The current key design is most friendly to check if there exists the events who match the given event ID to match some given info key (and its value). But if you want to fetch everything that belongs to this event (our query needs to do this, as it's implicitly an atomic unit for now), it seems to be inevitable to scan through all these columns that have the given event ID (correct me if I'm wrong :-). If so, there seems to to have little gain from this key design, while complicating the event encapsulation logic. And after rethinking of the current query to support (YARN-3051), I want to amend my suggestion. It seems to be more reasonable to use {{e!eventTimestamp?eventId?eventInfoKey}}, such that we can natively scan through the events of one entity one-by-one return them in a chronological order. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3963: --- Attachment: 0003-YARN-3963.patch [~leftnoteasy] and [~Naganarasimha] Thanks for review and comments. Have updated the patch as per the comments. Only when parameters are changed the exception is thrown now . AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643180#comment-14643180 ] Wangda Tan commented on YARN-3963: -- Thanks comment, [~Naganarasimha]. I think we can throw exception only if added node label has different attribute. If node label has same name and attribute, it should be simply ignored. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643217#comment-14643217 ] Varun Vasudev commented on YARN-3852: - Committed to trunk and branch-2. Thanks [~ashahab]! Add docker container support to container-executor --- Key: YARN-3852 URL: https://issues.apache.org/jira/browse/YARN-3852 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Abin Shahab Fix For: 2.8.0 Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852-3.patch, YARN-3852.patch For security reasons, we need to ensure that access to the docker daemon and the ability to run docker containers is restricted to privileged users ( i.e users running applications should not have direct access to docker). In order to ensure the node manager can run docker commands, we need to add docker support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3852) Add docker container support to container-executor
[ https://issues.apache.org/jira/browse/YARN-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643225#comment-14643225 ] Hudson commented on YARN-3852: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8227 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8227/]) YARN-3852. Add docker container support to container-executor. Contributed by Abin Shahab. (vvasudev: rev f36835ff9b878fa20fe58a30f9d1e8c47702d6d2) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h Add docker container support to container-executor --- Key: YARN-3852 URL: https://issues.apache.org/jira/browse/YARN-3852 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Abin Shahab Fix For: 2.8.0 Attachments: YARN-3852-1.patch, YARN-3852-2.patch, YARN-3852-3.patch, YARN-3852.patch For security reasons, we need to ensure that access to the docker daemon and the ability to run docker containers is restricted to privileged users ( i.e users running applications should not have direct access to docker). In order to ensure the node manager can run docker commands, we need to add docker support to the container-executor binary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643246#comment-14643246 ] Naganarasimha G R commented on YARN-3963: - Hi [~bibinchundatt], ??throw new IOException(Label= + label.getName() + ( + rmNodeLabel.getIsExclusive() + ) + already added);?? message would be better as {{throw new IOException(Exclusivity cannot be modified for an existing label : + label.getName() + ( + rmNodeLabel.getIsExclusive() + ) );}} In test case ??Assert.fail(IOException not thrown should have on adding same labels);?? I think better message would be {{IOException is expected when exlusivity is modified}} AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3963: --- Attachment: 0004-YARN-3963.patch [~Naganarasimha] Thank you for your review comments. [~leftnoteasy] Updated patch handling all comments. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch, 0004-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643328#comment-14643328 ] Anubhav Dhoot commented on YARN-3736: - bq. It may be better to store ReservationId object rather than the name as we are parsing the name back in most places Since the reservationId is used as a key/path where the ReservationAllocationState is stored, we need to convert the reservationId to string format. We are not actually storing the reservationId as part of the protobuf payload anymore. Persist the Plan information, ie. accepted reservations to the RMStateStore for failover Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch, YARN-3736.002.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643431#comment-14643431 ] Hadoop QA commented on YARN-3971: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 19s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 31s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 94m 24s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747403/0004-YARN-3971.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f36835f | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8680/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8680/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8680/console | This message was automatically generated. Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch, 0004-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643446#comment-14643446 ] Vrushali C commented on YARN-3908: -- Yes, let's get the current patch in and continue discussion on the event schema Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3736: Attachment: YARN-3736.002.patch Removed ReservationId from ReservationAllocationStateProto updated LevelDBRMStateStore to use batch writes. Persist the Plan information, ie. accepted reservations to the RMStateStore for failover Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch, YARN-3736.002.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643314#comment-14643314 ] Anubhav Dhoot commented on YARN-3736: - Thanks [~subru] for the review. Rebased the patch add addressed all feedback except the ZKRMstateStore For deleting reservation, in the first step we remove the reservation and then we check if the plan state is empty because of this removal. If so we remove the plan state as well. Because of this two steps we cannot use a transaction for both. So we either use 2 different transactions or leave it as is. I have left it as is for now and let me know if you feel we still need to use 2 SafeTransactions or do something else. Persist the Plan information, ie. accepted reservations to the RMStateStore for failover Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3851) Add support for container runtimes in YARN
[ https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana resolved YARN-3851. - Resolution: Fixed Release Note: Support for this was added as part of YARN-3853 . It wasn't straightforward (or very useful) to split the patches up, so a single patch was submitted. Add support for container runtimes in YARN --- Key: YARN-3851 URL: https://issues.apache.org/jira/browse/YARN-3851 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the ability to support different container types within the same executor. Container runtimes are lower-level implementations for supporting specific container engines (e.g docker). These are meant to be independent of executors themselves - a given executor (e.g LinuxContainerExecutor) could potentially switch between different container runtimes depending on what a client/application is requesting. An executor continues to provide higher level functionality that could be specific to an operating system - for example, LinuxContainerExecutor continues to handle cgroups, users, diagnostic events etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishore Chaliparambil updated YARN-2884: Attachment: YARN-2884-V2.patch Attached a new patch that addresses review comments from [~subru] Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic
[ https://issues.apache.org/jira/browse/YARN-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643448#comment-14643448 ] Wangda Tan commented on YARN-3983: -- Start working on this JIRA, will upload patch for review shortly. Make CapacityScheduler to easier extend application allocation logic Key: YARN-3983 URL: https://issues.apache.org/jira/browse/YARN-3983 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan While working on YARN-1651 (resource allocation for increasing container), I found it is very hard to extend existing CapacityScheduler resource allocation logic to support different types of resource allocation. For example, there's a lot of differences between increasing a container and allocating a container: - Increasing a container doesn't need to check locality delay. - Increasing a container doesn't need to build/modify a resource request tree (ANY-RACK/HOST). - Increasing a container doesn't need to check allocation/reservation starvation (see {{shouldAllocOrReserveNewContainer}}). - After increasing a container is approved by scheduler, it need to update an existing container token instead of creating new container. And there're lots of similarities when allocating different types of resources. - User-limit/queue-limit will be enforced for both of them. - Both of them needs resource reservation logic. (Maybe continuous reservation looking is needed for both of them). The purpose of this JIRA is to make easier extending CapacityScheduler resource allocation logic to support different types of resource allocation, make common code reusable, and also better code organization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643470#comment-14643470 ] Hadoop QA commented on YARN-2884: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747414/YARN-2884-V2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3e6fce9 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8684/console | This message was automatically generated. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643347#comment-14643347 ] Junping Du commented on YARN-3045: -- bq. Well was aware that priority was not to differentiate the containers but for the events of it, but i thought you mentioned for the purpose of better querying rather than the purpose of writing it. Better query is one of purpose but writing them in different policies could also be a consideration here. We may not afford to flush every events in a large scale cluster, so we may choose to ignore/cache some unimportant ones. bq. I have not gone through the writer code completely but is there any caching which you want to flush if the event priority is high ? Also was thinking whether we need to change the Writer/Collector API to mention the criticality of the event being published? We already have a new flush() API now for writer that checked in YARN-3949. Please refer some of discussions there with details. You are right that we are lacking of API to respect this priority/policy in the whole data flow for writing. I will file another JIRA to track this. bq. So from NM side we want to publish events for ApplicationEntity and ContainerEntity, but based on the title of this jira i thought scope of this jira is to handle only ContainerEntities from NM side, is it better to handle events related Application entities specific to a given NM in another Jira? but i can try to ensure required foundation is done in NM side in this JIRA as part of your other comments, Thoughts? I am fine with separating events other than container events to a separated JIRA if it is really necessary. In common case, jira title shouldn't bound the implementation as at JIRA proposing time, there is no so concrete goal like when JIRA is being implemented so we can fix/adjust later. Anyway, I would support the scope (container events + foundation work) you proposed here in case you are comfortable with. bq. Also event has just id but NM related Application events will have the same event ID in different NM's so would it be something like INIT_APPLICATION_NODE_ID ? That's a good question. My initative thinking is we could need something like NodemanagerEntity to store application events, resource localizaiton event, log aggregation handling events, configuration, etc. However, I would like to hear you and other guys' ideas on this as well. bq. +1 for this thought, had the same initial hitch as in future if we add more events than unnecessary create event and methods in publisher, but for the initial version thought will have approach similar to RM and ATSV1. But i feel better to handle now than refactor later on. But i can think of couple of approaches here. Yes. All three approaches seems to work here. IMO, the 2nd approach (hook to existing event dispatcher) looks simpler and straightforward. bq. Was not clear about the comment, IIRC Zhijjie in the meeting also mentioned that i am handling removing threaded model of publishing container metrics statistics as part of this jira. May be i am missing some other jira which you are already working on, may be can you englighten me about it? I was thinking you are encapsulating metrics with TimelineEvent but actually not. So no worry on my previous comments on this. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3982: Attachment: YARN-3982.001.patch Patch with fix attached. container-executor parsing of container-executor.cfg broken --- Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643388#comment-14643388 ] Li Lu commented on YARN-3816: - bq. I'm still very confused by the usage of the word aggregate. In this patch, aggregate really means accumulating values of a metric along the time dimension, which is completely different than the notion of aggregation we have used all along. The aggregation has always been about rolling up values from children to parents. I have a similar concern with regard to the dimensions of aggregations, too. If I understand the problem correctly, we have two dimensions in a flow/user level aggregation: one dimension for all entities belong to this flow/user, another dimension for time. If we aggregate in the flow/user dimension, one typical problem we will hit is aligning times. Suppose entity E1 and E2 both belong to flow F1. In an aggregation, we would like to aggregate E1 and E2. However, if a metric M is a time series, how do we align the times in E1.M and E2.M? Normally the two time series may have slightly different sample times, so I believe we need to decide the semantic on this situation? [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch We need application level aggregation of Timeline data: - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done. - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level. - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3982: Summary: container-executor parsing of container-executor.cfg broken in trunk and branch-2 (was: container-executor parsing of container-executor.cfg broken) container-executor parsing of container-executor.cfg broken in trunk and branch-2 - Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643421#comment-14643421 ] Li Lu commented on YARN-3908: - Just a quick check about the current status of this JIRA. Are we still planning to merge it in ASAP, or we want to fix the row key of timeline events with one more draft, or we plan to fully resolve timeline event problems before we merge it in (if fixing the row key does not fully resolve the problem)? I'd like to know our plan on this JIRA so that I can fine tune my patch for YARN-3904 accordingly. Thanks! Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3950) Add unique SHELL_ID environment variable to DistributedShell
[ https://issues.apache.org/jira/browse/YARN-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643452#comment-14643452 ] Jason Lowe commented on YARN-3950: -- +1 lgtm. Will commit tomorrow if there are no objections. Add unique SHELL_ID environment variable to DistributedShell Key: YARN-3950 URL: https://issues.apache.org/jira/browse/YARN-3950 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-3950.001.patch, YARN-3950.002.patch As discussed in [this comment|https://issues.apache.org/jira/browse/MAPREDUCE-6415?focusedCommentId=14636027page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14636027], it would be useful to have a monotonically increasing and independent ID of some kind that is unique per shell in the distributed shell program. We can do that by adding a SHELL_ID env var. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3982) container-executor parsing of container-executor.cfg broken
Varun Vasudev created YARN-3982: --- Summary: container-executor parsing of container-executor.cfg broken Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN
[ https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643382#comment-14643382 ] Allen Wittenauer commented on YARN-3851: What is a user supposed to do with that release note? Add support for container runtimes in YARN --- Key: YARN-3851 URL: https://issues.apache.org/jira/browse/YARN-3851 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the ability to support different container types within the same executor. Container runtimes are lower-level implementations for supporting specific container engines (e.g docker). These are meant to be independent of executors themselves - a given executor (e.g LinuxContainerExecutor) could potentially switch between different container runtimes depending on what a client/application is requesting. An executor continues to provide higher level functionality that could be specific to an operating system - for example, LinuxContainerExecutor continues to handle cgroups, users, diagnostic events etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN
[ https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643396#comment-14643396 ] Sidharta Seethana commented on YARN-3851: - hi [~aw] , I apologize - that wasn't meant to be a release note, just a comment that the patch for this is included in YARN-3853 . I'll remove the release note. thanks, -Sidharta Add support for container runtimes in YARN --- Key: YARN-3851 URL: https://issues.apache.org/jira/browse/YARN-3851 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the ability to support different container types within the same executor. Container runtimes are lower-level implementations for supporting specific container engines (e.g docker). These are meant to be independent of executors themselves - a given executor (e.g LinuxContainerExecutor) could potentially switch between different container runtimes depending on what a client/application is requesting. An executor continues to provide higher level functionality that could be specific to an operating system - for example, LinuxContainerExecutor continues to handle cgroups, users, diagnostic events etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3851) Add support for container runtimes in YARN
[ https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3851: Release Note: (was: Support for this was added as part of YARN-3853 . It wasn't straightforward (or very useful) to split the patches up, so a single patch was submitted. ) Add support for container runtimes in YARN --- Key: YARN-3851 URL: https://issues.apache.org/jira/browse/YARN-3851 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the ability to support different container types within the same executor. Container runtimes are lower-level implementations for supporting specific container engines (e.g docker). These are meant to be independent of executors themselves - a given executor (e.g LinuxContainerExecutor) could potentially switch between different container runtimes depending on what a client/application is requesting. An executor continues to provide higher level functionality that could be specific to an operating system - for example, LinuxContainerExecutor continues to handle cgroups, users, diagnostic events etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3963) AddNodeLabel on duplicate label addition shows success
[ https://issues.apache.org/jira/browse/YARN-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643400#comment-14643400 ] Hadoop QA commented on YARN-3963: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 32s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 3s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 56s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 56s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 55s | Tests passed in hadoop-yarn-common. | | | | 42m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747408/0004-YARN-3963.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f36835f | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8681/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8681/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8681/console | This message was automatically generated. AddNodeLabel on duplicate label addition shows success --- Key: YARN-3963 URL: https://issues.apache.org/jira/browse/YARN-3963 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-3963.patch, 0002-YARN-3963.patch, 0003-YARN-3963.patch, 0004-YARN-3963.patch Currently as per the code in {{CommonNodeLabelManager#addToClusterNodeLabels}} when we add same nodelabel again event will not be fired so no updation is done. {noformat} ./yarn rmadmin –addToClusterNodeLabels x ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=true)” ./yarn rmadmin –addToClusterNodeLabels “x(exclusive=false)” {noformat} All these commands will give success when applied again through CLI {code} 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=true] 2015-07-22 21:16:57,779 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=10.19.92.117 OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: Add labels: [z:exclusivity=false] 2015-07-22 21:17:06,431 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf IP=IP OPERATION=addToClusterNodeLabelsTARGET=AdminService RESULT=SUCCESS {code} Also since exclusive=true to false is not supported success is misleading -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3851) Add support for container runtimes in YARN
[ https://issues.apache.org/jira/browse/YARN-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643398#comment-14643398 ] Sidharta Seethana commented on YARN-3851: - Moved note from the release note : Support for this was added as part of YARN-3853 . It wasn't straightforward (or very useful) to split the patches up, so a single patch was submitted. Add support for container runtimes in YARN --- Key: YARN-3851 URL: https://issues.apache.org/jira/browse/YARN-3851 Project: Hadoop YARN Issue Type: Sub-task Components: yarn Reporter: Sidharta Seethana Assignee: Sidharta Seethana We need the ability to support different container types within the same executor. Container runtimes are lower-level implementations for supporting specific container engines (e.g docker). These are meant to be independent of executors themselves - a given executor (e.g LinuxContainerExecutor) could potentially switch between different container runtimes depending on what a client/application is requesting. An executor continues to provide higher level functionality that could be specific to an operating system - for example, LinuxContainerExecutor continues to handle cgroups, users, diagnostic events etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3983) Make CapacityScheduler to easier extend application allocation logic
Wangda Tan created YARN-3983: Summary: Make CapacityScheduler to easier extend application allocation logic Key: YARN-3983 URL: https://issues.apache.org/jira/browse/YARN-3983 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan While working on YARN-1651 (resource allocation for increasing container), I found it is very hard to extend existing CapacityScheduler resource allocation logic to support different types of resource allocation. For example, there's a lot of differences between increasing a container and allocating a container: - Increasing a container doesn't need to check locality delay. - Increasing a container doesn't need to build/modify a resource request tree (ANY-RACK/HOST). - Increasing a container doesn't need to check allocation/reservation starvation (see {{shouldAllocOrReserveNewContainer}}). - After increasing a container is approved by scheduler, it need to update an existing container token instead of creating new container. And there're lots of similarities when allocating different types of resources. - User-limit/queue-limit will be enforced for both of them. - Both of them needs resource reservation logic. (Maybe continuous reservation looking is needed for both of them). The purpose of this JIRA is to make easier extending CapacityScheduler resource allocation logic to support different types of resource allocation, make common code reusable, and also better code organization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3853) Add docker container runtime support to LinuxContainterExecutor
[ https://issues.apache.org/jira/browse/YARN-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643444#comment-14643444 ] Hudson commented on YARN-3853: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8228 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8228/]) YARN-3853. Add docker container runtime support to LinuxContainterExecutor. Contributed by Sidharta Seethana. (vvasudev: rev 3e6fce91a471b4a5099de109582e7c6417e8a822) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntimeConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommand.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntimeConstants.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperationExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntime.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/privileged/PrivilegedOperationException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java *
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643476#comment-14643476 ] Zhijie Shen commented on YARN-3908: --- Sure, as most folks are comfortable with the latest patch, let's get this in. I'll file a separate jira to track the discussion about event column key. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643337#comment-14643337 ] Sangjin Lee commented on YARN-3816: --- Thanks [~djp] for updating the POC patch and providing answers to the questions I had. I've looked over the new patch and also gone through your answers. Some follow-up thoughts and observations are below. (1) I think there is some confusion on the types of metrics in relation to this. Here is how I look at the metric types. See if it squares with your understanding. There are basically *2 independent* dimensions of metric types: - single value vs. time series - counter vs. gauge Single value vs. time series purely concerns *storage*. It only determines whether only the latest value is stored or the entire time series values are stored (subject to TTL). On the other hand, the counter vs. gauge dimension deals with *what type of mathematical functions/operations* apply to them. Counters are metrics that are time-cumulative in their nature, and are always monotonically increasing with time (e.g. HDFS bytes written). Gauges can fluctuate up and down over time (e.g. CPU usage). The time integral that's being done in this patch applies only to gauges. It does not make sense for counters. These are two independent dimensions in principle. For example, a gauge can be a single value. A counter can be a time series. Regardless of whether they are always useful, they are possible in principle. I propose to introduce the second dimension to the metrics explicitly. This second dimension nearly maps to toAggregate (and/or the REP/SUM distinction) in your patch. But I think it's probably better to introduce the metric types explicitly as another enum or by subclassing {{TimelineMetric}}. Let me know what you think. (2) I'm still very confused by the usage of the word aggregate. In this patch, aggregate really means accumulating values of a metric along the time dimension, which is completely different than the notion of aggregation we have used all along. The aggregation has always been about rolling up values from children to parents. Can we choose a different word to describe this aspect of accumulating values along the time dimension, and avoid using aggregation for this? Accumulate? Cumulative? Any suggestion? On a related note, {quote} However, in practice, there are cases that some aggregated metrics has both properties, like area value here - we do need its cumulative values and also could be interested in getting values within a given time interval. Isn't it? {quote} My statement was that a time-integral (or accumulation along the time dimension) does not make sense for counters. For example, consider HDFS bytes written. The time accumulation is already built into it (see (1)). If you further accumulate this along the time dimension, it becomes quadratic (doubly integrated) in time. I don't see how that can be useful. Another way to see this is that a counter is basically a time integral of another gauge. For example, the HDFS bytes written counter (in the unit of bytes) is a time integral of HDFS bytes written per time (in the unit of bytes/sec). If I misunderstood what you meant, could you kindly clarify it? (3) {quote} No. Nothing get changed on the design since our last discussions. The average and max is also important but I just haven't get bandwidth to add in poc stage as adding existing things could be more straight-forward. I will add it later. {quote} The average/max we discussed in the offline discussion is actually very similar to the aggregated (accumulated) metrics here. The only difference is that the average is further divided by the duration. Otherwise, it's basically the same derived property. It would be good to do one or the other, but not both. I would suggest that we do only one of them. I think it would be OK to do this and not the average/max of the previous discussion. I'd like to hear what others think about this. (4) Can we introduce a configuration that disables this time accumulation feature? As we discussed, some may not want to have this feature enabled and are perfectly happy with simple aggregation (from children to parents). It would be good to isolate this part and be able to enable/disable it. [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch We need application level aggregation of Timeline data: - To present end user
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643424#comment-14643424 ] Sangjin Lee commented on YARN-3908: --- +1 for committing this patch and having the event schema discussion in a different JIRA. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643463#comment-14643463 ] Vrushali C commented on YARN-3816: -- bq. If I understand the problem correctly, we have two dimensions in a flow/user level aggregation: one dimension for all entities belong to this flow/user, another dimension for time. Ah not quite. Time dimension goes with flow/user/queue. For example, we will aggregate for user level stats over a time period like daily or weekly. Similarly for flows. Flows are aggregated over one day or one week in hRaven. Ditto for users and queues. So let's say, for simplicity, user1 ran a wordcount map reduce job three times on Monday and a sleep job two times on monday. Now daily aggregation table for user1 will have sum of each metric which is a counter on that day, that is {code} M1 for user1 on monday = M1 from wordcount.Run1 on monday + M1 from wordcount.Run2 on monday + M1 from wordcount.Run3 on monday + M1 from sleep.run1 on monday + M1 from sleep.run2 on monday. {code} Now, for flows on monday: {code} M1 for wordcount on monday = M1 from wordcount.run1 on monday + M1 from wordcount.run2 on monday + M1 from wordcount.Run3 on monday M1 for sleep on monday = M1 from sleep.run1 on monday + M1 from sleep.run2 on monday {code} For timeseries, we need to decide what aggregation means. One option is that we could normalize the values to a minute level granularity. For example, add up values per min across each time. So anything that occurred within a minute will be assigned to the top of that minute: eg if something happening at 2 min 10 seconds is considered to have occurred at 2 min. That way we can sum up across flows/users/runs etc. [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch We need application level aggregation of Timeline data: - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done. - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level. - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3908. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: YARN-2928 Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and Sangjin, as well as other folks for contributing your thoughts. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643602#comment-14643602 ] Xuan Gong commented on YARN-3982: - This doc is confusing. {code} /** * Function to return an array of values for a key. * Value delimiter is assumed to be a '%'. */ char ** get_values(const char * key) { {code} Which value delimiter are really used in here ? % or , ? container-executor parsing of container-executor.cfg broken in trunk and branch-2 - Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643492#comment-14643492 ] Hadoop QA commented on YARN-3982: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | yarn tests | 6m 5s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 21m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747417/YARN-3982.001.patch | | Optional Tests | javac unit | | git revision | trunk / 3e6fce9 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8683/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8683/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8683/console | This message was automatically generated. container-executor parsing of container-executor.cfg broken in trunk and branch-2 - Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643606#comment-14643606 ] Subru Krishnan commented on YARN-3736: -- Thanks [~adhoot] for responding to my comments, what you say makes sense. The latest patch LGTM. Looks like there are minor test-patch issues (unused imports after the update etc). Can you address those. Also can you open a JIRA for storing/updating reservation state in the RMStateStore from the Plan. Persist the Plan information, ie. accepted reservations to the RMStateStore for failover Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch, YARN-3736.002.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3904: Attachment: YARN-3904-YARN-2928.004.patch Update the 004 version of the patch. This patch addresses the following two major issues: # Rebuild the current Phoenix writer into an offline aggregation writer. Specifically, the writer writes info and metric data into the newly created Phoenix offline aggregation table. # Simplify writer interface by using TimelineCollectorContext. In this way both normal writers and offline aggregation writers can use the same interface to write data. One thing pending discussion is about the {{aggregation}} method. I feel this method is a little bit outdated. Could anyone remind me the assumed use case for it? Will it fit for real-time aggregations only? Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3984) Rethink event column key issue
Zhijie Shen created YARN-3984: - Summary: Rethink event column key issue Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3982: Attachment: YARN-3982.002.patch bq. Which value delimiter are really used in here ? % or , ? Sorry about the goof up in the comment. It should be ,. Uploaded a new patch with the fix. container-executor parsing of container-executor.cfg broken in trunk and branch-2 - Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch, YARN-3982.002.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishore Chaliparambil updated YARN-2884: Attachment: YARN-2884-V3.patch Uploading a new version of the patch Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, YARN-2884-V3.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643504#comment-14643504 ] Li Lu commented on YARN-3816: - Thanks for the clarification [~vrushalic]! Yes the problem is with time series metrics. I think your approach works here, but maybe we'd like to change the scale of round-ups according to the scale of the aggregation? For example, if we aggregate the data for one whole day, we can merge the data in the same minute. If we aggregate the data in a week, maybe we can merge the data in the same hour? [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch We need application level aggregation of Timeline data: - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done. - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level. - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3982) container-executor parsing of container-executor.cfg broken in trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643528#comment-14643528 ] Varun Vasudev commented on YARN-3982: - Test failure is unrelated to the patch. container-executor parsing of container-executor.cfg broken in trunk and branch-2 - Key: YARN-3982 URL: https://issues.apache.org/jira/browse/YARN-3982 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.8.0 Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Blocker Attachments: YARN-3982.001.patch After YARN-2194, the container-executor parsing of container-executor.cfg is broken. The test-container-executor binary is also failing and has been failing for quite a while. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Persist the Plan information, ie. accepted reservations to the RMStateStore for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643565#comment-14643565 ] Hadoop QA commented on YARN-3736: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 10s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 5 new checkstyle issues (total was 96, now 100). | | {color:green}+1{color} | whitespace | 1m 20s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 32s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 52m 42s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 92m 39s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747409/YARN-3736.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3e6fce9 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8682/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8682/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8682/console | This message was automatically generated. Persist the Plan information, ie. accepted reservations to the RMStateStore for failover Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch, YARN-3736.002.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643650#comment-14643650 ] Dheeren Beborrtha commented on YARN-2918: - This is a major issue and a big inconvenience. Can this be backported to Hadoop 2.6.0? Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Sharma K S Assignee: Wangda Tan Fix For: 2.8.0, 2.7.1 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643660#comment-14643660 ] Hudson commented on YARN-3846: -- FAILURE: Integrated in Hadoop-trunk-Commit #8229 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8229/]) YARN-3846. RM Web UI queue filter is not working for sub queue. Contributed by Mohammad Shahid Khan (jianhe: rev 3572ebd738aa5fa8b0906d75fb12cc6cbb991573) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0, 2.8.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Labels: PatchAvailable Fix For: 2.8.0 Attachments: YARN-3846.patch, scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643666#comment-14643666 ] Hadoop QA commented on YARN-3904: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 20s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 14s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747454/YARN-3904-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / df0ec47 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8687/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8687/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8687/console | This message was automatically generated. Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3887) Support for changing Application priority during runtime
[ https://issues.apache.org/jira/browse/YARN-3887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643701#comment-14643701 ] Jian He commented on YARN-3887: --- thanks Sunil ! some comments on the patch: - Do you plan to do client side changes as part of this jira ? - RMAppUpdatePriorityEvent - RMApp may receive this event at many other states other than RUNNING state. In that case, the state-machine will throw InvalidEventException. I think we do not need to send event to RMApp, all it does is just get the application submission context and set the priority. This can be done at clientRMService. Similarly, the event to state-store can be sent directly from clientRMService. - CapacityScheuduler#updateApplicationPriority does not need to be synchronized ? Support for changing Application priority during runtime Key: YARN-3887 URL: https://issues.apache.org/jira/browse/YARN-3887 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3887.patch After YARN-2003, adding support to change priority of an application after submission. This ticket will handle the server side implementation for same. A new RMAppEvent will be created to handle this, and will be common for all schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643736#comment-14643736 ] Bibin A Chundatt commented on YARN-3971: Test case failure is not related to this patch Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery -- Key: YARN-3971 URL: https://issues.apache.org/jira/browse/YARN-3971 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 0003-YARN-3971.patch, 0004-YARN-3971.patch Steps to reproduce # Create label x,y # Delete label x,y # Create label x,y add capacity scheduler xml for labels x and y too # Restart RM Both RM will become Standby. Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} {code} 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label java.io.IOException: Cannot remove label=x, because queue=a1 is using this label. Please remove label on queue before remove the label at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) at org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) at org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643740#comment-14643740 ] Hadoop QA commented on YARN-3978: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 41s | Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 48s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | checkstyle | 2m 4s | The applied patch generated 2 new checkstyle issues (total was 39, now 41). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 6s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 52m 35s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 98m 24s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747458/YARN-3978.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3e6fce9 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8688/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8688/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8688/console | This message was automatically generated. Configurably turn off the saving of container info in Generic AHS - Key: YARN-3978 URL: https://issues.apache.org/jira/browse/YARN-3978 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Reporter: Eric Payne Assignee: Eric Payne Attachments: YARN-3978.001.patch Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C reassigned YARN-3984: Assignee: Vrushali C Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3736) Add RMStateStore apis to store and load accepted reservations for failover
[ https://issues.apache.org/jira/browse/YARN-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643749#comment-14643749 ] Hadoop QA commented on YARN-3736: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 1m 42s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 30s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 52m 24s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 92m 9s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747459/YARN-3736.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3572ebd | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8689/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8689/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8689/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8689/console | This message was automatically generated. Add RMStateStore apis to store and load accepted reservations for failover -- Key: YARN-3736 URL: https://issues.apache.org/jira/browse/YARN-3736 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, fairscheduler, resourcemanager Reporter: Subru Krishnan Assignee: Anubhav Dhoot Attachments: YARN-3736.001.patch, YARN-3736.001.patch, YARN-3736.002.patch, YARN-3736.003.patch We need to persist the current state of the plan, i.e. the accepted ReservationAllocations corresponding RLESpareseResourceAllocations to the RMStateStore so that we can recover them on RM failover. This involves making all the reservation system data structures protobuf friendly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3984) Rethink event column key issue
[ https://issues.apache.org/jira/browse/YARN-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643758#comment-14643758 ] Vrushali C commented on YARN-3984: -- I can take this up. Please feel free to reassign or if someone else wants it, please let me know on the jira and we can redistribute the jira. To add to my previous comment, let's take an example. Say event id is KILLED and it occurs 3 times for whatever reason. Now let's say: at ts1, for key DIAGNOSTICS, the value is xyz. at ts1, for key SOMETHING ELSE, the value is something at ts2, for key DIAGNOSTICS, the value is abc at ts3, for key DIAGNOSTICS, the value is pqr at ts3, for key SOMETHING ELSE, the value is something even more where ts1 ts2 ts3. So ts3 is the most recent timestamp. Now which of the queries is the most commonly required: - for this application, what is the diagnostic message for the most recent KILLED event id? Or all of the diagnostics in KILLED id? - for this application, what is the most recent key(s) in the KILLED event id ? - for this application, what are the keys ( values) that occurred between ts2 and ts3 for KILLED event id? If we think #2 and #3 are the most commonly run queries, then we can go with timestamp before the key. If we think #1 is the most commonly run query, then we can go with key before timestamp. Now if we choose timestamp before key, then we can never pull back the value given an event and a key without fetching all keys in that event for all timestamps. If we choose key before timestamp, we cant easily pull back most recently occurred key within an event. In any case, we can't know which event was the most recent in the application. For example, in this case, INITED event record will be stored before KILLED event record since I K and hbase will sort it lexicographically. So we are interested in knowing which event itself occurred the most recent, then we need to fetch all events (along with event keys and timestamps) and sort by timestamp and then return the most recent event. Rethink event column key issue -- Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3846) RM Web UI queue filter is not working
[ https://issues.apache.org/jira/browse/YARN-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643632#comment-14643632 ] Jian He commented on YARN-3846: --- patch looks good to me , committing RM Web UI queue filter is not working - Key: YARN-3846 URL: https://issues.apache.org/jira/browse/YARN-3846 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0, 2.8.0 Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Labels: PatchAvailable Attachments: YARN-3846.patch, scheduler queue issue.png, scheduler queue positive behavior.png Click on root queue will show the complete applications But click on the leaf queue is not filtering the application related to the the clicked queue. The regular expression seems to be wrong {code} q = '^' + q.substr(q.lastIndexOf(':') + 2) + '$';, {code} For example 1. Suppose queue name is b them the above expression will try to substr at index 1 q.lastIndexOf(':') = -1 -1+2= 1 which is wrong. its should look at the 0 index. 2. if queue name is ab.x then it will parse it to .x but it should be x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2918) Don't fail RM if queue's configured labels are not existed in cluster-node-labels
[ https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2918: - Labels: 2.6.1-candidate (was: ) Don't fail RM if queue's configured labels are not existed in cluster-node-labels - Key: YARN-2918 URL: https://issues.apache.org/jira/browse/YARN-2918 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Rohith Sharma K S Assignee: Wangda Tan Labels: 2.6.1-candidate Fix For: 2.8.0, 2.7.1 Attachments: YARN-2918.1.patch, YARN-2918.2.patch, YARN-2918.3.patch Currently, if admin setup labels on queues {{queue-path.accessible-node-labels = ...}}. And the label is not added to RM, queue's initialization will fail and RM will fail too: {noformat} 2014-12-03 20:11:50,126 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager ... Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, please check. at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) {noformat} This is not a good user experience, we should stop fail RM so that admin can configure queue/labels in following steps: - Configure queue (with label) - Start RM - Add labels to RM - Submit applications Now admin has to: - Configure queue (without label) - Start RM - Add labels to RM - Refresh queue's config (with label) - Submit applications -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643727#comment-14643727 ] Hadoop QA commented on YARN-2884: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 22s | Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 25s | The applied patch generated 2 new checkstyle issues (total was 237, now 238). | | {color:red}-1{color} | checkstyle | 3m 11s | The applied patch generated 2 new checkstyle issues (total was 0, now 2). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 51s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 19s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 52m 22s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 113m 9s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12747453/YARN-2884-V3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3e6fce9 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8686/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8686/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8686/console | This message was automatically generated. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, YARN-2884-V3.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services
[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643729#comment-14643729 ] Vrushali C commented on YARN-3816: -- Yes, granularity of aggregation could drive the rollup granularity as well.. Or we could use hourly timestamps for daily as well as weekly aggregations. We have the app level timeseries in more detail and in the UI design, it should be possible to navigate from flow/user/queue to individual apps in that aggregation to see detailed timeseries trends. [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch We need application level aggregation of Timeline data: - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done. - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level. - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)