[jira] [Commented] (YARN-4029) Update LogAggregationStatus to store on finish
[ https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696649#comment-14696649 ] Sunil G commented on YARN-4029: --- Ah! By mistake I assigned in my name. Reassigned to [~bibinchundatt] Thank You. Update LogAggregationStatus to store on finish -- Key: YARN-4029 URL: https://issues.apache.org/jira/browse/YARN-4029 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: Image.jpg Currently the log aggregation status is not getting updated to Store. When RM is restarted will show NOT_START. Steps to reproduce 1.Submit mapreduce application 2.Wait for completion 3.Once application is completed switch RM *Log Aggregation Status* are changing *Log Aggregation Status* from SUCCESS to NOT_START -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4029) Update LogAggregationStatus to store on finish
[ https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4029: -- Assignee: Bibin A Chundatt (was: Sunil G) Update LogAggregationStatus to store on finish -- Key: YARN-4029 URL: https://issues.apache.org/jira/browse/YARN-4029 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: Image.jpg Currently the log aggregation status is not getting updated to Store. When RM is restarted will show NOT_START. Steps to reproduce 1.Submit mapreduce application 2.Wait for completion 3.Once application is completed switch RM *Log Aggregation Status* are changing *Log Aggregation Status* from SUCCESS to NOT_START -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696594#comment-14696594 ] Sunil G commented on YARN-4014: --- Hi [~rohithsharma] Thank you. Overall the patch looks good. Some minor nits - In ApplicationClientProtocolPBServiceImpl, you may try like below {code} catch (YarnException | IOException e) { throw new ServiceException(e); } {code} - In ClientRMService {code} +if (EnumSet.of(RMAppState.NEW, RMAppState.NEW_SAVING, RMAppState.FAILED, +RMAppState.FINAL_SAVING, RMAppState.FINISHING, RMAppState.FINISHED, +RMAppState.KILLED, RMAppState.KILLING, RMAppState.FAILED).contains( +application.getState())) {code} Could we have a lookup method for this rather checking it directly. May be other apis can use this. - In testUpdateApplicationPriorityRequest, Could we pass an invalid AppID and check that error condition also. - While printing message {code} +pw.println( -appId Application ID ApplicationId can be used with any other); +pw.println( sub commands in future. Currently it is); +pw.println( used along only with -set-priority); {code} -set-priority can be changed to -updatePriority Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697063#comment-14697063 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697043#comment-14697043 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697038#comment-14697038 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697040#comment-14697040 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697061#comment-14697061 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697066#comment-14697066 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697118#comment-14697118 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697123#comment-14697123 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697120#comment-14697120 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697142#comment-14697142 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697140#comment-14697140 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697145#comment-14697145 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697314#comment-14697314 ] Ming Ma commented on YARN-221: -- The unit test failures aren't related. The tests pass on the local machine. Another thing Xuan and I discussed is how other frameworks on YARN such as MR, Tez can use this feature; for example if they need to make config and/or code change to allow framework applications specify the policy at per application basis. There are several approaches. * Have MR define its own configurations to config these policies. Make code change at YarnRunner to retrieve these configurations and set the values at ASC. That means Tez needs to do the same thing. * Define some common YARN configurations such as yarn.logaggregation.policy.class. YarnRunner still needs to retrieve these configurations and set the values at ASC. But at least MR and Tez can share the same configuration names. * Define some common YARN configurations such as yarn.logaggregation.policy.class. YarnClientImpl take care of fixing up ASC based on the configurations. In that way, no code change is required at the MR or Tez layer. Eventually, we prefer to go with the first approach, which is used by other existing MR properties. If we want to define some common YARN properties used by different YARN applications, we can have a separate jira for it. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697369#comment-14697369 ] Karthik Kambatla commented on YARN-1680: Any updates here? We would like to get one of this or YARN-3446 in soon. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Sharma K S Assignee: Tan, Wangda Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697449#comment-14697449 ] Varun Saxena commented on YARN-4053: Also for floating point metrics, query can be in integral form. This can create issues too. We should clearly document that query should also be in decimal representation for such metrics. That is, checking for condition like m1 40 should be mean query from client should have filter as {{m1 40.0}} in REST API. So that its interpreted as a floating point number Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4024: - Issue Type: Improvement (was: Bug) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697531#comment-14697531 ] Vrushali C commented on YARN-4025: -- Hmm, yes I think some more comments there might help (I should have included them in the earlier patch) Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697549#comment-14697549 ] Vrushali C commented on YARN-3904: -- A couple of more things that came to mind. We need not change the patch for just these, but wanted to say what's on my mind. - Do we want to provide a dropTable api ? I think we should not. In production situation, this can be a costly mistake if someone is testing their code on the cluster. A drop table should be a very manual command so that one is aware that they are running it. - Are the '?' and ',' special characters in this line? Is so, we dont have to change this right now, but maybe next time this code is being looked at, could we make it into a constant {code} String sql = UPSERT INTO + info.getTableName() + ( + StringUtils.join(info.getPrimaryKeyList(), ,) + , created_time, modified_time, metric_names) + VALUES ( + StringUtils.repeat(?,, info.getPrimaryKeyList().length) + ?, ?, ?); {code} The patch looks good overall. thanks [~gtCarrera9] Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697556#comment-14697556 ] Varun Saxena commented on YARN-4053: bq. What kind of metrics do you have in mind that will have floating point numbers ? There was some plan for reporting some cluster level metrics in future too, few of them would be floating point as well. Refer to json in YARN-3881 Also I remember some discussion during aggregation design regarding storing averages. Are we planning to calculate them on the fly instead ? Moreover,TimelineMetric stores metric value as a {{java.lang.Number}}. This means we are saying metric can store a floating point value as well. As we have no control over systems outside YARN(say Tez), if they use ATS and publish a metric of floating type, I guess we should be able to handle it. Thoughts ? If it has been decided that metrics can only be integral values, then its fine. Wont have to take care of it then. Let me know. Also, another key point we need to decide is that do we only support values till signed longs(8 bytes) ? Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697534#comment-14697534 ] Vrushali C commented on YARN-4025: -- I changed it from '?' to '='. Sangjin was also wondering if we should change it or no (read earlier comments in the jira). I think it might be good to change it now since '?' is a wild card character and using a non wild card character helps in easier reading while testing and for the reader code as well. It's a very small change, so thought another jira for this was an overkill. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697571#comment-14697571 ] Varun Saxena commented on YARN-4053: Tez may not be publishing any floating point metric as of now. I am not too sure about what all they publish. So probably there is no use case as of now. But if we do not support floating point numbers, then we should clearly document that we will only support integral values. And do the conversion in writer if any floating point value comes. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697537#comment-14697537 ] Vrushali C commented on YARN-3904: -- A very minor comment.. I think there is a typo in PHEONIX_OFFLINE_STORAGE_CONN_STR_DEFAULT variable name in YarnConfiguration.java Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697545#comment-14697545 ] Li Lu commented on YARN-4025: - Oh sorry I missed that line... That looks fine. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697654#comment-14697654 ] Varun Saxena commented on YARN-3862: [~gtCarrera9], these 2 JIRAs' were raised separately to address following areas : # Enhance already supported filters (YARN-3863) to filter out rows of data. By adding support for OR in addition to AND and relational ops for metrics. Scope for this JIRA is pretty clear. # Restrict the amount of data retrieved(from columns) in this JIRA. In this JIRA, we actually wanted to have a discussion on what all we need to support. Regex, prefix match , etc. Also whether we want to retrieve metrics by time windows as well. I am open to realigning these JIRAs' and distributing the work along the lines of the workflow you mentioned above. My only concern with deciding with a filter object model though will be that we may take a lot of time deciding it to cover all the scenarios. Because support for additional filters may come up during further discussion. Let's do as per whatever is the consensus. Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697583#comment-14697583 ] Vrushali C commented on YARN-4053: -- Hmm good points. I think all metrics should be stored of the same type else we have to deal with knowing which metric is of which type and would need to store metadata to know how to read it back. Storing it as an ascii value is not good, we need to be able to query for things like less than greater than etc. My vote is for going with Longs for all metrics right now unless there is a very strong use case where only decimals will do. We truncate (cast down) decimals to long if we receive any, so 99.9 means 99. I realize this is restrictive but my thinking is that instead of trying to do everything for this current ATS release, let's go with Longs and see if we really need decimal precision. If we do, we can revisit and modify to accept more data types. cc [~jrottinghuis] Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697663#comment-14697663 ] Varun Saxena commented on YARN-3862: bq. My feeling is that the concept of timeline filter may become a part of our object model, so that client users can easily communicate? Do we want to expose it to the client ? Suggestion sounds good. That wasnt the plan but if everyone agrees, lets have it that way. bq. are we treating our timeline filters as pure-data objects (models) Yes I am as of now treating them as pure data objects. That is why instead of using polymorphism and converting the filter to HBase Filter by providing a method for conversion in the filter class(es), I kept the conversion in util class. The intention was to decouple filters from storage implementation. bq. is it easy, or possible, for us to implement a paging filter? Will look into it. Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697667#comment-14697667 ] Varun Saxena commented on YARN-3862: BTW, did not upload a WIP patch for YARN-3863 due to issue raised in YARN-4053 Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697341#comment-14697341 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], the DNS cache is a global parameter for a JVM, correct? IMHO, we shouldn't use the global parameter, because RM may need to get latest IP address from DNS for other purpose. For example, RM needs to get latest address when NMs are registering (and also reconnect), but it may not need it when NMs is running. Thoughts? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4029) Update LogAggregationStatus to store on finish
[ https://issues.apache.org/jira/browse/YARN-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4029: --- Attachment: 0001-YARN-4029.patch Attaching initial patch. Please do review. Update LogAggregationStatus to store on finish -- Key: YARN-4029 URL: https://issues.apache.org/jira/browse/YARN-4029 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: 0001-YARN-4029.patch, Image.jpg Currently the log aggregation status is not getting updated to Store. When RM is restarted will show NOT_START. Steps to reproduce 1.Submit mapreduce application 2.Wait for completion 3.Once application is completed switch RM *Log Aggregation Status* are changing *Log Aggregation Status* from SUCCESS to NOT_START -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4053: --- Description: Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. was: Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697443#comment-14697443 ] Varun Saxena commented on YARN-4053: Storing metric values(which are numbers) as string is fine if we want to check them for equality. But we have to support all relational operations for metrics. And that is where string representation doesnt work. This is because in HBase, filters currently use lexicographical comparison. This means that with current mechanism to store metric values, a value of 4000 will be judged as smaller than 60. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4053) Change the way metric values are stored in HBase Storage
Varun Saxena created YARN-4053: -- Summary: Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697500#comment-14697500 ] Vrushali C commented on YARN-4053: -- I think metric values should be stored (and read back) as Longs. What kind of metrics do you have in mind that will have floating point numbers? Any percentages that we want to store? I don't think we really need that level of precision. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697445#comment-14697445 ] Varun Saxena commented on YARN-4053: So to resolve this we need some other way of storing metric values. Options are as under : # Keep the current way of storing metric values. And write a custom filter to match the values. But this would need the new filter to be deployed on all region servers. This solution hence may not be feasible. But if we do not want to do this, for lexicographic comparison to work, sizes of bytes compared should be equal. # Store values as primitive types. That is, long as 8 bytes, integer as 4 bytes and so on. But this can create problems in lexicographical comparison too. Say metric m1 is stored as long. But a query to reader might be of the form {{m1 4}}. As 4 will be interpreted as Integer, we will try to compare 4 bytes against 8 bytes. So the solution for this is to store every integral value as long(8 bytes) and floating point values as double. Same approach can be used while matching at reader side. # But above solution may not work if we want to support BigInteger and BigDecimal values(i.e. numerical values 8 bytes). Although 8 bytes should be enough but aggregated values may exceed 8 bytes. In this case, we can probably decide values upto how many bytes do we need to support. 16 bytes, for that matter even 12 bytes should be more than enough for all realistic scenarios. While encoding we can do padding with zeroes in front if number is less than 16 bytes. # Another option can be to continue supporting string representation and restrict max number of digits we want to support before and after decimal point. Say 30 digits before decimal point and 3 after. We can pad rest of the bytes with zeroes while storing so that comparison can be done. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697411#comment-14697411 ] Wangda Tan commented on YARN-1680: -- [~kasha], sorry I don't have a chance to take this, unassigning myself. I suggest we can finish MAPREDUCE-6302 (I think approach of MAPREDUCE-6302 looks good to me) to resolve such deadlock issues. AvailableResource calculation can be improved after that. Thoughts? availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Sharma K S Assignee: Tan, Wangda Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1680: - Assignee: (was: Tan, Wangda) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Sharma K S Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697454#comment-14697454 ] Varun Saxena commented on YARN-4053: cc [~sjlee0], [~djp], [~zjshen], [~vinodkv]. Thoughts ? Will implement one of the options above depending on whatever is the consensus. Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper is used to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697816#comment-14697816 ] Vrushali C commented on YARN-3904: -- +1 yes we can move ahead. I am quite curious, how is the accessibility being restricted? The method has no specifier so that means it is package level visible, no? Also, the annotations of @private and @VisibleForTesting are only annotations, they don't really affect the private/public accessibility of the function. Or am I mistaken? But that said, let's go ahead with the patch, my question is only for discussion purpose. Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697840#comment-14697840 ] Li Lu commented on YARN-3862: - bq. That is why instead of using polymorphism and converting the filter to HBase Filter by providing a method for conversion in the filter class(es), I kept the conversion in util class. The intention was to decouple filters from storage implementation. I agree with this approach. Meanwhile we may also want to restrict the range of the util class. Instead of making them in TimelineReaderUtils, feel free to add something like HBaseFilterConverter to model the filter conversion logic. Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697967#comment-14697967 ] Naganarasimha G R commented on YARN-4053: - [~vrushalic] how about double ? I feel it would be the better as it too takes the same size of long (8 bytes) and supports decimals too ? Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697968#comment-14697968 ] Naganarasimha G R commented on YARN-4053: - [~vrushalic] how about double ? I feel it would be the better as it too takes the same size of long (8 bytes) and supports decimals too ? Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3904: Attachment: YARN-3904-YARN-2928.009.patch Fixed the typo raised by [~vrushalic]. Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697761#comment-14697761 ] Jian He commented on YARN-4014: --- some comments on my side: - updateApplicationPriority has two RPC calls, one to get the appReport the other to update priority. Can we make it one call ? we can make updateApplicationPriority throw an ApplicationNotRunningException and let client catch the exception and prints “Application not running “ msg. - I missed two things in YARN-3887, would you mind fixing those here ? -- the updateApplicationStateSynchronously should not send the APP_UPDATE_SAVED events and so RMAppImpl should not need handle this event as changed this patch. -- CapacityScheduler#updateApplicationPriority should not be synchronized. it’ll cause problem if hold capacity scheduler lock while accessing state-store. Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697845#comment-14697845 ] Li Lu commented on YARN-3862: - Oh and, BTW, I thinks it's pretty much fine on the code side, so please feel free to proceed this JIRA as planed. Thanks! Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-7.patch Merging to trunk with the newest resource monitoring structure CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: BB2015-05-TBR, containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697970#comment-14697970 ] Hadoop QA commented on YARN-3458: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 19s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 26s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 1m 56s | Tests failed in hadoop-yarn-common. | | | | 39m 3s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750623/YARN-3458-7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / dc7a061 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8847/artifact/patchprocess/whitespace.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8847/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8847/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8847/console | This message was automatically generated. CPU resource monitoring in Windows -- Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Assignee: Inigo Goiri Priority: Minor Labels: BB2015-05-TBR, containers, metrics, windows Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch Original Estimate: 168h Remaining Estimate: 168h The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1469#comment-1469 ] Li Lu commented on YARN-4053: - Hi [~varun_saxena], I agree this is a valid issue. Before we get deep involvement into this issue, I'm wondering if this is blocking any of our ongoing tasks to finish our planned POC of the reader and web UI? Change the way metric values are stored in HBase Storage Key: YARN-4053 URL: https://issues.apache.org/jira/browse/YARN-4053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Currently HBase implementation uses GenericObjectMapper to convert and store values in backend HBase storage. This converts everything into a string representation(ASCII/UTF-8 encoded byte array). While this is fine in most cases, it does not quite serve our use case for metrics. So we need to decide how are we going to encode and decode metric values and store them in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697818#comment-14697818 ] Hadoop QA commented on YARN-3904: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 7s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 51s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 11s | The applied patch generated 1 new checkstyle issues (total was 214, now 214). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 20s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 25s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 43m 0s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750594/YARN-3904-YARN-2928.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / f40c735 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8846/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8846/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8846/console | This message was automatically generated. Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697758#comment-14697758 ] Li Lu commented on YARN-3904: - Thanks [~vrushalic]! I agree we should not make a public dropTable api. Actually in my code I'm restricting the accessibility of this method to test only. About the special characters, the comma and question marks are used for prepared SQL statements in JDBC, which should be quite stable by now. But I agree that we should clean up the sql statements when we touch this part in future. For now, if it's fine with all of us, maybe we can put this in and move forward with the offline aggregation implementations? Thanks! Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead
[ https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697801#comment-14697801 ] Jian He commented on YARN-3986: --- the proposal makes sense to me, thanks ! getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead -- Key: YARN-3986 URL: https://issues.apache.org/jira/browse/YARN-3986 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3986.01.patch, YARN-3986.02.patch Currently getTransferredContainers is present in {{AbstractYarnScheduler}}. *But in ApplicationMasterService, while registering AM, we are calling this method by typecasting it to AbstractYarnScheduler, which is incorrect.* This method should be moved to YarnScheduler. Because if a custom scheduler is to be added, it will implement YarnScheduler, not AbstractYarnScheduler. As ApplicationMasterService is calling getTransferredContainers by typecasting it to AbstractYarnScheduler, it is imposing an indirect dependency on AbstractYarnScheduler for any pluggable custom scheduler. We can move the method to YarnScheduler and leave the definition in AbstractYarnScheduler as it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697952#comment-14697952 ] Naganarasimha G R commented on YARN-3045: - [~djp] [~sjlee0], Seems like patch seems to be failing on the new YARN-2928 branch... will rebase and upload new one. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Refactor timelineservice.storage to add support to online and offline aggregation writers
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697824#comment-14697824 ] Li Lu commented on YARN-3904: - Oh right now the test is using this utility method so it has to be default. We're adding the annotations to avoid adding it to any public javadocs or API lists. This is also an agreement among the reviewers. I agree it's not quite enough, and I'm considering moving this dangerous part to test component in the offline aggregator JIRA. Refactor timelineservice.storage to add support to online and offline aggregation writers - Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch, YARN-3904-YARN-2928.004.patch, YARN-3904-YARN-2928.005.patch, YARN-3904-YARN-2928.006.patch, YARN-3904-YARN-2928.007.patch, YARN-3904-YARN-2928.008.patch, YARN-3904-YARN-2928.009.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. In this JIRA, I'm proposing to refactor writers to add support to aggregation writers. Offline aggregation writers typically has less contextual information. We can distinguish these writers by special naming. We can also use CollectorContexts to model all contextual information and use it in our writer interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Decide which contents to retrieve and send back in response in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697829#comment-14697829 ] Li Lu commented on YARN-3862: - I'm not worrying about having a filter object model will slow down everything. Sure, we may not cover everything in the first draft, or even in the first JIRA. However, if we know we're on the right track we're making progress. If we realize any use case limitations we can always fix them later, but at this early stage let's first have the right framework and get our planned goals done. Decide which contents to retrieve and send back in response in TimelineReader - Key: YARN-3862 URL: https://issues.apache.org/jira/browse/YARN-3862 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3862-YARN-2928.wip.01.patch Currently, we will retrieve all the contents of the field if that field is specified in the query API. In case of configs and metrics, this can become a lot of data even though the user doesn't need it. So we need to provide a way to query only a set of configs or metrics. As a comma spearated list of configs/metrics to be returned will be quite cumbersome to specify, we have to support either of the following options : # Prefix match # Regex # Group the configs/metrics and query that group. We also need a facility to specify a metric time window to return metrics in a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698126#comment-14698126 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 21s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 58s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 49s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 9m 20s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 9s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 57m 28s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750640/YARN-3045-YARN-2928.010.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / f40c735 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8848/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8848/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3045: Attachment: YARN-3045-YARN-2928.010.patch rebased the patch [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696825#comment-14696825 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696823#comment-14696823 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696828#comment-14696828 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696842#comment-14696842 ] Hudson commented on YARN-4047: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/]) YARN-4047. ClientRMService getApplications has high scheduler lock contention. Contributed by Jason Lowe (jianhe: rev 7a445fcfabcf9c6aae219051f65d3f6cb8feb87c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Fix For: 2.7.2 Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3987) am container complete msg ack to NM once RM receive it
[ https://issues.apache.org/jira/browse/YARN-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696839#comment-14696839 ] Hudson commented on YARN-3987: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/]) YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt am container complete msg ack to NM once RM receive it -- Key: YARN-3987 URL: https://issues.apache.org/jira/browse/YARN-3987 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: sandflee Assignee: sandflee Fix For: 2.8.0 Attachments: YARN-3987.001.patch, YARN-3987.002.patch In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4005) Completed container whose app is finished is not removed from NMStateStore
[ https://issues.apache.org/jira/browse/YARN-4005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696837#comment-14696837 ] Hudson commented on YARN-4005: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/]) YARN-4005. Completed container whose app is finished is possibly not removed from NMStateStore. Contributed by Jun Gong (jianhe: rev 38aed1a94ed7b6da62e2445b5610bc02b1cddeeb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java Completed container whose app is finished is not removed from NMStateStore -- Key: YARN-4005 URL: https://issues.apache.org/jira/browse/YARN-4005 Project: Hadoop YARN Issue Type: Bug Reporter: Jun Gong Assignee: Jun Gong Fix For: 2.8.0 Attachments: YARN-4005.01.patch If a container is completed and its corresponding app is finished, NM only removes it from its context and does not add it to 'recentlyStoppedContainers' when calling 'getContainerStatuses'. Then NM will not remove it from NMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696957#comment-14696957 ] Eric Payne commented on YARN-4014: -- {code} +pw.println( -appId Application ID ApplicationId can be used with any other); +pw.println( sub commands in future. Currently it is); +pw.println( used along only with -set-priority); ... + ApplicationId can be used with any other sub commands in future. + + Currently it is used along only with -set-priority); {code} This is a minor point, but in these 2 places, I would simply state something like the following: {{ID of the affected application.}} That way, when it is used in the future by other switches, the developer doesn't have to remember to change these statements. Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)