[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584366#comment-13584366 ] Hitesh Shah commented on YARN-193: -- @Zhijie, distributedshell is an example application and therefore explains how to write a good application which checks what the limits are and changes its requests accordingly. IMO, for applications which do not respect the limits, instead of reducing their defined requirements to the max value, we should throw an error as we are not sure if the app really needs that high amount of resources and whether it will actually if we reduce that amount to the max value. Does that make sense? Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-415) Capture memory utilization at the app-level for chargeback
Kendall Thrapp created YARN-415: --- Summary: Capture memory utilization at the app-level for chargeback Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Hoover updated YARN-412: -- Attachment: YARN-412.patch FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch, YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality
[ https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584473#comment-13584473 ] Hadoop QA commented on YARN-412: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570495/YARN-412.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/417//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/417//console This message is automatically generated. FifoScheduler incorrectly checking for node locality Key: YARN-412 URL: https://issues.apache.org/jira/browse/YARN-412 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Roger Hoover Priority: Minor Labels: patch Attachments: YARN-412.patch, YARN-412.patch In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455: application.getResourceRequest(priority, node.getRMNode().getNodeAddress()); Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234) In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129 application.getResourceRequest(priority, node.getHostName()); Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-376: Attachment: YARN-376.patch Patch that adds a new interface to RMNode so ResourceTrackingService can atomically get-and-clear the list of containers and applications to cleanup. Apps that have completed can appear as RUNNING on the NM UI --- Key: YARN-376 URL: https://issues.apache.org/jira/browse/YARN-376 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Attachments: YARN-376.patch On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-376: Priority: Blocker (was: Major) Increasing to Blocker as this race can lead to lost logs since NM will not aggregate the logs until it thinks the application has completed. In addition each leaked application in the NM has a corresponding log aggregation thread in the NM and eventually it will be unable to create new threads. Apps that have completed can appear as RUNNING on the NM UI --- Key: YARN-376 URL: https://issues.apache.org/jira/browse/YARN-376 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-376.patch On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584667#comment-13584667 ] Arun C Murthy commented on YARN-415: +1 for the thought! Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584679#comment-13584679 ] Hadoop QA commented on YARN-376: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570527/YARN-376.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 one of tests included doesn't have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/418//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/418//console This message is automatically generated. Apps that have completed can appear as RUNNING on the NM UI --- Key: YARN-376 URL: https://issues.apache.org/jira/browse/YARN-376 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-376.patch On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584703#comment-13584703 ] Alejandro Abdelnur commented on YARN-392: - I'd like to restate he problem. Making things a bit more high level, the end goal is for an AM to give certain hints to the RM scheduler on how it plans to use requested resources. Hints are just that, 'hints'. They may not be taken into consideration, AMs must not rely on hints to be able to work properly. It is fine if the RM scheduler ignores hints completely (because it is too busy or because it does not understand them). An RM scheduler that understands a hint may use it to make more optimal allocation decisions and may give AMs a speed boost. Another thing to keep in mind is that hints won't complicate the RM logic as the RM only involvement is passing them to the scheduler. Examples of hints are: gang scheduling, desired locality, desired multi-locality, resources fulfillment timeout, future resource allocation. I can understand the worries about going task centric, but I think the hints approach is a bit different. Being able to specify hints will enable scheduling features experimentation without requiring protocol changes. Eventually, if we find that a hint is a good feature to support at scheduler API level we may eventually add it to the protocol/API. The changes in the protocol/API would be as simple as having and extra String field in resource requests and resources allocations to indicate hints (on requests) and receive the hints taken into consideration (on allocations). Thoughts? Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584711#comment-13584711 ] Sandy Ryza commented on YARN-24: I encountered this when trying to start a NM and a namenode at the same time. The NM shut down because the namenode was in safe mode. Having the NM die in this way introduces a dependency in the order that services are started. Log aggregation is checked each time an app is run on a node, and the app is immediately killed if a log folder cannot be used for it. Thus, merely removing the NM killing itself on startup doesn't introduce any correctness issues. The worst that could happen is that time could be wasted by scheduling more containers on a node we already know has connection issues to the namenode. Attached a patch that removes the NM killing itself on startup. At initApp time, if verifyAndCreateRemoteLogDir has not been successfully completed, it is called again, and the app is failed if it fails. If initApp fails five consecutive times, the NM sets its status to unhealthy. I agree if an NM loses its ability to connect to the namenode after an app has started, it would be good for the NMs to report that they weren't able to write their logs, but my opinion is that that is a more difficult issue and does not need to be tied to this change. Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Attachments: YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-24: --- Attachment: YARN-24.patch Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Attachments: YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584738#comment-13584738 ] Andy Rhee commented on YARN-415: +1 This feature is way overdue! Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-376: Attachment: YARN-376.patch Updated patch so the test has a timeout. Apps that have completed can appear as RUNNING on the NM UI --- Key: YARN-376 URL: https://issues.apache.org/jira/browse/YARN-376 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-376.patch, YARN-376.patch On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI
[ https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584770#comment-13584770 ] Hadoop QA commented on YARN-376: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570543/YARN-376.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/420//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/420//console This message is automatically generated. Apps that have completed can appear as RUNNING on the NM UI --- Key: YARN-376 URL: https://issues.apache.org/jira/browse/YARN-376 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker Attachments: YARN-376.patch, YARN-376.patch On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: YARN-365.8.patch Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: (was: YARN-365.8.patch) Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: YARN-365.8.patch Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: YARN-365.8.patch Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-365: --- Attachment: (was: YARN-365.8.patch) Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584896#comment-13584896 ] Hadoop QA commented on YARN-365: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570570/YARN-365.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 one of tests included doesn't have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/422//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/422//console This message is automatically generated. Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler
[ https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584906#comment-13584906 ] Xuan Gong commented on YARN-365: The one test file that did not include any timeout is MockNode.java file which is under the test directory Each NM heartbeat should not generate an event for the Scheduler Key: YARN-365 URL: https://issues.apache.org/jira/browse/YARN-365 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Affects Versions: 0.23.5 Reporter: Siddharth Seth Assignee: Xuan Gong Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch Follow up from YARN-275 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-416) Limiting the Diagnostic related data on Application Overview Page
omkar vinit joshi created YARN-416: -- Summary: Limiting the Diagnostic related data on Application Overview Page Key: YARN-416 URL: https://issues.apache.org/jira/browse/YARN-416 Project: Hadoop YARN Issue Type: Bug Reporter: omkar vinit joshi On Application overview page; Diagnostic data is printed as it is without limiting the total content in it. There should be someway to control ( in terms of lines or total characters) the final diagnostic data. One way this can be done is by adding configuration parameters or by adding some control on the front end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM
[ https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585000#comment-13585000 ] Hadoop QA commented on YARN-196: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570598/YARN-196.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/423//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/423//console This message is automatically generated. Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM. --- Key: YARN-196 URL: https://issues.apache.org/jira/browse/YARN-196 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Ramgopal N Assignee: Xuan Gong Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch If NM is started before starting the RM ,NM is shutting down with the following error {code} ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145) ... 3 more Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131) at $Proxy23.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 5 more Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857) at org.apache.hadoop.ipc.Client.call(Client.java:1141) at org.apache.hadoop.ipc.Client.call(Client.java:1100) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128) ... 7 more Caused by: java.net.ConnectException: Connection refused at