[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777171#comment-13777171 ] Hadoop QA commented on YARN-311: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604962/YARN-311-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2012//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2012//console This message is automatically generated. > Dynamic node resource configuration: core scheduler changes > --- > > Key: YARN-311 > URL: https://issues.apache.org/jira/browse/YARN-311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, > YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, > YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch > > > As the first step, we go for resource change on RM side and expose admin APIs > (admin protocol, CLI, REST and JMX API) later. In this jira, we will only > contain changes in scheduler. > The flow to update node's resource and awareness in resource scheduling is: > 1. Resource update is through admin API to RM and take effect on RMNodeImpl. > 2. When next NM heartbeat for updating status comes, the RMNode's resource > change will be aware and the delta resource is added to schedulerNode's > availableResource before actual scheduling happens. > 3. Scheduler do resource allocation according to new availableResource in > SchedulerNode. > For more design details, please refer proposal and discussions in parent > JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1235) Regulate the case of applicationType
Zhijie Shen created YARN-1235: - Summary: Regulate the case of applicationType Key: YARN-1235 URL: https://issues.apache.org/jira/browse/YARN-1235 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-1001, when filtering applications, we ignore the case of the applicationType. However, RMClientService#getApplications doesn't. Moreover, it is not documented that ApplicationClientProtocol ignores the case of applicationType or not. IMHO, we need to do: 1. Modify RMClientService#getApplications to ignore the case of applicationType when filtering applications 2. Add javadoc in ApplicationClientProtocol#submitApplication and getApplications to say that applicationType is case insensitive 3. Probably, when submitApplication, we'd like to "normalize" the applicationType to the lower case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-311: Attachment: YARN-311-v7.patch Sync up patch with Trunk in v7 patch. > Dynamic node resource configuration: core scheduler changes > --- > > Key: YARN-311 > URL: https://issues.apache.org/jira/browse/YARN-311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, > YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, > YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch > > > As the first step, we go for resource change on RM side and expose admin APIs > (admin protocol, CLI, REST and JMX API) later. In this jira, we will only > contain changes in scheduler. > The flow to update node's resource and awareness in resource scheduling is: > 1. Resource update is through admin API to RM and take effect on RMNodeImpl. > 2. When next NM heartbeat for updating status comes, the RMNode's resource > change will be aware and the delta resource is added to schedulerNode's > availableResource before actual scheduling happens. > 3. Scheduler do resource allocation according to new availableResource in > SchedulerNode. > For more design details, please refer proposal and discussions in parent > JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K reassigned YARN-1028: --- Assignee: Karthik Kambatla (was: Devaraj K) > Add FailoverProxyProvider like capability to RMProxy > > > Key: YARN-1028 > URL: https://issues.apache.org/jira/browse/YARN-1028 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Karthik Kambatla > > RMProxy layer currently abstracts RM discovery and implements it by looking > up service information from configuration. Motivated by HDFS and using > existing classes from Common, we can add failover proxy providers that may > provide RM discovery in extensible ways. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1215) Yarn URL should include userinfo
[ https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777124#comment-13777124 ] Hadoop QA commented on YARN-1215: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604945/YARN-1215-trunk.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2011//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2011//console This message is automatically generated. > Yarn URL should include userinfo > > > Key: YARN-1215 > URL: https://issues.apache.org/jira/browse/YARN-1215 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch > > > In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an > userinfo as part of the URL. When converting a {{java.net.URI}} object into > the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will > set uri host as the url host. If the uri has a userinfo part, the userinfo is > discarded. This will lead to information loss if the original uri has the > userinfo, e.g. foo://username:passw...@example.com will be converted to > foo://example.com and username/password information is lost during the > conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777082#comment-13777082 ] Ravi Prakash commented on YARN-90: -- Hi nijel! Welcome to the community and thanks for your contribution. A few comments: 1. Nit: Some lines are over 80 characters long. 2. numFailures is never incremented any more when the directory fails. Thus getNumFailures() would return the wrong result. Could you please also tell us how you tested the patch? There seem to be a lot of unit tests which use LocalDirsHandlerService. Did you run them all and ensure that they still all pass? Thanks again > NodeManager should identify failed disks becoming good back again > - > > Key: YARN-90 > URL: https://issues.apache.org/jira/browse/YARN-90 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ravi Gummadi > Attachments: YARN-90.patch > > > MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes > down, it is marked as failed forever. To reuse that disk (after it becomes > good), NodeManager needs restart. This JIRA is to improve NodeManager to > reuse good disks(which could be bad some time back). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables
[ https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1128: - Hadoop Flags: Reviewed Committed to trunk, branch-2, and branch-2.1-beta > FifoPolicy.computeShares throws NPE on empty list of Schedulables > - > > Key: YARN-1128 > URL: https://issues.apache.org/jira/browse/YARN-1128 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > Fix For: 2.1.2-beta > > Attachments: yarn-1128-1.patch > > > FifoPolicy gives all of a queue's share to the earliest-scheduled application. > {code} > Schedulable earliest = null; > for (Schedulable schedulable : schedulables) { > if (earliest == null || > schedulable.getStartTime() < earliest.getStartTime()) { > earliest = schedulable; > } > } > earliest.setFairShare(Resources.clone(totalResources)); > {code} > If the queue has no schedulables in it, earliest will be left null, leading > to an NPE on the last line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777071#comment-13777071 ] Sandy Ryza commented on YARN-1089: -- As was requested, I posted a summary of the proposal on YARN-1024. In case it's not clear on the summary, here's the problem we're trying to solve: We want jobs to be portable between clusters. CPU is not a fluid resource in the way memory is. The number of cores on a machine is just as important its total processing power when scheduling tasks. Imagine a cluster where every node has powerful CPUs with many cores. One type of task that will be run on the cluster saturates a full CPU, but another type of task that will be run on the cluster contains two threads, each which can saturate only half a full CPU. If we have a single dimension for CPU requests, these tasks will request an equal number of those. What happens if we then move those tasks to a cluster with CPUs whose cores are half as fast? The first task will run half as fast, and the second task will run in the same amount of time. It's in the first task's interest to only request half as many CPU resources on that cluster. I'm also afraid of things getting complicated, but I can't think of anything better that doesn't require having the meaning of a virtual core vary widely from cluster to cluster. > Add YARN compute units alongside virtual cores > -- > > Key: YARN-1089 > URL: https://issues.apache.org/jira/browse/YARN-1089 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1089-1.patch, YARN-1089.patch > > > Based on discussion in YARN-1024, we will add YARN compute units as a > resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777069#comment-13777069 ] Hadoop QA commented on YARN-899: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604938/YARN-899.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2010//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2010//console This message is automatically generated. > Get queue administration ACLs working > - > > Key: YARN-899 > URL: https://issues.apache.org/jira/browse/YARN-899 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Xuan Gong > Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, > YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, > YARN-899.7.patch, YARN-899.8.patch > > > The Capacity Scheduler documents the > yarn.scheduler.capacity.root..acl_administer_queue config option > for controlling who can administer a queue, but it is not hooked up to > anything. The Fair Scheduler could make use of a similar option as well. > This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777068#comment-13777068 ] Junping Du commented on YARN-311: - Hi [~acmurthy], it has been a while. Any chance to review this patch? > Dynamic node resource configuration: core scheduler changes > --- > > Key: YARN-311 > URL: https://issues.apache.org/jira/browse/YARN-311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, > YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, > YARN-311-v6.2.patch, YARN-311-v6.patch > > > As the first step, we go for resource change on RM side and expose admin APIs > (admin protocol, CLI, REST and JMX API) later. In this jira, we will only > contain changes in scheduler. > The flow to update node's resource and awareness in resource scheduling is: > 1. Resource update is through admin API to RM and take effect on RMNodeImpl. > 2. When next NM heartbeat for updating status comes, the RMNode's resource > change will be aware and the delta resource is added to schedulerNode's > availableResource before actual scheduling happens. > 3. Scheduler do resource allocation according to new availableResource in > SchedulerNode. > For more design details, please refer proposal and discussions in parent > JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1215) Yarn URL should include userinfo
[ https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-1215: Attachment: YARN-1215-trunk.2.patch Attach a new patch that adds a userInfo field for org.apache.hadoop.yarn.api.records.URL. This appends an optional filed to existing .proto file. This is allowed according to compatibility guide at: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility > Yarn URL should include userinfo > > > Key: YARN-1215 > URL: https://issues.apache.org/jira/browse/YARN-1215 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Chuan Liu >Assignee: Chuan Liu > Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch > > > In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an > userinfo as part of the URL. When converting a {{java.net.URI}} object into > the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will > set uri host as the url host. If the uri has a userinfo part, the userinfo is > discarded. This will lead to information loss if the original uri has the > userinfo, e.g. foo://username:passw...@example.com will be converted to > foo://example.com and username/password information is lost during the > conversion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777056#comment-13777056 ] Hudson commented on YARN-1214: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4464 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4464/]) YARN-1214. Register ClientToken MasterKey in SecretManager after it is saved (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1526078) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenSecretManagerInRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java > Register ClientToken MasterKey in SecretManager after it is saved > - > > Key: YARN-1214 > URL: https://issues.apache.org/jira/browse/YARN-1214 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Critical > Fix For: 2.1.2-beta > > Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, > YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch > > > Currently, app attempt ClientToken master key is registered before it is > saved. This can cause problem that before the master key is saved, client > gets the token and RM also crashes, RM cannot reloads the master key back > after it restarts as it is not saved. As a result, client is holding an > invalid token. > We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777051#comment-13777051 ] Bikas Saha commented on YARN-1089: -- At this point, I am not seeing the benefit of creating yet another cpu related configuration. While I am not against useful configurations, its already hard to configure YARN. Like Vinod and others said, can a summary of the discussions made elsewhere be placed here. > Add YARN compute units alongside virtual cores > -- > > Key: YARN-1089 > URL: https://issues.apache.org/jira/browse/YARN-1089 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1089-1.patch, YARN-1089.patch > > > Based on discussion in YARN-1024, we will add YARN compute units as a > resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable
[ https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777044#comment-13777044 ] Jian He commented on YARN-674: -- Is this related to ClientRMService.renewDelegationToken this method? > Slow or failing DelegationToken renewals on submission itself make RM > unavailable > - > > Key: YARN-674 > URL: https://issues.apache.org/jira/browse/YARN-674 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > This was caused by YARN-280. A slow or a down NameNode for will make it look > like RM is unavailable as it may run out of RPC handlers due to blocked > client submissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777034#comment-13777034 ] Sandy Ryza commented on YARN-1089: -- I'm ok with with waiting until 2.3. In case it's not clear, the consequence of this is that until then it will be impossible to place more tasks on a node than its number of virtual cores, which is essentially its number of physical cores. I think we should make YARN-976, documenting the meaning of vcores, a blocker for 2.2. > Add YARN compute units alongside virtual cores > -- > > Key: YARN-1089 > URL: https://issues.apache.org/jira/browse/YARN-1089 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1089-1.patch, YARN-1089.patch > > > Based on discussion in YARN-1024, we will add YARN compute units as a > resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-899: --- Attachment: YARN-899.8.patch > Get queue administration ACLs working > - > > Key: YARN-899 > URL: https://issues.apache.org/jira/browse/YARN-899 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Xuan Gong > Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, > YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, > YARN-899.7.patch, YARN-899.8.patch > > > The Capacity Scheduler documents the > yarn.scheduler.capacity.root..acl_administer_queue config option > for controlling who can administer a queue, but it is not hooked up to > anything. The Fair Scheduler could make use of a similar option as well. > This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-899: --- Attachment: YARN-899.7.patch create patch based on the latest trunk > Get queue administration ACLs working > - > > Key: YARN-899 > URL: https://issues.apache.org/jira/browse/YARN-899 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Xuan Gong > Attachments: YARN-899.1.patch, YARN-899.2.patch, YARN-899.3.patch, > YARN-899.4.patch, YARN-899.5.patch, YARN-899.5.patch, YARN-899.6.patch, > YARN-899.7.patch > > > The Capacity Scheduler documents the > yarn.scheduler.capacity.root..acl_administer_queue config option > for controlling who can administer a queue, but it is not hooked up to > anything. The Fair Scheduler could make use of a similar option as well. > This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777023#comment-13777023 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604922/YARN-1229.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2008//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2008//console This message is automatically generated. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777021#comment-13777021 ] Hadoop QA commented on YARN-1232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604931/yarn-1232-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.client.TestGetGroups org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.conf.TestYarnConfiguration org.apache.hadoop.yarn.logaggregation.TestLogDumper org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue org.apache.hadoop.yarn.server.resourcemanager.recovery.TestRMStateStore org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueParsing org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestChildQueueOrder org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.TestRMNodeTransitions org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerEventLog org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.applicationmasterservice.TestApplicationMasterService org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestParentQueue org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestQueueMetrics org.apache.hadoop.yarn.server.resource
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777009#comment-13777009 ] Hudson commented on YARN-1229: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4463 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4463/]) YARN-1229. Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle. Contributed by Xuan Gong. (sseth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1526065) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/SingleCluster.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/INSTALL * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestAuxServices.java > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1234: Fix Version/s: 2.1.2-beta > Container localizer logs are not created in secured cluster > > > Key: YARN-1234 > URL: https://issues.apache.org/jira/browse/YARN-1234 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.2-beta > > > When we are running ContainerLocalizer in secured cluster we potentially are > not creating any log file to track log messages. This will be helpful in > potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-1234: Component/s: nodemanager > Container localizer logs are not created in secured cluster > > > Key: YARN-1234 > URL: https://issues.apache.org/jira/browse/YARN-1234 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Fix For: 2.1.2-beta > > > When we are running ContainerLocalizer in secured cluster we potentially are > not creating any log file to track log messages. This will be helpful in > potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-1234) Container localizer logs are not created in secured cluster
[ https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi moved MAPREDUCE-5532 to YARN-1234: Key: YARN-1234 (was: MAPREDUCE-5532) Project: Hadoop YARN (was: Hadoop Map/Reduce) > Container localizer logs are not created in secured cluster > > > Key: YARN-1234 > URL: https://issues.apache.org/jira/browse/YARN-1234 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > When we are running ContainerLocalizer in secured cluster we potentially are > not creating any log file to track log messages. This will be helpful in > potentially identifying ContainerLocalization issues in secured cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-2.patch Patch that adds descriptions and tests HAUtil, and to be applied on trunk. > Configuration support for RM HA > --- > > Key: YARN-1232 > URL: https://issues.apache.org/jira/browse/YARN-1232 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1232-1.patch, yarn-1232-2.patch > > > We should augment the configuration to allow users specify two RMs and the > individual RPC addresses for them. This blocks > ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776979#comment-13776979 ] Siddharth Seth commented on YARN-1229: -- +1. Committing. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776975#comment-13776975 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604917/YARN-1229.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2007//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2007//console This message is automatically generated. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.6.patch fix documentation > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776945#comment-13776945 ] Siddharth Seth commented on YARN-1229: -- Patch looks good. Missed this earlier, but there's several references to mapreduce.shuffle in documentation which need to be updated. Also, since it's being updated - can you make the Pattern final. Thanks > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776934#comment-13776934 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604911/YARN-1229.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2006//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2006//console This message is automatically generated. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.5.patch 1.Change to mapreduce_shuffle 2. using regex for checking auxName > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch, YARN-1229.5.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1168) Cannot run "echo \"Hello World\""
[ https://issues.apache.org/jira/browse/YARN-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1168: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > Cannot run "echo \"Hello World\"" > - > > Key: YARN-1168 > URL: https://issues.apache.org/jira/browse/YARN-1168 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Tassapol Athiapinya >Priority: Critical > Fix For: 2.1.2-beta > > > Run > $ ssh localhost "echo \"Hello World\"" > with bash does succeed. Hello World is shown in stdout. > Run distributed shell with similar echo command. That is either > $ /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client > -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar > -shell_command echo -shell_args "\"Hello World\"" > or > $ /usr/bin/yarn org.apache.hadoop.yarn.applications.distributedshell.Client > -jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell-2.*.jar > -shell_command echo -shell_args "Hello World" > {code:title=yarn logs -- only hello is shown} > LogType: stdout > LogLength: 6 > Log Contents: > hello > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1149: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > NM throws InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > - > > Key: YARN-1149 > URL: https://issues.apache.org/jira/browse/YARN-1149 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ramya Sunil >Assignee: Xuan Gong > Fix For: 2.1.2-beta > > Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, > YARN-1149.4.patch > > > When nodemanager receives a kill signal when an application has finished > execution but log aggregation has not kicked in, > InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown > {noformat} > 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just > finished : application_1377459190746_0118 > 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate > log-file for app application_1377459190746_0118 at > /app-logs/foo/logs/application_1377459190746_0118/_45454.tmp > 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService > (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation > to complete for application_1377459190746_0118 > 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for > container container_1377459190746_0118_01_04. Current good log dirs are > /tmp/yarn/local > 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate > log-file for app application_1377459190746_0118 > 2013-08-25 20:45:00,925 WARN application.Application > (ApplicationImpl.java:handle(427)) - Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > APPLICATION_LOG_HANDLING_FINISHED at RUNNING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) > at java.lang.Thread.run(Thread.java:662) > 2013-08-25 20:45:00,926 INFO application.Application > (ApplicationImpl.java:handle(430)) - Application > application_1377459190746_0118 transitioned from RUNNING to null > 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(463)) - > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is interrupted. Exiting. > 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping > server on 8040 > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1157: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.2-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1142: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > MiniYARNCluster web ui does not work properly > - > > Key: YARN-1142 > URL: https://issues.apache.org/jira/browse/YARN-1142 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur > Fix For: 2.1.2-beta > > > When going to the RM http port, the NM web ui is displayed. It seems there is > a singleton somewhere that breaks things when RM & NMs run in the same > process. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1131: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > $ yarn logs should return a message log aggregation is during progress if > YARN application is running > - > > Key: YARN-1131 > URL: https://issues.apache.org/jira/browse/YARN-1131 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Reporter: Tassapol Athiapinya >Assignee: Junping Du >Priority: Minor > Fix For: 2.1.2-beta > > > In the case when log aggregation is enabled, if a user submits MapReduce job > and runs $ yarn logs -applicationId while the YARN application is > running, the command will return no message and return user back to shell. It > is nice to tell the user that log aggregation is in progress. > {code} > -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 > -bash-4.1$ > {code} > At the same time, if invalid application ID is given, YARN CLI should say > that the application ID is incorrect rather than throwing > NoSuchElementException. > {code} > $ /usr/bin/yarn logs -applicationId application_0 > Exception in thread "main" java.util.NoSuchElementException > at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) > at > org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) > at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) > at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1158) ResourceManager UI has application stdout missing if application stdout is not in the same directory as AppMaster stdout
[ https://issues.apache.org/jira/browse/YARN-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1158: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > ResourceManager UI has application stdout missing if application stdout is > not in the same directory as AppMaster stdout > > > Key: YARN-1158 > URL: https://issues.apache.org/jira/browse/YARN-1158 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya > Fix For: 2.1.2-beta > > > Configure yarn-site.xml's yarn.nodemanager.local-dirs to multiple > directories. Turn on log aggregation. Run distributed shell application. If > an application writes AppMaster.stdout in one directory and stdout in another > directory. Goto ResourceManager web UI. Open up container logs. Only > AppMaster.stdout would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing
[ https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1121: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > RMStateStore should flush all pending store events before closing > - > > Key: YARN-1121 > URL: https://issues.apache.org/jira/browse/YARN-1121 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Bikas Saha > Fix For: 2.1.2-beta > > > on serviceStop it should wait for all internal pending events to drain before > stopping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776915#comment-13776915 ] Siddharth Seth commented on YARN-1229: -- Took a quick look. - Can you please rename MapreduceShuffle to mapreduce_shuffle (closer to the old name) - The check can be regex based, rather than walking through all the characters. - Include an empty check along with the null check > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1167: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > Submitted distributed shell application shows appMasterHost = empty > --- > > Key: YARN-1167 > URL: https://issues.apache.org/jira/browse/YARN-1167 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Tassapol Athiapinya > Fix For: 2.1.2-beta > > > Submit distributed shell application. Once the application turns to be > RUNNING state, app master host should not be empty. In reality, it is empty. > ==console logs== > distributedshell.Client: Got application report from ASM for, appId=12, > clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, > appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, > distributedFinalState=UNDEFINED, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1022) Unnecessary INFO logs in AMRMClientAsync
[ https://issues.apache.org/jira/browse/YARN-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1022: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > Unnecessary INFO logs in AMRMClientAsync > > > Key: YARN-1022 > URL: https://issues.apache.org/jira/browse/YARN-1022 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bikas Saha >Priority: Minor > Labels: newbie > Fix For: 2.1.2-beta > > > Logs like the following should be debug or else every legitimate stop causes > unnecessary exception traces in the logs. > 464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: > Heartbeater interrupted > 465 java.lang.InterruptedException: sleep interrupted > 466 at java.lang.Thread.sleep(Native Method) > 467 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249) > 468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: > Interrupted while waiting for queue > 469 java.lang.InterruptedException > 470 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer. > java:1961) > 471 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996) > 472 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) > 473 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
[ https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1053: Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > Diagnostic message from ContainerExitEvent is ignored in ContainerImpl > -- > > Key: YARN-1053 > URL: https://issues.apache.org/jira/browse/YARN-1053 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Labels: newbie > Fix For: 2.3.0, 2.1.2-beta > > Attachments: YARN-1053.20130809.patch > > > If the container launch fails then we send ContainerExitEvent. This event > contains exitCode and diagnostic message. Today we are ignoring diagnostic > message while handling this event inside ContainerImpl. Fixing it as it is > useful in diagnosing the failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.4.patch Allow _ as valid character in auxServiceName, and disallow auxServiceName starting at number > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, > YARN-1229.4.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1214: - Priority: Critical (was: Major) > Register ClientToken MasterKey in SecretManager after it is saved > - > > Key: YARN-1214 > URL: https://issues.apache.org/jira/browse/YARN-1214 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He >Priority: Critical > Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, > YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch > > > Currently, app attempt ClientToken master key is registered before it is > saved. This can cause problem that before the master key is saved, client > gets the token and RM also crashes, RM cannot reloads the master key back > after it restarts as it is not saved. As a result, client is holding an > invalid token. > We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776872#comment-13776872 ] Chris Nauroth commented on YARN-1229: - Agreed on underscores. Various resources indicate that {{[a-zA-Z_]+[a-zA-Z0-9_]*}} is a good format that we can expect to work cross-platform. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1128) FifoPolicy.computeShares throws NPE on empty list of Schedulables
[ https://issues.apache.org/jira/browse/YARN-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1128: - Fix Version/s: 2.1.2-beta > FifoPolicy.computeShares throws NPE on empty list of Schedulables > - > > Key: YARN-1128 > URL: https://issues.apache.org/jira/browse/YARN-1128 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Karthik Kambatla > Fix For: 2.1.2-beta > > Attachments: yarn-1128-1.patch > > > FifoPolicy gives all of a queue's share to the earliest-scheduled application. > {code} > Schedulable earliest = null; > for (Schedulable schedulable : schedulables) { > if (earliest == null || > schedulable.getStartTime() < earliest.getStartTime()) { > earliest = schedulable; > } > } > earliest.setFairShare(Resources.clone(totalResources)); > {code} > If the queue has no schedulables in it, earliest will be left null, leading > to an NPE on the last line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1203) Application Manager UI does not appear with Https enabled
[ https://issues.apache.org/jira/browse/YARN-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1203: -- Fix Version/s: 2.1.2-beta > Application Manager UI does not appear with Https enabled > - > > Key: YARN-1203 > URL: https://issues.apache.org/jira/browse/YARN-1203 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Fix For: 2.1.2-beta > > Attachments: YARN-1203.20131017.1.patch, YARN-1203.20131017.2.patch, > YARN-1203.20131017.3.patch, YARN-1203.20131018.1.patch, > YARN-1203.20131018.2.patch, YARN-1203.20131019.1.patch > > > Need to add support to disable 'hadoop.ssl.enabled' for MR jobs. > A job should be able to run on http protocol by setting 'hadoop.ssl.enabled' > property at job level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776865#comment-13776865 ] Siddharth Seth commented on YARN-1229: -- Just looked at the patch, it'd be nice to include underscores as well - provides for a separator in the allowed character set. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1204: -- Fix Version/s: 2.1.2-beta > Need to add https port related property in Yarn > --- > > Key: YARN-1204 > URL: https://issues.apache.org/jira/browse/YARN-1204 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Fix For: 2.1.2-beta > > Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, > YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, > YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch > > > There is no yarn property available to configure https port for Resource > manager, nodemanager and history server. Currently, Yarn services uses the > port defined for http [defined by > 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', > 'yarn.resourcemanager.webapp.address'] for running services on https protocol. > Yarn should have list of property to assign https port for RM, NM and JHS. > It can be like below. > yarn.nodemanager.webapp.https.address > yarn.resourcemanager.webapp.https.address > mapreduce.jobhistory.webapp.https.address -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1229: - Fix Version/s: (was: 2.1.1-beta) 2.1.2-beta > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.2-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776852#comment-13776852 ] Vinod Kumar Vavilapalli commented on YARN-1229: --- *sigh* more incompatible changes. Thought for a while if we can do it in a compatible manner, but doesn't seem like there is any way. Looked at the patch, +1 for the changes. Let's get it in asap. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol
[ https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776847#comment-13776847 ] Carlo Curino commented on YARN-624: --- Hi Guys, I would like to quantify what is the typical waste of resources while "hoarding" containers towards a gang for Gyraph or Storm. Anyone have an intuition/measure of the typical time-delay and container slot-time wasted while hoarding containers, before the useful part of the computation starts? Thanks.. > Support gang scheduling in the AM RM protocol > - > > Key: YARN-624 > URL: https://issues.apache.org/jira/browse/YARN-624 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, scheduler >Affects Versions: 2.0.4-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > > Per discussion on YARN-392 and elsewhere, gang scheduling, in which a > scheduler runs a set of tasks when they can all be run at the same time, > would be a useful feature for YARN schedulers to support. > Currently, AMs can approximate this by holding on to containers until they > get all the ones they need. However, this lends itself to deadlocks when > different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776843#comment-13776843 ] Hadoop QA commented on YARN-1214: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604886/YARN-1214.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2005//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2005//console This message is automatically generated. > Register ClientToken MasterKey in SecretManager after it is saved > - > > Key: YARN-1214 > URL: https://issues.apache.org/jira/browse/YARN-1214 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, > YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch > > > Currently, app attempt ClientToken master key is registered before it is > saved. This can cause problem that before the master key is saved, client > gets the token and RM also crashes, RM cannot reloads the master key back > after it restarts as it is not saved. As a result, client is holding an > invalid token. > We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1214: -- Attachment: YARN-1214.6.patch patch rebased > Register ClientToken MasterKey in SecretManager after it is saved > - > > Key: YARN-1214 > URL: https://issues.apache.org/jira/browse/YARN-1214 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, > YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.6.patch, YARN-1214.patch > > > Currently, app attempt ClientToken master key is registered before it is > saved. This can cause problem that before the master key is saved, client > gets the token and RM also crashes, RM cannot reloads the master key back > after it restarts as it is not saved. As a result, client is holding an > invalid token. > We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1214) Register ClientToken MasterKey in SecretManager after it is saved
[ https://issues.apache.org/jira/browse/YARN-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776802#comment-13776802 ] Bikas Saha commented on YARN-1214: -- +1 > Register ClientToken MasterKey in SecretManager after it is saved > - > > Key: YARN-1214 > URL: https://issues.apache.org/jira/browse/YARN-1214 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-1214.1.patch, YARN-1214.2.patch, YARN-1214.3.patch, > YARN-1214.4.patch, YARN-1214.5.patch, YARN-1214.patch > > > Currently, app attempt ClientToken master key is registered before it is > saved. This can cause problem that before the master key is saved, client > gets the token and RM also crashes, RM cannot reloads the master key back > after it restarts as it is not saved. As a result, client is holding an > invalid token. > We can register the client token master key after it is saved in the store. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776765#comment-13776765 ] Bikas Saha commented on YARN-1229: -- Looks good to me. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776684#comment-13776684 ] Jian He commented on YARN-1157: --- Tests look much clean, thanks for the update, patch looks good, + 1 > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776663#comment-13776663 ] Hadoop QA commented on YARN-1157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604859/YARN-1157.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2004//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2004//console This message is automatically generated. > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776653#comment-13776653 ] Karthik Kambatla commented on YARN-1068: Thanks [~bikassaha], agree with most of your points. bq. AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern The reason for this is our attempt to reuse most of the common code - protos and client implementations. bq. Having thought about this, it seems to me that this jira is actually blocked by YARN-986. To fix the admin support in entirety, I agree that we need YARN-1232 and YARN-986. That said, for ease of development, I would propose splitting the admin support into two parts (JIRAs) - basic support (this JIRA) to go in first to help testing YARN-1232 and YARN-986, and complete admin support that adds the remaining parts. Otherwise, we need applying this over those other JIRAs to test. Thoughts? > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-986) YARN should have a ClusterId/ServiceId
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-986: Assignee: Karthik Kambatla (was: Vinod Kumar Vavilapalli) Sure, here you go. > YARN should have a ClusterId/ServiceId > -- > > Key: YARN-986 > URL: https://issues.apache.org/jira/browse/YARN-986 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Karthik Kambatla > > This needs to be done to support non-ip based fail over of RM. Once the > server sets the token service address to be this generic ClusterId/ServiceId, > clients can translate it to appropriate final IP and then be able to select > tokens via TokenSelectors. > Some workarounds for other related issues were put in place at YARN-945. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-986) YARN should have a ClusterId/ServiceId
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-986: Summary: YARN should have a ClusterId/ServiceId (was: YARN should have a ClusterId/ServiceId that should be used to set the service address for tokens) > YARN should have a ClusterId/ServiceId > -- > > Key: YARN-986 > URL: https://issues.apache.org/jira/browse/YARN-986 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > This needs to be done to support non-ip based fail over of RM. Once the > server sets the token service address to be this generic ClusterId/ServiceId, > clients can translate it to appropriate final IP and then be able to select > tokens via TokenSelectors. > Some workarounds for other related issues were put in place at YARN-945. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-986) YARN should have a ClusterId/ServiceId
[ https://issues.apache.org/jira/browse/YARN-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776637#comment-13776637 ] Bikas Saha commented on YARN-986: - This should be used to set the service address for tokens. This would also be needed to pick up the correct configs for HA scenarios. > YARN should have a ClusterId/ServiceId > -- > > Key: YARN-986 > URL: https://issues.apache.org/jira/browse/YARN-986 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > This needs to be done to support non-ip based fail over of RM. Once the > server sets the token service address to be this generic ClusterId/ServiceId, > clients can translate it to appropriate final IP and then be able to select > tokens via TokenSelectors. > Some workarounds for other related issues were put in place at YARN-945. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776634#comment-13776634 ] Bikas Saha commented on YARN-1068: -- It would be educative to compare the HAAdmin server start code with existing admin RM server like the AdminService. I notice 2 things. 1) AdminService does not use the HAServiceProtocolServerSideTranslatorPB pattern 2) AdminService does something with HADOOP_SECURITY_AUTHORIZATION which is missing in HAAdminService. This probably defines who has access to perform the admin operations. We will likely need that for the HAAdmin right? Having thought about this, it seems to me that this jira is actually blocked by YARN-986. Without a concept of a logical name how can we expect the CLI etc to find the correct RM address from configuration? The client conf files would be expected to have entries for all RM instances and we would need to be able to issue admin commands to any one of them. So we need to be able to address them via a logical name, right? So the current approach that picks the RM_HA_ADMIN_SERVICE address does not seem like a viable solution. Similarly, server conf files would need to tell the server what its logical name is so that it can try to pick and instance specific configurations. This is precisely why we have the HAAdmin.resolveTarget() method. Again, it would be educative to look at NNHAServiceTarget for client side and the constructor for NameNode where is uses the logical name to translate and re-write server side conf. > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1157: Attachment: YARN-1157.6.patch Adding more comments in RegisterApplicationMasterRequest and FinishApplicationMasterRequest > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch, YARN-1157.6.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776629#comment-13776629 ] Hadoop QA commented on YARN-1157: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604851/YARN-1157.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2003//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2003//console This message is automatically generated. > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776628#comment-13776628 ] Alejandro Abdelnur commented on YARN-1021: -- +1 > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776598#comment-13776598 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604849/YARN-1229.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2002//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2002//console This message is automatically generated. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1157) ResourceManager UI has invalid tracking URL link for distributed shell application
[ https://issues.apache.org/jira/browse/YARN-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1157: Attachment: YARN-1157.5.patch create the patch based on the latest trunk > ResourceManager UI has invalid tracking URL link for distributed shell > application > -- > > Key: YARN-1157 > URL: https://issues.apache.org/jira/browse/YARN-1157 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong > Fix For: 2.1.1-beta > > Attachments: YARN-1157.1.patch, YARN-1157.2.patch, YARN-1157.2.patch, > YARN-1157.3.patch, YARN-1157.4.patch, YARN-1157.5.patch > > > Submit YARN distributed shell application. Goto ResourceManager Web UI. The > application definitely appears. In Tracking UI column, there will be history > link. Click on that link. Instead of showing application master web UI, HTTP > error 500 would appear. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.3.patch Changed the NM_AUX_SERVICE prefix to NodeManagerAuxService to eliminate the "_" > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1233) NodeManager doesn't renew krb5 creds
Allen Wittenauer created YARN-1233: -- Summary: NodeManager doesn't renew krb5 creds Key: YARN-1233 URL: https://issues.apache.org/jira/browse/YARN-1233 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Allen Wittenauer In 2.1.0-beta-rc1 (sorry, haven't upgraded yet) the NM is not renewing krb5 TGTs after they expire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776571#comment-13776571 ] Hadoop QA commented on YARN-1068: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604842/yarn-1068-7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2000//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2000//console This message is automatically generated. > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776554#comment-13776554 ] Hadoop QA commented on YARN-1229: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604841/YARN-1229.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2001//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2001//console This message is automatically generated. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776546#comment-13776546 ] Bikas Saha commented on YARN-1229: -- base32 encoding is a good idea if we dont want to break compatibility. It basically boils down to that. Xuan, the AuxServiceHelper is still using NM_AUX_SERVICE prefix that has _ in it. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776529#comment-13776529 ] Xuan Gong commented on YARN-1229: - Run the full YARN test, all the YARN Test are passing. Run the full MAPREDUCE test, some of tests in mapred package has time out issue, which I do not think it is caused by this patch. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1068: --- Attachment: yarn-1068-7.patch Thanks [~tucu00]. Updated patch to address the comment. > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.2.patch Add a test case > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch, YARN-1229.2.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776515#comment-13776515 ] Hudson commented on YARN-1204: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4462 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4462/]) YARN-1204. Added separate configuration properties for https for RM and NM without which servers enabled with https will also start on http ports. Contributed by Omkar Vinit Joshi. MAPREDUCE-5523. Added separate configuration properties for https for JHS without which even when https is enabled, it starts on http port itself. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1525947) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/WebAppUtil.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/v2/MiniMRYarnCluster.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/amfilter/AmFilterInitializer.java > Need to add https port related property in Yarn > --- > > Key: YARN-1204 > URL: https://issues.apache.org/jira/browse/YARN-1204 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, > YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, > YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch > > > There is no yarn property available to configure https port for Resource > manager, nodemanager and history server. Currently, Yarn services uses the > port defined for http [defined by > 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', > 'yarn.resourcemanager.webapp.address'] for running services on https protocol. > Yarn should have list of property to assign https port for RM, NM and JHS. > It can be like below. > yarn.nodemanager.webapp.https.address > yarn.resourcemanager.webapp.https.address > mapreduce.jobhistory.webapp.https.address --
[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy
[ https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776509#comment-13776509 ] Karthik Kambatla commented on YARN-1028: Using the configs introduced in YARN-1232, we should be able to retry alternate RMs by setting {{yarn.resourcemanager.ha.nodes.id}}. [~devaraj.k], I hope it is okay if I take this up. > Add FailoverProxyProvider like capability to RMProxy > > > Key: YARN-1028 > URL: https://issues.apache.org/jira/browse/YARN-1028 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Devaraj K > > RMProxy layer currently abstracts RM discovery and implements it by looking > up service information from configuration. Motivated by HDFS and using > existing classes from Common, we can add failover proxy providers that may > provide RM discovery in extensible ways. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776508#comment-13776508 ] Karthik Kambatla commented on YARN-1232: Will post another patch that describes these configs in yarn-default.xml. Don't think we can have default values for these though. > Configuration support for RM HA > --- > > Key: YARN-1232 > URL: https://issues.apache.org/jira/browse/YARN-1232 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1232-1.patch > > > We should augment the configuration to allow users specify two RMs and the > individual RPC addresses for them. This blocks > ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1232) Configuration support for RM HA
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-1.patch Patch that adds the configs to YarnConfiguration and hooks them up to RM startup and RMProxy implementation through HAUtil. > Configuration support for RM HA > --- > > Key: YARN-1232 > URL: https://issues.apache.org/jira/browse/YARN-1232 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1232-1.patch > > > We should augment the configuration to allow users specify two RMs and the > individual RPC addresses for them. This blocks > ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1232) Configuration support for RM HA
Karthik Kambatla created YARN-1232: -- Summary: Configuration support for RM HA Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. This blocks ConfiguredFailoverProxyProvider. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-1089: Target Version/s: 2.3.0 (was: 2.1.1-beta) > Add YARN compute units alongside virtual cores > -- > > Key: YARN-1089 > URL: https://issues.apache.org/jira/browse/YARN-1089 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1089-1.patch, YARN-1089.patch > > > Based on discussion in YARN-1024, we will add YARN compute units as a > resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776502#comment-13776502 ] Alejandro Abdelnur commented on YARN-1068: -- One nit, in the RMHAProtocolService, the {{serviceStop()}} should be symmetric with the start in the sense it should do the {{if (haEnabled)}} check to stop the HAAdmin server (instead of doing this check in the HAAdmin service itself). > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1089) Add YARN compute units alongside virtual cores
[ https://issues.apache.org/jira/browse/YARN-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776503#comment-13776503 ] Arun C Murthy commented on YARN-1089: - I don't think we should put this in branch-2.1 or target this for hadoop-2.2. This is a major new feature which can be implemented in a compatible manner - let's target this for 2.3.0. > Add YARN compute units alongside virtual cores > -- > > Key: YARN-1089 > URL: https://issues.apache.org/jira/browse/YARN-1089 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-1089-1.patch, YARN-1089.patch > > > Based on discussion in YARN-1024, we will add YARN compute units as a > resource for requesting and scheduling CPU processing power. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1229: Attachment: YARN-1229.1.patch Attached patch changes the mapreduce.shuffle to MapreduceShuffle. Also enforce the check(service name should contain only a-zA-Z0-9) at AuxSerivce > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > Attachments: YARN-1229.1.patch > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1204) Need to add https port related property in Yarn
[ https://issues.apache.org/jira/browse/YARN-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776485#comment-13776485 ] Vinod Kumar Vavilapalli commented on YARN-1204: --- The latest patch looks good to me. +1. Checking this in. > Need to add https port related property in Yarn > --- > > Key: YARN-1204 > URL: https://issues.apache.org/jira/browse/YARN-1204 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Omkar Vinit Joshi > Attachments: YARN-1204.20131018.1.patch, YARN-1204.20131020.1.patch, > YARN-1204.20131020.2.patch, YARN-1204.20131020.3.patch, > YARN-1204.20131020.4.patch, YARN-1204.20131023.1.patch > > > There is no yarn property available to configure https port for Resource > manager, nodemanager and history server. Currently, Yarn services uses the > port defined for http [defined by > 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', > 'yarn.resourcemanager.webapp.address'] for running services on https protocol. > Yarn should have list of property to assign https port for RM, NM and JHS. > It can be like below. > yarn.nodemanager.webapp.https.address > yarn.resourcemanager.webapp.https.address > mapreduce.jobhistory.webapp.https.address -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1068) Add admin support for HA operations
[ https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776484#comment-13776484 ] Karthik Kambatla commented on YARN-1068: [~bikassaha], when you get a chance, can you review the latest patch? > Add admin support for HA operations > --- > > Key: YARN-1068 > URL: https://issues.apache.org/jira/browse/YARN-1068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Labels: ha > Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, > yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, > yarn-1068-prelim.patch > > > Support HA admin operations to facilitate transitioning the RM to Active and > Standby states. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776441#comment-13776441 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604818/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1999//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1999//console This message is automatically generated. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atl
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: (was: YARN-1021.pdf) > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.pdf > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776367#comment-13776367 ] Alejandro Abdelnur commented on YARN-1021: -- [~ywskycn], we shouldn't use /tmp as that does not get clean up by the build, instead we should use a temp subdir under target/, easily done by: {code} File dir = new File("target", UUID.randomUUID()); dir.mkdirs(); {code} And the documentation, in the appendix should have a complete/simple example of an sls JSON input file as a reference. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776364#comment-13776364 ] Wei Yan commented on YARN-1021: --- Update a new patch according to [~tucu00]'s latest comments. And also let simulator support two types of inputs: (1) The rumen traces, thus users can directly deploy their rumen traces to the simulator. (2) The simulator itself traces (sls), which is much simpler and users can easily generate various workloads. The simulator also has a tool to help users convert rumen traces to sls traces. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776357#comment-13776357 ] Hadoop QA commented on YARN-1021: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604801/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1998//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1998//console This message is automatically generated. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/softwa
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776350#comment-13776350 ] Chris Nauroth commented on YARN-1229: - BTW, if we use {{[a-zA-Z_]+[a-zA-Z0-9_]*}}, then that will be compatible with Windows too. It looks like Windows actually allows many more characters than that, but I think it makes sense to stick to a minimal set that we expect to work cross-platform. > Shell$ExitCodeException could happen if AM fails to start > - > > Key: YARN-1229 > URL: https://issues.apache.org/jira/browse/YARN-1229 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.1.1-beta >Reporter: Tassapol Athiapinya >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.1.1-beta > > > I run sleep job. If AM fails to start, this exception could occur: > 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with > state FAILED due to: Application application_1379673267098_0020 failed 1 > times due to AM Container for appattempt_1379673267098_0020_01 exited > with exitCode: 1 due to: Exception from container-launch: > org.apache.hadoop.util.Shell$ExitCodeException: > /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: > line 12: export: > `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= > ': not a valid identifier > at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) > at org.apache.hadoop.util.Shell.run(Shell.java:379) > at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276
[ https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776302#comment-13776302 ] Hadoop QA commented on YARN-1231: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604791/YARN-1231.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1997//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1997//console This message is automatically generated. > Fix test cases that will hit max- am-used-resources-percent limit after > YARN-276 > > > Key: YARN-1231 > URL: https://issues.apache.org/jira/browse/YARN-1231 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 2.1.1-beta >Reporter: Nemon Lou >Assignee: Nemon Lou > Labels: test > Attachments: YARN-1231.patch > > > Use a separate jira to fix YARN's test cases that will fail by hitting max- > am-used-resources-percent limit after YARN-276. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1231) Fix test cases that will hit max- am-used-resources-percent limit after YARN-276
[ https://issues.apache.org/jira/browse/YARN-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated YARN-1231: Attachment: YARN-1231.patch A patch fixing test cases in hadoop-yarn-server-resourcemanager project. > Fix test cases that will hit max- am-used-resources-percent limit after > YARN-276 > > > Key: YARN-1231 > URL: https://issues.apache.org/jira/browse/YARN-1231 > Project: Hadoop YARN > Issue Type: Task >Affects Versions: 2.1.1-beta >Reporter: Nemon Lou >Assignee: Nemon Lou > Labels: test > Attachments: YARN-1231.patch > > > Use a separate jira to fix YARN's test cases that will fail by hitting max- > am-used-resources-percent limit after YARN-276. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776173#comment-13776173 ] Hadoop QA commented on YARN-1021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604767/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist: org.apache.hadoop.yarn.sls.TestSLSRunner {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1996//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1996//console This message is automatically generated. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information
[jira] [Updated] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1021: -- Attachment: YARN-1021.patch > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776142#comment-13776142 ] Hadoop QA commented on YARN-1021: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604747/YARN-1021.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1145 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-tools/hadoop-sls hadoop-tools/hadoop-tools-dist. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1995//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1995//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1995//console This message is automatically generated. > Yarn Scheduler Load Simulator > - > > Key: YARN-1021 > URL: https://issues.apache.org/jira/browse/YARN-1021 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, > YARN-1021.pdf > > > The Yarn Scheduler is a fertile area of interest with different > implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, > several optimizations are also made to improve scheduler performance for > different scenarios and workload. Each scheduler algorithm has its own set of > features, and drives scheduling decisions by many factors, such as fairness, > capacity guarantee, resource availability, etc. It is very important to > evaluate a scheduler algorithm very well before we deploy it in a production > cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling > algorithm. Evaluating in a real cluster is always time and cost consuming, > and it is also very hard to find a large-enough cluster. Hence, a simulator > which can predict how well a scheduler algorithm for some specific workload > would be quite useful. > We want to build a Scheduler Load Simulator to simulate large-scale Yarn > clusters and application loads in a single machine. This would be invaluable > in furthering Yarn by providing a tool for researchers and developers to > prototype new scheduler features and predict their behavior and performance > with reasonable amount of confidence, there-by aiding rapid innovation. > The simulator will exercise the real Yarn ResourceManager removing the > network factor by simulating NodeManagers and ApplicationMasters via handling > and dispatching NM/AMs heartbeat events from within the same JVM. > To keep tracking of scheduler behavior and performance, a scheduler wrapper > will wrap the real scheduler. > The simulator will produce real time metrics while executing, including: > * Resource usages for whole cluster and each queue, which can be utilized to > configure cluster and queue's capacity. > * The detailed application execution trace (recorded in relation to simulated > time), which can be analyzed to understand/validate the scheduler behavior > (individual jobs turn around time, throughput, fairness, capacity guarantee, > etc). > * Several key metrics of scheduler algorithm, such as time cost of each > scheduler operation (allocate, handle, etc), which can be utilized by Hadoop > developers to find the code spots and scalability limits. > The simulator will provide real time charts showing the behavior of the > scheduler and its performance. > A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing > how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrec