[jira] [Commented] (YARN-1578) Fix how to handle ApplicationHistory about the container
[ https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866471#comment-13866471 ] Hadoop QA commented on YARN-1578: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622140/YARN-1578.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2837//console This message is automatically generated. Fix how to handle ApplicationHistory about the container Key: YARN-1578 URL: https://issues.apache.org/jira/browse/YARN-1578 Project: Hadoop YARN Issue Type: Bug Affects Versions: YARN-321 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: YARN-1578.patch, screenshot.png I carried out PiEstimator job at Hadoop cluster which applied YARN-321. After the job end and when I accessed Web UI of HistoryServer, it displayed 500. And HistoryServer daemon log was output as follows. {code} 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /applicationhistory/appattempt/appattempt_1389146249925_0008_01 java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) (snip...) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110) (snip...) {code} I confirmed that there was container which was not finished from ApplicationHistory file. In ResourceManager daemon log, ResourceManager reserved this container, but did not allocate it. Therefore, about a container which is not allocated, it is necessary to change how to handle in ApplicationHistory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493
[ https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1577: - Target Version/s: 2.4.0 Unmanaged AM is broken because of YARN-1493 --- Key: YARN-1577 URL: https://issues.apache.org/jira/browse/YARN-1577 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Jian He Assignee: Jian He Priority: Blocker Today unmanaged AM client is waiting for app state to be Accepted to launch the AM. This is broken since we changed in YARN-1493 to start the attempt after the application is Accepted. We may need to introduce an attempt state report that client can rely on to query the attempt state and choose to launch the unmanaged AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1470) Add audience annotation to MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1470: -- Assignee: (was: Chen He) Add audience annotation to MiniYARNCluster -- Key: YARN-1470 URL: https://issues.apache.org/jira/browse/YARN-1470 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Reporter: Sandy Ryza Labels: newbie We should make it clear whether this is a public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1575) Public localizer crashes with Localized unkown resource
[ https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1575: - Attachment: YARN-1575.patch YARN-1575.branch-0.23.patch Attaching a blunt way to solve the race condition, which is to synchronize the queueing and update of {{pending}}. This basically defeats the point of {{pending}} being a ConcurrentHashMap, so I updated it to a synchronized map since some unit tests are accessing it asynchronously. For 0.23 we already are synchronizing {{attempts}}, so I piggy-backed the synchronization on that variable. Public localizer crashes with Localized unkown resource - Key: YARN-1575 URL: https://issues.apache.org/jira/browse/YARN-1575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Priority: Critical Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch The public localizer can crash with the error: {noformat} 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1580) Documentation error regarding container-allocation.expiry-interval-ms
German Florez-Larrahondo created YARN-1580: -- Summary: Documentation error regarding container-allocation.expiry-interval-ms Key: YARN-1580 URL: https://issues.apache.org/jira/browse/YARN-1580 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Trivial While trying to control settings related to expiration of tokens for long running jobs,based on the documentation ( http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml) I attempted to increase values for yarn.rm.container-allocation.expiry-interval-ms without luck. Looking code like YarnConfiguration.java I noticed that in recent versions all these kind of settings now have the prefix yarn.resourcemanager.rm as opposed to yarn.rm. So for this specific case the setting of interest is yarn.resourcemanager.rm.container-allocation.expiry-interval-ms I supposed there are other documentation errors similar to this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early
German Florez-Larrahondo created YARN-1581: -- Summary: Fair Scheduler: containers that get reserved create container token to early Key: YARN-1581 URL: https://issues.apache.org/jira/browse/YARN-1581 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Minor When using the FairScheduler with some slow servers and long running jobs I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource
[ https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866767#comment-13866767 ] Hadoop QA commented on YARN-1575: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622200/YARN-1575.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2838//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2838//console This message is automatically generated. Public localizer crashes with Localized unkown resource - Key: YARN-1575 URL: https://issues.apache.org/jira/browse/YARN-1575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch The public localizer can crash with the error: {noformat} 2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26 2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] German Florez-Larrahondo updated YARN-1581: --- Description: When using the FairScheduler with long running jobs and over-subscribed servers I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. was: When using the FairScheduler with some slow servers and long running jobs I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. Fair Scheduler: containers that get reserved create container token to early Key: YARN-1581 URL: https://issues.apache.org/jira/browse/YARN-1581 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Minor When using the FairScheduler with long running jobs and over-subscribed servers I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] German Florez-Larrahondo updated YARN-1581: --- Attachment: ProcessInfo construction failing.jpg Fair Scheduler: containers that get reserved create container token to early Key: YARN-1581 URL: https://issues.apache.org/jira/browse/YARN-1581 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Minor Attachments: ProcessInfo construction failing.jpg When using the FairScheduler with long running jobs and over-subscribed servers I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] German Florez-Larrahondo updated YARN-1581: --- Attachment: (was: ProcessInfo construction failing.jpg) Fair Scheduler: containers that get reserved create container token to early Key: YARN-1581 URL: https://issues.apache.org/jira/browse/YARN-1581 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Minor When using the FairScheduler with long running jobs and over-subscribed servers I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well
[ https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866844#comment-13866844 ] Zhijie Shen commented on YARN-1413: --- Checked RMContainerImpl again. It's wrong to set logURL to the aggregated log link in LaunchedTransition. It should be kept unchanged, and point to NM webpage of showing the log of running container. This logURL should be updated to the aggregated log link when the container is finished. See my //TODO comments in FinishedTransition. [YARN-321] AHS WebUI should server aggregated logs as well -- Key: YARN-1413 URL: https://issues.apache.org/jira/browse/YARN-1413 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Mayank Bansal Fix For: YARN-321 Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, YARN-1413-4.patch, YARN-1413-5.patch, YARN-1413-6.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866848#comment-13866848 ] Chris Nauroth commented on YARN-1138: - Hi, Chuan. The patch looks good. It's a similar approach to MAPREDUCE-5442. Two minor things: # There is some indentation by 3 spaces instead of 2 around the property in yarn-default.xml. The indentation was already off in the current code, but would you mind fixing it as part of this patch? # It's no longer possible for someone to look at yarn-default.xml and see the default classpath. In MAPREDUCE-5442, we worked around this by adding extra documentation in the description field. Would you mind doing the same here? Here is what we ended up with for {{mapreduce.application.classpath}} in mapred-default.xml: {code} property descriptionCLASSPATH for MR applications. A comma-separated list of CLASSPATH entries. If mapreduce.application.framework is set then this must specify the appropriate classpath for that archive, and the name of the archive must be present in the classpath. When this value is empty, the following default CLASSPATH for MR applications would be used. For Linux: $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*, $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*. For Windows: %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*, %HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*. /description namemapreduce.application.classpath/name value/value /property {code} yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Reporter: Yingda Chen Assignee: Chuan Liu Fix For: 3.0.0 Attachments: YARN-1138.patch yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845 ] Alejandro Abdelnur commented on YARN-888: - [~vinodvk], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +- com.sun.jersey:jersey-server:jar:1.9:compile | +- asm:asm:jar:3.2:compile | \- com.sun.jersey:jersey-core:jar:1.9:compile +- com.sun.jersey:jersey-json:jar:1.9:compile | +- org.codehaus.jettison:jettison:jar:1.1:compile | | \- stax:stax-api:jar:1.0.1:compile | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile | | \- javax.activation:activation:jar:1.1:compile | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed from 1.8.3) | \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 1.8.3) \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile {code} With the patch, the required dependencies by itself are down to: {code} +- commons-lang:commons-lang:jar:2.6:compile +- com.google.guava:guava:jar:11.0.2:compile | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile +- commons-logging:commons-logging:jar:1.1.3:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile \- com.google.protobuf:protobuf-java:jar:2.5.0:compile {code} Does this address your concerns? clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845 ] Alejandro Abdelnur edited comment on YARN-888 at 1/9/14 5:50 PM: - [~vinodkv], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +- com.sun.jersey:jersey-server:jar:1.9:compile | +- asm:asm:jar:3.2:compile | \- com.sun.jersey:jersey-core:jar:1.9:compile +- com.sun.jersey:jersey-json:jar:1.9:compile | +- org.codehaus.jettison:jettison:jar:1.1:compile | | \- stax:stax-api:jar:1.0.1:compile | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile | | \- javax.activation:activation:jar:1.1:compile | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed from 1.8.3) | \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 1.8.3) \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile {code} With the patch, the required dependencies by itself are down to: {code} +- commons-lang:commons-lang:jar:2.6:compile +- com.google.guava:guava:jar:11.0.2:compile | \- com.google.code.findbugs:jsr305:jar:1.3.9:compile +- commons-logging:commons-logging:jar:1.1.3:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile \- com.google.protobuf:protobuf-java:jar:2.5.0:compile {code} Does this address your concerns? was (Author: tucu00): [~vinodvk], While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it drags non-required dependencies (unless you only put in non-leaf POMs dependencies that are common to all the leaf modules). Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is one of the motivations (agree it is an IntelliJ issue, on the other hand the change does not affect the project built at all and allows IntelliJ users to build/debug from the IDE out of the box without doing funny voodoo). The other motivation, and IMO is more important, is to clean up the dependencies modules like yarn-api and yarn-client have. Restricting them to what is used on the client side. Using the dependency:tree and dependency:analyze plugins I’ve reduced the 3rd party JARs required by the clients significantly. As [~ste...@apache.org] pointed out there is much more work we should do in this direction, this is a first non-intrusive baby step in that direction. To give you and idea, before the this patch *hadoop-yarn-api* reports as required dependencies by itself: {code} +- org.slf4j:slf4j-api:jar:1.7.5:compile +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile | \- log4j:log4j:jar:1.2.17:compile +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile | +- tomcat:jasper-compiler:jar:5.5.23:test +- com.google.inject.extensions:guice-servlet:jar:3.0:compile +- io.netty:netty:jar:3.6.2.Final:compile +- com.google.protobuf:protobuf-java:jar:2.5.0:compile +- commons-io:commons-io:jar:2.4:compile +- com.google.inject:guice:jar:3.0:compile | +- javax.inject:javax.inject:jar:1:compile | \- aopalliance:aopalliance:jar:1.0:compile +-
[jira] [Updated] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1574: Attachment: YARN-1574.4.patch When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866855#comment-13866855 ] Xuan Gong commented on YARN-1574: - Thanks for the review. The new patch addresses all the latest comments When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload
[ https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866877#comment-13866877 ] Sandy Ryza commented on YARN-1567: -- The Fair Scheduler does not have a notion of stopped queues. While there is a lot in queue definitions and behavior that we can consolidate, I don't think that we should use conciliation as a sole basis for stopping new features. Is there a reason that the way the Capacity Scheduler works is fundamentally incompatible with changing queues from leaf to parent? Queues have a lot in common once they exist, but the way queues are configured, loaded, and managed may be different to the point of irreconcilable between the Capacity Scheduler and Fair Scheduler. Also, I realized I left out the motivation for this. The use case behind this is the following: Sally gets a leaf queue created for her division, SallysQueue. As usage starts to ramp up, her teams start complaining that their workloads are stepping on each other's toes. She would like to be able to divide up the resources allocated to her division between her teams without restarting the RM. In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload - Key: YARN-1567 URL: https://issues.apache.org/jira/browse/YARN-1567 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1567-1.patch, YARN-1567.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1576) Allow setting RPC timeout in ApplicationClientProtocolPBClientImpl
[ https://issues.apache.org/jira/browse/YARN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866889#comment-13866889 ] Sandy Ryza commented on YARN-1576: -- This tracks the YARN changes that will be necessary for MAPREDUCE-5707 Allow setting RPC timeout in ApplicationClientProtocolPBClientImpl -- Key: YARN-1576 URL: https://issues.apache.org/jira/browse/YARN-1576 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866923#comment-13866923 ] Vinod Kumar Vavilapalli commented on YARN-888: -- Makes sense now. I think I understand the main trouble point. I think this is the one: {quote} unless you only put in non-leaf POMs dependencies that are common to all the leaf modules. {quote} Today we put any dependency needed by *any* leaf-module in the non-leaf module. That clearly adds a lot of burden on leafs that don't need those libs. Am I understanding that correctly? If so, yeah, let's go for it. At the end of the day, the deps that user sees needs to be as clean as possible. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866925#comment-13866925 ] Vinod Kumar Vavilapalli commented on YARN-888: -- Further, may be we should also do a hybrid by putting *only* common stuff in the non-leaf modules and anything else in the leaves? clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866945#comment-13866945 ] Alejandro Abdelnur commented on YARN-888: - You got it. On the hybrid approach, it is quite cumbersome as you would have to verify that all children modules use the common dependency to add. IMO, leaving the noleaf modules slim will be much easier to handle. Plus we solve the prob for IntelliJ IDE users. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-915) Apps Completed metrics on web UI is not correct after RM restart
[ https://issues.apache.org/jira/browse/YARN-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-915. -- Resolution: Duplicate Apps Completed metrics on web UI is not correct after RM restart Key: YARN-915 URL: https://issues.apache.org/jira/browse/YARN-915 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: screen shot.png negative Apps completed metrics can show on web UI -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-915) Apps Completed metrics on web UI is not correct after RM restart
[ https://issues.apache.org/jira/browse/YARN-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866972#comment-13866972 ] Jian He commented on YARN-915: -- closed as a duplicate of YARN-1166 Apps Completed metrics on web UI is not correct after RM restart Key: YARN-915 URL: https://issues.apache.org/jira/browse/YARN-915 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: screen shot.png negative Apps completed metrics can show on web UI -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuan Liu updated YARN-1138: Attachment: YARN-1138.2.patch Sound good! Here is new patch that addressed the two issues. yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Reporter: Yingda Chen Assignee: Chuan Liu Fix For: 3.0.0 Attachments: YARN-1138.2.patch, YARN-1138.patch yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-1138: Component/s: api Target Version/s: 3.0.0, 2.3.0 Affects Version/s: 2.2.0 Fix Version/s: (was: 3.0.0) Hadoop Flags: Reviewed +1 for the patch, pending a Jenkins run with the new version. I plan to commit this later today. yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.2.0 Reporter: Yingda Chen Assignee: Chuan Liu Attachments: YARN-1138.2.patch, YARN-1138.patch yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-45: Issue Type: Bug (was: Sub-task) Parent: (was: YARN-397) Scheduler feedback to AM to release containers -- Key: YARN-45 URL: https://issues.apache.org/jira/browse/YARN-45 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Chris Douglas Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45_design_thoughts.pdf The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers. [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-567: - Issue Type: Bug (was: Sub-task) Parent: (was: YARN-397) RM changes to support preemption for FairScheduler and CapacityScheduler Key: YARN-567 URL: https://issues.apache.org/jira/browse/YARN-567 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: YARN-567.patch, YARN-567.patch A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-567: - Issue Type: Sub-task (was: Bug) Parent: YARN-45 RM changes to support preemption for FairScheduler and CapacityScheduler Key: YARN-567 URL: https://issues.apache.org/jira/browse/YARN-567 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: YARN-567.patch, YARN-567.patch A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this. The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity. By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569). The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-569: - Issue Type: Bug (was: Sub-task) Parent: (was: YARN-397) CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, YARN-569.1.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch, preemption.2.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt
[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-569: - Issue Type: Sub-task (was: Bug) Parent: YARN-45 CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, YARN-569.1.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.2.patch, YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch, preemption.2.patch There is a tension between the fast-pace reactive role of the CapacityScheduler, which needs to respond quickly to applications resource requests, and node updates, and the more introspective, time-based considerations needed to observe and correct for capacity balance. To this purpose we opted instead of hacking the delicate mechanisms of the CapacityScheduler directly to add support for preemption by means of a Capacity Monitor, which can be run optionally as a separate service (much like the NMLivelinessMonitor). The capacity monitor (similarly to equivalent functionalities in the fairness scheduler) operates running on intervals (e.g., every 3 seconds), observe the state of the assignment of resources to queues from the capacity scheduler, performs off-line computation to determine if preemption is needed, and how best to edit the current schedule to improve capacity, and generates events that produce four possible actions: # Container de-reservations # Resource-based preemptions # Container-based preemptions # Container killing The actions listed above are progressively more costly, and it is up to the policy to use them as desired to achieve the rebalancing goals. Note that due to the lag in the effect of these actions the policy should operate at the macroscopic level (e.g., preempt tens of containers from a queue) and not trying to tightly and consistently micromanage container allocations. - Preemption policy (ProportionalCapacityPreemptionPolicy): - Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows: # it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*) # if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**) # it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and respecting bounds on the amount of preemption we allow for each round) # it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order) # it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits # (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left, # (if not enough) it moves onto unreserve and preempt from the next application. # containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed. Notes: (*) at the moment, in order to avoid double-counting of the requests, we only look at the ANY part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any. (**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point. Tunables of the ProportionalCapacityPreemptionPolicy: # observe-only mode (i.e., log the actions it would take, but behave as read-only) # how frequently to run the policy # how long to wait between preemption and kill of a container # which fraction of the containers I would like to obtain should I preempt (has to
[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption
[ https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-568: - Issue Type: Sub-task (was: Improvement) Parent: YARN-45 FairScheduler: support for work-preserving preemption -- Key: YARN-568 URL: https://issues.apache.org/jira/browse/YARN-568 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 2.1.0-beta Attachments: YARN-568-1.patch, YARN-568-2.patch, YARN-568-2.patch, YARN-568.patch, YARN-568.patch In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-780) Expose preemption warnings in AMRMClient
[ https://issues.apache.org/jira/browse/YARN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-780: - Issue Type: Sub-task (was: Improvement) Parent: YARN-45 Expose preemption warnings in AMRMClient Key: YARN-780 URL: https://issues.apache.org/jira/browse/YARN-780 Project: Hadoop YARN Issue Type: Sub-task Components: api, client Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza When the scheduler gives feedback on containers that need to be released/will be preempted, this should be passed on to users of AMRMClient and AMRMClientAsync -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-650: - Issue Type: Sub-task (was: Task) Parent: YARN-45 User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.4.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cindy Li updated YARN-1525: --- Assignee: Cindy Li (was: Xuan Gong) Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.6.patch New patch added the test for testing reserved container to be killed by the previous attempt. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866999#comment-13866999 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622248/YARN-1490.6.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2841//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867004#comment-13867004 ] Cindy Li commented on YARN-1525: I've been thinking about the best way to find the current active RM for a standby RM. Considering two options now: one is to establish connection to zookeeper, the other is to query active/standby state related to YARN-1033. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867002#comment-13867002 ] Karthik Kambatla commented on YARN-1525: Thinking out loud: Given that YARN-1482 enables webapp even in Standby mode, I think there is merit to not redirecting to the Active always. How about we redirect only when querying applications/nodes etc, and have the About section not redirect? Also, if it is not too much trouble, I would prefer getting in YARN-1033 in first so we can verify we aren't messing with that functionality either. I should have a patch later today. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867022#comment-13867022 ] Cindy Li commented on YARN-1525: @Karthik Kambatla, I'm proposing the following. From a user's perspective, it would be convenient to be able to access the current active RM's URL. If a user types in the url of a standby RM (which used to be the active one after failover), YARN-1033 will show this is a standby node. Then this jira YARN-1525 will use that information to redirect to the current active RM's webpage, and showing a message redirecting to the current active RM: active RM's url, then the user can go to that url directly next time. This looks more convenient for users. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-650) User guide for preemption
[ https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867023#comment-13867023 ] Hadoop QA commented on YARN-650: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582234/Y650-0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2842//console This message is automatically generated. User guide for preemption - Key: YARN-650 URL: https://issues.apache.org/jira/browse/YARN-650 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Chris Douglas Priority: Minor Fix For: 2.4.0 Attachments: Y650-0.patch YARN-45 added a protocol for the RM to ask back resources. The docs on writing YARN applications should include a section on how to interpret this message. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows
[ https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867108#comment-13867108 ] Hadoop QA commented on YARN-1138: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622244/YARN-1138.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2840//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2840//console This message is automatically generated. yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows --- Key: YARN-1138 URL: https://issues.apache.org/jira/browse/YARN-1138 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.2.0 Reporter: Yingda Chen Assignee: Chuan Liu Attachments: YARN-1138.2.patch, YARN-1138.patch yarn-default.xml has yarn.application.classpath entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867169#comment-13867169 ] Karthik Kambatla commented on YARN-1574: The test looks much cleaner now. I should have thought of this earlier - we should remove the dispatcher added to the RM's serviceList (through addIfService) and add the new one as well. Otherwise, we could be creating a memory leak. Not sure if we could add a unit test for this, but would be nice to make sure there is no such leak using jmap or something. I also noticed CompositeService#removeService is broken. I am okay with fixing that too in the same JIRA or a different JIRA. Either way, our tests should probably cover that too. When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1041) Protocol changes for RM to bind and notify a restarted AM of existing containers
[ https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867184#comment-13867184 ] Vinod Kumar Vavilapalli commented on YARN-1041: --- Knowing well this is an early patch, quick review comments - Document RegisterApplicationMasterResponse.getRunningContainers() with its semantics. Particularly about allocated, acquired, reserved containers from previous AM being killed. - yarn_service_protos.proto: running_containers - containers_from_previous_attempt ? Running or not may(or may not) change depending on the implementation. If we do this, we should change the name everywhere. - YarnScheduler.getAllRunningContainers(): Rename and also deal with an app-attempt instead of application? Because that is how scheduler looks at containers - per attempt. - Per [YARN-1490 comment|https://issues.apache.org/jira/browse/YARN-1490?focusedCommentId=13866267page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866267], we need to take care of NMTokens too. The code duplication in schedulers is a bigger problem that needs to be addressed. But it has to start somewhere. I am +1 for starting with a AbstractYarnScheduler. Protocol changes for RM to bind and notify a restarted AM of existing containers Key: YARN-1041 URL: https://issues.apache.org/jira/browse/YARN-1041 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Jian He Attachments: YARN-1041.1.patch For long lived containers we don't want the AM to be a SPOF. When the RM restarts a (failed) AM, it should be given the list of containers it had already been allocated. the AM should then be able to contact the NMs to get details on them. NMs would also need to do any binding of the containers needed to handle a moved/restarted AM. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
Thomas Graves created YARN-1582: --- Summary: Capacity Scheduler: add a maximum-allocation-mb setting per queue Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.2.0, 0.23.10, 3.0.0 Reporter: Thomas Graves Assignee: Thomas Graves We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1033: --- Summary: Expose RM active/standby state to Web UI and REST API (was: Expose RM active/standby state to web UI and metrics) Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Cluster metrics also need this state for monitor. Standby RM web services shall refuse client request unless querying for RM state. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-1581. --- Resolution: Duplicate Duplicate of YARN-1417. Fair Scheduler: containers that get reserved create container token to early Key: YARN-1581 URL: https://issues.apache.org/jira/browse/YARN-1581 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.2.0 Environment: CentOS 6.4 Reporter: German Florez-Larrahondo Priority: Minor When using the FairScheduler with long running jobs and over-subscribed servers I hit (what I believe is) the same issue reported on https://issues.apache.org/jira/browse/YARN-180 , which talked specifically about the CapacityScheduler: In my case, with the FairScheduler I am seeing those corner cases when the NodeManager is finally ready to start a container that was reserved but the token was already expired. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1033: --- Attachment: yarn-1033-1.patch Straight-forward patch that adds the HA state information to ClusterInfo REST and Web UI. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1033: --- Description: Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. (was: Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Cluster metrics also need this state for monitor. Standby RM web services shall refuse client request unless querying for RM state.) Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload
[ https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867219#comment-13867219 ] Vinod Kumar Vavilapalli commented on YARN-1567: --- bq. I don't think that we should use conciliation as a sole basis for stopping new features I get the use case. Didn't say that we should stop this. If we are doing this for FS, I was mentioning that we should do it for CS too assuming the validity of the use-case. bq. The Fair Scheduler does not have a notion of stopped queues. bq. Queues have a lot in common once they exist, but the way queues are configured, loaded, and managed may be different to the point of irreconcilable between the Capacity Scheduler and Fair Scheduler. We've done the reconciliation in Hadoop-1. Though not in the 1.x line. It happened in 0.21/0.22. So I can see how it can be done in Hadoop-2 also. In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload - Key: YARN-1567 URL: https://issues.apache.org/jira/browse/YARN-1567 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1567-1.patch, YARN-1567.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1579: --- Attachment: yarn-1579-1.patch Trivial patch that modifies the fields to optional. ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867248#comment-13867248 ] Vinod Kumar Vavilapalli commented on YARN-888: -- Yeah. I can see that. Either ways, it is a convention. Both are equally hard/easy to enforce. May be we should put comments in the pom files about the agreed upon convention - the one that the last patch does. I can give it a try on my setup in the mean while. Tx. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867254#comment-13867254 ] Karthik Kambatla commented on YARN-1033: The posted patch doesn't add the information to JMX. It is slightly more involved and thought we could do it when we need it. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload
[ https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867262#comment-13867262 ] Sandy Ryza commented on YARN-1567: -- bq. Didn't say that we should stop this. If we are doing this for FS, I was mentioning that we should do it for CS too assuming the validity of the use-case. Ah, sorry, misunderstood. In that case, I totally agree we should add this in a way that keeps the semantics consistent across the schedulers. I just read up on the meaning of QueueState in the Capacity Scheduler. Looks like we should be able to add the same to the Fair Scheduler. I think queues should need to be empty, not necessarily stopped, to do a leaf-parent change. Like the leaf-parent change, changing a queue from empty to stopped occurs on scheduler configuration reload. I see no reason that we should require two reloads to turn an empty leaf queue into a parent. We should of course ensure that we are synchronizing properly so that an app does not end up getting to submitted to the queue that is now a parent. In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload - Key: YARN-1567 URL: https://issues.apache.org/jira/browse/YARN-1567 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1567-1.patch, YARN-1567.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867265#comment-13867265 ] Sandy Ryza commented on YARN-1579: -- +1 ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867267#comment-13867267 ] Hadoop QA commented on YARN-1579: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622280/yarn-1579-1.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2844//console This message is automatically generated. ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867266#comment-13867266 ] Hadoop QA commented on YARN-1033: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622276/yarn-1033-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2843//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2843//console This message is automatically generated. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()
Ted Yu created YARN-1583: Summary: Ineffective state check in FairSchedulerAppsBlock#render() Key: YARN-1583 URL: https://issues.apache.org/jira/browse/YARN-1583 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Starting line 90: {code} for (RMApp app : apps.values()) { if (reqAppStates != null !reqAppStates.contains(app.getState())) { {code} reqAppStates is of type YarnApplicationState. app.getState() returns RMAppState. These are two different enum types. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867280#comment-13867280 ] Karthik Kambatla commented on YARN-1033: Deployed a psuedo-dist cluster, verified on rm-address/cluster/cluster (About page) and rm-address/ws/v1/cluster/info (REST API) that the haState is reflected as I toggle between Active and Standby states. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1579: --- Attachment: yarn-1579-1.patch Uploading the same patch again. ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1579-1.patch, yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867281#comment-13867281 ] Sandy Ryza commented on YARN-1033: -- +1 Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867283#comment-13867283 ] Karthik Kambatla commented on YARN-1033: Thanks Sandy. I ll commit this tomorrow morning PT if there are no objections. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867293#comment-13867293 ] Sandy Ryza commented on YARN-1496: -- Thought ChangeApplicationQueue would be better because it's shorter, but I see what you're saying. MoveApplicationAcrossQueues sounds find to me. Uploaded a new patch with MoveApplicationAcrossQueues and marked the APIs as Unstable. Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1496: - Attachment: YARN-1496-5.patch Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867294#comment-13867294 ] Karthik Kambatla commented on YARN-1525: I am sorry if I came out wrong, I do see merit to redirecting to the Active. I think we should redirect in a way that the HA state information exposed by YARN-1033 continues to be available on both Active/ Standby RMs. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.7.patch RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867315#comment-13867315 ] Junping Du commented on YARN-1506: -- bq. Patch looks good to me mostly, Bikas Saha/ Vinod Kumar Vavilapalli you may also want to take a look. [~bikassaha] and [~vinodkv], would you help to review it? Thanks! Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867323#comment-13867323 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622297/YARN-1490.7.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2845//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867329#comment-13867329 ] Hadoop QA commented on YARN-1496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622295/YARN-1496-5.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2847//console This message is automatically generated. Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867336#comment-13867336 ] Hadoop QA commented on YARN-1579: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622293/yarn-1579-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2846//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2846//console This message is automatically generated. ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-1579-1.patch, yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867355#comment-13867355 ] Bikas Saha commented on YARN-1506: -- I will get to this by the weekend. Sorry for the delay Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1579: --- Priority: Trivial (was: Major) ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Attachments: yarn-1579-1.patch, yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags
[ https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867362#comment-13867362 ] Karthik Kambatla commented on YARN-1399: [~vinodkv], [~sandyr] - do we agree on setting the default scope to ALL and have Oozie set it explicitly to OWN? Allow users to annotate an application with multiple tags - Key: YARN-1399 URL: https://issues.apache.org/jira/browse/YARN-1399 Project: Hadoop YARN Issue Type: Improvement Reporter: Zhijie Shen Nowadays, when submitting an application, users can fill the applicationType field to facilitate searching it later. IMHO, it's good to accept multiple tags to allow users to describe their applications in multiple aspects, including the application type. Then, searching by tags may be more efficient for users to reach their desired application collection. It's pretty much like the tag system of online photo/video/music and etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional
[ https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867376#comment-13867376 ] Hudson commented on YARN-1579: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4982 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4982/]) YARN-1579. ActiveRMInfoProto fields should be optional (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557001) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto ActiveRMInfoProto fields should be optional --- Key: YARN-1579 URL: https://issues.apache.org/jira/browse/YARN-1579 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Trivial Fix For: 2.4.0 Attachments: yarn-1579-1.patch, yarn-1579-1.patch Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867386#comment-13867386 ] Jian He commented on YARN-1490: --- - New patch got rid of the local flag transferStateFromPreviousAttempt inside RMAppAttemptImpl and SchedulerApplication, and notify the transferContainersFromPreviousAttempt through AppAddedSchedulerEvent , and keepContainersAcrossAttempts through AppRemovedSchedulerEvent. - similar things for RMAppAttempt to notify RMApp for transferring the state through event. - Fixed a bug in RMAppAttemptImpl.BaseFinalTransition. Missed to set the flag to false in RMAppFailedAttemptEvent if this is the last attempt or unmanagedAM. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.9.patch Fixed one test name in TestRMAppAttemptTransitions RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1496: - Attachment: YARN-1496-6.patch Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496-6.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867408#comment-13867408 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622310/YARN-1490.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2848//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2848//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2848//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867412#comment-13867412 ] Vinod Kumar Vavilapalli commented on YARN-888: -- Playing with the patch. hadoop-yarn-project's pom.xml has some deps. So this indeed looks like the hybrid approach? I ran dependency:analyze, seeing this for hadoop-yarn-api {code} [INFO] --- maven-dependency-plugin:2.2:analyze (default-cli) @ hadoop-yarn-api --- [WARNING] Unused declared dependencies found: [WARNING]org.apache.hadoop:hadoop-common:jar:3.0.0-SNAPSHOT:provided [WARNING]org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile {code} I guess this is what you meant by the following in the pom files: {code} !-- 'mvn dependency:analyze' fails to detect use of this dependency -- {code} If dependency plugin is broken like this, we will have to depend (no pun intended) on something else for correctness. We should set up a single node cluster atleast to ensure that all is well. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867413#comment-13867413 ] Nemon Lou commented on YARN-1033: - Thanks Karthik Kambatla .You are really efficient. +1(non-binding) Agree that HA state in JMX can be added later in another JIRA when needed. Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.10.patch fix the find bug RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.
[ https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867420#comment-13867420 ] Cindy Li commented on YARN-1525: Ok. I'll try to accommodate the change in YARN-1033. Web UI should redirect to active RM when HA is enabled. --- Key: YARN-1525 URL: https://issues.apache.org/jira/browse/YARN-1525 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Cindy Li When failover happens, web UI should redirect to the current active rm. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867424#comment-13867424 ] Junping Du commented on YARN-1506: -- Thank you, Bikas! Replace set resource change on RMNode/SchedulerNode directly with event notification. - Key: YARN-1506 URL: https://issues.apache.org/jira/browse/YARN-1506 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, scheduler Reporter: Junping Du Assignee: Junping Du Priority: Blocker Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch According to Vinod's comments on YARN-312 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.11.patch RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867434#comment-13867434 ] Hadoop QA commented on YARN-1490: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622324/YARN-1490.11.patch against trunk revision . {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2852//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867440#comment-13867440 ] Hadoop QA commented on YARN-1490: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622320/YARN-1490.10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2851//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2851//console This message is automatically generated. RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()
[ https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867466#comment-13867466 ] Bangtao Zhou commented on YARN-1583: Line #84 {code}reqAppStates = new HashSetRMAppState(appStateStrings.length);{code} so what do you mean reqAppStates is of type YarnApplicationState.? Ineffective state check in FairSchedulerAppsBlock#render() -- Key: YARN-1583 URL: https://issues.apache.org/jira/browse/YARN-1583 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Starting line 90: {code} for (RMApp app : apps.values()) { if (reqAppStates != null !reqAppStates.contains(app.getState())) { {code} reqAppStates is of type YarnApplicationState. app.getState() returns RMAppState. These are two different enum types. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues
[ https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867465#comment-13867465 ] Hadoop QA commented on YARN-1496: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622315/YARN-1496-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2850//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2850//console This message is automatically generated. Protocol additions to allow moving apps between queues -- Key: YARN-1496 URL: https://issues.apache.org/jira/browse/YARN-1496 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496-6.patch, YARN-1496.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()
[ https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867472#comment-13867472 ] Ted Yu commented on YARN-1583: -- I was looking at this class in trunk. I don't see the above code snippet. Which branch are you checking ? Ineffective state check in FairSchedulerAppsBlock#render() -- Key: YARN-1583 URL: https://issues.apache.org/jira/browse/YARN-1583 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Starting line 90: {code} for (RMApp app : apps.values()) { if (reqAppStates != null !reqAppStates.contains(app.getState())) { {code} reqAppStates is of type YarnApplicationState. app.getState() returns RMAppState. These are two different enum types. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867473#comment-13867473 ] Xuan Gong commented on YARN-1574: - bq. I should have thought of this earlier - we should remove the dispatcher added to the RM's serviceList (through addIfService) and add the new one as well. Otherwise, we could be creating a memory leak. Not sure if we could add a unit test for this, but would be nice to make sure there is no such leak using jmap or something. fixed bq.I also noticed CompositeService#removeService is broken. I am okay with fixing that too in the same JIRA or a different JIRA. Either way, our tests should probably cover that too. fixed When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch, YARN-1574.5.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1574: Attachment: YARN-1574.5.patch When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch, YARN-1574.5.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API
[ https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867484#comment-13867484 ] Xuan Gong commented on YARN-1033: - +1 LGTM Expose RM active/standby state to Web UI and REST API - Key: YARN-1033 URL: https://issues.apache.org/jira/browse/YARN-1033 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Nemon Lou Assignee: Karthik Kambatla Attachments: yarn-1033-1.patch Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1166: -- Attachment: YARN-1166.8.patch Thanks Vinod and Jian for your review. I uploaded a new patch. bq. One comment other than Jian's. runAppAttempt API can just take in an ApplicationID? Similarly, finishAppAttempt can just take appId and user. Refactoring the code accordingly. bq. @zhijie, last time I checked YARN-915 should be caused by this. If so, can you add a unit test for the restart scenario. thanks! The test case for RM restart is added. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.8.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867488#comment-13867488 ] Alejandro Abdelnur commented on YARN-888: - [~vinodkv], thx for taking the time to play with the patch. bq. hadoop-yarn-project's pom.xml has some deps ... This was an oversight from end as I've traversed the parent poms starting from the leafs and the yarn modules do not have hadoop-yarn-project as parent. This means the dependencies there were not being used. I'm attaching a new patch removing the dependencies section from the hadoop-yarn-project. Thanks for catching that. bq. I guess this was you meant by Correct. However, I wouldn't say the plugin is broken, but it has limitations (it cannot detect usage of classes loaded via reflection, it cannot detect use of constant for primitive types and Strings, etc). bq. We should set up a single node cluster atleast to ensure that all is well. The produced TARBALL has the exact same set of JAR files, so I would not expect this being an issue. However, just to be safe, I've just did a build with the patch, started minicluster and run a couple of example jobs. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once
[ https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867497#comment-13867497 ] Hadoop QA commented on YARN-1574: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622337/YARN-1574.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2853//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2853//console This message is automatically generated. When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once Key: YARN-1574 URL: https://issues.apache.org/jira/browse/YARN-1574 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch, YARN-1574.5.patch Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService. Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-888) clean up POM dependencies
[ https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867514#comment-13867514 ] Hadoop QA commented on YARN-888: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622339/YARN-888.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2855//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2855//console This message is automatically generated. clean up POM dependencies - Key: YARN-888 URL: https://issues.apache.org/jira/browse/YARN-888 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, yarn-888-2.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'
[ https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867502#comment-13867502 ] Hadoop QA commented on YARN-1166: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622338/YARN-1166.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2854//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2854//console This message is automatically generated. YARN 'appsFailed' metric should be of type 'counter' Key: YARN-1166 URL: https://issues.apache.org/jira/browse/YARN-1166 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Srimanth Gunturi Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.8.patch, YARN-1166.patch Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of type 'guage' - which means the exact value will be reported. All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) are all of type 'counter' - meaning Ganglia will use slope to provide deltas between time-points. To be consistent, AppsFailed metric should also be of type 'counter'. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()
[ https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867556#comment-13867556 ] Bangtao Zhou commented on YARN-1583: I am checking on **branch-2.2.0**, and I just checked on **trunk** a moment ago, the problem you mentioned exists indeed It's my carelessness Ineffective state check in FairSchedulerAppsBlock#render() -- Key: YARN-1583 URL: https://issues.apache.org/jira/browse/YARN-1583 Project: Hadoop YARN Issue Type: Bug Reporter: Ted Yu Priority: Minor Starting line 90: {code} for (RMApp app : apps.values()) { if (reqAppStates != null !reqAppStates.contains(app.getState())) { {code} reqAppStates is of type YarnApplicationState. app.getState() returns RMAppState. These are two different enum types. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1490: -- Attachment: YARN-1490.11.patch submit the same patch to kick jenkins RM should optionally not kill all containers when an ApplicationMaster exits Key: YARN-1490 URL: https://issues.apache.org/jira/browse/YARN-1490 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-1490.1.patch, YARN-1490.10.patch, YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch This is needed to enable work-preserving AM restart. Some apps can chose to reconnect with old running containers, some may not want to. This should be an option. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867591#comment-13867591 ] Xuan Gong commented on YARN-1410: - Posting a patch that outlines the approach. Thanks to the AtMostOnce annotation and idempotent annotation, we only need to do a little code changes. * As my previous comment, we will make RM accept the appId in the context. When the failover happens, we will re-use the old applicationId (assigned by previous active RM) to submit the applications for current active RM * Use AtMostOnce and idempotent annotation ** As we discussed in YARN-1521, submitApplication and getNewApplication can not be idempotent. To make those functions retry, we can add AtMostOnce annotation. And getApplicationReport can be marked as idempotent. ** I would like to add related annotations for these three apis here, because I think that this is part of work for this ticket * This is how application submission works. YarnClient#SubmitApplication +call+ ClientRMService#SubmitApplication +call+ RMAppManager#SubmitApplication +create+ RMApp and submit the START Event +ReturnBack+ ClientRMService#SubmitApplication +ReturnBack+ YarnClient#SubmitApplication +CheckingAppStatus+ getApplicationReport +END+ ** The failover may happen in any steps or between any steps. ** If failover happens : *** between the time that YarnClient#SubmitApplication starts and the time that ClientRMService#SubmitApplication is called. The YarnClient will find the next active RM, and continue to do where it left. *** between the time that ClientRMService#SubmitApplication starts and the time that RMAppManager#SubmitApplication is called. We will restart ClientRMService#SubmitApplication(re-run it from the first line). At this time, the application has not been saved in zookeeper yet, so we are fine to restart ClientRMService#SubmitApplication. *** between the time that RMAppManager#SubmitApplication starts and the time that RMApp has been created and START EVENT has been submitted. We will do the same thing as previous case. *** after the time that RMApp has been created and START Event has been send out. If the failover happens, there are several different cases: after YarnClient got the SubmitApplicationResponse, but state of RMApp has not been saved in zookeeper yet. If the failover happens, when we try to getApplicationReport, we will get ApplicationNotFoundException. What I am doing here is to catch this exception, and call YarnClient#SubmitApplication again. after YarnClient got the SubmitApplicationResponse, and the state of RMApp has been saved in zookeeper. If the failover happens, we do not need to do anything. before YarnClient got the SubmitApplicationResponse, and state of RMApp has not been saved in zookeeper yet. If the failover happens, we will restart ClientRMService#SubmitApplication at very beginning before YarnClient got the SubmitApplicationResponse, but state of RMApp has been saved in zookeeper. This is the most tricky case. If the failover happens here, we will re-run ClientRMService#SubmitApplication at very beginning. It will try to re-submit the application with the old applicationId. But since we have already saved this application in zookeeper, we will get a Application with id already exists exception which is *not* we want. For the last corner case, [~bikassaha], [~kkambatl] Any suggestions ? Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1410: Attachment: YARN-1410-outline.patch Handle client failover during 2 step client API's like app submission - Key: YARN-1410 URL: https://issues.apache.org/jira/browse/YARN-1410 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-1410-outline.patch, YARN-1410.1.patch Original Estimate: 48h Remaining Estimate: 48h App submission involves 1) creating appId 2) using that appId to submit an ApplicationSubmissionContext to the user. The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)