[jira] [Updated] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified
[ https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1260: -- Priority: Major (was: Blocker) I agree it breaks setup, but it is no blocker as there is a clear work around. RM_HOME link breaks when webapp.https.address related properties are not specified -- Key: YARN-1260 URL: https://issues.apache.org/jira/browse/YARN-1260 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta, 2.1.2-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Attachments: YARN-1260.20131030.1.patch This issue happens in multiple node cluster where resource manager and node manager are running on different machines. Steps to reproduce: 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml 2) set hadoop.ssl.enabled = true in core-site.xml 3) Do not specify below property in yarn-site.xml yarn.nodemanager.webapp.https.address and yarn.resourcemanager.webapp.https.address Here, the default value of above two property will be considered. 4) Go to nodemanager web UI https://nodemanager host:8044/node 5) Click on RM_HOME link This link redirects to https://nodemanager host:8090/cluster instead https://resourcemanager host:8090/cluster -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified
[ https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783689#comment-13783689 ] Vinod Kumar Vavilapalli commented on YARN-1260: --- Patch is trivial and looks good anyways. Checking this in. RM_HOME link breaks when webapp.https.address related properties are not specified -- Key: YARN-1260 URL: https://issues.apache.org/jira/browse/YARN-1260 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta, 2.1.2-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Attachments: YARN-1260.20131030.1.patch This issue happens in multiple node cluster where resource manager and node manager are running on different machines. Steps to reproduce: 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml 2) set hadoop.ssl.enabled = true in core-site.xml 3) Do not specify below property in yarn-site.xml yarn.nodemanager.webapp.https.address and yarn.resourcemanager.webapp.https.address Here, the default value of above two property will be considered. 4) Go to nodemanager web UI https://nodemanager host:8044/node 5) Click on RM_HOME link This link redirects to https://nodemanager host:8090/cluster instead https://resourcemanager host:8090/cluster -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783702#comment-13783702 ] Bikas Saha commented on YARN-445: - How does the Windows JVM handle ctrl-break? How would be emulate a ctrl-c signal that would trigger the JVM shutdown hook? Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783708#comment-13783708 ] Bikas Saha commented on YARN-1197: -- Still thinking through the RM-NM interactions. The request for change should probably be a new object that is basically a map of (containerId, Resource) where Resource is new value for the existing containerId. Not quite sure how we would use the new container token for a running container since its only used in start container. If we wait for RM to sync with NM about the increased resources then it might be too slow since this happens on a heartbeat and the heartbeat interval can be in the order of seconds. An alternative would be a new NM API to allow AM's to increase resources and this would be signed with new container token. But this would burden the AMs by requiring them to make that additional call. There could be a race between a new container token coming in with increased resources for an acquired container and the old container token being used by the NMClient to launch the container (in case the AM decides to launch the smaller container while it was waiting for an increase). Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1232: --- Attachment: yarn-1232-6.patch Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-658: - Target Version/s: 2.1.2-beta Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783714#comment-13783714 ] Karthik Kambatla commented on YARN-1232: [~bikassaha], thanks. Updated the patch to address all your comments except one. Changed the two configs to {{yarn.resourcemanager.ha.rm-ids}} and {{yarn.resourcemanager.ha.id}} - the reason for including ha in the latter is because the RM's id is relevant only when HA is enabled. For the tokens themselves, a logical-name is more apt and is best added by the JIRA that handles the token-related logic (may be, YARN-986). Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783715#comment-13783715 ] Karthik Kambatla commented on YARN-1232: bq. In HA enabled scenarios we need the explicit rm ids. How are we handling rm-id in non-HA setups or in existing clusters where no rm-id is being set currently? This RM id is used only when the HA is enabled. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783724#comment-13783724 ] Hadoop QA commented on YARN-1232: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606267/yarn-1232-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2060//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2060//console This message is automatically generated. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1010) FairScheduler: decouple container scheduling from nodemanager heartbeats
[ https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783813#comment-13783813 ] Hudson commented on YARN-1010: -- FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/350/]) YARN-1010. FairScheduler: decouple container scheduling from nodemanager heartbeats. (Wei Yan via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528192) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: decouple container scheduling from nodemanager heartbeats Key: YARN-1010 URL: https://issues.apache.org/jira/browse/YARN-1010 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Wei Yan Priority: Critical Fix For: 2.3.0 Attachments: YARN-1010.patch, YARN-1010.patch Currently scheduling for a node is done when a node heartbeats. For large cluster where the heartbeat interval is set to several seconds this delays scheduling of incoming allocations significantly. We could have a continuous loop scanning all nodes and doing scheduling. If there is availability AMs will get the allocation in the next heartbeat after the one that placed the request. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1215) Yarn URL should include userinfo
[ https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783815#comment-13783815 ] Hudson commented on YARN-1215: -- FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/350/]) YARN-1215. Correct CHANGES.txt. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528239) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt YARN-1215. Yarn URL should include userinfo. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528233) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/URL.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/URLPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestConverterUtils.java Yarn URL should include userinfo Key: YARN-1215 URL: https://issues.apache.org/jira/browse/YARN-1215 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 3.0.0 Reporter: Chuan Liu Assignee: Chuan Liu Fix For: 3.0.0, 2.1.2-beta Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an userinfo as part of the URL. When converting a {{java.net.URI}} object into the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will set uri host as the url host. If the uri has a userinfo part, the userinfo is discarded. This will lead to information loss if the original uri has the userinfo, e.g. foo://username:passw...@example.com will be converted to foo://example.com and username/password information is lost during the conversion. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading
[ https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783816#comment-13783816 ] Hudson commented on YARN-1228: -- FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/350/]) YARN-1228. Clean up Fair Scheduler configuration loading. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528201) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/test-fair-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm Clean up Fair Scheduler configuration loading - Key: YARN-1228 URL: https://issues.apache.org/jira/browse/YARN-1228 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch Currently the Fair Scheduler is configured in two ways * An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties. * With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format. The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml. It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary. Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems 1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default. 2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified. We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1262) TestApplicationCleanup relies on all containers assigned in a single heartbeat
[ https://issues.apache.org/jira/browse/YARN-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783828#comment-13783828 ] Hudson commented on YARN-1262: -- FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/350/]) YARN-1262. TestApplicationCleanup relies on all containers assigned in a single heartbeat (Karthik Kambatla via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528243) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java TestApplicationCleanup relies on all containers assigned in a single heartbeat -- Key: YARN-1262 URL: https://issues.apache.org/jira/browse/YARN-1262 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Karthik Kambatla Fix For: 2.1.2-beta Attachments: yarn-1262-1.patch TestApplicationCleanup submits container requests and waits for allocations to come in. It only sends a single node heartbeat to the node, expecting multiple containers to be assigned on this heartbeat, which not all schedulers do by default. This is causing the test to fail when run with the Fair Scheduler. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified
[ https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783830#comment-13783830 ] Hudson commented on YARN-1260: -- FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/350/]) YARN-1260. Added webapp.http.address to yarn-default.xml so that default install with https enabled doesn't have broken link on NM UI. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528312) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml RM_HOME link breaks when webapp.https.address related properties are not specified -- Key: YARN-1260 URL: https://issues.apache.org/jira/browse/YARN-1260 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.1-beta, 2.1.2-beta Reporter: Yesha Vora Assignee: Omkar Vinit Joshi Fix For: 2.1.2-beta Attachments: YARN-1260.20131030.1.patch This issue happens in multiple node cluster where resource manager and node manager are running on different machines. Steps to reproduce: 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml 2) set hadoop.ssl.enabled = true in core-site.xml 3) Do not specify below property in yarn-site.xml yarn.nodemanager.webapp.https.address and yarn.resourcemanager.webapp.https.address Here, the default value of above two property will be considered. 4) Go to nodemanager web UI https://nodemanager host:8044/node 5) Click on RM_HOME link This link redirects to https://nodemanager host:8090/cluster instead https://resourcemanager host:8090/cluster -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784012#comment-13784012 ] Wangda Tan commented on YARN-1197: -- To [~tucu00], I think the heap size change (Xmx, etc.) of running JVM based container is not totally related to this topic. If user want to change a JVM-based container size, he/she may use a watcher process launch the worker process in a container, and relaunch the worker process with different JVM parameters if needed. In a word, if we cannot solve this in language side, we can solve it in application side. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784061#comment-13784061 ] Wangda Tan commented on YARN-1197: -- To [~bikassaha], Thanks for your comments, see my opinions below, {quote} Still thinking through the RM-NM interactions. The request for change should probably be a new object that is basically a map of (containerId, Resource) where Resource is new value for the existing containerId. Not quite sure how we would use the new container token for a running container since its only used in start container. {quote} Agree, we need to update interface of YarnScheduler.allocate to accept this as a paramter if we make request for change independent. And as you mentioned below, we can use the new token to update NM's resource monitoring limitations of containers. {quote} If we wait for RM to sync with NM about the increased resources then it might be too slow since this happens on a heartbeat and the heartbeat interval can be in the order of seconds. An alternative would be a new NM API to allow AM's to increase resources and this would be signed with new container token. But this would burden the AMs by requiring them to make that additional call. {quote} Agree, this is much more time-effective than RM-NM communications. Yes, it's a cost for both AM/NM for changing container size, but AM should be self-discipline not do this too frequent. {quote} There could be a race between a new container token coming in with increased resources for an acquired container and the old container token being used by the NMClient to launch the container (in case the AM decides to launch the smaller container while it was waiting for an increase). {quote} Hmmm... thanks for reminding, this is really a problem. I find another issue is AM may lie to RM/NM about resource usage, AM can 1) allocate a big container, launch it 2) ask for decrease the container, RM released resource in corresponding node/application 3) but AM doesn't tell NM about this decrease, it can still use resource before releasing in the container I don't have a good idea to solve such problem now. Hope to get more idea from you about this, I will think it through as well. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-677: - Summary: Increase coverage to FairScheduler (was: Add test methods in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784086#comment-13784086 ] Jonathan Eagles commented on YARN-677: -- +1. Thanks for the coverage addition for this component. Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784104#comment-13784104 ] Hudson commented on YARN-677: - SUCCESS: Integrated in Hadoop-trunk-Commit #4516 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4516/]) YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784134#comment-13784134 ] Bikas Saha commented on YARN-1232: -- Will there be different rm id's for ha and non-ha cases? Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784149#comment-13784149 ] Jonathan Eagles commented on YARN-465: -- I haven't looked too closely at this, but I see a setAccessible call. This is the same technique that powermock uses to access field which has been a disalllowed testing technique in the hadoop stack. The reason being that it points usually to an improvement that should be made to the class under test. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1167: Attachment: YARN-1167.2.patch Added a test case Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784186#comment-13784186 ] David Yan commented on YARN-658: Vinod, Thanks again for looking into this. My custom AM is not trapping any system signal, and I don't have yarn.nodemanager.sleep-delay-before-sigkill.ms set. The thing is, the exact same code works with Ubuntu 12.04, but not 12.10 or 13.04. Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784199#comment-13784199 ] Andrey Klochkov commented on YARN-445: -- Bikas, on Windows JVM prints full thread dump on ctrl+break. I think ctrl+c may be emulated in the same way and used in place of TERM on Windows, via the same signalContainers API. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty
[ https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784194#comment-13784194 ] Hadoop QA commented on YARN-1167: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606390/YARN-1167.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2061//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2061//console This message is automatically generated. Submitted distributed shell application shows appMasterHost = empty --- Key: YARN-1167 URL: https://issues.apache.org/jira/browse/YARN-1167 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Tassapol Athiapinya Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1167.1.patch, YARN-1167.2.patch Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty. ==console logs== distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-445: - Attachment: YARN-445--n2.patch fixing javadoc warnings and the failed test Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784222#comment-13784222 ] Sandy Ryza commented on YARN-1197: -- It seems to me that for the reasons Bikas mentioned and for consistency with the way container launch is done, the AM should be the one who tells the NM to do the resize. If resources are released, then the NM would tell the RM about the newly free space on its next heartbeat after the resize has completed. Only then would the scheduler consider those resources available. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784282#comment-13784282 ] Robert Parker commented on YARN-658: looking at https://launchpad.net/ubuntu/+source/procps it appears that 13.10 will still deploy procps-3.3.3 and the /bin/kill parameter parsing bug is not fixed until 3.3.4. Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784285#comment-13784285 ] Bikas Saha commented on YARN-1232: -- Correct. My question is whether there will be 2 different config items to specify the logical name for the rm? In ha case its ha.id and in non-ha case its rm.id? Or should this jira just use rm.id and not ha.id? Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784284#comment-13784284 ] Hadoop QA commented on YARN-445: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606399/YARN-445--n2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2062//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/2062//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2062//console This message is automatically generated. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-445) Ability to signal containers
[ https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784292#comment-13784292 ] Andrey Klochkov commented on YARN-445: -- As I understand this Findbugs warning should be ignored as it's complaining about a valid type cast. Ability to signal containers Key: YARN-445 URL: https://issues.apache.org/jira/browse/YARN-445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Attachments: YARN-445--n2.patch, YARN-445.patch It would be nice if an ApplicationMaster could send signals to contaniers such as SIGQUIT, SIGUSR1, etc. For example, in order to replicate the jstack-on-task-timeout feature implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an interface for sending SIGQUIT to a container. For that specific feature we could implement it as an additional field in the StopContainerRequest. However that would not address other potential features like the ability for an AM to trigger jstacks on arbitrary tasks *without* killing them. The latter feature would be a very useful debugging tool for users who do not have shell access to the nodes. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-621: --- Attachment: YARN-621.20131001.1.patch RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784307#comment-13784307 ] Omkar Vinit Joshi commented on YARN-621: attaching patch...fixed path specs.. RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784337#comment-13784337 ] Hadoop QA commented on YARN-621: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606419/YARN-621.20131001.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2063//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2063//console This message is automatically generated. RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784366#comment-13784366 ] Karthik Kambatla commented on YARN-1232: I am confused. For Case 1, we need the RMs to have separate ids. For Case 2, looks like we need the RMs to have the *same* logical name. If this is correct, we need two configs, no? e.g. RM1 and RM2 have ha.id set to rm1, rm2 respectively, and logical name yarn.resourcemanger.clusterid set to yarn-cluster-foo-bar for both RMs. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784404#comment-13784404 ] Bikas Saha commented on YARN-1141: -- +1 Looks good except for the following. This test probably needs to be reversed right? {code} +// Verify the blacklist can be updated independent of requesting containers +cs.allocate(appAttemptId, Collections.ResourceRequestemptyList(), +Collections.ContainerIdemptyList(), null, +Collections.singletonList(host)); +Assert.assertFalse(cs.getApplication(appAttemptId).isBlacklisted(host)); +cs.allocate(appAttemptId, Collections.ResourceRequestemptyList(), +Collections.ContainerIdemptyList(), +Collections.singletonList(host), null); +Assert.assertTrue(cs.getApplication(appAttemptId).isBlacklisted(host)); {code} Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1141.1.patch, YARN-1141.2.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1141: -- Attachment: YARN-1141.3.patch Updated the test according to Bikas' comment Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1232) Configuration to support multiple RMs
[ https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784465#comment-13784465 ] Alejandro Abdelnur commented on YARN-1232: -- IMO they are different: * the HA ids are to differentiate the RM instances within the same cluster. This is used by the failover logic only, a user does not need to be aware of them when creating applications. * the clusterID is to differentiate different clusters. This is used by the user to indicate against which cluster they want to work. Because all the configurations a client needs to be aware, as Bikas, Karthik and I chatted during the yarn meetup last friday, an easy way to handle this would be for a client to specify the yarn-site.xml of the cluster he wants to connect or have a nested configurations in the single yarn-site.xml, one per cluster. If we do something like that, then the ids being introduced here will only be used for HA. Configuration to support multiple RMs - Key: YARN-1232 URL: https://issues.apache.org/jira/browse/YARN-1232 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Karthik Kambatla Assignee: Karthik Kambatla Labels: ha Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784466#comment-13784466 ] Hadoop QA commented on YARN-1141: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606443/YARN-1141.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2064//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2064//console This message is automatically generated. Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.2-beta Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist
[ https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784480#comment-13784480 ] Hudson commented on YARN-1141: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4520 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4520/]) YARN-1141. Updating resource requests should be decoupled with updating blacklist (Zhijie Shen via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java Updating resource requests should be decoupled with updating blacklist -- Key: YARN-1141 URL: https://issues.apache.org/jira/browse/YARN-1141 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.1.2-beta Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
Sandy Ryza created YARN-1265: Summary: Fair Scheduler chokes on unhealthy node reconnect Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784489#comment-13784489 ] Ravi Prakash commented on YARN-465: --- Thanks Aleksey for your contribution. Could you please also update the patch? TestWebAppProxyServlet.java could not be compiled with this patch fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784485#comment-13784485 ] David Yan commented on YARN-658: I see. Will there be workaround fix in YARN to get around this problem? I would imagine more and more users will try to run YARN on ubuntu 12.10, 13.04 and 13.10. Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784503#comment-13784503 ] Bikas Saha commented on YARN-867: - Probably we can ignore the error here since the container has already failed. {code} // From LOCALIZATION_FAILED State .addTransition(ContainerState.LOCALIZATION_FAILED, @@ -180,6 +184,9 @@ public ContainerImpl(Configuration conf, Dispatcher dispatcher, .addTransition(ContainerState.LOCALIZATION_FAILED, ContainerState.LOCALIZATION_FAILED, ContainerEventType.RESOURCE_FAILED) +.addTransition(ContainerState.LOCALIZATION_FAILED, ContainerState.EXITED_WITH_FAILURE, +ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, +new ExitedWithFailureTransition(false)) {code} Probably have 1 try catch instead of multiple. Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably hasnt failed. TestAuxService needs an addition for the new code TestContainer - new test can be made simpler by not mocking AuxServiceHandler and instead sending the failed event directly like its done for other tests there. In AuxService.handle(APPLICATION_INIT) and other places like that, where the service does not exist then we should fail too. Zhijie, we should err on the side of caution here and fail the container. If we see real use cases where failure can be ignored then we can make that improvement. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-425) coverage fix for yarn api
[ https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784517#comment-13784517 ] Hudson commented on YARN-425: - SUCCESS: Integrated in Hadoop-trunk-Commit #4521 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4521/]) YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml coverage fix for yarn api - Key: YARN-425 URL: https://issues.apache.org/jira/browse/YARN-425 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Fix For: 3.0.0, 2.3.0 Attachments: YARN-425-branch-0.23-d.patch, YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch coverage fix for yarn api patch YARN-425-trunk-a.patch for trunk patch YARN-425-branch-2.patch for branch-2 patch YARN-425-branch-0.23.patch for branch-0.23 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work
Mayank Bansal created YARN-1266: --- Summary: Adding ApplicationHistoryProtocolPBService to make web apps to work Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Maybe we should include AHS classes as well (for developer usage) in yarn and yarn.cmd -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1266: Description: Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process Adding ApplicationHistoryProtocolPBService to make web apps to work --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1266: Description: (was: Maybe we should include AHS classes as well (for developer usage) in yarn and yarn.cmd) Adding ApplicationHistoryProtocolPBService to make web apps to work --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-1266: Attachment: YARN-1266-1.patch Attaching patch Thanks, Mayank Adding ApplicationHistoryProtocolPBService to make web apps to work --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784550#comment-13784550 ] Robert Parker commented on YARN-658: The problem is reproducible in mvn test -Dtest=org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown I have confirmed that kill version 3.3.8 does not have this problem by installing it ahead of /bin/kill and running the above test(no code change required). I had a half baked patch that I will dig up, clean up and post. The patch precedes the -PID with a ' -- ' . I had hoped Ubuntu would fix this sooner but that sadly does not appear to be the case. Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan Attachments: AppMaster.stderr, yarn-david-nodemanager-david-ubuntu.out, yarn-david-resourcemanager-david-ubuntu.out After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.5.patch Use the readLock and WriteLock to solve the synchronization issue. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work
[ https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784559#comment-13784559 ] Hadoop QA commented on YARN-1266: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606466/YARN-1266-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2067//console This message is automatically generated. Adding ApplicationHistoryProtocolPBService to make web apps to work --- Key: YARN-1266 URL: https://issues.apache.org/jira/browse/YARN-1266 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: YARN-1266-1.patch Adding ApplicationHistoryProtocolPBService to make web apps to work and changing yarn to run AHS as a seprate process -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy
[ https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-876: Component/s: (was: nodemanager) resourcemanager Node resource is added twice when node comes back from unhealthy to healthy --- Key: YARN-876 URL: https://issues.apache.org/jira/browse/YARN-876 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: PengZhang Assignee: PengZhang Fix For: 2.1.2-beta Attachments: YARN-876.patch When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still UNHEALTHY. And second time is at node's update, while node's state changing from UNHEALTHY to HEALTHY. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy
[ https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784586#comment-13784586 ] Hudson commented on YARN-876: - SUCCESS: Integrated in Hadoop-trunk-Commit #4522 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4522/]) YARN-876. Node resource is added twice when node comes back from unhealthy. (Peng Zhang via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java Node resource is added twice when node comes back from unhealthy to healthy --- Key: YARN-876 URL: https://issues.apache.org/jira/browse/YARN-876 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: PengZhang Assignee: PengZhang Fix For: 2.1.2-beta Attachments: YARN-876.patch When an unhealthy restarts, its resource maybe added twice in scheduler. First time is at node's reconnection, while node's final state is still UNHEALTHY. And second time is at node's update, while node's state changing from UNHEALTHY to HEALTHY. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784589#comment-13784589 ] Hadoop QA commented on YARN-1149: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606467/YARN-1149.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2066//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2066//console This message is automatically generated. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784613#comment-13784613 ] Siddharth Seth commented on YARN-1131: -- [~djp], if you don't mind, I'd like to take this over - would be good to get it into the next release. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Minor Fix For: 2.1.2-beta In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784620#comment-13784620 ] Junping Du commented on YARN-1131: -- Sure. Sid, I just reassign this JIRA to you. Please feel free to start the work. Thanks! $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1131: - Assignee: Siddharth Seth (was: Junping Du) $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-890: - Attachment: YARN-890.2.patch [~zjshen] [~xgong] Mind taking a look? I had a different approach in mind to fixing the issue. The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Reopened] (YARN-677) Increase coverage to FairScheduler
[ https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reopened YARN-677: - Increase coverage to FairScheduler -- Key: YARN-677 URL: https://issues.apache.org/jira/browse/YARN-677 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Vadim Bondarev Fix For: 3.0.0, 2.3.0 Attachments: HADOOP-4536-branch-2-a.patch, HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1199) Make NM/RM Versions Available
[ https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784661#comment-13784661 ] Hadoop QA commented on YARN-1199: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606437/YARN-1199.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestJobCleanup org.apache.hadoop.yarn.sls.TestSLSRunner The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.v2.TestUberAM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2065//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2065//console This message is automatically generated. Make NM/RM Versions Available - Key: YARN-1199 URL: https://issues.apache.org/jira/browse/YARN-1199 Project: Hadoop YARN Issue Type: Improvement Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, YARN-1199.patch Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1213: - Attachment: YARN-1213.patch Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1265: - Attachment: YARN-1265.patch Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1267) Refactor cgroup logic out of LCE into a standalone binary
Alejandro Abdelnur created YARN-1267: Summary: Refactor cgroup logic out of LCE into a standalone binary Key: YARN-1267 URL: https://issues.apache.org/jira/browse/YARN-1267 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.2-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Fix For: 2.3.0 As discussed in YARN-1253 we should consider decoupling cgroups handling from the LCE. YARN-3 initially had a proposal on how this could be done, we should see if any of that make sense in the current state of things. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode
[ https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784667#comment-13784667 ] Alejandro Abdelnur commented on YARN-1253: -- Create YARN-1267 to refactor and decouple cgroups from LCE. Thinking a bit, I agree with Arun about leaving this JIRA out of branch-2.1-beta, only trunk/branch-2. I've reviewed and tested the patch already, I'll wait till Friday noon for comments from other reviewers before committing. Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode - Key: YARN-1253 URL: https://issues.apache.org/jira/browse/YARN-1253 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Blocker Attachments: YARN-1253.patch.txt When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784669#comment-13784669 ] Sandy Ryza commented on YARN-1265: -- Attached patch removes the guard against nodes not being in the nodes map in CapacityScheduler.removeNode. With the guard removed and without the other changes, TestResourceTrackerService.testReconnect fails. It also fails without the changes when setting the Fair Scheduler as the default scheduler. With the changes, it passes. Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784671#comment-13784671 ] Alejandro Abdelnur commented on YARN-1213: -- +1 pending jenkins Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode
[ https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784667#comment-13784667 ] Alejandro Abdelnur edited comment on YARN-1253 at 10/3/13 12:18 AM: Created YARN-1267 to refactor and decouple cgroups from LCE. Thinking a bit, I agree with Arun about leaving this JIRA out of branch-2.1-beta, only trunk/branch-2. I've reviewed and tested the patch already, I'll wait till Friday noon for comments from other reviewers before committing. was (Author: tucu00): Create YARN-1267 to refactor and decouple cgroups from LCE. Thinking a bit, I agree with Arun about leaving this JIRA out of branch-2.1-beta, only trunk/branch-2. I've reviewed and tested the patch already, I'll wait till Friday noon for comments from other reviewers before committing. Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode - Key: YARN-1253 URL: https://issues.apache.org/jira/browse/YARN-1253 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Blocker Attachments: YARN-1253.patch.txt When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784684#comment-13784684 ] Hadoop QA commented on YARN-890: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606479/YARN-890.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2068//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2068//console This message is automatically generated. The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784691#comment-13784691 ] Hadoop QA commented on YARN-1213: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606494/YARN-1213.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2070//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2070//console This message is automatically generated. Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect
[ https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784692#comment-13784692 ] Hadoop QA commented on YARN-1265: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606495/YARN-1265.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2069//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2069//console This message is automatically generated. Fair Scheduler chokes on unhealthy node reconnect - Key: YARN-1265 URL: https://issues.apache.org/jira/browse/YARN-1265 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.1.1-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1265.patch Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this. I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784694#comment-13784694 ] Omkar Vinit Joshi commented on YARN-621: Today everytime filter is getting called twice because we have defined AuthenticationFilter filtering urls as * /cluster * /cluster/* * /ws * /ws/* so if we specify http://localhost:8088/cluster; then AuthenticationFilter will be called twice. However if we request http://localhost:8088/cluster/cluster; then it will be called once. We know the related ticket HADOOP-8830 (error due to AuthenticationFilter getting called twice without request containing hadoop.auth cookie). Also we can not remove below code because it is required inside WebApp.serverPathSpecs - WebApp.configureServlets() {code}webapp.addServePathSpec(basePath);{code} Also all these calls will really matter for the first call only. After that once cookie is set then it doesn't matter. We should definitely fix HADOOP-8830. RM triggers web auth failure before first job - Key: YARN-621 URL: https://issues.apache.org/jira/browse/YARN-621 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Allen Wittenauer Assignee: Omkar Vinit Joshi Priority: Critical Attachments: YARN-621.20131001.1.patch On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky
Sandy Ryza created YARN-1268: Summary: TestFairScheduler.testContinuousScheduling is flaky Key: YARN-1268 URL: https://issues.apache.org/jira/browse/YARN-1268 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza It looks like there's a timeout in it that's causing it to be flaky. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784695#comment-13784695 ] Sandy Ryza commented on YARN-1213: -- Test failure is unrelated - TestFairScheduler.testContinuousScheduling appears to be flaky. Filed YARN-1268 for this. Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception
[ https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784708#comment-13784708 ] Omkar Vinit Joshi commented on YARN-1219: - bq. I didn't see anywhere in code to treat the .tmp file differently. If you know please let me know. If the original author only used a suffix to make sure the name is different than the original file name, it doesn't seem to be worth it to add an unnecessary and error-prone rename operations just to keep the temporary file name suffix. No we are not adding new just moving them around. from unpack to here..Ideally that rename code should have been present here only. I remember we had a bug to remove that .tmp file. But I think it is fine we can go ahead with this patch. As it will not break anything else. FSDownload changes file suffix making FileUtil.unTar() throw exception -- Key: YARN-1219 URL: https://issues.apache.org/jira/browse/YARN-1219 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 2.1.2-beta Attachments: YARN-1219.patch While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into .tmp before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is gzipped by looking at the file suffix: {code} boolean gzipped = inFile.toString().endsWith(gz); {code} To fix this problem, we can remove the .tmp in the temp file name. Here is the detailed exception: org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240) at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676) at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625) at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784717#comment-13784717 ] Xuan Gong commented on YARN-867: bq.Probably have 1 try catch instead of multiple. Fixed. Use only one big try catch block bq.Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably hasnt failed. Done bq.TestAuxService needs an addition for the new code Added a new test case in TestAuxService bq.TestContainer - new test can be made simpler by not mocking AuxServiceHandler and instead sending the failed event directly like its done for other tests there. Fixed bq.In AuxService.handle(APPLICATION_INIT) and other places like that, where the service does not exist then we should fail too. Done bq.Probably we can ignore the error here since the container has already failed. I think we still need this transition. The container can go to ContainerState.LOCALIZATION_FAILED from new state, and AuxService is triggered to do the Application_init at that time. If there is any exception, we will send the ContainerExitEvent with ContainerEventType.CONTAINER_EXITED_WITH_FAILURE to the Container. And It is very possible that container will start to process this event when it is in the LOCALIZATION_FAILED state. So, we should handle it. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-867: --- Attachment: YARN-867.5.patch Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1213) Restore banning submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1213: - Summary: Restore banning submitting to undeclared pools in the Fair Scheduler (was: Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler) Restore banning submitting to undeclared pools in the Fair Scheduler Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1213) Restore config to bad submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1213: - Summary: Restore config to bad submitting to undeclared pools in the Fair Scheduler (was: Restore banning submitting to undeclared pools in the Fair Scheduler) Restore config to bad submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1213: - Summary: Restore config to ban submitting to undeclared pools in the Fair Scheduler (was: Restore config to bad submitting to undeclared pools in the Fair Scheduler) Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784726#comment-13784726 ] Sandy Ryza commented on YARN-1213: -- I just committed this to trunk, branch-2, and branch-2.1-beta Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1131: - Attachment: YARN-1131.1.txt Changes in the patch Adds a YARN application status check based on the ApplicationId, to log a correct message if the application is running. If an application is not found in the RM - the CLI tool will continue to search for the files on hdfs (RM not running, or RM restarted). Fixes the exception in case of an invalid applicationId. There's still a case, right after an app completes, but before aggregation is complete where an empty output is returned. That should be a separate jira though. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1149: Attachment: YARN-1149.6.patch add container into failedContainers if try to launch it in the NM shut down process NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784730#comment-13784730 ] Hudson commented on YARN-1213: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4524 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4524/]) YARN-1213. Restore config to ban submitting to undeclared pools in the Fair Scheduler. (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Restore config to ban submitting to undeclared pools in the Fair Scheduler -- Key: YARN-1213 URL: https://issues.apache.org/jira/browse/YARN-1213 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.2-beta Attachments: YARN-1213.patch -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784732#comment-13784732 ] Hadoop QA commented on YARN-867: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606498/YARN-867.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2071//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2071//console This message is automatically generated. Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky
[ https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan reassigned YARN-1268: - Assignee: Wei Yan TestFairScheduler.testContinuousScheduling is flaky --- Key: YARN-1268 URL: https://issues.apache.org/jira/browse/YARN-1268 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Sandy Ryza Assignee: Wei Yan It looks like there's a timeout in it that's causing it to be flaky. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1131: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-431 $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1131: -- Fix Version/s: (was: 2.1.2-beta) $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784742#comment-13784742 ] Hadoop QA commented on YARN-1149: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606502/YARN-1149.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2073//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2073//console This message is automatically generated. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784745#comment-13784745 ] Bikas Saha commented on YARN-867: - Why is this check needed? {code} + private void handleAuxServiceFail(AuxServicesEvent event, Throwable th) { +if (event.getType() instanceof AuxServicesEventType) { + Container container = event.getContainer(); {code} If container has already failed then why do we need to change state again? the container has already failed. {code} +.addTransition(ContainerState.LOCALIZATION_FAILED, ContainerState.EXITED_WITH_FAILURE, +ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, +new ExitedWithFailureTransition(false)) {code} {code} +.addTransition(ContainerState.CONTAINER_CLEANEDUP_AFTER_KILL, +ContainerState.EXITED_WITH_FAILURE, +ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, +new ExitedWithFailureTransition(false)) {code} Why is CONTAINER_EXITED_WITH_FAILURE not being handled while container state is localized/running? Why are extra events being ignored in addition to ContainerEventType.CONTAINER_EXITED_WITH_FAILURE? {code} +ContainerState.EXITED_WITH_FAILURE, +EnumSet.of( +ContainerEventType.KILL_CONTAINER, +ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, +ContainerEventType.RESOURCE_LOCALIZED, +ContainerEventType.RESOURCE_FAILED, +ContainerEventType.CONTAINER_LAUNCHED, +ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS, +ContainerEventType.CONTAINER_KILLED_ON_REQUEST)) {code} {code} +.addTransition(ContainerState.DONE, ContainerState.DONE, +EnumSet.of( +ContainerEventType.RESOURCE_LOCALIZED, +ContainerEventType.CONTAINER_LAUNCHED, +ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, +ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP, +ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS, +ContainerEventType.CONTAINER_KILLED_ON_REQUEST)) {code} Can you please check if ExitedWithFailureTransition(true) needs to be called in places where the patch is adding ExitedWithFailureTransition(false). Is cleanup required? Do the new tests fail without the changes? Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784748#comment-13784748 ] Hadoop QA commented on YARN-1131: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606503/YARN-1131.1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.cli.TestLogsCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/2072//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2072//console This message is automatically generated. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated YARN-465: - Attachment: YARN-465-branch-2--n3.patch YARN-465-trunk--n3.patch Attaching updated patches. setAccessible usage is removed. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy
[ https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784752#comment-13784752 ] Hadoop QA commented on YARN-465: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12606508/YARN-465-branch-2--n3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2074//console This message is automatically generated. fix coverage org.apache.hadoop.yarn.server.webproxy Key: YARN-465 URL: https://issues.apache.org/jira/browse/YARN-465 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha Reporter: Aleksey Gorshkov Assignee: Aleksey Gorshkov Attachments: YARN-465-branch-0.23-a.patch, YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, YARN-465-trunk.patch fix coverage org.apache.hadoop.yarn.server.webproxy patch YARN-465-trunk.patch for trunk patch YARN-465-branch-2.patch for branch-2 patch YARN-465-branch-0.23.patch for branch-0.23 There is issue in branch-0.23 . Patch does not creating .keep file. For fix it need to run commands: mkdir yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy touch yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784785#comment-13784785 ] Hitesh Shah commented on YARN-1149: --- Comments: - use proper javadoc notation for Reason enum - CMgrCompletedContainersEvent should have final member vars - NodeStatusUpdaterImpl.java: use !appsToCleanup.isEmpty() instead appsToCleanup.size() != 0 - ContainerManagerImpl#cleanUpApplications - shouldn't an invalid event type trigger a fatal error? NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784803#comment-13784803 ] Wangda Tan commented on YARN-1197: -- Method mentioned by [~sandyr] can solve cheating problem in AM side. AM doesn't need tell RM to decrease container size at all, just tell NM, and let NM telling RM by heartbeat, the only problem is order of second latency to decrease in scheduler side, shouldn't be a big problem. And the race problem mentioned by [~bikassaha], I think it should be reasonable, when an AM request for more resource on a container, it never know when RM will return it back. So AM may need to use the smaller resource to launch container, this is not harmful to either scheduler or NM (use less resource is not a problem). After sometime, AM will get allocated resource, it can tell NM and child process to increase memory quota. Do you agree about this? Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: yarn-1197.pdf Currently, YARN cannot support merge several containers in one node to a big container, which can make us incrementally ask resources, merge them to a bigger one, and launch our processes. The user scenario is described in the comments. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784808#comment-13784808 ] Xuan Gong commented on YARN-1149: - bq.ContainerManagerImpl#cleanUpApplications - shouldn't an invalid event type trigger a fatal error? Why ? NodeManagerEventType is the enum type. We can not subclass it. How can the event type be invalid ? If in the future, we add more event type into NodeManagerEventType, we should also add the method to handle it. NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784811#comment-13784811 ] Zhijie Shen commented on YARN-890: -- +1 for the new approach. We shouldn't round up the available resource The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
[ https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784823#comment-13784823 ] Hitesh Shah commented on YARN-1149: --- [~xgong] My comment was in reference to: + default: +LOG.warn(Invalid eventType: + eventType); +} NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING - Key: YARN-1149 URL: https://issues.apache.org/jira/browse/YARN-1149 Project: Hadoop YARN Issue Type: Bug Reporter: Ramya Sunil Assignee: Xuan Gong Fix For: 2.1.2-beta Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown {noformat} 2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118 2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp 2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118 2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_04. Current good log dirs are /tmp/yarn/local 2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118 2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null 2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784836#comment-13784836 ] Hitesh Shah commented on YARN-1131: --- Comments: - why not use Option.setRequired for the applicationId param - this will allow removal of the appIdStr == null check. - typo in function name dumpAContainersLogs or is it meant to read dump a container's logs? Maybe just dumpContainerLogs? - containerIdStr and nodeAddressStr could be parsed for correct format to error out earlier before invoking the actual log reader functionality. - is a YarnApplicationState check enough to guarantee that the user receives the correct error message in case logs are tried to be retrieved when log aggregration is still in process just after the app completes? - missing test for when container id specified but node address is not ( and vice versa ) ? $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784841#comment-13784841 ] Siddharth Seth commented on YARN-1131: -- Thanks for the review. bq. why not use Option.setRequired for the applicationId param - this will allow removal of the appIdStr == null check. Will look into using this. bq. is a YarnApplicationState check enough to guarantee that the user receives the correct error message in case logs are tried to be retrieved when log aggregration is still in process just after the app completes? Had mentioned this in my last comment. Not targeting for this jira. bq. There's still a case, right after an app completes, but before aggregation is complete where an empty output is returned. That should be a separate jira though. bq. typo in function name dumpAContainersLogs or is it meant to read dump a container's logs? Maybe just dumpContainerLogs? I believe it was meant to be this. The diff, unfortunately, is a lot bigger than it should be, since the files had to be moved between packages. bq. containerIdStr and nodeAddressStr could be parsed for correct format to error out earlier before invoking the actual log reader functionality. bq. missing test for when container id specified but node address is not ( and vice versa ) ? Only targeting the specific issue mentioned in the jira. I'm sure there's more - but applicationId is likely to be the most common case. The rest can be a single or multiple separate jiras. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)