[jira] [Commented] (YARN-170) NodeManager stop() gets called twice on shutdown
[ https://issues.apache.org/jira/browse/YARN-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546776#comment-13546776 ] Hudson commented on YARN-170: - Integrated in Hadoop-Yarn-trunk #90 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/90/]) YARN-170. Change NodeManager stop to be reentrant. Contributed by Sandy Ryza. (Revision 1429796) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429796 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java NodeManager stop() gets called twice on shutdown Key: YARN-170 URL: https://issues.apache.org/jira/browse/YARN-170 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-170-1.patch, YARN-170-20130107.txt, YARN-170-2.patch, YARN-170-3.patch, YARN-170.patch The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook. The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli moved MAPREDUCE-3061 to YARN-321: - Tags: (was: mrv2, history server) Component/s: (was: mrv2) Fix Version/s: (was: 0.24.0) Target Version/s: (was: 0.24.0) Affects Version/s: (was: 0.23.0) Key: YARN-321 (was: MAPREDUCE-3061) Project: Hadoop YARN (was: Hadoop Map/Reduce) Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-2: - Attachment: YARN-2.patch Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546824#comment-13546824 ] Arun C Murthy commented on YARN-2: -- Fixed TestRMWebServicesCapacitySched (had to fix the test) - any final comments? I think it's good to go, for now I'll commit after jenkins okays it since it's getting harder to maintain this largish patch. We can fix nits etc. post-commit. Thanks. Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546840#comment-13546840 ] Hadoop QA commented on YARN-2: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563741/YARN-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 22 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/323//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/323//console This message is automatically generated. Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-253) Container launch may fail if no files were localized
[ https://issues.apache.org/jira/browse/YARN-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546845#comment-13546845 ] Hadoop QA commented on YARN-253: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563743/YARN-253-20130108.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/324//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/324//console This message is automatically generated. Container launch may fail if no files were localized Key: YARN-253 URL: https://issues.apache.org/jira/browse/YARN-253 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha Reporter: Tom White Assignee: Tom White Priority: Critical Attachments: YARN-253-20130108.txt, YARN-253.patch, YARN-253.patch, YARN-253-test.patch This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-170) NodeManager stop() gets called twice on shutdown
[ https://issues.apache.org/jira/browse/YARN-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546866#comment-13546866 ] Hudson commented on YARN-170: - Integrated in Hadoop-Hdfs-trunk #1279 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1279/]) YARN-170. Change NodeManager stop to be reentrant. Contributed by Sandy Ryza. (Revision 1429796) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429796 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java NodeManager stop() gets called twice on shutdown Key: YARN-170 URL: https://issues.apache.org/jira/browse/YARN-170 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-170-1.patch, YARN-170-20130107.txt, YARN-170-2.patch, YARN-170-3.patch, YARN-170.patch The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook. The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-253) Container launch may fail if no files were localized
[ https://issues.apache.org/jira/browse/YARN-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546875#comment-13546875 ] Hudson commented on YARN-253: - Integrated in Hadoop-trunk-Commit #3190 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3190/]) YARN-253. Fixed container-launch to not fail when there are no local resources to localize. Contributed by Tom White. (Revision 1430269) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430269 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java Container launch may fail if no files were localized Key: YARN-253 URL: https://issues.apache.org/jira/browse/YARN-253 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha Reporter: Tom White Assignee: Tom White Priority: Critical Fix For: 2.0.3-alpha Attachments: YARN-253-20130108.txt, YARN-253.patch, YARN-253.patch, YARN-253-test.patch This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-323) Yarn CLI commands prints classpath
Nishan Shetty created YARN-323: -- Summary: Yarn CLI commands prints classpath Key: YARN-323 URL: https://issues.apache.org/jira/browse/YARN-323 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Priority: Minor Execute ./yarn commands. It will print classpath in console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-318) sendSignal in DefaultContainerExecutor causes invalid options error
[ https://issues.apache.org/jira/browse/YARN-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyunsik Choi resolved YARN-318. --- Resolution: Not A Problem I didn't doubt the kill's bug because it is very common utility and it worked when I executed the command 'kill -0 -12127' on shell. However, accoring to this bug report (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688731), it was a bug of procps-ng. If we don't give the path of kill command, the command invokes the built-in 'kill' command of bash. So, it seemed working. sendSignal in DefaultContainerExecutor causes invalid options error - Key: YARN-318 URL: https://issues.apache.org/jira/browse/YARN-318 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha Environment: * OS: MintOS 14 ** MintOS 14 is based on ubuntu 12.10. So, this problem may be caused in ubuntu 12.10. * procp version: procps-ng 3.3.3 * OpenJDK version: 7u9-2.3.3 * Hadoop version: 2.0.2-alpha Reporter: Hyunsik Choi In line 238 of DefaultcontainerExecutor, sendSignal method causes an error when ContainerManagerImpl tries to kill a container. The command passed to ShellCommandExecutor in sendSignal() was kill -0 -12127. The following message is copied from the detailMessage of the Exception. {noformat} kill: invalid option -- '1' Usage: kill [options] pid [...] Options: pid [...]send signal to every pid listed -signal, -s, --signal signal specify the signal to be sent -l, --list=[signal] list all signal names, or convert one to a name -L, --tablelist all signal names in a nice table -h, --help display this help and exit -V, --version output version information and exit For more details see kill(1). {noformat} I investigated a little bit on this problem. I've found that sendSignal works well with traditional procp (http://procps.sourceforge.net/), whereas it causes such a error with procps-ng (https://fedoraproject.org/wiki/Features/procps-ng) used in MintOS 14. As you know, 'kill' command is included in procp package in general linux distributions. When I only change the 'kill' binary into traditional one, stopContainer works well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-320) RM should always be able to renew its own tokens
[ https://issues.apache.org/jira/browse/YARN-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546988#comment-13546988 ] Daryn Sharp commented on YARN-320: -- Admittedly, extracting the renewer is a bit dubious. It was the simplest way to satisfy the ADTSM checks w/o making changes to the core that would have a larger impact. In 1.x I tried to make the JT not do a lookback RPC to itself, but I don't recall why it was dinged. With the current design I'm not sure there's a good way for the token to get access to the RM's secret manager, but yes that would be ideal. Thanks, I'll wrap up the patch. RM should always be able to renew its own tokens Key: YARN-320 URL: https://issues.apache.org/jira/browse/YARN-320 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-320.branch-23.patch YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-309: --- Attachment: YARN-309.2.patch Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-320) RM should always be able to renew its own tokens
[ https://issues.apache.org/jira/browse/YARN-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated YARN-320: - Attachment: YARN-320.branch-23.patch Add unit tests, trunk patch forthcoming. RM should always be able to renew its own tokens Key: YARN-320 URL: https://issues.apache.org/jira/browse/YARN-320 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-320.branch-23.patch, YARN-320.branch-23.patch YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-324) Provide way to preserve container directories
[ https://issues.apache.org/jira/browse/YARN-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lohit Vijayarenu updated YARN-324: -- Summary: Provide way to preserve container directories (was: Provide way to preserve ) Provide way to preserve container directories - Key: YARN-324 URL: https://issues.apache.org/jira/browse/YARN-324 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu There should be a way to preserve container directories (along with filecache/appcache) for offline debugging. As of today, if container completes (either success or failure) it would get cleaned up. In case of failure it becomes very hard to debug to find out what the case of failure is. Having ability to preserve container directories will enable one to log into the machine and debug further for failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-324) Provide way to preserve container directories
[ https://issues.apache.org/jira/browse/YARN-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547191#comment-13547191 ] Jason Lowe commented on YARN-324: - The nodemanager currently supports this via the yarn.nodemanager.delete.debug-delay-sec property. Is that sufficient to meet your needs or were you thinking of something different? Provide way to preserve container directories - Key: YARN-324 URL: https://issues.apache.org/jira/browse/YARN-324 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu There should be a way to preserve container directories (along with filecache/appcache) for offline debugging. As of today, if container completes (either success or failure) it would get cleaned up. In case of failure it becomes very hard to debug to find out what the case of failure is. Having ability to preserve container directories will enable one to log into the machine and debug further for failures. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-320) RM should always be able to renew its own tokens
[ https://issues.apache.org/jira/browse/YARN-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated YARN-320: - Attachment: YARN-320.patch Patch for trunk and branch-2. RM should always be able to renew its own tokens Key: YARN-320 URL: https://issues.apache.org/jira/browse/YARN-320 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-320.branch-23.patch, YARN-320.branch-23.patch, YARN-320.patch YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-320) RM should always be able to renew its own tokens
[ https://issues.apache.org/jira/browse/YARN-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547246#comment-13547246 ] Hadoop QA commented on YARN-320: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563814/YARN-320.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/325//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/325//console This message is automatically generated. RM should always be able to renew its own tokens Key: YARN-320 URL: https://issues.apache.org/jira/browse/YARN-320 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: YARN-320.branch-23.patch, YARN-320.branch-23.patch, YARN-320.patch YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547248#comment-13547248 ] Hadoop QA commented on YARN-193: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563808/YARN-193.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/326//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/326//console This message is automatically generated. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Hitesh Shah Assignee: Hitesh Shah Priority: Critical Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-142) Change YARN APIs to throw IOException
[ https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-142: --- Attachment: YARN-142.3.patch Change YARN APIs to throw IOException - Key: YARN-142 URL: https://issues.apache.org/jira/browse/YARN-142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Critical Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch Ref: MAPREDUCE-4067 All YARN APIs currently throw YarnRemoteException. 1) This cannot be extended in it's current form. 2) The RPC layer can throw IOExceptions. These end up showing up as UndeclaredThrowableExceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547662#comment-13547662 ] Hudson commented on YARN-2: --- Integrated in Hadoop-trunk-Commit #3200 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3200/]) MAPREDUCE-4520. Added support for MapReduce applications to request for CPU cores along-with memory post YARN-2. Contributed by Arun C. Murthy. (Revision 1430688) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430688 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-2) Enhance CS to schedule accounting for both memory and cpu cores
[ https://issues.apache.org/jira/browse/YARN-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547664#comment-13547664 ] caolong commented on YARN-2: en,The great patch. all right,What are we planning to do ahout FairScheduler? Enhance CS to schedule accounting for both memory and cpu cores --- Key: YARN-2 URL: https://issues.apache.org/jira/browse/YARN-2 Project: Hadoop YARN Issue Type: New Feature Components: capacityscheduler, scheduler Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327.patch, MAPREDUCE-4327-v2.patch, MAPREDUCE-4327-v3.patch, MAPREDUCE-4327-v4.patch, MAPREDUCE-4327-v5.patch, YARN-2-help.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch, YARN-2.patch With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
[ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy reassigned YARN-325: -- Assignee: Arun C Murthy RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing - Key: YARN-325 URL: https://issues.apache.org/jira/browse/YARN-325 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Critical If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
[ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-325: --- Priority: Blocker (was: Critical) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing - Key: YARN-325 URL: https://issues.apache.org/jira/browse/YARN-325 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
[ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-325: --- Attachment: YARN-325.patch Illustrative patch, need to fix unit-tests yet. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing - Key: YARN-325 URL: https://issues.apache.org/jira/browse/YARN-325 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: YARN-325.patch If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
[ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-325: --- Attachment: YARN-325.patch Added unit-tests. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing - Key: YARN-325 URL: https://issues.apache.org/jira/browse/YARN-325 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: YARN-325.patch, YARN-325.patch If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-325) RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing
[ https://issues.apache.org/jira/browse/YARN-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547734#comment-13547734 ] Hadoop QA commented on YARN-325: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563898/YARN-325.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/328//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/328//console This message is automatically generated. RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing - Key: YARN-325 URL: https://issues.apache.org/jira/browse/YARN-325 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Jason Lowe Assignee: Arun C Murthy Priority: Blocker Attachments: YARN-325.patch, YARN-325.patch If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock. Stacktrace to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira