[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537754#comment-13537754 ] Hadoop QA commented on YARN-285: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562055/YARN-285.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/244//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/244//console This message is automatically generated. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-223) Change processTree interface to work better with native code
[ https://issues.apache.org/jira/browse/YARN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537823#comment-13537823 ] Hudson commented on YARN-223: - Integrated in Hadoop-Hdfs-0.23-Build #470 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/470/]) YARN-223. Change processTree interface to work better with native code (Radim Kolar via tgraves) (Revision 1424590) Result = UNSTABLE tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424590 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRConfig.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/LinuxResourceCalculatorPlugin.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcfsBasedProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/util/TestProcfsBasedProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-tools/hadoop-gridmix/src/test/java/org/apache/hadoop/mapred/gridmix/TestResourceUsageEmulators.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * /hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java Change processTree interface to work better with native code Key: YARN-223 URL: https://issues.apache.org/jira/browse/YARN-223 Project: Hadoop YARN Issue Type: Bug Reporter: Radim Kolar Assignee: Radim Kolar Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.6 Attachments: pstree-update4.txt, pstree-update6.txt, pstree-update6.txt Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code. replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-279) Generalize RM token management
[ https://issues.apache.org/jira/browse/YARN-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537875#comment-13537875 ] Alejandro Abdelnur commented on YARN-279: - Regarding pushing down the token management to the AM. Why not use the '6 months passport validity rule'? In other words, require, on submission that any delegation token being used must at least last until the AM is up and running, else things would fail. Once the AM is up and running it is the AM responsibility to renew it. Generalize RM token management -- Key: YARN-279 URL: https://issues.apache.org/jira/browse/YARN-279 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Token renewal/cancelation in the RM presents challenges to support arbitrary tokens. The RM's CLASSPATH is currently required to have token renewer classes and all associated classes for the project's client. The logistics of having installs on the RM of all hadoop projects that submit jobs - just to support client connections to renew/cancel tokens - are untenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-223) Change processTree interface to work better with native code
[ https://issues.apache.org/jira/browse/YARN-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537883#comment-13537883 ] Hudson commented on YARN-223: - Integrated in Hadoop-Mapreduce-trunk #1291 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1291/]) YARN-223. Update process tree instead of getting new process trees. (Radim Kolar via llu) (Revision 1424244) Result = SUCCESS llu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424244 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/LinuxResourceCalculatorPlugin.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcfsBasedProcessTree.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/util/TestProcfsBasedProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LinuxResourceCalculatorPlugin.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorPlugin.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java Change processTree interface to work better with native code Key: YARN-223 URL: https://issues.apache.org/jira/browse/YARN-223 Project: Hadoop YARN Issue Type: Bug Reporter: Radim Kolar Assignee: Radim Kolar Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.6 Attachments: pstree-update4.txt, pstree-update6.txt, pstree-update6.txt Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code. replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-278) Fair scheduler maxRunningApps config causes no apps to make progress
[ https://issues.apache.org/jira/browse/YARN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538023#comment-13538023 ] Alejandro Abdelnur commented on YARN-278: - +1 Fair scheduler maxRunningApps config causes no apps to make progress Key: YARN-278 URL: https://issues.apache.org/jira/browse/YARN-278 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-278.patch, YARN-278.patch This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-282) Fair scheduler web UI double counts Apps Submitted
[ https://issues.apache.org/jira/browse/YARN-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538032#comment-13538032 ] Alejandro Abdelnur commented on YARN-282: - +1 Fair scheduler web UI double counts Apps Submitted -- Key: YARN-282 URL: https://issues.apache.org/jira/browse/YARN-282 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-282.patch Each app submitted is reported twice under Apps Submitted -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-283) Fair scheduler fails to get queue info without root prefix
[ https://issues.apache.org/jira/browse/YARN-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538070#comment-13538070 ] Alejandro Abdelnur commented on YARN-283: - +1 Fair scheduler fails to get queue info without root prefix -- Key: YARN-283 URL: https://issues.apache.org/jira/browse/YARN-283 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-283.patch If queue1 exists, and a client calls mapred queue -info queue1, an exception is thrown. If they use root.queue1, it works correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-286) Add a YARN ApplicationClassLoader
[ https://issues.apache.org/jira/browse/YARN-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538106#comment-13538106 ] Hadoop QA commented on YARN-286: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562081/YARN-286.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/245//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/245//console This message is automatically generated. Add a YARN ApplicationClassLoader - Key: YARN-286 URL: https://issues.apache.org/jira/browse/YARN-286 Project: Hadoop YARN Issue Type: New Feature Components: applications Affects Versions: 2.0.2-alpha Reporter: Tom White Assignee: Tom White Fix For: 2.0.3-alpha Attachments: YARN-286.patch Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-279) Generalize RM token management
[ https://issues.apache.org/jira/browse/YARN-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538123#comment-13538123 ] Daryn Sharp commented on YARN-279: -- The problem with the passport idea is how would you determine upon submission if the tokens will be valid until the AM is launched? It's non-deterministic how long the RM will take to launch the job after submission. You also have the problem that the yarn framework may need to use the tokens itself (ie. log aggregation) so it can't purely be the responsibility of the AM to renew/cancel. Snapping back, the larger problem is that renewing tokens requires the jar for that project to be in the CLASSPATH. I don't think that's a reasonable requirement for either the RM or AM, since neither is in the position to know what jars are required for external token-based systems that will be accessed by the task. Generalize RM token management -- Key: YARN-279 URL: https://issues.apache.org/jira/browse/YARN-279 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Token renewal/cancelation in the RM presents challenges to support arbitrary tokens. The RM's CLASSPATH is currently required to have token renewer classes and all associated classes for the project's client. The logistics of having installs on the RM of all hadoop projects that submit jobs - just to support client connections to renew/cancel tokens - are untenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-271) Fair scheduler hits IllegalStateException trying to reserve different apps on same node
[ https://issues.apache.org/jira/browse/YARN-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538126#comment-13538126 ] Hudson commented on YARN-271: - Integrated in Hadoop-trunk-Commit #3147 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3147/]) YARN-271. Fair scheduler hits IllegalStateException trying to reserve different apps on same node. Contributed by Sandy Ryza. (Revision 1424945) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424945 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Fair scheduler hits IllegalStateException trying to reserve different apps on same node --- Key: YARN-271 URL: https://issues.apache.org/jira/browse/YARN-271 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-271-1.patch, YARN-271.patch After the fair scheduler reserves a container on a node, it doesn't check for reservations it just made when trying to make more reservations during the same heartbeat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-211) Allow definition of max-active-applications per queue
[ https://issues.apache.org/jira/browse/YARN-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538135#comment-13538135 ] Thomas Graves commented on YARN-211: We shouldn't be changing the name to Running apps. Especially in like the web services. This breaks backwards compatibility. Can you clarify your comment because maximum active applications is used in documentation for number of running + queued applications.? Yes there can time between when an application is made active vs when the AM gets started - waiting for a container. Is that what you are referring to?I don't think with your patch this has changed, correct? I believe the main reason it was a percent is because AM's could potentially use different amount of memory per AM (YARN-276 related). So I could have 2 AM's that if using the minimum allocation size would run fine but then if I have 2, one that say needs 1G and the other uses 20G, then your queue is full and everything gets deadlocked. Hence why in that case it would be better to use a % of resources. I can understand your scenario though. I would say if it was just one you could set the percent really low like (0.0001) because it always give you atleast one, but if you have cases where you want 2 or 3 that doesn't work so well. You also need to update the capacity scheduler docs about the config. Allow definition of max-active-applications per queue - Key: YARN-211 URL: https://issues.apache.org/jira/browse/YARN-211 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Radim Kolar Assignee: Radim Kolar Attachments: capacity-maxactive.txt, max-running.txt In some cases, automatic max-active is not enough, especially if you need less active tasks in given queue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-280) RM does not reject app submission with invalid tokens
[ https://issues.apache.org/jira/browse/YARN-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538142#comment-13538142 ] Hadoop QA commented on YARN-280: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561953/YARN-280.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/246//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/246//console This message is automatically generated. RM does not reject app submission with invalid tokens - Key: YARN-280 URL: https://issues.apache.org/jira/browse/YARN-280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: YARN-280.patch The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-279) Generalize RM token management
[ https://issues.apache.org/jira/browse/YARN-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538182#comment-13538182 ] Alejandro Abdelnur commented on YARN-279: - bq. how would you determine upon submission if the tokens will be valid until the AM is launched? Well, reasonable time (that is the point of the 6 months), if they are not valid when the AM starts, AM should fail. Bq. Snappy back, Agree that we should have a single piece of code to handle all tokens logic. And it should be in commons for all to use. Generalize RM token management -- Key: YARN-279 URL: https://issues.apache.org/jira/browse/YARN-279 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Token renewal/cancelation in the RM presents challenges to support arbitrary tokens. The RM's CLASSPATH is currently required to have token renewer classes and all associated classes for the project's client. The logistics of having installs on the RM of all hadoop projects that submit jobs - just to support client connections to renew/cancel tokens - are untenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-278) Fair scheduler maxRunningApps config causes no apps to make progress
[ https://issues.apache.org/jira/browse/YARN-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538183#comment-13538183 ] Hudson commented on YARN-278: - Integrated in Hadoop-trunk-Commit #3148 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3148/]) YARN-278. Fair scheduler maxRunningApps config causes no apps to make progress. (sandyr via tucu) (Revision 1424989) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424989 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java Fair scheduler maxRunningApps config causes no apps to make progress Key: YARN-278 URL: https://issues.apache.org/jira/browse/YARN-278 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-278.patch, YARN-278.patch This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-282) Fair scheduler web UI double counts Apps Submitted
[ https://issues.apache.org/jira/browse/YARN-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538184#comment-13538184 ] Hudson commented on YARN-282: - Integrated in Hadoop-trunk-Commit #3148 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3148/]) YARN-282. Fair scheduler web UI double counts Apps Submitted. (sandyr via tucu) (Revision 1424995) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424995 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java Fair scheduler web UI double counts Apps Submitted -- Key: YARN-282 URL: https://issues.apache.org/jira/browse/YARN-282 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-282.patch Each app submitted is reported twice under Apps Submitted -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-272) Fair scheduler log messages try to print objects without overridden toString methods
[ https://issues.apache.org/jira/browse/YARN-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538186#comment-13538186 ] Hudson commented on YARN-272: - Integrated in Hadoop-trunk-Commit #3148 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3148/]) YARN-272. Fair scheduler log messages try to print objects without overridden toString methods. (sandyr via tucu) (Revision 1424984) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1424984 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/PriorityPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java Fair scheduler log messages try to print objects without overridden toString methods Key: YARN-272 URL: https://issues.apache.org/jira/browse/YARN-272 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.0.3-alpha Attachments: YARN-272-1.patch, YARN-272-1.patch, YARN-272.patch A lot of junk gets printed out like this: 2012-12-11 17:31:52,998 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Application application_1355270529654_0003 reserved container org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97 on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 used=8192, currently has 4 at priority org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; currentReservation 4096 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-279) Generalize RM token management
[ https://issues.apache.org/jira/browse/YARN-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538199#comment-13538199 ] Daryn Sharp commented on YARN-279: -- I'd have concerns that a 6 month expiration provides a huge window of vulnerability for stolen tokens. Initial expiration is a tangent issue from how to ensure renewer code is available. However, I've actually got a patch up on YARN-280 for job submission to fail if submitted tokens are invalid. Another problem with 6 expiration is that token issuers often hold tokens in memory, plus token requesters often do a poor job (as in never) of canceling tokens. That would cause some serious memory bloat... Generalize RM token management -- Key: YARN-279 URL: https://issues.apache.org/jira/browse/YARN-279 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Token renewal/cancelation in the RM presents challenges to support arbitrary tokens. The RM's CLASSPATH is currently required to have token renewer classes and all associated classes for the project's client. The logistics of having installs on the RM of all hadoop projects that submit jobs - just to support client connections to renew/cancel tokens - are untenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-280) RM does not reject app submission with invalid tokens
[ https://issues.apache.org/jira/browse/YARN-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538294#comment-13538294 ] Thomas Graves commented on YARN-280: +1. Thanks Daryn! RM does not reject app submission with invalid tokens - Key: YARN-280 URL: https://issues.apache.org/jira/browse/YARN-280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: YARN-280.patch The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-280) RM does not reject app submission with invalid tokens
[ https://issues.apache.org/jira/browse/YARN-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538304#comment-13538304 ] Hudson commented on YARN-280: - Integrated in Hadoop-trunk-Commit #3150 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3150/]) YARN-280. RM does not reject app submission with invalid tokens (Daryn Sharp via tgraves) (Revision 1425079) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425079 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java RM does not reject app submission with invalid tokens - Key: YARN-280 URL: https://issues.apache.org/jira/browse/YARN-280 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 3.0.0, 2.0.3-alpha, 0.23.6 Attachments: YARN-280.patch The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-286) Add a YARN ApplicationClassLoader
[ https://issues.apache.org/jira/browse/YARN-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538350#comment-13538350 ] Bikas Saha commented on YARN-286: - Patch does not show any consumers for this functionality? Add a YARN ApplicationClassLoader - Key: YARN-286 URL: https://issues.apache.org/jira/browse/YARN-286 Project: Hadoop YARN Issue Type: New Feature Components: applications Affects Versions: 2.0.2-alpha Reporter: Tom White Assignee: Tom White Fix For: 2.0.3-alpha Attachments: YARN-286.patch Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-192) Node update causes NPE in the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-192: Attachment: YARN-192-1.patch Node update causes NPE in the fair scheduler Key: YARN-192 URL: https://issues.apache.org/jira/browse/YARN-192 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-192-1.patch, YARN-192.patch The exception occurs when unreserve is called on an FSSchedulerApp with a NodeId that it does not know about. The RM seems to have a different idea about what apps are reserved for which node than the scheduler. 2012-10-29 22:30:52,901 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp.unreserve(FSSchedulerApp.java:356) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.unreserve(AppSchedulable.java:214) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:330) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueSchedulable.assignContainer(FSQueueSchedulable.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:759) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:836) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:329) at java.lang.Thread.run(Thread.java:662) 2012-10-29 22:30:52,903 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-103) Add a yarn AM - RM client module
[ https://issues.apache.org/jira/browse/YARN-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538364#comment-13538364 ] Bikas Saha commented on YARN-103: - Looks like there are different views on interface usage. How about the following proposal? The next step after this gets committed is to write a version of the AMRMClient that is more advanced and handles things like auto-heartbeat and task-container matching. This will be useful for simple applications that dont need fine grained control over scheduling. After I am done writing that client then it will be clear if the interface API's apply or if its better to write that class with a different API. Based on that we can continue to keep the interface and add a factory or remove the interface. Does this sound like a way forward? Add a yarn AM - RM client module Key: YARN-103 URL: https://issues.apache.org/jira/browse/YARN-103 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Assignee: Bikas Saha Attachments: YARN-103.1.patch, YARN-103.2.patch, YARN-103.3.patch, YARN-103.4.patch, YARN-103.4.wrapper.patch, YARN-103.5.patch, YARN-103.6.patch, YARN-103.7.patch Add a basic client wrapper library to the AM RM protocol in order to prevent proliferation of code being duplicated everywhere. Provide helper functions to perform reverse mapping of container requests to RM allocation resource request table format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-192) Node update causes NPE in the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538363#comment-13538363 ] Sandy Ryza commented on YARN-192: - Changed the test so that it fails without the fix. Node update causes NPE in the fair scheduler Key: YARN-192 URL: https://issues.apache.org/jira/browse/YARN-192 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-192-1.patch, YARN-192.patch The exception occurs when unreserve is called on an FSSchedulerApp with a NodeId that it does not know about. The RM seems to have a different idea about what apps are reserved for which node than the scheduler. 2012-10-29 22:30:52,901 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp.unreserve(FSSchedulerApp.java:356) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.unreserve(AppSchedulable.java:214) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:330) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueSchedulable.assignContainer(FSQueueSchedulable.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:759) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:836) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:329) at java.lang.Thread.run(Thread.java:662) 2012-10-29 22:30:52,903 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-192) Node update causes NPE in the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538369#comment-13538369 ] Hadoop QA commented on YARN-192: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562128/YARN-192-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/247//console This message is automatically generated. Node update causes NPE in the fair scheduler Key: YARN-192 URL: https://issues.apache.org/jira/browse/YARN-192 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-192-1.patch, YARN-192.patch The exception occurs when unreserve is called on an FSSchedulerApp with a NodeId that it does not know about. The RM seems to have a different idea about what apps are reserved for which node than the scheduler. 2012-10-29 22:30:52,901 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp.unreserve(FSSchedulerApp.java:356) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.unreserve(AppSchedulable.java:214) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:330) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueSchedulable.assignContainer(FSQueueSchedulable.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:759) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:836) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:329) at java.lang.Thread.run(Thread.java:662) 2012-10-29 22:30:52,903 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-231) Add persistent store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-231: Attachment: YARN-231.2.patch Attaching patch after rebasing on YARN-230. Addressing review comments and adding default store when recovery is enabled. Add persistent store implementation for RMStateStore Key: YARN-231 URL: https://issues.apache.org/jira/browse/YARN-231 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: YARN-231.1.patch, YARN-231.2.patch Add stores that write RM state data to ZooKeeper and FileSystem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-192) Node update causes NPE in the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-192: Attachment: YARN-192-1.patch Node update causes NPE in the fair scheduler Key: YARN-192 URL: https://issues.apache.org/jira/browse/YARN-192 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-192-1.patch, YARN-192-1.patch, YARN-192.patch The exception occurs when unreserve is called on an FSSchedulerApp with a NodeId that it does not know about. The RM seems to have a different idea about what apps are reserved for which node than the scheduler. 2012-10-29 22:30:52,901 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp.unreserve(FSSchedulerApp.java:356) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.unreserve(AppSchedulable.java:214) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:266) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:330) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueSchedulable.assignContainer(FSQueueSchedulable.java:161) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:759) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:836) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:329) at java.lang.Thread.run(Thread.java:662) 2012-10-29 22:30:52,903 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-231) Add persistent store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538442#comment-13538442 ] Bikas Saha commented on YARN-231: - {noformat} CodeWarning IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.zkClient; locked 81% of time IS Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.zkSessionTimeout; locked 50% of time {noformat} These are the warnings. I looked at the code and I dont see a synchronization issue. Maybe a different pair of eyes might spot something. If its clean I will disable the warning for that part of the code. Add persistent store implementation for RMStateStore Key: YARN-231 URL: https://issues.apache.org/jira/browse/YARN-231 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: YARN-231.1.patch, YARN-231.2.patch Add stores that write RM state data to ZooKeeper and FileSystem -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538446#comment-13538446 ] Hadoop QA commented on YARN-285: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562146/YARN-285.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/250//console This message is automatically generated. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek Dagit updated YARN-285: - Attachment: YARN-285-branch-0.23.patch RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek Dagit updated YARN-285: - Attachment: YARN-285.patch Fixes previous set of patches. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538503#comment-13538503 ] Hadoop QA commented on YARN-285: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562168/YARN-285.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/251//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/251//console This message is automatically generated. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538515#comment-13538515 ] Vinod Kumar Vavilapalli commented on YARN-285: -- Meta comment. We need this only as a temporary solution, this isn't a long term fix. Can you please file a follow up to fix the real issue. That may be blocked on ApplicationHistoryServer but let's file it anyways. Looks good overall. The MR specific plugin at MAPREDUCE-4899 will need to load mapred-site.xml to read the correct history server address, so you may need to override setConf() in that plugin to also load mapred-site.xml. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Derek Dagit updated YARN-285: - Attachment: YARN-285.patch Re-adding so as not to break the QA bot build RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-279) Generalize RM token management
[ https://issues.apache.org/jira/browse/YARN-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538573#comment-13538573 ] Karthik Kambatla commented on YARN-279: --- If we are abstracting out token management from RM, would it make sense to push it all the way down to common and use some of the existing token-management code (renewal etc.)? Generalize RM token management -- Key: YARN-279 URL: https://issues.apache.org/jira/browse/YARN-279 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.0.0, 2.0.0-alpha, 0.23.5 Reporter: Daryn Sharp Token renewal/cancelation in the RM presents challenges to support arbitrary tokens. The RM's CLASSPATH is currently required to have token renewer classes and all associated classes for the project's client. The logistics of having installs on the RM of all hadoop projects that submit jobs - just to support client connections to renew/cancel tokens - are untenable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-289) Fair scheduler doesn't satisfy smaller request when bigger one is outstanding
Sandy Ryza created YARN-289: --- Summary: Fair scheduler doesn't satisfy smaller request when bigger one is outstanding Key: YARN-289 URL: https://issues.apache.org/jira/browse/YARN-289 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza An application requests a container with 1024 MB. It then requests a container with 2048 MB. A node shows up with 1024 MB available. Even if the application is the only one running, neither request will be scheduled on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538577#comment-13538577 ] Vinod Kumar Vavilapalli commented on YARN-285: -- Can you please file a follow up ticket to fix it in a more saner way for long term? Tx. RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Fix For: 2.0.3-alpha, 0.23.6 Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-285) RM should be able to provide a tracking link for apps that have already been purged
[ https://issues.apache.org/jira/browse/YARN-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538581#comment-13538581 ] Hudson commented on YARN-285: - Integrated in Hadoop-trunk-Commit #3153 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3153/]) YARN-285. Added a temporary plugin interface for RM to be able to redirect to JobHistory server for apps that it no longer tracks. Contributed by Derek Dagit. (Revision 1425210) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425210 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/TrackingUriPlugin.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/ProxyUriUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestProxyUriUtils.java RM should be able to provide a tracking link for apps that have already been purged --- Key: YARN-285 URL: https://issues.apache.org/jira/browse/YARN-285 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 0.23.5 Reporter: Derek Dagit Assignee: Derek Dagit Fix For: 2.0.3-alpha, 0.23.6 Attachments: YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285-branch-0.23.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch, YARN-285.patch As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM. When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed. In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs. We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-284) YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized cluster
[ https://issues.apache.org/jira/browse/YARN-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved YARN-284. Resolution: Implemented Is this even valid anymore? Closing this, please re-open if necessary. YARN capacity scheduler doesn't spread MR tasks evenly on an underutilized cluster -- Key: YARN-284 URL: https://issues.apache.org/jira/browse/YARN-284 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon The fair scheduler in MR1 has the behavior that, if a job is submitted to an under-utilized cluster and the cluster has more open slots than tasks in the job, the tasks are spread evenly throughout the cluster. This improves job latency since more spindles and NICs are utilized to complete the job. In MR2 I see this issue causing significantly longer job runtimes when there is excess capacity in the cluster -- especially on reducers which sometimmes end up clumping together on a smaller set of nodes which then become disk/network constrained. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-290) Wrong cluster metrics on RM page
Lohit Vijayarenu created YARN-290: - Summary: Wrong cluster metrics on RM page Key: YARN-290 URL: https://issues.apache.org/jira/browse/YARN-290 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Priority: Minor ResourceManager seems to always show few (1-3) applications in pending state on ResourceManager webpage under Cluster metrics tab, while there are no pending applications. It is very easy to replicate. Start RM, submit one job and you would see there is 2 pending applications which is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-290) Wrong cluster metrics on RM page
[ https://issues.apache.org/jira/browse/YARN-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538610#comment-13538610 ] Sandy Ryza commented on YARN-290: - Which scheduler(s) is the occurring under? Wrong cluster metrics on RM page Key: YARN-290 URL: https://issues.apache.org/jira/browse/YARN-290 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Priority: Minor ResourceManager seems to always show few (1-3) applications in pending state on ResourceManager webpage under Cluster metrics tab, while there are no pending applications. It is very easy to replicate. Start RM, submit one job and you would see there is 2 pending applications which is incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira