[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2854: Attachment: YARN-2854.20150313-1.patch Patch with [~zjshen] 's 1st and 2nd comment fixed. The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360739#comment-14360739 ] Wangda Tan commented on YARN-3243: -- YARN-3204 tracks findbugs warning, and test failure is not related to this change. CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360669#comment-14360669 ] Tsuyoshi Ozawa commented on YARN-2890: -- [~mitdesai] thank you for updating a patch! +1 for the change itself to make it configurable. {quote} I was trying out the fix for MiniMRYarnCluster where we want to start the timeline service only if the TIMELINE_SERVICE_ENABLED == true. But as per current implementation of the miniCluster, it takes in a boolean when its instance is created to decide whether to start or not to start the timeline server. {quote} I don't understand the context - why you'd like to make the flag off by default? Could you clarify it? IMO, it would be enough to make it configurable. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360805#comment-14360805 ] zhihai xu commented on YARN-3336: - TestRMWebServices and TestFairSchedulerQueueACLs passed in my local latest build and both test failures are not related to my patch. {code} --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.871 sec - in org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.004 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices Results : Tests run: 19, Failures: 0, Errors: 0, Skipped: 0 {code} The findbugs warnings are also not related to my patch. YARN-3341 is to fix one of the findbugs warnings. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else {
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360822#comment-14360822 ] Junping Du commented on YARN-3034: -- Hi [~vinodkv], for DistributedShell patch, we have assumption that v1 and v2 service could running at the same time (also for TestDistributedShell cases, we test v1 and v2 on the same miniYARN cluster). AM get launched with different parameter of version, then it pass the boolean value of newTimelineService to TimelineClient which will call related functions - that is current flow we have. [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360797#comment-14360797 ] Vinod Kumar Vavilapalli commented on YARN-3034: --- We don't have a plan to directly put any metrics data from NM to the storage yet. So, agree that this is an issue, but not an immediate one. When we come to it, may be we will have a yarn.system-metrics-publisher.enabled which is used by both RM and NM and deprecates the current RM flag. +1 for a yarn.timeline-service.version. This is what we should have done for the DistributedShell patch? /cc [~djp]. May be for all clients when YARN-2928 is ready to go in? [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360823#comment-14360823 ] Li Lu commented on YARN-3034: - Hi [~Naganarasimha], thanks for the clarification. I think this way of organization is fine for now. [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034-20150312-1.patch, YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360680#comment-14360680 ] Hudson commented on YARN-3267: -- FAILURE: Integrated in Hadoop-trunk-Commit #7316 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7316/]) YARN-3267. Timelineserver applies the ACL rules after applying the limit on the number of records (Chang Li via jeagles) (jeagles: rev 8180e676abb2bb500a48b3a0c0809d2a807ab235) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestTimelineDataManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineReader.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRTimelineEventHandling.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/jobhistory/TestJobHistoryEventHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360938#comment-14360938 ] Mit Desai commented on YARN-2890: - bq. The above ctor was removed. If anyone is using MiniMRYARNCluster from 2.4.0 to test their jobs, this will break compatibility. My latest patch no longer removes this constructor bq. Why use a hardcoded false instead of the DEFAULT field from YarnConfiguration? Makes sense. Thanks. I will update the patch to use the default value which is set as false already. But I will wait for Zhijie's response before updating the patch. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360686#comment-14360686 ] Junping Du commented on YARN-3225: -- bq. If we have a constraint that we should issue graceful decommission command from only one RMAdmin CLI then this issue will not be a problem. Can we have this assumption in our first phase (target to catch up 2.8)? IMO, Decommissioning nodes is very restrictive operation that we don't expect multiple could happen at the same time on a cluster. We can improve later if we think this is not good enough. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360809#comment-14360809 ] Hitesh Shah commented on YARN-2890: --- 2 issues with the patch: {code} public MiniMRYarnCluster(String testName, int noOfNMs, boolean enableAHS) {code} The above ctor was removed. If anyone is using MiniMRYARNCluster from 2.4.0 to test their jobs, this will break compatibility. {code} conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, false) {code} Why use a hardcoded false instead of the DEFAULT field from YarnConfiguration? Also, to add to Tsuyoshi's comment, what is the issue with turning on Timeline in all scenarios? If Timeline is going to be a first class citizen of YARN going forwards, why make it false by default? [~zjshen] comments on this? MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360704#comment-14360704 ] Mit Desai commented on YARN-2890: - Thats because not everything is using the timeline server. Turning it off by default will prevent users from accidentally using the timeline server if they do not intend to. Moreover if someone intends to use the timeline server, they are well aware and can turn the flag on. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360697#comment-14360697 ] Karthik Kambatla commented on YARN-3306: [~cwelch] - thanks for the clarifications. I also spoke to Vinod offline on the goals and likely path of this work. I see the benefits of having a single scheduler with pluggable policies. I feel it might be easier to implement a new scheduler and plug-in FS and CS policies into it. However, I understand the iterative approach you propose will get validation from current users along the way. I am a little circumspect about the iterative approach and how we avoid regressions, but remain hopeful the code will convince me it is the right approach. I would like to be involved in the work here, can we work on a branch and merge in as and when appropriate. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container
[ https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360881#comment-14360881 ] Hadoop QA commented on YARN-3291: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704462/YARN-3291.patch against trunk revision f446669. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6957//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6957//console This message is automatically generated. DockerContainerExecutor should run as a non-root user inside the container -- Key: YARN-3291 URL: https://issues.apache.org/jira/browse/YARN-3291 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Assignee: Abin Shahab Attachments: YARN-3291.patch, YARN-3291.patch Currently DockerContainerExecutor runs container as root(inside the container). Outside the container it runs as yarn. Inside the this can be run as the user which is not root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360967#comment-14360967 ] Vinod Kumar Vavilapalli commented on YARN-3306: --- Yup, the discussion above captures the main bits, but to summarize How do we avoid the fragmentation for this feature itself? - By putting the framework in the common place and use it in specific schedulers one after another. Is this only for leaf-queue? - No, we start with leaf queue and demonstrate viability across different existing policies and then move up to parent queue (which should be easier than leaf-queue) and extending to limits Why not a new scheduler? # Getting existing users to validate our changes, and a smoother migration path. # Make sure current behaviors are completely absorbed. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3291) DockerContainerExecutor should run as a non-root user inside the container
[ https://issues.apache.org/jira/browse/YARN-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3291: -- Attachment: YARN-3291.patch Removed findbugs warning DockerContainerExecutor should run as a non-root user inside the container -- Key: YARN-3291 URL: https://issues.apache.org/jira/browse/YARN-3291 Project: Hadoop YARN Issue Type: Improvement Reporter: Abin Shahab Assignee: Abin Shahab Attachments: YARN-3291.patch, YARN-3291.patch Currently DockerContainerExecutor runs container as root(inside the container). Outside the container it runs as yarn. Inside the this can be run as the user which is not root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360841#comment-14360841 ] Hadoop QA commented on YARN-2854: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704461/YARN-2854.20150313-1.patch against trunk revision 8180e67. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6956//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6956//console This message is automatically generated. The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2480) DockerContainerExecutor must support user namespaces
[ https://issues.apache.org/jira/browse/YARN-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361049#comment-14361049 ] Ravindra Kumar Naik commented on YARN-2480: --- Though this support exists in Linux containers (LXC), docker doesn't yet support such mapping Please have a look at https://github.com/docker/docker/issues/7906 DockerContainerExecutor must support user namespaces Key: YARN-2480 URL: https://issues.apache.org/jira/browse/YARN-2480 Project: Hadoop YARN Issue Type: New Feature Reporter: Abin Shahab Labels: security When DockerContainerExector launches a container, the root inside that container has root privileges on the host. This is insecure in a mult-tenant environment. The uid of the container's root user must be mapped to a non-privileged user on the host. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361074#comment-14361074 ] Sangjin Lee commented on YARN-3039: --- I stand corrected [~djp]. For some strange reason I missed the null check in the while loop, which is why I mistakenly thought that every call would end up right in the Thread.sleep(). Thanks for the correction. [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2854: -- Issue Type: Improvement (was: Bug) The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361035#comment-14361035 ] Zhijie Shen commented on YARN-2890: --- bq. If Timeline is going to be a first class citizen of YARN going forwards, why make it false by default? I think so far it's still not assumed that timeline service is the always enabled component, though we we'd like to propose it, but maybe more persuasive until ATS v2? And for MiniMRYARNCluster there's a technical issue too. Because of the singleton in Guice, only one webapp can be created per daemon. Enabling the ATS will break the other web test cases around RM/NM (if I remember it correctly, there seems to have such tests). MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361089#comment-14361089 ] Junping Du commented on YARN-3225: -- One additional comments: For RMNodeEventType, DECOMMISSION_WITH_DELAY sounds better? {code} RMNodeEventType.java @@ -24,6 +24,7 @@ // Source: AdminService DECOMMISSION, + DECOMMISSION_WITH_TIMEOUT, {code} New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361072#comment-14361072 ] Zhijie Shen commented on YARN-2854: --- Sorry, I missed that sentence. The new patch looks good to me. Will commit it and the image. The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361084#comment-14361084 ] Hitesh Shah commented on YARN-2890: --- Thanks [~mitdesai]. In the future, it would be good if your patches are versioned to avoid confusion. More questions on the patch: - testTimelineServiceStartInMiniCluster() - is there a reason why a job is run when timeline is enabled but not run when it is disabled? - should be a job run be needed here in the first place given the name of the test? - might be better to move the testing of job runs based on absence/presence of timeline to a separate test - testMRTimelineEventHandling, testMapreduceJobTimelineServiceEnabled, testMapreduceJobTimelineServiceEnabled - is there a need to change all of them? - there does not seem to be a code path that tests timeline being enabled by passing the enableAHS value in the ctor if all these are changed. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361101#comment-14361101 ] Hudson commented on YARN-2854: -- FAILURE: Integrated in Hadoop-trunk-Commit #7321 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7321/]) YARN-2854. Updated the documentation of the timeline service and the generic history service. Contributed by Naganarasimha G R. (zjshen: rev 6fdef76cc3e818856ddcc4d385c2899a8e6ba916) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/resources/images/timeline_structure.jpg * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md * hadoop-yarn-project/CHANGES.txt The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361103#comment-14361103 ] Zhijie Shen commented on YARN-2854: --- [~Naganarasimha], would you please check the branch-2? The patch cannot be merged to the branch-2 clearly. See if we need to create patch for branch-2 only. Thanks! The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, YARN-2854.20150313-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361267#comment-14361267 ] Sangjin Lee commented on YARN-3039: --- [~djp], a couple of quick comments (I'll follow up after reviewing the latest patch more carefully). {quote} We could have an annotation in class level which is default publicity and stability for each method. However, each method could have its own annotation to override the class one. In most cases, the class level annotation is more public and stable than individual methods which is first-class contract with end users or other components (or they will have concerns to use it). Take an example, if we need to add a new API which is not stable yet to a protocol class marked with stable, we shouldn't regression the whole class from stable to evolving or unstable, but we can mark the new method as unstable or evolving. Make sense? {quote} Yes, I get the reasoning for annotating individual methods. My concern is more about the *new classes*. Note that we're still evolving even the class names. This might be a fine point, but I feel we should annotate the *new classes* at least as unstable for now in addition to the method annotations. Thoughts? {quote} bq. RMAppImpl.java, Would this be backward compatible from the RM state store perspective? I don't think so. ApplicationDataProto is also be a protobuffer object, and new field for aggreagtorAddress is optional. {quote} So you mean it will be backward compatible, right? [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361280#comment-14361280 ] Sangjin Lee commented on YARN-3039: --- [~zjshen], [~djp], regarding the idea about having IPC from NM to the per-app collector, I don't think that will work with a special container use case. The special container for the per-app collector will bind to a port for RPC that will not be determined until the time the collector binds to it. So it's basically a chicken-and-egg problem: NM doesn't know the RPC port for the per-all collector in the special container until ... the special containers tells it. This is not a problem with the current per-node collector container situation. Although it's a little roundabout, I don't see a fundamental problem with having the per-app collector (or the collection of them) sending its location to the NM once it's up. It's actually conceptually simpler, and it should work in all 3 modes (aux service, standalone per-node daemon, and special container). [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361300#comment-14361300 ] Jian He commented on YARN-3305: --- looks good overall, why is unmanagedAM check needed? {code} if (!submissionContext.getUnmanagedAM()) {code} AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361565#comment-14361565 ] Rohith commented on YARN-3305: -- Unmanaged applications need not necessarily send AM RR(ResourceRequest) because RM won't allocate a container for the AM and start it. Instead RM expect AM to be launched and connect to the RM within the AM liveliness period. So for unmanaged applications RR can be null which would cause NPE while normalizing RR. AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361586#comment-14361586 ] Jian He commented on YARN-3273: --- thanks Rohith ! overall looks great ! - synchronization of userResourceLimit - I think we can use volatile keyword for the userResourceLimit and do not need the synchronized keyword {code} public Resource getUserResourceLimit() { return userResourceLimit; } public synchronized void setUserResourceLimit(Resource userResourceLimit) { this.userResourceLimit = userResourceLimit; } {code} - SchedulerCommonInfo - SchedulerInfo - I think it’s fine to store below info with Resource type. i.e. “private Resource minAllocResource”; similarly on the UI, we can expose as minimum allocation resource {code} protected long minAllocMemory; protected long maxAllocMemory; protected long minAllocVirtualCores; protected long maxAllocVirtualCores; {code} - just for better code readability, we may use a variable to store the entry.getValue {code} usersToReturn.add(new UserInfo(entry.getKey(), Resources.clone(entry .getValue().getUsed()), entry.getValue().getActiveApplications(), entry.getValue().getPendingApplications(), Resources.clone(entry .getValue().getConsumedAMResources()), Resources.clone(entry .getValue().getUserResourceLimit(; {code} - the headroom rendering should be inside the webUiType.equals(YarnWebParams.RM_WEB_UI) check. {code}// TODO Need to get HeadRoom from scheduler and render it web ui {code} bq. I think headroom can be in RMAppAttemptMetric and render only if attempt is running. sounds good to me. Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361572#comment-14361572 ] Hadoop QA commented on YARN-3284: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704544/YARN-3284.2.patch against trunk revision 6fdef76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 13 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1153 javac compiler warnings (more than the trunk's current 1152 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6960//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6960//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6960//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6960//console This message is automatically generated. Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361590#comment-14361590 ] Jian He commented on YARN-3305: --- ah, right. forgot about that. will commit this. thanks ! AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a Fair SchedulerOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.17.patch With support for configuration via the scheduler's config file Implement a Fair SchedulerOrderingPolicy Key: YARN-3319 URL: https://issues.apache.org/jira/browse/YARN-3319 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3319.13.patch, YARN-3319.14.patch, YARN-3319.17.patch Implement a Fair SchedulerOrderingPolicy which prefers to allocate to SchedulerProcesses with least current usage, very similar to the FairScheduler's FairSharePolicy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361607#comment-14361607 ] Rohith commented on YARN-3284: -- Hi [~xgong], thanks for working on this Jira. For displaying application headroom for running applications( one of the point in YARN-3273), it is required to expose applicationHeadroom field in the ApplicationAttemptMetrics.java. Would you please mind adding this field in your patch that helps retrieve headroom? Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361338#comment-14361338 ] Hadoop QA commented on YARN-3212: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704518/YARN-3212-v1.patch against trunk revision 6fdef76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6958//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6958//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6958//console This message is automatically generated. RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3284) Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command
[ https://issues.apache.org/jira/browse/YARN-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3284: Attachment: YARN-3284.2.patch Expose more ApplicationMetrics and ApplicationAttemptMetrics through YARN command - Key: YARN-3284 URL: https://issues.apache.org/jira/browse/YARN-3284 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3284.1.patch, YARN-3284.1.patch, YARN-3284.2.patch Current, we have some extra metrics about the application and current attempt in RM Web UI. We should expose that information through YARN Command, too. 1. Preemption metrics 2. application outstanding resource requests 3. container locality info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3212: - Attachment: YARN-3212-v1.patch Upload the first patch for core state changes with decommissioning. For RMNodeEventType, I would prefer DECOMMISSION_WITH_DELAY over DECOMMISSION_WITH_TIMEOUT like my comments in YARN-3225. May update later if that comments get adopted. RMNode State Transition Update with DECOMMISSIONING state - Key: YARN-3212 URL: https://issues.apache.org/jira/browse/YARN-3212 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Junping Du Assignee: Junping Du Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and can transition from “running” state triggered by a new event - “decommissioning”. This new state can be transit to state of “decommissioned” when Resource_Update if no running apps on this NM or NM reconnect after restart. Or it received DECOMMISSIONED event (after timeout from CLI). In addition, it can back to “running” if user decides to cancel previous decommission by calling recommission on the same node. The reaction to other events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361409#comment-14361409 ] Vinod Kumar Vavilapalli commented on YARN-1963: --- Assuming integers are supported - Do we have a range? Otherwise, nothing stops users from setting their priority be INTEGER_MAX and everybody scratching their heads. - If we have a range, which side is up? is -20 20 like unix (isn't intuitive at all to me) or -20 20 (intuitive)? - Either ways, it is an implicit decision that needs to be documented and told to users explicitly. Labels convey that without any of that. - What does a negative priority means anything anyways? - Admin comes and says I need a new super-high priority, now your ranges need to be dynamically size-able. I don't see a difference between say 10 priorities and 10 labeled priorities, other than that labels are better in the following - They are more *human readable* on the UI and CLIs: This app has priority 19 doesn't give much feedback as much as This app has HIGH priority - Even if we don't want them now, you can let admins create new priorities between two existing ones, create a new priority lower than the lowest easily etc. With integers, you start with 0-10, then adding one more lower than them all takes them into negative priorities' territory making it all confusing. - Specifying restrictions is very straight forward: for a root.enginnering queue, VERY_HIGH can be only be used by (u1,u2, g1), HIGH by (u3, u4) and everything else by everyone. The way I see it, we will provide a predefined set of labeled priorities that should work for 80% of the clusters, the remaining can define their own set. Support priorities across applications within the same queue - Key: YARN-1963 URL: https://issues.apache.org/jira/browse/YARN-1963 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Reporter: Arun C Murthy Assignee: Sunil G Attachments: 0001-YARN-1963-prototype.patch, YARN Application Priorities Design.pdf, YARN Application Priorities Design_01.pdf It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3318: -- Attachment: YARN-3318.17.patch With support for configuration via the scheduler's config file Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior --- Key: YARN-3318 URL: https://issues.apache.org/jira/browse/YARN-3318 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3318.13.patch, YARN-3318.14.patch, YARN-3318.17.patch Create the initial framework required for using OrderingPolicies with SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.
[ https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361274#comment-14361274 ] Jian He commented on YARN-3243: --- one thing is that the approach of subtracting all reserved resources to pass through various limits so as to dive down into the sub-queues may cause a lot of dry loop, which can be fixed separately. Patch looks good to me. +1 CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits. - Key: YARN-3243 URL: https://issues.apache.org/jira/browse/YARN-3243 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch, YARN-3243.4.patch, YARN-3243.5.patch Now CapacityScheduler has some issues to make sure ParentQueue always obeys its capacity limits, for example: 1) When allocating container of a parent queue, it will only check parentQueue.usage parentQueue.max. If leaf queue allocated a container.size (parentQueue.max - parentQueue.usage), parent queue can excess its max resource limit, as following example: {code} A (usage=54, max=55) / \ A1 A2 (usage=1, max=55) (usage=53, max=53) {code} Queue-A2 is able to allocate container since its usage max, but if we do that, A's usage can excess A.max. 2) When doing continous reservation check, parent queue will only tell children you need unreserve *some* resource, so that I will less than my maximum resource, but it will not tell how many resource need to be unreserved. This may lead to parent queue excesses configured maximum capacity as well. With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, *here is my proposal*: - ParentQueue will set its children's ResourceUsage.headroom, which means, *maximum resource its children can allocate*. - ParentQueue will set its children's headroom to be (saying parent's name is qA): min(qA.headroom, qA.max - qA.used). This will make sure qA's ancestors' capacity will be enforced as well (qA.headroom is set by qA's parent). - {{needToUnReserve}} is not necessary, instead, children can get how much resource need to be unreserved to keep its parent's resource limit. - More over, with this, YARN-3026 will make a clear boundary between LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API
[ https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361397#comment-14361397 ] Hadoop QA commented on YARN-3345: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704531/YARN-3345.1.patch against trunk revision 6fdef76. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1153 javac compiler warnings (more than the trunk's current 1152 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6959//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6959//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6959//console This message is automatically generated. Add non-exclusive node label RMAdmin CLI/API Key: YARN-3345 URL: https://issues.apache.org/jira/browse/YARN-3345 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3345.1.patch As described in YARN-3214 (see design doc attached to that JIRA), we need add non-exclusive node label RMAdmin API and CLI implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3345) Add non-exclusive node label RMAdmin CLI/API
[ https://issues.apache.org/jira/browse/YARN-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3345: - Attachment: YARN-3345.1.patch Attached ver.1 patch. Add non-exclusive node label RMAdmin CLI/API Key: YARN-3345 URL: https://issues.apache.org/jira/browse/YARN-3345 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3345.1.patch As described in YARN-3214 (see design doc attached to that JIRA), we need add non-exclusive node label RMAdmin API and CLI implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361353#comment-14361353 ] Junping Du commented on YARN-3039: -- bq. Yes, I get the reasoning for annotating individual methods. My concern is more about the new classes. Note that we're still evolving even the class names. This might be a fine point, but I feel we should annotate the new classes at least as unstable for now in addition to the method annotations. Thoughts? Agree. I think in v5 patch, I tried to mark all interfaces (include some abstract classes, we don't need to mark implementation because it follow the same with parent class/interface) with either Evolving or Unstable. Please let me know if I miss something there. bq. So you mean it will be backward compatible, right? Yes. I mean this. bq. NM doesn't know the RPC port for the per-all collector in the special container until ... the special containers tells it. This is not a problem with the current per-node collector container situation. Make sense. That's also a good reason to keep NM as RPC server and aggregator(collector)Collection as client. [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3294: Attachment: apache-yarn-3294.1.patch Uploading patch again to kick off jenkins. Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3336: Attachment: YARN-3336.003.patch FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360032#comment-14360032 ] Naganarasimha G R commented on YARN-3326: - Hi [~vvasudev], consider the scenario where user wants for all labels and hence does not pass any labels ? In that case the url will be just /nodes which i feel not so good , ur thoughts? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360007#comment-14360007 ] Varun Vasudev commented on YARN-3326: - How about /nodes?labels=label1,label2 etc? If I understand it right - you want to give a list of labels and get the nodes back for those labels, so /nodes?labels= form should work? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360025#comment-14360025 ] zhihai xu commented on YARN-3336: - I uploaded a new patch YARN-3336.003.patch to fix the test failure due to the change in FileSystem. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj instanceof Key) { Key that = (Key)obj; return isEqual(this.scheme, that.scheme) isEqual(this.authority, that.authority) isEqual(this.ugi, that.ugi) (this.unique == that.unique); } return false; } {code} UserGroupInformation.equals will compare subject by reference. {code} public boolean equals(Object o) { if (o == this) { return true; } else if (o == null || getClass() != o.getClass()) { return false; } else { return subject == ((UserGroupInformation) o).subject; } } {code} So in this case, every time createProxyUser and FileSystem.get(getConfig()) are called, a new FileSystem will be created and a new entry will be added to FileSystem.CACHE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3263) ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream
[ https://issues.apache.org/jira/browse/YARN-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu resolved YARN-3263. - Resolution: Not a Problem This is not an issue. tokens.rewind() is called before credentials.readTokenStorageStream(buf). This will have the same effect as rewind after readTokenStorageStream. Also no other place accesses the tokens except parseCredentials. ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream -- Key: YARN-3263 URL: https://issues.apache.org/jira/browse/YARN-3263 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: zhihai xu Assignee: zhihai xu ContainerManagerImpl#parseCredentials don't rewind the ByteBuffer after credentials.readTokenStorageStream. So the next time if we access Tokens, we will have EOFException. The following is the code for parseCredentials in ContainerManagerImpl. {code} private Credentials parseCredentials(ContainerLaunchContext launchContext) throws IOException { Credentials credentials = new Credentials(); // Parse credentials ByteBuffer tokens = launchContext.getTokens(); if (tokens != null) { DataInputByteBuffer buf = new DataInputByteBuffer(); tokens.rewind(); buf.reset(tokens); credentials.readTokenStorageStream(buf); if (LOG.isDebugEnabled()) { for (Token? extends TokenIdentifier tk : credentials.getAllTokens()) { LOG.debug(tk.getService() + = + tk.toString()); } } } // End of parsing credentials return credentials; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360054#comment-14360054 ] Varun Vasudev commented on YARN-3326: - Will /nodes?labels=* work? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
[ https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360129#comment-14360129 ] Gururaj Shetty commented on YARN-3261: -- Hi [~aw]/[~rohithsharma], Kindly review the patch attached. rewrite resourcemanager restart doc to remove roadmap bits --- Key: YARN-3261 URL: https://issues.apache.org/jira/browse/YARN-3261 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3261.01.patch Another mixture of roadmap and instruction manual that seems to be ever present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3273: - Attachment: 0001-YARN-3273-v2.patch Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360137#comment-14360137 ] Hadoop QA commented on YARN-3294: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704374/apache-yarn-3294.1.patch against trunk revision 387f271. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1177 javac compiler warnings (more than the trunk's current 1152 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.ha.TestActiveStandbyElectorRealZK org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.metrics2.lib.TestMutableMetrics org.apache.hadoop.yarn.server.resourcemanager.TestRMRestTestTests org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6952//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6952//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6952//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6952//console This message is automatically generated. Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360148#comment-14360148 ] Rohith commented on YARN-3273: -- Attached v2 patch for surfacing scheduler metrics. And attached the screenshot of changed UI page. YARN-3273-am-resource-used-AND-User-limit-v2.PNG shows following metrics # SchedulerMetrics table is added in front page. This table containes generic scheduler data like schedulerType,schedulerResourceType,min/max resource allocation. This table can be used in future for other common scheduler metrics to display. # *Used Application Master Resources:* added for each leafqueue info. # Active users info table is added per CS#LeafQueue. This display user's ResourceLimit, ReusourceUsed,AM Resource,AM ResourceUsed and others. Since it is specific to CS, this is added in this page. YARN-3273-application-headroom-v2.PNG # For headroom, it is added only display block with empty data. Since headroom is not part of RMAppAttemptMetrics, retrieving this info directly from scheduler is tedious on the fly. Headroom need to be stored in either RMApp or RMAttempt state. I think headroom can be in RMAppAttemptMetric and render only if attempt is running. Any thoughts? Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360155#comment-14360155 ] Varun Vasudev commented on YARN-3326: - In general REST APIs are supposed to be about resources. labelsToNodes is not a resource. {quote} getNodeToLabels to /get-node-to-labels replaceLabelsOnNodes to /replace-node-to-labels getClusterNodeLabels to /get-node-labels addToClusterNodeLabels to /add-node-labels removeFromCluserNodeLabels to /remove-node-labels getLabelsOnNode to /nodes/\{nodeId}/get-labels replaceLabelsOnNodes to /replace-node-to-labels ... {quote} are not about resources either but they're already in and by adding more APIs of that form we're making things worse. ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360146#comment-14360146 ] Naganarasimha G R commented on YARN-3326: - Well [~vvasudev], i feel /nodes would be better than having internal logic for \* and checked again and saw that /nodes is already used for getNodes but what abt /labelsToNodes or /labels-to-nodes you feel this is big ? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360171#comment-14360171 ] Naganarasimha G R commented on YARN-3326: - [~vvasudev] How about /label-mappings?label=label1,label2,... ? ReST support for getLabelsToNodes -- Key: YARN-3326 URL: https://issues.apache.org/jira/browse/YARN-3326 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.6.0 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-3326.20150310-1.patch REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360140#comment-14360140 ] Hadoop QA commented on YARN-3273: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704390/YARN-3273-application-headroom-v2.PNG against trunk revision 387f271. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6953//console This message is automatically generated. Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
[ https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated YARN-3261: - Attachment: YARN-3261.01.patch rewrite resourcemanager restart doc to remove roadmap bits --- Key: YARN-3261 URL: https://issues.apache.org/jira/browse/YARN-3261 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3261.01.patch Another mixture of roadmap and instruction manual that seems to be ever present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3273: - Attachment: YARN-3273-application-headroom-v2.PNG YARN-3273-am-resource-used-AND-User-limit-v2.PNG Improve web UI to facilitate scheduling analysis and debugging -- Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Rohith Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, YARN-3273-am-resource-used-AND-User-limit.PNG, YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3336) FileSystem memory leak in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360141#comment-14360141 ] Hadoop QA commented on YARN-3336: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704376/YARN-3336.003.patch against trunk revision 387f271. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6951//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6951//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6951//console This message is automatically generated. FileSystem memory leak in DelegationTokenRenewer Key: YARN-3336 URL: https://issues.apache.org/jira/browse/YARN-3336 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3336.000.patch, YARN-3336.001.patch, YARN-3336.002.patch, YARN-3336.003.patch FileSystem memory leak in DelegationTokenRenewer. Every time DelegationTokenRenewer#obtainSystemTokensForUser is called, a new FileSystem entry will be added to FileSystem#CACHE which will never be garbage collected. This is the implementation of obtainSystemTokensForUser: {code} protected Token?[] obtainSystemTokensForUser(String user, final Credentials credentials) throws IOException, InterruptedException { // Get new hdfs tokens on behalf of this user UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser()); Token?[] newTokens = proxyUser.doAs(new PrivilegedExceptionActionToken?[]() { @Override public Token?[] run() throws Exception { return FileSystem.get(getConfig()).addDelegationTokens( UserGroupInformation.getLoginUser().getUserName(), credentials); } }); return newTokens; } {code} The memory leak happened when FileSystem.get(getConfig()) is called with a new proxy user. Because createProxyUser will always create a new Subject. The calling sequence is FileSystem.get(getConfig())=FileSystem.get(getDefaultUri(conf), conf)=FileSystem.CACHE.get(uri, conf)=FileSystem.CACHE.getInternal(uri, conf, key)=FileSystem.CACHE.map.get(key)=createFileSystem(uri, conf) {code} public static UserGroupInformation createProxyUser(String user, UserGroupInformation realUser) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException(Null user); } if (realUser == null) { throw new IllegalArgumentException(Null real user); } Subject subject = new Subject(); SetPrincipal principals = subject.getPrincipals(); principals.add(new User(user)); principals.add(new RealUser(realUser)); UserGroupInformation result =new UserGroupInformation(subject); result.setAuthenticationMethod(AuthenticationMethod.PROXY); return result; } {code} FileSystem#Cache#Key.equals will compare the ugi {code} Key(URI uri, Configuration conf, long unique) throws IOException { scheme = uri.getScheme()==null?:uri.getScheme().toLowerCase(); authority = uri.getAuthority()==null?:uri.getAuthority().toLowerCase(); this.unique = unique; this.ugi = UserGroupInformation.getCurrentUser(); } public boolean equals(Object obj) { if (obj == this) { return true; } if (obj != null obj
[jira] [Commented] (YARN-3261) rewrite resourcemanager restart doc to remove roadmap bits
[ https://issues.apache.org/jira/browse/YARN-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360197#comment-14360197 ] Hadoop QA commented on YARN-3261: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704385/YARN-3261.01.patch against trunk revision 387f271. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6954//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6954//console This message is automatically generated. rewrite resourcemanager restart doc to remove roadmap bits --- Key: YARN-3261 URL: https://issues.apache.org/jira/browse/YARN-3261 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3261.01.patch Another mixture of roadmap and instruction manual that seems to be ever present in a lot of the recently written documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360227#comment-14360227 ] Rohith commented on YARN-3305: -- Updated the patch correcting test failures. TestCapacitySchedulerNodeLabelUpdate failure JIRA is YARN-3343 . Kindly review the updated patch AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360250#comment-14360250 ] Hudson commented on YARN-3154: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #865 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/865/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360251#comment-14360251 ] Hudson commented on YARN-3338: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #865 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/865/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-project/pom.xml * hadoop-yarn-project/CHANGES.txt Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360237#comment-14360237 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #131 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/131/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360437#comment-14360437 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360415#comment-14360415 ] Mit Desai commented on YARN-2890: - [~hitesh] [~zjshen] Can you guys take a look? MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360438#comment-14360438 ] Hudson commented on YARN-3338: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #122 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/122/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360430#comment-14360430 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2063 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360431#comment-14360431 ] Hudson commented on YARN-3338: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2063 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2063/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-3305: - Attachment: 0002-YARN-3305.patch AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360238#comment-14360238 ] Hudson commented on YARN-3338: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #131 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/131/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360297#comment-14360297 ] Hadoop QA commented on YARN-3305: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704406/0002-YARN-3305.patch against trunk revision 387f271. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRM Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6955//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6955//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6955//console This message is automatically generated. AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360498#comment-14360498 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #131 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/131/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
Varun Vasudev created YARN-3348: --- Summary: Add a 'yarn top' tool to help understand cluster usage Key: YARN-3348 URL: https://issues.apache.org/jira/browse/YARN-3348 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev It would be helpful to have a 'yarn top' tool that would allow administrators to understand which apps are consuming resources. Ideally the tool would allow you to filter by queue, user, maybe labels, etc and show you statistics on container allocation across the cluster to find out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3338) Exclude jline dependency from YARN
[ https://issues.apache.org/jira/browse/YARN-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360364#comment-14360364 ] Hudson commented on YARN-3338: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2081 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2081/]) YARN-3338. Exclude jline dependency from YARN. Contributed by Zhijie (xgong: rev 06ce1d9a6cd9bec25e2f478b98264caf96a3ea44) * hadoop-yarn-project/CHANGES.txt * hadoop-project/pom.xml Exclude jline dependency from YARN -- Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3338.1.patch It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360363#comment-14360363 ] Hudson commented on YARN-3154: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2081 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2081/]) YARN-3154. Added additional APIs in LogAggregationContext to avoid aggregating running logs of application when rolling is enabled. Contributed by Xuan Gong. (vinodkv: rev 863079bb874ba77918ca1c0741eae10e245995c8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/LogAggregationContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LogAggregationContextPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch, YARN-3154.4.patch Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360387#comment-14360387 ] Varun Vasudev commented on YARN-3294: - The findbugs errors are unrelated to the patch. The test failures are also unrelated as per my analysis and I'm unsure of the Javac warnings since they seem to be from files I didn't modify. [~jianhe], can you help me out and take a look at the patch? Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period - Key: YARN-3294 URL: https://issues.apache.org/jira/browse/YARN-3294 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, apache-yarn-3294.0.patch, apache-yarn-3294.1.patch It would be nice to have a button on the web UI that would allow dumping of debug logs for just the capacity scheduler for a fixed period of time(1 min, 5 min or so) in a separate log file. It would be useful when debugging scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360591#comment-14360591 ] Devaraj K commented on YARN-3225: - I was describing in my previous comment about having two RMAdmin CLI's for increasing the timeout value, one CLI runs with the timeout(say ‘x’) value and continues waiting for timeout, during this time another CLI issues the command with the higher timeout(x). If we keep the CLI(x timeout) running it would issue the forceful decommission with x timeout and new CLI timeout(x) will not reflect. If we have a constraint that we should issue graceful decommission command from only one RMAdmin CLI then this issue will not be a problem. New parameter or CLI for decommissioning node gracefully in RMAdmin CLI --- Key: YARN-3225 URL: https://issues.apache.org/jira/browse/YARN-3225 Project: Hadoop YARN Issue Type: Sub-task Reporter: Junping Du Assignee: Devaraj K Attachments: YARN-3225.patch, YARN-914.patch New CLI (or existing CLI with parameters) should put each node on decommission list to decommissioning status and track timeout to terminate the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3305) AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360586#comment-14360586 ] Rohith commented on YARN-3305: -- Failed test is unrelated to this patch. AM-Used Resource for leafqueue is wrongly populated if AM ResourceRequest is less than minimumAllocation Key: YARN-3305 URL: https://issues.apache.org/jira/browse/YARN-3305 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Rohith Assignee: Rohith Attachments: 0001-YARN-3305.patch, 0001-YARN-3305.patch, 0002-YARN-3305.patch For given any ResourceRequest, {{CS#allocate}} normalizes request to minimumAllocation if requested memory is less than minimumAllocation. But AM-used resource is updated with actual ResourceRequest made by user. This results in AM container allocation more than Max ApplicationMaster Resource. This is because AM-Used is updated with actual ResourceRequest made by user while activating the applications. But during allocation of container, ResourceRequest is normalized to minimumAllocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360605#comment-14360605 ] Zhijie Shen commented on YARN-2854: --- Hi Naga, thanks for updating the patch. Almost good to me. Some additional comments. 1. Format is wrong around the following sentence (from the built web page). It seems to be the problem of quote mark. Would you please double check? {code} Developers can define what information they want to record for their applications by composing `TimelineEntity and `TimelineEvent` objects, and put the entities and events to the Timeline server via `TimelineClient`. Below is an example: {code} 2. Changing Publishing of per-framework data by applications to Publishing of application-specific data? 3. In Current Status, shall we also mention we're rolling out the timeline service next generation as a scalable solution? The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN
[ https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360652#comment-14360652 ] Craig Welch commented on YARN-3306: --- Thanks for your thoughts, [~kasha] The immediate proposal is to begin adding new functionality in a fashion which can be easily shared across scheduler implementations and mixed together in a single cluster. The first case is to support additional container assignment and preemption types to fifo for applications in the capacity scheduler and potentially the fair scheduler using the same code, but this is expected to be expanded to cover queue relationships and potentially other behaviors (limits, etc) over time. The hope is that this allows us to iterate toward a state where the various behaviors of the schedulers can be mixed, matched, and shared across implementations rather than having to try and accomplish this all in one go, and allows us to achieve the benefit of mixing and matching some of the features earlier/along the way. I suspect that at some point we'll hit a critical mass where enough of the functionality has been extracted to sharable components and where we've been able to establish an understanding of how these can be made to compose well, and then we'll take that as an inflection point and go down the path you are suggesting, introduce a new scheduler to house the policies and in that way complete the picture, deprecating the others. That's by no means the only possible conclusion, but it seems to be a good and/or likely one. [Umbrella] Proposing per-queue Policy driven scheduling in YARN --- Key: YARN-3306 URL: https://issues.apache.org/jira/browse/YARN-3306 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: PerQueuePolicydrivenschedulinginYARN.pdf Scheduling layout in Apache Hadoop YARN today is very coarse grained. This proposal aims at converting today’s rigid scheduling in YARN to a per-queue policy driven architecture. We propose the creation of a common policy framework and implement acommon set of policies that administrators can pick and chose per queue - Make scheduling policies configurable per queue - Initially, we limit ourselves to a new type of scheduling policy that determines the ordering of applications within the leaf queue - In the near future, we will also pursue parent queue level policies and potential algorithm reuse through a separate type of policies that control resource limits per queue, user, application etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360650#comment-14360650 ] Naganarasimha G R commented on YARN-2854: - Thanks [~zjshen] for the review For the 3rd point i had mentioned the same in these words : {{In subsequent releases we will be rolling out next generation timeline service which is scalable and reliable }} so is it that you wanted the wording like you mentioned or you had missed what i had mentioned ? Others will correct in a while ... The document about timeline service and generic service needs to be updated --- Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R Priority: Critical Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, YARN-2854.20150311-1.patch, timeline_structure.jpg -- This message was sent by Atlassian JIRA (v6.3.4#6332)