[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308817#comment-14308817 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-trunk-Commit #7038 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7038/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt TestLocalResourcesTrackerImpl.testLocalResourceCache often failed - Key: YARN-1537 URL: https://issues.apache.org/jira/browse/YARN-1537 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1537.1.patch Here is the error log {code} Results : Failed tests: TestLocalResourcesTrackerImpl.testLocalResourceCache:351 Wanted but not invoked: eventHandler.handle( isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309650#comment-14309650 ] Jian He commented on YARN-3021: --- bq. the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config) If this is the case, the assumption here is problematic, why would I request a token from B but let untrusted 3rd party A renew my token in the first place? YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309752#comment-14309752 ] Hudson commented on YARN-2694: -- FAILURE: Integrated in Hadoop-trunk-Commit #7042 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7042/]) YARN-2694. Ensure only single node label specified in ResourceRequest. Contributed by Wangda Tan (jianhe: rev c1957fef29b07fea70938e971b30532a1e131fd0) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY --- Key: YARN-2694 URL: https://issues.apache.org/jira/browse/YARN-2694 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, YARN-2694-20150205-3.patch Currently, node label expression supporting in capacity scheduler is partial completed. Now node label expression specified in Resource Request will only respected when it specified at ANY level. And a ResourceRequest/host with multiple node labels will make user limit, etc. computation becomes more tricky. Now we need temporarily disable them, changes include, - AMRMClient - ApplicationMasterService - RMAdminCLI - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY
[ https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2694: -- Target Version/s: 2.7.0 (was: 2.6.0) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY --- Key: YARN-2694 URL: https://issues.apache.org/jira/browse/YARN-2694 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.7.0 Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, YARN-2694-20150205-3.patch Currently, node label expression supporting in capacity scheduler is partial completed. Now node label expression specified in Resource Request will only respected when it specified at ANY level. And a ResourceRequest/host with multiple node labels will make user limit, etc. computation becomes more tricky. Now we need temporarily disable them, changes include, - AMRMClient - ApplicationMasterService - RMAdminCLI - CommonNodeLabelsManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3126) FairScheduler: queue's usedResource is always more than the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309865#comment-14309865 ] Wei Yan commented on YARN-3126: --- [~Xia Hu], I checked the latest trunk version. The problem is still there. Could u rebase a patch for the trunk? Normally we fix the problem in trunk, instead of previous released version. And we may need to get YARN-2083 committed firstly. Hey, [~kasha], do u have time look YARN-2083? FairScheduler: queue's usedResource is always more than the maxResource limit - Key: YARN-3126 URL: https://issues.apache.org/jira/browse/YARN-3126 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.3.0 Environment: hadoop2.3.0. fair scheduler. spark 1.1.0. Reporter: Xia Hu Labels: assignContainer, fairscheduler, resources Attachments: resourcelimit.patch When submitting spark application(both spark-on-yarn-cluster and spark-on-yarn-cleint model), the queue's usedResources assigned by fairscheduler always can be more than the queue's maxResources limit. And by reading codes of fairscheduler, I suppose this issue happened because of ignore to check the request resources when assign Container. Here is the detail: 1. choose a queue. In this process, it will check if queue's usedResource is bigger than its max, with assignContainerPreCheck. 2. then choose a app in the certain queue. 3. then choose a container. And here is the question, there is no check whether this container would make the queue sources over its max limit. If a queue's usedResource is 13G, the maxResource limit is 16G, then a container which asking for 4G resources may be assigned successful. This problem will always happen in spark application, cause we can ask for different container resources in different applications. By the way, I have already use the patch from YARN-2083. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3120) YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good.
[ https://issues.apache.org/jira/browse/YARN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309867#comment-14309867 ] vaidhyanathan commented on YARN-3120: - Hi Varun, Thanks for responding. I started running the yarn cmd files by running as administrator and it worked . Also i opened the command prompt and ran it in the administrator mode. The word count example worked fine for the first time but now im facing a different issue , When i run it now with the earlier setup , job doesnt proceed after this step '15/02/06 15:38:26 INFO mapreduce.Job: Running job: job_1423255041751_0001' and when i check the consolde the status is 'Accepted' and the final status is 'Undefined' YarnException on windows + org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local dirnm-local-dir, which was marked as good. --- Key: YARN-3120 URL: https://issues.apache.org/jira/browse/YARN-3120 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Environment: Windows 8 , Hadoop 2.6.0 Reporter: vaidhyanathan Hi, I tried to follow the instructiosn in http://wiki.apache.org/hadoop/Hadoop2OnWindows and have setup hadoop-2.6.0.jar in my windows system. I was able to start everything properly but when i try to run the job wordcount as given in the above URL , the job fails with the below exception . 15/01/30 12:56:09 INFO localizer.ResourceLocalizationService: Localizer failed org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to setup local di r /tmp/hadoop-haremangala/nm-local-dir, which was marked as good. at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService. java:1372) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService.access$900(ResourceLocalizationService.java:137) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer. ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java :1085) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309612#comment-14309612 ] Sangjin Lee commented on YARN-3041: --- [~rkanter], [~Naganarasimha], IMO it might make sense to define all YARN system entities as explicit types. It would include flow runs, YARN apps, app attempts, and containers. They have well-defined meaning and relationship, so it seems natural to me? Thoughts? create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309703#comment-14309703 ] Yongjun Zhang commented on YARN-3021: - Hi [~vinodkv] and [~jianhe], Thank you so much for review and commenting! I will try to respond to part of your comments here and keep looking into the rest. {quote} RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address. This way, we don't need an explicit API in the submission context. {quote} Seems regardless of this jira, we could have do the above change, right? any catch? {quote} Apologies for going back and forth on this one. {quote} I appreciate the insight you provided, and we are trying to figure out the best solution together. All the points you provided are reasonable, so absolutely no need for apologies here. {quote} Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely {quote} I did test in a secure env and it worked. Would you please elaborate? {quote} I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right? {quote} I think that's the case. The problem is that there is no trust between A and B. So common should be the one to renew the token. Thanks. YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-281. - Resolution: Won't Fix Release Note: I think this may not need since we already have tests in TestSchedulerUitls, it will verify minimum/maximum resource normalization/verification. And SchedulerUtil runs before scheduler can see such resource requests. Resolved it as won't fix. Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309909#comment-14309909 ] Chris Douglas commented on YARN-3100: - Agreed; definitely a separate JIRA. As state is copied from the old queues, some of the methods called in {{CSQueueUtils}} throw exceptions, similar to the case you found in {{LeafQueue}}. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309816#comment-14309816 ] Hadoop QA commented on YARN-3144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697097/YARN-3144.4.patch against trunk revision eaab959. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6537//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6537//console This message is automatically generated. Configuration for making delegation token failures to timeline server not-fatal --- Key: YARN-3144 URL: https://issues.apache.org/jira/browse/YARN-3144 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, YARN-3144.4.patch Posting events to the timeline server is best-effort. However, getting the delegation tokens from the timeline server will kill the job. This patch adds a configuration to make get delegation token operations best-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309886#comment-14309886 ] Jason Lowe commented on YARN-3144: -- Committing this. The test failures appear to be unrelated, and they both pass for me locally with the patch applied. Configuration for making delegation token failures to timeline server not-fatal --- Key: YARN-3144 URL: https://issues.apache.org/jira/browse/YARN-3144 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, YARN-3144.4.patch Posting events to the timeline server is best-effort. However, getting the delegation tokens from the timeline server will kill the job. This patch adds a configuration to make get delegation token operations best-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1142) MiniYARNCluster web ui does not work properly
[ https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309663#comment-14309663 ] Sangjin Lee commented on YARN-1142: --- Some more info on this at https://issues.apache.org/jira/browse/YARN-3087?focusedCommentId=14307614page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14307614 MiniYARNCluster web ui does not work properly - Key: YARN-1142 URL: https://issues.apache.org/jira/browse/YARN-1142 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Fix For: 2.7.0 When going to the RM http port, the NM web ui is displayed. It seems there is a singleton somewhere that breaks things when RM NMs run in the same process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309920#comment-14309920 ] Jason Lowe commented on YARN-2809: -- +1 lgtm. Will commit this early next week if there are no objections. Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309639#comment-14309639 ] Jian He commented on YARN-3021: --- bq. Explicitly have an external renewer system that has the right permissions to renew these tokens. I think this is the correct long-term solution. RM today happens to be the renewer. But we need a central renewer component so that we can do cross-cluster renewals. bq. RM can simply inspect the incoming renewer specified in the token and skip renewing those tokens if the renewer doesn't match it's own address I think in this case, the renewer specified in the token is the same as the RM. IIUC, the JobClient will request the token from B cluster, but still specify the renewer as the A cluster RM (via the A cluster local config), am I right? YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3153) Capacity Scheduler max AM resource percentage is mis-used as ratio
Wangda Tan created YARN-3153: Summary: Capacity Scheduler max AM resource percentage is mis-used as ratio Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309973#comment-14309973 ] Xuan Gong commented on YARN-3154: - We can add a parameter in logAggregationContext and indicate whether this app is LRS app. Based on this flag, the NM can decide whether it need to upload the partial logs for this app Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310023#comment-14310023 ] Jian He commented on YARN-3153: --- As the [doc|http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] already explicitly mentions specified as float, to keep it compatible, we may choose to do 1) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3100: -- Issue Type: Improvement (was: Bug) Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310026#comment-14310026 ] Zhijie Shen commented on YARN-3041: --- bq. IMO it might make sense to define all YARN system entities as explicit types Make sense to me. create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310033#comment-14310033 ] Hadoop QA commented on YARN-1126: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697127/YARN-905-addendum.patch against trunk revision 5c79439. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6540//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6540//console This message is automatically generated. Add validation of users input nodes-states options to nodes CLI --- Key: YARN-1126 URL: https://issues.apache.org/jira/browse/YARN-1126 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-905-addendum.patch Follow the discussion in YARN-905. (1) case-insensitive checks for all. (2) validation of users input, exit with non-zero code and print all valid states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310056#comment-14310056 ] Zhijie Shen commented on YARN-2928: --- bq. A single tez application can run multiple different Hive queries submitted by different users. In this use case, who is the user of the TEZ application? This may affect the data mode and the parent-child relationship (cluster-user-flow-flow run-application). bq. Where does the current implementation's otherInfo and primaryFilters fit in? metadata aims to store the same thing as otherInfo, but I didn't want to be called otherinfo because it's no longer the other info than primaryFilters. When making the new schema, I'm looking for the option to have the entity indexed, but don't need to explicitly specify what is the primaryFilters, which makes trouble and bugs when updating the entity before. bq. What are the main differences between meta-data and configuration? It may be combined, as I consider both are key-value pairs, but I distinguish them explicitly for better usage. Or is there any special access pattern for config? bq. If there is a hierarchy of objects, will there be support to listen to or retrieve all events for a given tree by providing a root node? We may probably run adhoc query to get the events of all applications of a workflow. bq. What use are events? Will there be a streaming API available to listen to all events based on some search criteria? bq. In certain cases, it might be required to mine a specific job's data by exporting contents out of ATS. They sound to be interesting features, but we may not able to accommodate them within Hadoop 2.8 timeline. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309974#comment-14309974 ] Wangda Tan commented on YARN-3153: -- We have 3 options basically, 1) Keep the config name (...percentage) and continue use it as ratio, add additional checking for this to make sure it fit in range \[0,1\] 2) Keep the config name. Use it as percentage, this need update yarn-default as well. This will have some impacts on existing deployments if they upgrade. 3) Change the config name to (...ratio), this will be a in-compatible change. Thoughts? [~vinodkv], [~jianhe] Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310032#comment-14310032 ] Wangda Tan commented on YARN-3153: -- Thanks for your feedbacks, I agree to do 1) first. I think deprecating+change-name is not so graceful enough, user will get confused when he found one option deprecated but system suggest to use a very similar one. Will upload a patch for #1 shortly. Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method
Li Lu created YARN-3155: --- Summary: Refactor the exception handling code for TimelineClientImpl's retryOn method Key: YARN-3155 URL: https://issues.apache.org/jira/browse/YARN-3155 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Minor Since we switched to Java 1.7, the exception handling code for the retryOn method can be merged into one statement block, instead of the current two, to avoid repeated code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3156) Allow RM timeline client renewDelegation exceptions to be non-fatal
Jonathan Eagles created YARN-3156: - Summary: Allow RM timeline client renewDelegation exceptions to be non-fatal Key: YARN-3156 URL: https://issues.apache.org/jira/browse/YARN-3156 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles This is a follow-up to YARN-3144. In addition to YarnClientImpl, delegation token renew may also fail after client has successfully retrieved delegation token. This jira is to allow the exception generated in TimelineDelegationTokenIdentifier to be non-fatal if the RM has configured the yarn.timeline-service.client.best-effort flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.prelim.patch Preliminary patch to calculate CPU VCores used for ProcfsBasedProcessTree based on proc/pid/stat values. Remaining work is WindowsProcessTree calculation and unit tests. Tested by using the main method for ProcfsBasedProcessTree Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309986#comment-14309986 ] Jason Lowe commented on YARN-3154: -- Note that even LRS apps have issues if they don't do their own log rolling. If I remember correctly, stdout and stderr files are setup by the container executor, and we'll have partial logs uploaded then deleted from the local filesystem, losing any subsequent logs to these files or any other files that aren't explicitly log rolled and filtered via a log aggregation context. IMHO we need to make sure we do _not_ delete anything for a running app _unless_ it has a log aggregation context filter to tell us what is safe to upload and delete. Without that information, we cannot tell if a log file is live and therefore going to be deleted too early. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309949#comment-14309949 ] Jason Lowe commented on YARN-3143: -- Thanks for the review, Kihwal! Committing this. RM Apps REST API can return NPE or entries missing id and other fields -- Key: YARN-3143 URL: https://issues.apache.org/jira/browse/YARN-3143 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.2 Reporter: Kendall Thrapp Assignee: Jason Lowe Attachments: YARN-3143.001.patch I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: {code} http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED {code} JSON Response was: {code} {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}} {code} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {code} {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0} {code} Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310042#comment-14310042 ] Vinod Kumar Vavilapalli commented on YARN-3154: --- Does having two separate notions work? - Today's LogAggregationContext's include/exclude patterns for the app to indicate which log files need to be aggregated explicitly at app finish. This works for regular apps. - A new include/exclude pattern for app to indicate which log files need to be aggregated in a rolling fashion. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310049#comment-14310049 ] Wangda Tan commented on YARN-3153: -- Good suggestion, I think we can deprecate the precent one, make sure its value within \[0, 1\], and use a ratio/factor as the new option name. Sounds good? Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-281: Release Note: (was: I think this may not need since we already have tests in TestSchedulerUitls, it will verify minimum/maximum resource normalization/verification. And SchedulerUtil runs before scheduler can see such resource requests. Resolved it as won't fix.) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310099#comment-14310099 ] Wangda Tan commented on YARN-281: - Just missed putting comment to release note, cleaned release note. Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
[ https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310100#comment-14310100 ] Wangda Tan commented on YARN-281: - I think this may not need since we already have tests in TestSchedulerUitls, it will verify minimum/maximum resource normalization/verification. And SchedulerUtil runs before scheduler can see such resource requests. Resolved it as won't fix. Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits - Key: YARN-281 URL: https://issues.apache.org/jira/browse/YARN-281 Project: Hadoop YARN Issue Type: Test Components: scheduler Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Wangda Tan Labels: test We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test to prevent regressions of any kind on such limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3089) LinuxContainerExecutor does not handle file arguments to deleteAsUser
[ https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309952#comment-14309952 ] Vinod Kumar Vavilapalli commented on YARN-3089: --- bq. Currently, even we are running a MR job, it will upload the partial logs which does not sound right. And we need to fix it. Wow, this is a huge blocker. We should fix it in 2.6.1. [~xgong], can you please file a ticket and link it here? Tx. LinuxContainerExecutor does not handle file arguments to deleteAsUser - Key: YARN-3089 URL: https://issues.apache.org/jira/browse/YARN-3089 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Assignee: Eric Payne Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt, YARN-3089.v3.txt YARN-2468 added the deletion of individual logs that are aggregated, but this fails to delete log files when the LCE is being used. The LCE native executable assumes the paths being passed are paths and the delete fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2990) FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests
[ https://issues.apache.org/jira/browse/YARN-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310034#comment-14310034 ] Sandy Ryza commented on YARN-2990: -- +1. Sorry for the delay in getting to this. FairScheduler's delay-scheduling always waits for node-local and rack-local delays, even for off-rack-only requests --- Key: YARN-2990 URL: https://issues.apache.org/jira/browse/YARN-2990 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2990-0.patch, yarn-2990-1.patch, yarn-2990-2.patch, yarn-2990-test.patch Looking at the FairScheduler, it appears the node/rack locality delays are used for all requests, even those that are only off-rack. More details in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310055#comment-14310055 ] Hudson commented on YARN-3143: -- FAILURE: Integrated in Hadoop-trunk-Commit #7045 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7045/]) YARN-3143. RM Apps REST API can return NPE or entries missing id and other fields. Contributed by Jason Lowe (jlowe: rev da2fb2bc46bddf42d79c6d7664cbf0311973709e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java RM Apps REST API can return NPE or entries missing id and other fields -- Key: YARN-3143 URL: https://issues.apache.org/jira/browse/YARN-3143 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.2 Reporter: Kendall Thrapp Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-3143.001.patch I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: {code} http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED {code} JSON Response was: {code} {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}} {code} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {code} {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0} {code} Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310114#comment-14310114 ] Hitesh Shah commented on YARN-2928: --- bq. In this use case, who is the user of the TEZ application? This may affect the data mode and the parent-child relationship (cluster-user-flow-flow run-application). When you say user, what does it really imply? User a can submit a hive query. A tez application running as user hive can execute the query submitted by user a using a's delegation tokens. With proxy users and potential use of delegation tokens, which user should be used? bq. metadata aims to store the same thing as otherInfo, ... primaryFilters Seems like a good option. What form of search will be supported? In most cases, values will unlikely be primitive types but deep nested structures. Will you support all forms of search on all objects? bq. They sound to be interesting features, .. My point related to events was not about a new interesting feature but to generally understand what use case is meant to be solved by events and how should an application developer use events? bq. We may probably run adhoc query to get the events of all applications of a workflow. How is a workflow defined when an entity has 2 parents? Considering the tez-hive example, do you agree that both a Hive Query and a Tez application are workflows and share some entities? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310136#comment-14310136 ] Kendall Thrapp commented on YARN-3143: -- Thanks [~jlowe] for debugging and the super quick patch and thanks [~eepayne] and [~kihwal] for reviewing. RM Apps REST API can return NPE or entries missing id and other fields -- Key: YARN-3143 URL: https://issues.apache.org/jira/browse/YARN-3143 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.2 Reporter: Kendall Thrapp Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-3143.001.patch I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: {code} http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED {code} JSON Response was: {code} {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}} {code} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {code} {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0} {code} Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.prelim.patch [ ]# stress -c 3 [1] 1778 [ ]# stress: info: [1778] dispatching hogs: 3 cpu, 0 io, 0 vm, 0 hdd [ ]# top -n 1 -p 1778 -p 1779 -p 1780 -p 1781 | grep stress 1779 root 20 0 6516 192 100 R 99.8 0.0 1:35.90 stress 1780 root 20 0 6516 192 100 R 99.8 0.0 1:36.04 stress 1781 root 20 0 6516 192 100 R 99.8 0.0 1:35.87 stress 1778 root 20 0 6516 556 468 S 0.0 0.0 0:00.00 stress [ ]# java org.apache.hadoop.yarn.util.ProcfsBasedProcessTree 1779 Number of processors 4 Creating ProcfsBasedProcessTree for process 1779 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 1779 1778 1778 595 (stress) 59492 7 6672384 48 stress -c 3 Get cpu usage -1.0 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 1779 1778 1778 595 (stress) 59553 7 6672384 48 stress -c 3 Get cpu usage 24.091627 [ ]# java org.apache.hadoop.yarn.util.ProcfsBasedProcessTree 1778 Number of processors 4 Creating ProcfsBasedProcessTree for process 1778 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 1779 1778 1778 595 (stress) 60692 8 6672384 48 stress -c 3 |- 1781 1778 1778 595 (stress) 60741 6 6672384 48 stress -c 3 |- 1780 1778 1778 595 (stress) 60729 5 6672384 48 stress -c 3 |- 1778 628 1778 595 (stress) 0 0 6672384 139 stress -c 3 Get cpu usage -1.0 |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 1779 1778 1778 595 (stress) 60750 8 6672384 48 stress -c 3 |- 1781 1778 1778 595 (stress) 60801 6 6672384 48 stress -c 3 |- 1780 1778 1778 595 (stress) 60786 5 6672384 48 stress -c 3 |- 1778 628 1778 595 (stress) 0 0 6672384 139 stress -c 3 Get cpu usage 72.553894 Metrics for container's actual CPU usage Key: YARN-3122 URL: https://issues.apache.org/jira/browse/YARN-3122 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-3122.prelim.patch, YARN-3122.prelim.patch It would be nice to capture resource usage per container, for a variety of reasons. This JIRA is to track CPU usage. YARN-2965 tracks the resource usage on the node, and the two implementations should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2796) deprecate sbin/*.sh
[ https://issues.apache.org/jira/browse/YARN-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310246#comment-14310246 ] Hadoop QA commented on YARN-2796: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697160/YARN-2796-00.patch against trunk revision da2fb2b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6541//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6541//console This message is automatically generated. deprecate sbin/*.sh --- Key: YARN-2796 URL: https://issues.apache.org/jira/browse/YARN-2796 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Attachments: YARN-2796-00.patch We should deprecate mark all yarn sbin/*.sh commands (except for start and stop) as deprecated in trunk so that they may be removed in a future release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310265#comment-14310265 ] Hadoop QA commented on YARN-2246: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697162/YARN-2246.patch against trunk revision da2fb2b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6542//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6542//console This message is automatically generated. Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API
[ https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310281#comment-14310281 ] Hadoop QA commented on YARN-1050: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656239/YARN-1050-3.patch against trunk revision da2fb2b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6543//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6543//console This message is automatically generated. Document the Fair Scheduler REST API Key: YARN-1050 URL: https://issues.apache.org/jira/browse/YARN-1050 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Sandy Ryza Assignee: Kenji Kikushima Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch The documentation should be placed here along with the Capacity Scheduler documentation: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
[ https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310278#comment-14310278 ] Xuan Gong commented on YARN-3154: - bq. Does having two separate notions work? This should work. But require the API changes. Should not upload partial logs for MR jobs or other short-running' applications - Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310242#comment-14310242 ] Sangjin Lee commented on YARN-2928: --- [~hitesh], continuing that discussion, {quote} [~vinodkv] Should have probably added more context from the design doc: We assume that the failure semantics of the ATS writer companion is the same as the AM. If the ATS writer companion fails for any reason, we try to bring it back up up to a specified number of times. If the maximum retries are exhausted, we consider it a fatal failure, and fail the application. {quote} Yes, I definitely could add more color to that point. I'm going to update the design doc as there are a number of clarifications made. Hopefully some time next week. In the per-app timeline aggregator (a.k.a. ATS writer companion) model, it is a special container. And we need to be able to allocate both the timeline aggregator and the AM or neither. Also, we do want to be able to co-locate the AM and the aggregator on the same node. Then RM needs to negotiate that combined capacity atomically. In other words, we don't want to have a situation where we were able to allocate ATS but not AM, or vice versa. If AM needs 2 G, and the timeline aggregator needs 1 G, then this pair needs to go to a node on which 3 G can be allocated at that time. In terms of the failure scenarios, we may need to hash out some more details. Since allocation is considered as a pair, it is also natural to consider their failure semantics in the same manner. But a deeper question is, if the AM came up but the timeline aggregator didn't come up (for resource reasons or otherwise), do we consider that an acceptable situation? If the timeline aggregator for that app cannot come up, should that be considered fatal? Or, if apps are running but they're not logging critical lifecycle events, etc. because the timeline aggregator went down, do we consider that situation acceptable? The discussion was that it is probably not acceptable as if it is a common occurrence, it would leave a large hole in the collected timeline data and the overall value of the timeline data goes down significantly. That said, this point is deferred somewhat because initially we're starting out with a per-node aggregator option. The per-node aggregator option somewhat sidesteps (but not completely) this issue. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3100: -- Attachment: (was: YARN-3100.2.patch) Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310260#comment-14310260 ] Zhijie Shen commented on YARN-2246: --- bq. For running applications 'res' needs to be appended to 'trackingUri' because it is trying to load the files like I see. But do you know why we use proxyLink for running app instead of redirect the request? Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived
[ https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310328#comment-14310328 ] Carlo Curino commented on YARN-1039: Tossing some fire back on duration. I read your concerns of applications ability to provide good values, however, I'd rather have the app providing their best duration estimate (and the framework rounding it or bucketing it), than the app providing a coarse grained tag-based version in the first place. Changing cluster configurations and policies might turn what used to be a short task into something not that short, which we want to handle differently and so on. In a sense asking for duration prevent us to rely on what application will judge as short/long etc.. As another example, based on whatever mechanisms for log aggregation we will have in the future, we can change our mind about what are the cut-points for short/long etc.. For example, because a new technique makes it very cheap and we want to provide much more frequent feedback to users. Bottom line, I find duration a rather neutral thing to ask, vs something which is more opinion-based, and corner cases like never-ending services are easily handled with -1 or +inf values. I also agree that there are many other use cases for tags, that emerged in the discussion, which have a clear value and are by no means covered by duration. Add parameter for YARN resource requests to indicate long lived - Key: YARN-1039 URL: https://issues.apache.org/jira/browse/YARN-1039 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 3.0.0, 2.1.1-beta Reporter: Steve Loughran Assignee: Craig Welch Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch A container request could support a new parameter long-lived. This could be used by a scheduler that would know not to host the service on a transient (cloud: spot priced) node. Schedulers could also decide whether or not to allocate multiple long-lived containers on the same node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-3021: -- Summary: YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp (was: YARN's delegation-token handling disallows certain trust setups to operate properly) YARN's delegation-token handling disallows certain trust setups to operate properly over DistCp --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly
[ https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310570#comment-14310570 ] Harsh J commented on YARN-3021: --- [~vinodkv], Many thanks for the response here! bq. Though the patch unblocks the jobs in the short term, it seems like long term this is still bad. I agree in that it does not resolve the problem. The goal we're seeking is also short-term, in that of bringing back a behaviour that got allowed on MR1, in MR2 - even though both end up facing the same issue. The longer term approach sounds like the most optimal thing to do for proper resolution, but given some users are getting blocked by this behaviour change I'd like to know if there'll be any objections in adding the current approach as an interim-fix (the doc for the property does/will claim it disables several necessary features of the job), and file subsequent JIRAs for implementing the standalone renewer? bq. Irrespective of how we decide to skip tokens, the way the patch is skipping renewal will not work. In secure mode, DelegationTokenRenewer drives the app state machine. So if you skip adding the app itself to DTR, the app will be completely stuck. In our simple tests the app did run through successfully with such an approach, but there was multiple factors we did not test for (app recovery, task failures, etc. which could be impacted). Would it be better if we added in a morphed DelegationTokenRenewer (which does NOP as part of actual renewal logic), instead of skipping adding in the renewer completely? YARN's delegation-token handling disallows certain trust setups to operate properly --- Key: YARN-3021 URL: https://issues.apache.org/jira/browse/YARN-3021 Project: Hadoop YARN Issue Type: Bug Components: security Affects Versions: 2.3.0 Reporter: Harsh J Attachments: YARN-3021.001.patch, YARN-3021.002.patch, YARN-3021.003.patch, YARN-3021.patch Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN clusters. Now if one logs in with a COMMON credential, and runs a job on A's YARN that needs to access B's HDFS (such as a DistCp), the operation fails in the RM, as it attempts a renewDelegationToken(…) synchronously during application submission (to validate the managed token before it adds it to a scheduler for automatic renewal). The call obviously fails cause B realm will not trust A's credentials (here, the RM's principal is the renewer). In the 1.x JobTracker the same call is present, but it is done asynchronously and once the renewal attempt failed we simply ceased to schedule any further attempts of renewals, rather than fail the job immediately. We should change the logic such that we attempt the renewal but go easy on the failure and skip the scheduling alone, rather than bubble back an error to the client, failing the app submission. This way the old behaviour is retained. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310250#comment-14310250 ] Sangjin Lee commented on YARN-2928: --- [~rajesh.balamohan]: bq. In certain cases, it might be required to mine a specific job's data by exporting contents out of ATS. Would there be any support for an export tool to get data out of ATS? Other than access to the REST endpoint, one might be able to query the backing storage directly. And we're keeping that in mind. But that would depend on the backing storage's capability. For example, for HBase, we could provide phoenix schema on which one can do offline queries pretty efficiently. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310315#comment-14310315 ] Zhijie Shen commented on YARN-3100: --- Thanks for the last patch. It looks good to me. Pending the commit to give Chris some time to feedback. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3154) Should not upload partial logs for MR jobs or other short-running' applications
Xuan Gong created YARN-3154: --- Summary: Should not upload partial logs for MR jobs or other short-running' applications Key: YARN-3154 URL: https://issues.apache.org/jira/browse/YARN-3154 Project: Hadoop YARN Issue Type: Sub-task Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Currently, if we are running a MR job, and we do not set the log interval properly, we will have their partial logs uploaded and then removed from the local filesystem which is not right. We only upload the partial logs for LRS applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3143) RM Apps REST API can return NPE or entries missing id and other fields
[ https://issues.apache.org/jira/browse/YARN-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310004#comment-14310004 ] Jason Lowe commented on YARN-3143: -- My apologies, I also meant to thank Eric for the original review! RM Apps REST API can return NPE or entries missing id and other fields -- Key: YARN-3143 URL: https://issues.apache.org/jira/browse/YARN-3143 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.5.2 Reporter: Kendall Thrapp Assignee: Jason Lowe Fix For: 2.7.0 Attachments: YARN-3143.001.patch I'm seeing intermittent null pointer exceptions being returned by the YARN Apps REST API. For example: {code} http://{cluster}:{port}/ws/v1/cluster/apps?finalStatus=UNDEFINED {code} JSON Response was: {code} {RemoteException:{exception:NullPointerException,javaClassName:java.lang.NullPointerException}} {code} At a glance appears to be only when we query for unfinished apps (i.e. finalStatus=UNDEFINED). Possibly related, when I do get back a list of apps, sometimes one or more of the apps will be missing most of the fields, like id, name, user, etc., and the fields that are present all have zero for the value. For example: {code} {progress:0.0,clusterId:0,applicationTags:,startedTime:0,finishedTime:0,elapsedTime:0,allocatedMB:0,allocatedVCores:0,runningContainers:0,preemptedResourceMB:0,preemptedResourceVCores:0,numNonAMContainerPreempted:0,numAMContainerPreempted:0} {code} Let me know if there's any other information I can provide to help debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310029#comment-14310029 ] Vinod Kumar Vavilapalli commented on YARN-3153: --- This is a hard one to solve. +1 for option (1) for now. In addition to that, we can chose to deprecate this configuration completely and introduce a new one with the right semantics but with a name-change: say yarn.scheduler.capacity.maximum-am-resources-percentage. Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3155) Refactor the exception handling code for TimelineClientImpl's retryOn method
[ https://issues.apache.org/jira/browse/YARN-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3155: Attachment: YARN-3155-020615.patch In this patch I refactored the catch blocks in TimelineClientConnectionRetry's retryOn method. I used the Java 1.7's new catch block to eliminate the repeated exception handling code for two types of exceptions. Refactor the exception handling code for TimelineClientImpl's retryOn method Key: YARN-3155 URL: https://issues.apache.org/jira/browse/YARN-3155 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Minor Labels: refactoring Attachments: YARN-3155-020615.patch Since we switched to Java 1.7, the exception handling code for the retryOn method can be merged into one statement block, instead of the current two, to avoid repeated code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3152: --- Assignee: Naganarasimha G R Missing hadoop exclude file fails RMs in HA --- Key: YARN-3152 URL: https://issues.apache.org/jira/browse/YARN-3152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: Debian 7 Reporter: Neill Lima Assignee: Naganarasimha G R I have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=1 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3100: -- Attachment: YARN-3100.2.patch Uploaded a new patch: - rename the DefaultYarnAuthorizer to ConfiguredYarnAuthorizer - Added private/unstable annotations to the newly added classes. - Move setPermissions on the authorizer after queue init/re-init is done. Addressed other comments from Zhijie and Chris too. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310286#comment-14310286 ] Hitesh Shah commented on YARN-2928: --- Also, what if an application does not want to write data to ATS or does not care if the data does not reach ATS? Will there now be more flags introducing an application submission to tell the RM that it does or does not need the ATS service so as to ensure that its app does not fail? Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1
[ https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310285#comment-14310285 ] Sangjin Lee commented on YARN-2928: --- bq. When you say user, what does it really imply? User a can submit a hive query. A tez application running as user hive can execute the query submitted by user a using a's delegation tokens. With proxy users and potential use of delegation tokens, which user should be used? That's something we haven't fully considered. IMO the user is used for resource attribution (e.g. chargeback) and also for access control. We'll need to sort out this scenario (probably not for the first cut however). bq. What are the main differences between meta-data and configuration? One could argue they are not different. However, from a user's perspective (especially MR jobs), the configuration has a strong meaning. It might be good to call out configuration separately from other metadata. Application Timeline Server (ATS) next gen: phase 1 --- Key: YARN-2928 URL: https://issues.apache.org/jira/browse/YARN-2928 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Sangjin Lee Priority: Critical Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal v1.pdf We have the application timeline server implemented in yarn per YARN-1530 and YARN-321. Although it is a great feature, we have recognized several critical issues and features that need to be addressed. This JIRA proposes the design and implementation changes to address those. This is phase 1 of this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310346#comment-14310346 ] Hadoop QA commented on YARN-2348: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668984/YARN-2348.3.patch against trunk revision da2fb2b. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6546//console This message is automatically generated. ResourceManager web UI should display server-side time instead of UTC time -- Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Assignee: Leitao Guo Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310487#comment-14310487 ] Xuan Gong commented on YARN-3152: - In the yarn-site.xml, we do set some value for the yarn.resourcemanager.nodes.exclude-path. Since the file does not exist, we should throw out the exception. When RM starts to transit to active, it automatically calls all the refresh*s. It is by design if any of them fails, we should let RM fail. Missing hadoop exclude file fails RMs in HA --- Key: YARN-3152 URL: https://issues.apache.org/jira/browse/YARN-3152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: Debian 7 Reporter: Neill Lima Assignee: Naganarasimha G R NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=1 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309662#comment-14309662 ] Sangjin Lee commented on YARN-3087: --- Thanks for looking into this [~devaraj.k]! Doesn't sound there is a quick resolution then. :( the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Devaraj K This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3144: -- Attachment: YARN-3144.4.patch No problem, [~jlowe]. Uploaded patch to add the exception message. Configuration for making delegation token failures to timeline server not-fatal --- Key: YARN-3144 URL: https://issues.apache.org/jira/browse/YARN-3144 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch, YARN-3144.4.patch Posting events to the timeline server is best-effort. However, getting the delegation tokens from the timeline server will kill the job. This patch adds a configuration to make get delegation token operations best-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309757#comment-14309757 ] Jian He commented on YARN-3100: --- bq. AbstractCSQueue and CSQueueUtils Maybe I missed something, I think these two are mostly fine. As we create the new queue hierarchy first and then update the old queues. If certain methods fail in these two classes, the new queue creation will fail upfront and so will not update the old queue. Anyway, we can address this separately. Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
[ https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3151: Assignee: Rohith On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Assignee: Rohith Priority: Minor Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://IP:PORT/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(SetString appTypes, EnumSetYarnApplicationState appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308947#comment-14308947 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java TestLocalResourcesTrackerImpl.testLocalResourceCache often failed - Key: YARN-1537 URL: https://issues.apache.org/jira/browse/YARN-1537 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1537.1.patch Here is the error log {code} Results : Failed tests: TestLocalResourcesTrackerImpl.testLocalResourceCache:351 Wanted but not invoked: eventHandler.handle( isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308949#comment-14308949 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308953#comment-14308953 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-1904. Ensure exceptions thrown in ClientRMService ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java Uniform the NotFound messages from ClientRMService and ApplicationHistoryClientService -- Key: YARN-1904 URL: https://issues.apache.org/jira/browse/YARN-1904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-1904.1.patch It's good to make ClientRMService and ApplicationHistoryClientService throw NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
Bibin A Chundatt created YARN-3151: -- Summary: On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Priority: Minor Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://IP:PORT/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(SetString appTypes, EnumSetYarnApplicationState appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308952#comment-14308952 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #96 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/96/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo Key: YARN-3145 URL: https://issues.apache.org/jira/browse/YARN-3145 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-3145.001.patch, YARN-3145.002.patch {code} ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309013#comment-14309013 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo Key: YARN-3145 URL: https://issues.apache.org/jira/browse/YARN-3145 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-3145.001.patch, YARN-3145.002.patch {code} ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309018#comment-14309018 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Typo in message for invalid application id -- Key: YARN-3149 URL: https://issues.apache.org/jira/browse/YARN-3149 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3100) Make YARN authorization pluggable
[ https://issues.apache.org/jira/browse/YARN-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308987#comment-14308987 ] Chris Douglas commented on YARN-3100: - Looking through {{AbstractCSQueue}} and {{CSQueueUtils}}, it looks like there are many misconfigurations that leave queues in an inconsistent state... Make YARN authorization pluggable - Key: YARN-3100 URL: https://issues.apache.org/jira/browse/YARN-3100 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-3100.1.patch, YARN-3100.2.patch The goal is to have YARN acl model pluggable so as to integrate other authorization tool such as Apache Ranger, Sentry. Currently, we have - admin ACL - queue ACL - application ACL - time line domain ACL - service ACL The proposal is to create a YarnAuthorizationProvider interface. Current implementation will be the default implementation. Ranger or Sentry plug-in can implement this interface. Benefit: - Unify the code base. With the default implementation, we can get rid of each specific ACL manager such as AdminAclManager, ApplicationACLsManager, QueueAclsManager etc. - Enable Ranger, Sentry to do authorization for YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309014#comment-14309014 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-1904. Ensure exceptions thrown in ClientRMService ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt Uniform the NotFound messages from ClientRMService and ApplicationHistoryClientService -- Key: YARN-1904 URL: https://issues.apache.org/jira/browse/YARN-1904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-1904.1.patch It's good to make ClientRMService and ApplicationHistoryClientService throw NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309007#comment-14309007 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Yarn-trunk #830 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/830/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java Capacity Scheduler: add a maximum-allocation-mb setting per queue -- Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.7.0 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, YARN-1582.003.patch We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309195#comment-14309195 ] Hudson commented on YARN-1582: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java Capacity Scheduler: add a maximum-allocation-mb setting per queue -- Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.7.0 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, YARN-1582.003.patch We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309206#comment-14309206 ] Hudson commented on YARN-3149: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Typo in message for invalid application id -- Key: YARN-3149 URL: https://issues.apache.org/jira/browse/YARN-3149 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309180#comment-14309180 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java * hadoop-yarn-project/CHANGES.txt Typo in message for invalid application id -- Key: YARN-3149 URL: https://issues.apache.org/jira/browse/YARN-3149 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309170#comment-14309170 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java Capacity Scheduler: add a maximum-allocation-mb setting per queue -- Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.7.0 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, YARN-1582.003.patch We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309173#comment-14309173 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309177#comment-14309177 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1904. Ensure exceptions thrown in ClientRMService ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt Uniform the NotFound messages from ClientRMService and ApplicationHistoryClientService -- Key: YARN-1904 URL: https://issues.apache.org/jira/browse/YARN-1904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-1904.1.patch It's good to make ClientRMService and ApplicationHistoryClientService throw NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309171#comment-14309171 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #93 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/93/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt TestLocalResourcesTrackerImpl.testLocalResourceCache often failed - Key: YARN-1537 URL: https://issues.apache.org/jira/browse/YARN-1537 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1537.1.patch Here is the error log {code} Results : Failed tests: TestLocalResourcesTrackerImpl.testLocalResourceCache:351 Wanted but not invoked: eventHandler.handle( isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309203#comment-14309203 ] Hudson commented on YARN-1904: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1904. Ensure exceptions thrown in ClientRMService ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/CHANGES.txt Uniform the NotFound messages from ClientRMService and ApplicationHistoryClientService -- Key: YARN-1904 URL: https://issues.apache.org/jira/browse/YARN-1904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-1904.1.patch It's good to make ClientRMService and ApplicationHistoryClientService throw NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309196#comment-14309196 ] Hudson commented on YARN-1537: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt TestLocalResourcesTrackerImpl.testLocalResourceCache often failed - Key: YARN-1537 URL: https://issues.apache.org/jira/browse/YARN-1537 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1537.1.patch Here is the error log {code} Results : Failed tests: TestLocalResourcesTrackerImpl.testLocalResourceCache:351 Wanted but not invoked: eventHandler.handle( isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309199#comment-14309199 ] Hudson commented on YARN-3101: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2028 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2028/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309344#comment-14309344 ] Jason Lowe commented on YARN-3144: -- Thanks for updating the patch. Comments: * The added test now no longer mocks the TimelineClient as it did before? The test requires the timeline client to throw to work properly, and we could accidentally connect to a timeline server. * Nit: Does timelineServicesBestEffort need to be visible anymore? * Nit: Reading the doc string for the property in yarn-default.xml implies it should be true to make timeline operations fatal. Configuration for making delegation token failures to timeline server not-fatal --- Key: YARN-3144 URL: https://issues.apache.org/jira/browse/YARN-3144 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3144.1.patch, YARN-3144.2.patch Posting events to the timeline server is best-effort. However, getting the delegation tokens from the timeline server will kill the job. This patch adds a configuration to make get delegation token operations best-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309343#comment-14309343 ] Hadoop QA commented on YARN-2809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697032/YARN-2809-v2.patch against trunk revision 1425e3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6535//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6535//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6535//console This message is automatically generated. Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-2809-v2.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-2809: - Attachment: YARN-2809-v2.patch upmerge to latest trunk Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Attachments: YARN-2809-v2.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309262#comment-14309262 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Typo in message for invalid application id -- Key: YARN-3149 URL: https://issues.apache.org/jira/browse/YARN-3149 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309258#comment-14309258 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo Key: YARN-3145 URL: https://issues.apache.org/jira/browse/YARN-3145 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-3145.001.patch, YARN-3145.002.patch {code} ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309390#comment-14309390 ] Devaraj K commented on YARN-2246: - [~jlowe], [~zjshen] Thanks for your inputs. [~jlowe], I have started working on this, will provide patch today. Thanks Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 3.0.0, 0.23.11, 2.5.0 Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3144) Configuration for making delegation token failures to timeline server not-fatal
[ https://issues.apache.org/jira/browse/YARN-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309454#comment-14309454 ] Hadoop QA commented on YARN-3144: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697051/YARN-3144.3.patch against trunk revision 1425e3d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6536//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6536//console This message is automatically generated. Configuration for making delegation token failures to timeline server not-fatal --- Key: YARN-3144 URL: https://issues.apache.org/jira/browse/YARN-3144 Project: Hadoop YARN Issue Type: Improvement Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-3144.1.patch, YARN-3144.2.patch, YARN-3144.3.patch Posting events to the timeline server is best-effort. However, getting the delegation tokens from the timeline server will kill the job. This patch adds a configuration to make get delegation token operations best-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309304#comment-14309304 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309301#comment-14309301 ] Hudson commented on YARN-1582: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-1582. Capacity Scheduler: add a maximum-allocation-mb setting per queue. Contributed by Thomas Graves (jlowe: rev 69c8a7f45be5c0aa6787b07f328d74f1e2ba5628) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java Capacity Scheduler: add a maximum-allocation-mb setting per queue -- Key: YARN-1582 URL: https://issues.apache.org/jira/browse/YARN-1582 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 3.0.0, 0.23.10, 2.2.0 Reporter: Thomas Graves Assignee: Thomas Graves Fix For: 2.7.0 Attachments: YARN-1582-branch-0.23.patch, YARN-1582.002.patch, YARN-1582.003.patch We want to allow certain queues to use larger container sizes while limiting other queues to smaller container sizes. Setting it per queue will help prevent abuse, help limit the impact of reservations, and allow changes in the maximum container size to be rolled out more easily. One reason this is needed is more application types are becoming available on yarn and certain applications require more memory to run efficiently. While we want to allow for that we don't want other applications to abuse that and start requesting bigger containers then what they really need. Note that we could have this based on application type, but that might not be totally accurate either since for example you might want to allow certain users on MapReduce to use larger containers, while limiting other users of MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3149) Typo in message for invalid application id
[ https://issues.apache.org/jira/browse/YARN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309311#comment-14309311 ] Hudson commented on YARN-3149: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3149. Fix typo in message for invalid application id. Contributed (xgong: rev b77ff37686e01b7497d3869fbc62789a5b123c0a) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Typo in message for invalid application id -- Key: YARN-3149 URL: https://issues.apache.org/jira/browse/YARN-3149 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Fix For: 2.7.0 Attachments: YARN-3149.patch, YARN-3149.patch, screenshot-1.png Message in console wrong when application id format wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3145) ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo
[ https://issues.apache.org/jira/browse/YARN-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309307#comment-14309307 ] Hudson commented on YARN-3145: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-3145. Fixed ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo. Contributed by Tsuyoshi OZAWA (jianhe: rev 4641196fe02af5cab3d56a9f3c78875c495dbe03) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java ConcurrentModificationException on CapacityScheduler ParentQueue#getQueueUserAclInfo Key: YARN-3145 URL: https://issues.apache.org/jira/browse/YARN-3145 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Tsuyoshi OZAWA Fix For: 2.7.0 Attachments: YARN-3145.001.patch, YARN-3145.002.patch {code} ava.util.ConcurrentModificationException(java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getQueueUserAclInfo(ParentQueue.java:348) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueUserAclInfo(CapacityScheduler.java:850) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:844) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueUserAcls(ApplicationClientProtocolPBServiceImpl.java:250) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:335) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1537) TestLocalResourcesTrackerImpl.testLocalResourceCache often failed
[ https://issues.apache.org/jira/browse/YARN-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309302#comment-14309302 ] Hudson commented on YARN-1537: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2047 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2047/]) YARN-1537. Fix race condition in TestLocalResourcesTrackerImpl.testLocalResourceCache. Contributed by Xuan Gong. (acmurthy: rev 02f154a0016b7321bbe5b09f2da44a9b33797c36) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java * hadoop-yarn-project/CHANGES.txt TestLocalResourcesTrackerImpl.testLocalResourceCache often failed - Key: YARN-1537 URL: https://issues.apache.org/jira/browse/YARN-1537 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 2.2.0 Reporter: Hong Shen Assignee: Xuan Gong Fix For: 2.7.0 Attachments: YARN-1537.1.patch Here is the error log {code} Results : Failed tests: TestLocalResourcesTrackerImpl.testLocalResourceCache:351 Wanted but not invoked: eventHandler.handle( isA(org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerResourceLocalizedEvent) ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl.testLocalResourceCache(TestLocalResourcesTrackerImpl.java:351) However, there were other interactions with this mock: - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) - at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1904) Uniform the XXXXNotFound messages from ClientRMService and ApplicationHistoryClientService
[ https://issues.apache.org/jira/browse/YARN-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309259#comment-14309259 ] Hudson commented on YARN-1904: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-1904. Ensure exceptions thrown in ClientRMService ApplicationHistoryClientService are uniform when application-attempt is not found. Contributed by Zhijie Shen. (acmurthy: rev 18b2507edaac991e3ed68d2f27eb96f6882137b9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt Uniform the NotFound messages from ClientRMService and ApplicationHistoryClientService -- Key: YARN-1904 URL: https://issues.apache.org/jira/browse/YARN-1904 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.7.0 Attachments: YARN-1904.1.patch It's good to make ClientRMService and ApplicationHistoryClientService throw NotFoundException with similar messages -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3101) In Fair Scheduler, fix canceling of reservations for exceeding max share
[ https://issues.apache.org/jira/browse/YARN-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309255#comment-14309255 ] Hudson commented on YARN-3101: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #97 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/97/]) YARN-3101. In Fair Scheduler, fix canceling of reservations for exceeding max share (Anubhav Dhoot via Sandy Ryza) (sandy: rev b6466deac6d5d6344f693144290b46e2bef83a02) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt In Fair Scheduler, fix canceling of reservations for exceeding max share Key: YARN-3101 URL: https://issues.apache.org/jira/browse/YARN-3101 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.7.0 Attachments: YARN-3101-Siqi.v1.patch, YARN-3101-Siqi.v2.patch, YARN-3101.001.patch, YARN-3101.002.patch, YARN-3101.003.patch, YARN-3101.003.patch, YARN-3101.004.patch, YARN-3101.004.patch YARN-2811 added fitInMaxShare to validate reservations on a queue, but did not count it during its calculations. It also had the condition reversed so the test was still passing because both cancelled each other. -- This message was sent by Atlassian JIRA (v6.3.4#6332)