[jira] [Created] (YARN-3175) Consolidate the ResournceManager documentation into one
Allen Wittenauer created YARN-3175: -- Summary: Consolidate the ResournceManager documentation into one Key: YARN-3175 URL: https://issues.apache.org/jira/browse/YARN-3175 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer We really don't need a different document for every individual RM feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users
[ https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316609#comment-14316609 ] Sangjin Lee commented on YARN-2423: --- I'm not sure how realistic it will be not to impact callers of the timeline client between the current ATS and the next gen if that's what you're asking. As such, work we may do on this JIRA would not have a direct implication on what happens on the next gen. How important/critical is it to support this in the 2.7 timeframe? I think the current consensus is not to work on major feature additions on ATS. But I'm not sure how major this would be. Separately, it would be desirable to support Java APIs for reads in the next gen too. TimelineClient should wrap all GET APIs to facilitate Java users Key: YARN-2423 URL: https://issues.apache.org/jira/browse/YARN-2423 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter Attachments: YARN-2423.004.patch, YARN-2423.005.patch, YARN-2423.006.patch, YARN-2423.007.patch, YARN-2423.patch, YARN-2423.patch, YARN-2423.patch TimelineClient provides the Java method to put timeline entities. It's also good to wrap over all GET APIs (both entity and domain), and deserialize the json response into Java POJO objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3124: - Attachment: YARN-3124.4.patch Fixed findbugs warning, attached ver.4 patch Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, YARN-3124.4.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3041) create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316597#comment-14316597 ] Sangjin Lee commented on YARN-3041: --- Thanks for the clarification Robert. create the ATS entity/event API --- Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316353#comment-14316353 ] Hudson commented on YARN-3160: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/]) YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. (Contributed by Chengbing Liu) (junping_du: rev c541a374d88ffed6ee71b0e5d556939ccd2c5159) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/CHANGES.txt Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316352#comment-14316352 ] Hudson commented on YARN-2246: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/]) YARN-2246. Made the proxy tracking URL always be http(s)://proxy addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. (zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316360#comment-14316360 ] Hudson commented on YARN-2809: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/]) YARN-2809. Implement workaround for linux kernel panic when removing cgroup. Contributed by Nathan Roberts (jlowe: rev 3f5431a22fcef7e3eb9aceeefe324e5b7ac84049) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/CHANGES.txt Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316357#comment-14316357 ] Hudson commented on YARN-3090: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2052 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2052/]) YARN-3090. DeletionService can silently ignore deletion task failures. Contributed by Varun Saxena (jlowe: rev 4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/CHANGES.txt DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.
[ https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3173: --- Component/s: scripts start-yarn.sh script can't aware how many RMs to be started. - Key: YARN-3173 URL: https://issues.apache.org/jira/browse/YARN-3173 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: BOB Priority: Minor When config more than one RM, for example in the HA cluster, using the start-yarn.sh script to start yarn cluster,but the cluster only start up with one resourcemanager on the node which start-yarn.sh be executed. I think yarn should sense how many RMs been configured at the beginning, and start them all in start-yarn.sh script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316587#comment-14316587 ] Chris Trezzo commented on YARN-2654: I am willing to change the parameters to .http.address, but it seems as though a large number of parameters in the YARN code have chosen to use webapp.address. I will stay consistent with this, unless there is strong opposition. Any other comments on config parameter naming? Otherwise I will close this as resolved. [~vinodkv] [~sjlee0] [~kasha] Revisit all shared cache config parameters to ensure quality names -- Key: YARN-2654 URL: https://issues.apache.org/jira/browse/YARN-2654 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Priority: Blocker Attachments: shared_cache_config_parameters.txt Revisit all the shared cache config parameters in YarnConfiguration and yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316505#comment-14316505 ] Hudson commented on YARN-3074: -- FAILURE: Integrated in Hadoop-trunk-Commit #7073 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7073/]) YARN-3074. Nodemanager dies when localizer runner tries to write to a full disk. Contributed by Varun Saxena (jlowe: rev b379972ab39551d4b57436a54c0098a63742c7e1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3074) Nodemanager dies when localizer runner tries to write to a full disk
[ https://issues.apache.org/jira/browse/YARN-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316449#comment-14316449 ] Jason Lowe commented on YARN-3074: -- +1 lgtm as well. Committing this. Nodemanager dies when localizer runner tries to write to a full disk Key: YARN-3074 URL: https://issues.apache.org/jira/browse/YARN-3074 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Attachments: YARN-3074.001.patch, YARN-3074.002.patch, YARN-3074.03.patch When a LocalizerRunner tries to write to a full disk it can bring down the nodemanager process. Instead of failing the whole process we should fail only the container and make a best attempt to keep going. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3174) Consolidate the NodeManager documentation into one
Allen Wittenauer created YARN-3174: -- Summary: Consolidate the NodeManager documentation into one Key: YARN-3174 URL: https://issues.apache.org/jira/browse/YARN-3174 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer We really don't need a different document for every individual nodemanager feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316606#comment-14316606 ] Junping Du commented on YARN-914: - bq. I do agree with Vinod that there should minimally be an easy way, CLI or otherwise, for outside scripts driving the decommission to either force it or wait for it to complete. If waiting, there also needs to be a way to either have the wait have a timeout which will force after that point or another method with which to easily kill the containers still on that node. Make sense. Sounds like most of us here make agreement on to go with 2nd approach proposed by Ming and refined by Vinod. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3170) YARN architecture document needs updating
[ https://issues.apache.org/jira/browse/YARN-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316562#comment-14316562 ] Brahma Reddy Battula commented on YARN-3170: Hello [~aw] *I want to update like following* {quote} Apache Hadoop NextGen MapReduce (YARN) {quote} Apache Hadoop Yarn {quote} MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN. {quote} will remove this line {quote} The fundamental idea of MRv2 {quote} will update YARN instead of MRv2. *{color:blue}Please give your inputs{color}* YARN architecture document needs updating - Key: YARN-3170 URL: https://issues.apache.org/jira/browse/YARN-3170 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula The marketing paragraph at the top, NextGen MapReduce, etc are all marketing rather than actual descriptions. It also needs some general updates, esp given it reads as though 0.23 was just released yesterday. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3169) drop the useless yarn overview document
[ https://issues.apache.org/jira/browse/YARN-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316566#comment-14316566 ] Brahma Reddy Battula commented on YARN-3169: Hi [~aw] sorry. this document I did not find in trunk.. can you please tell the location..? Thanks.. drop the useless yarn overview document --- Key: YARN-3169 URL: https://issues.apache.org/jira/browse/YARN-3169 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer Assignee: Brahma Reddy Battula It's pretty superfluous given there is a site index on the left. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3177) Fix the order of the parameters in YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated YARN-3177: --- Attachment: YARN-3177.patch Fix the order of the parameters in YarnConfiguration Key: YARN-3177 URL: https://issues.apache.org/jira/browse/YARN-3177 Project: Hadoop YARN Issue Type: Improvement Reporter: Brahma Reddy Battula Priority: Minor Attachments: YARN-3177.patch *1. keep Process principal and keytab one place..( NM and RM are not placed in order)* {code} public static final String RM_AM_MAX_ATTEMPTS = RM_PREFIX + am.max-attempts; public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2; /** The keytab for the resource manager.*/ public static final String RM_KEYTAB = RM_PREFIX + keytab; /**The kerberos principal to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY = RM_PREFIX + webapp.spnego-principal; /**The kerberos keytab to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY = RM_PREFIX + webapp.spnego-keytab-file; {code} *2.RM webapp adress and port are not in order* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316741#comment-14316741 ] Xuan Gong commented on YARN-3164: - bq. Have tested the same manually as of now and is working fine . Thanks for testing this manually. Could you add the unit test for this ? Maybe add a unit test in TestRMAdminCLI ? rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3177) Fix the order of the parameters in YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316749#comment-14316749 ] Hadoop QA commented on YARN-3177: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698143/YARN-3177.patch against trunk revision e42fc1a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6597//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6597//console This message is automatically generated. Fix the order of the parameters in YarnConfiguration Key: YARN-3177 URL: https://issues.apache.org/jira/browse/YARN-3177 Project: Hadoop YARN Issue Type: Improvement Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Attachments: YARN-3177.patch *1. keep Process principal and keytab one place..( NM and RM are not placed in order)* {code} public static final String RM_AM_MAX_ATTEMPTS = RM_PREFIX + am.max-attempts; public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2; /** The keytab for the resource manager.*/ public static final String RM_KEYTAB = RM_PREFIX + keytab; /**The kerberos principal to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY = RM_PREFIX + webapp.spnego-principal; /**The kerberos keytab to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY = RM_PREFIX + webapp.spnego-keytab-file; {code} *2.RM webapp adress and port are not in order* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3151) On Failover tracking url wrong in application cli for KILLED application
[ https://issues.apache.org/jira/browse/YARN-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316766#comment-14316766 ] Xuan Gong commented on YARN-3151: - bq. - diagnostics.contains(applicationAttempt.getWebProxyBase())); +diagnostics.contains(applicationAttempt.getTrackingUrl())); Any reason why we need to change the test code ? On Failover tracking url wrong in application cli for KILLED application Key: YARN-3151 URL: https://issues.apache.org/jira/browse/YARN-3151 Project: Hadoop YARN Issue Type: Bug Components: client, resourcemanager Affects Versions: 2.6.0 Environment: 2 RM HA Reporter: Bibin A Chundatt Assignee: Rohith Priority: Minor Attachments: 0001-YARN-3151.patch, 0002-YARN-3151.patch, 0002-YARN-3151.patch Run an application and kill the same after starting Check {color:red} ./yarn application -list -appStates KILLED {color} (empty line) {quote} Application-Id Tracking-URL application_1423219262738_0001 http://IP:PORT/cluster/app/application_1423219262738_0001 {quote} Shutdown the active RM1 Check the same command {color:red} ./yarn application -list -appStates KILLED {color} after RM2 is active {quote} Application-Id Tracking-URL application_1423219262738_0001 null {quote} Tracking url for application is shown as null Expected : Same url before failover should be shown ApplicationReport .getOriginalTrackingUrl() is null after failover org.apache.hadoop.yarn.client.cli.ApplicationCLI listApplications(SetString appTypes, EnumSetYarnApplicationState appStates) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316637#comment-14316637 ] Hadoop QA commented on YARN-3124: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698119/YARN-3124.4.patch against trunk revision b94c111. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6595//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6595//console This message is automatically generated. Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, YARN-3124.4.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316636#comment-14316636 ] Devaraj K commented on YARN-3044: - [~Naganarasimha], would you mind if I take it up, if you haven't started working on this already? implement RM writing app lifecycle events to ATS Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316848#comment-14316848 ] Naganarasimha G R commented on YARN-3044: - Hi [~devaraj.k], Thanks for showing interest in this jira, but this jira is work in continuation with 3034 and i would like to work on them . implement RM writing app lifecycle events to ATS Key: YARN-3044 URL: https://issues.apache.org/jira/browse/YARN-3044 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-914: Attachment: Gracefully Decommission of NodeManager (v2).pdf Update proposal to reflect what we discussed above. Some key updates: - Change the whole architecture to keep Decommission_In_Progress dark from NM side but only within RM side. - Move tracking of timeout out of core of YARN to new CLI - Keep track on persistent of RMNode state (with tracking with YARN-2567) - Remove new configurations of enable and timeout, as both seems unnecessary for now - Break down work items Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
Siqi Li created YARN-3176: - Summary: In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li reassigned YARN-3176: - Assignee: Siqi Li In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3177) Fix the order of the parameters in YarnConfiguration
Brahma Reddy Battula created YARN-3177: -- Summary: Fix the order of the parameters in YarnConfiguration Key: YARN-3177 URL: https://issues.apache.org/jira/browse/YARN-3177 Project: Hadoop YARN Issue Type: Improvement Reporter: Brahma Reddy Battula Priority: Minor *1. keep Process principal and keytab one place..( NM and RM are not placed in order)* {code} public static final String RM_AM_MAX_ATTEMPTS = RM_PREFIX + am.max-attempts; public static final int DEFAULT_RM_AM_MAX_ATTEMPTS = 2; /** The keytab for the resource manager.*/ public static final String RM_KEYTAB = RM_PREFIX + keytab; /**The kerberos principal to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_USER_NAME_KEY = RM_PREFIX + webapp.spnego-principal; /**The kerberos keytab to be used for spnego filter for RM.*/ public static final String RM_WEBAPP_SPNEGO_KEYTAB_FILE_KEY = RM_PREFIX + webapp.spnego-keytab-file; {code} *2.RM webapp adress and port are not in order* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3176: -- Description: if the child queue does not have a maxRunningApp limit, it will use the queueMaxAppsDefault. This behavior is not quite right, since queueMaxAppsDefault is normally a small number, whereas some parent queues do have maxRunningApp set to be more than the default In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3176.v1.patch if the child queue does not have a maxRunningApp limit, it will use the queueMaxAppsDefault. This behavior is not quite right, since queueMaxAppsDefault is normally a small number, whereas some parent queues do have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3176: -- Attachment: YARN-3176.v1.patch In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3176.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3176) In Fair Scheduler, child queue should inherit maxApp from its parent
[ https://issues.apache.org/jira/browse/YARN-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316797#comment-14316797 ] Hadoop QA commented on YARN-3176: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698137/YARN-3176.v1.patch against trunk revision 22441ab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6596//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6596//console This message is automatically generated. In Fair Scheduler, child queue should inherit maxApp from its parent Key: YARN-3176 URL: https://issues.apache.org/jira/browse/YARN-3176 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-3176.v1.patch if the child queue does not have a maxRunningApp limit, it will use the queueMaxAppsDefault. This behavior is not quite right, since queueMaxAppsDefault is normally a small number, whereas some parent queues do have maxRunningApp set to be more than the default -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316851#comment-14316851 ] Allen Wittenauer commented on YARN-3164: Adding a unit test for this is a waste of time, IMO. I'm much more curious why we are override the method rather than just changing the text directly. Does anything else actually even use the method that's being overridden? (In general, having this method even exist seems like a strong case of over-engineering, which is pretty prevalent throughout Hadoop.) rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state
[ https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316414#comment-14316414 ] Varun Saxena commented on YARN-2902: Yes...Sorry but have been keeping busy since last couple of weeks. Will update by this weekend. Killing a container that is localizing can orphan resources in the DOWNLOADING state Key: YARN-2902 URL: https://issues.apache.org/jira/browse/YARN-2902 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-2902.002.patch, YARN-2902.patch If a container is in the process of localizing when it is stopped/killed then resources are left in the DOWNLOADING state. If no other container comes along and requests these resources they linger around with no reference counts but aren't cleaned up during normal cache cleanup scans since it will never delete resources in the DOWNLOADING state even if their reference count is zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Umbrella: Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316261#comment-14316261 ] Hudson commented on YARN-913: - FAILURE: Integrated in Hadoop-trunk-Commit #7070 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7070/]) YARN-2683. [YARN-913] registry config options: document and move to core-default. (stevel) (stevel: rev c3da2db48fd18c41096fe5d6d4650978fb31ae24) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Umbrella: Add a way to register long-lived services in a YARN cluster - Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316274#comment-14316274 ] Hudson commented on YARN-3090: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/]) YARN-3090. DeletionService can silently ignore deletion task failures. Contributed by Varun Saxena (jlowe: rev 4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/CHANGES.txt DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316269#comment-14316269 ] Hudson commented on YARN-2246: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/]) YARN-2246. Made the proxy tracking URL always be http(s)://proxy addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. (zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316277#comment-14316277 ] Hudson commented on YARN-2809: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/]) YARN-2809. Implement workaround for linux kernel panic when removing cgroup. Contributed by Nathan Roberts (jlowe: rev 3f5431a22fcef7e3eb9aceeefe324e5b7ac84049) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Umbrella: Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316253#comment-14316253 ] Hudson commented on YARN-913: - FAILURE: Integrated in Hadoop-trunk-Commit #7069 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7069/]) YARN-2616 [YARN-913] Add CLI client to the registry to list, view and manipulate entries. (Akshay Radia via stevel) (stevel: rev 362565cf5a8cbc1e7e66847649c29666d79f6938) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/cli/TestRegistryCli.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/cli/RegistryCli.java * hadoop-yarn-project/CHANGES.txt Umbrella: Add a way to register long-lived services in a YARN cluster - Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list, view and manipulate entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316252#comment-14316252 ] Hudson commented on YARN-2616: -- FAILURE: Integrated in Hadoop-trunk-Commit #7069 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7069/]) YARN-2616 [YARN-913] Add CLI client to the registry to list, view and manipulate entries. (Akshay Radia via stevel) (stevel: rev 362565cf5a8cbc1e7e66847649c29666d79f6938) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/cli/TestRegistryCli.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/cli/RegistryCli.java * hadoop-yarn-project/CHANGES.txt Add CLI client to the registry to list, view and manipulate entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Akshay Radia Fix For: 2.7.0 Attachments: YARN-2616-003.patch, YARN-2616-008.patch, YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316270#comment-14316270 ] Hudson commented on YARN-3160: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2033 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2033/]) YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. (Contributed by Chengbing Liu) (junping_du: rev c541a374d88ffed6ee71b0e5d556939ccd2c5159) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316232#comment-14316232 ] Steve Loughran commented on YARN-2616: -- +1 applying to branch-2+ Add CLI client to the registry to list/view entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Akshay Radia Attachments: YARN-2616-003.patch, YARN-2616-008.patch, YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3173) start-yarn.sh script can't aware how many RMs to be started.
[ https://issues.apache.org/jira/browse/YARN-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316372#comment-14316372 ] Allen Wittenauer commented on YARN-3173: This actually can be fixed if start-yarn uses some of the same tricks that start-dfs uses. However, it's a lot easier with HADOOP-11565 committed. start-yarn.sh script can't aware how many RMs to be started. - Key: YARN-3173 URL: https://issues.apache.org/jira/browse/YARN-3173 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: BOB Priority: Minor When config more than one RM, for example in the HA cluster, using the start-yarn.sh script to start yarn cluster,but the cluster only start up with one resourcemanager on the node which start-yarn.sh be executed. I think yarn should sense how many RMs been configured at the beginning, and start them all in start-yarn.sh script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2683) registry config options: document and move to core-default
[ https://issues.apache.org/jira/browse/YARN-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316260#comment-14316260 ] Hudson commented on YARN-2683: -- FAILURE: Integrated in Hadoop-trunk-Commit #7070 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7070/]) YARN-2683. [YARN-913] registry config options: document and move to core-default. (stevel) (stevel: rev c3da2db48fd18c41096fe5d6d4650978fb31ae24) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-configuration.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/yarn-registry.md * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/index.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/using-the-yarn-service-registry.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/registry/registry-security.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt registry config options: document and move to core-default -- Key: YARN-2683 URL: https://issues.apache.org/jira/browse/YARN-2683 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Fix For: 2.7.0 Attachments: HADOOP-10530-005.patch, YARN-2683-001.patch, YARN-2683-002.patch, YARN-2683-003.patch, YARN-2683-006.patch Original Estimate: 1h Time Spent: 1h Remaining Estimate: 0.5h Add to {{yarn-site}} a page on registry configuration parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316294#comment-14316294 ] Neill Lima commented on YARN-3152: -- [~vinodkv]] -- It fails in both RMs indeed. What was 'unexpected' it didn't fail in single RM because of the missing exclude file. Is the need of the exclude file so relevant to the RMs but not so much for the NNs? Because the behavior (NNs vs RMs) is very different. I lean towards the NNs behavior. Missing hadoop exclude file fails RMs in HA --- Key: YARN-3152 URL: https://issues.apache.org/jira/browse/YARN-3152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: Debian 7 Reporter: Neill Lima Assignee: Naganarasimha G R NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=1 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2616) Add CLI client to the registry to list, view and manipulate entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2616: - Summary: Add CLI client to the registry to list, view and manipulate entries (was: Add CLI client to the registry to list/view entries) Add CLI client to the registry to list, view and manipulate entries --- Key: YARN-2616 URL: https://issues.apache.org/jira/browse/YARN-2616 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Akshay Radia Attachments: YARN-2616-003.patch, YARN-2616-008.patch, YARN-2616-008.patch, yarn-2616-v1.patch, yarn-2616-v2.patch, yarn-2616-v4.patch, yarn-2616-v5.patch, yarn-2616-v6.patch, yarn-2616-v7.patch registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2809) Implement workaround for linux kernel panic when removing cgroup
[ https://issues.apache.org/jira/browse/YARN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316331#comment-14316331 ] Hudson commented on YARN-2809: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/]) YARN-2809. Implement workaround for linux kernel panic when removing cgroup. Contributed by Nathan Roberts (jlowe: rev 3f5431a22fcef7e3eb9aceeefe324e5b7ac84049) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/util/TestCgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt Implement workaround for linux kernel panic when removing cgroup Key: YARN-2809 URL: https://issues.apache.org/jira/browse/YARN-2809 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: RHEL 6.4 Reporter: Nathan Roberts Assignee: Nathan Roberts Fix For: 2.7.0 Attachments: YARN-2809-v2.patch, YARN-2809-v3.patch, YARN-2809.patch Some older versions of linux have a bug that can cause a kernel panic when the LCE attempts to remove a cgroup. It is a race condition so it's a bit rare but on a few thousand node cluster it can result in a couple of panics per day. This is the commit that likely (haven't verified) fixes the problem in linux: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-2.6.39.yid=068c5cc5ac7414a8e9eb7856b4bf3cc4d4744267 Details will be added in comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2246) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/YARN-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316323#comment-14316323 ] Hudson commented on YARN-2246: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/]) YARN-2246. Made the proxy tracking URL always be http(s)://proxy addr:port/proxy/appId to avoid duplicate sections. Contributed by Devaraj K. (zjshen: rev d5855c0e46404cfc1b5a63e59015e68ba668f0ea) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: YARN-2246 URL: https://issues.apache.org/jira/browse/YARN-2246 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Devaraj K Assignee: Devaraj K Fix For: 2.7.0 Attachments: MAPREDUCE-4064-1.patch, MAPREDUCE-4064.patch, YARN-2246-3.patch, YARN-2246-4.patch, YARN-2246.2.patch, YARN-2246.patch {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3160) Non-atomic operation on nodeUpdateQueue in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316324#comment-14316324 ] Hudson commented on YARN-3160: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/]) YARN-3160. Fix non-atomic operation on nodeUpdateQueue in RMNodeImpl. (Contributed by Chengbing Liu) (junping_du: rev c541a374d88ffed6ee71b0e5d556939ccd2c5159) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/CHANGES.txt Non-atomic operation on nodeUpdateQueue in RMNodeImpl - Key: YARN-3160 URL: https://issues.apache.org/jira/browse/YARN-3160 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Chengbing Liu Assignee: Chengbing Liu Fix For: 2.7.0 Attachments: YARN-3160.2.patch, YARN-3160.patch {code:title=RMNodeImpl.java|borderStyle=solid} while(nodeUpdateQueue.peek() != null){ latestContainerInfoList.add(nodeUpdateQueue.poll()); } {code} The above code brings potential risk of adding null value to {{latestContainerInfoList}}. Since {{ConcurrentLinkedQueue}} implements a wait-free algorithm, we can directly poll the queue, before checking whether the value is null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3090) DeletionService can silently ignore deletion task failures
[ https://issues.apache.org/jira/browse/YARN-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316328#comment-14316328 ] Hudson commented on YARN-3090: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #102 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/102/]) YARN-3090. DeletionService can silently ignore deletion task failures. Contributed by Varun Saxena (jlowe: rev 4eb5f7fa32bab1b9ce3fb58eca51e2cd2e194cd5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/CHANGES.txt DeletionService can silently ignore deletion task failures -- Key: YARN-3090 URL: https://issues.apache.org/jira/browse/YARN-3090 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Jason Lowe Assignee: Varun Saxena Fix For: 2.7.0 Attachments: YARN-3090.001.patch, YARN-3090.002.patch, YARN-3090.003.patch, YARN-3090.04.patch If a non-I/O exception occurs while the DeletionService is executing a deletion task then it will be silently ignored. The exception bubbles up to the thread workers of the ScheduledThreadPoolExecutor which simply attaches the throwable to the Future that was returned when the task was scheduled. However the thread pool is used as a fire-and-forget pool, so nothing ever looks at the Future and therefore the exception is never logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316961#comment-14316961 ] Jian He commented on YARN-3171: --- sorry, YARN-2163 is committed already, this is probably a different problem, forget what I said. Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Naganarasimha G R Priority: Minor Attachments: ats_webui.png The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3152) Missing hadoop exclude file fails RMs in HA
[ https://issues.apache.org/jira/browse/YARN-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316887#comment-14316887 ] Naganarasimha G R commented on YARN-3152: - [~vinodkv], [~xgong], [~neillfontes] [~rohithsharma] From the discussions till now what i could conclude is : As per the design if the required files are not there we need to fail fast, i.e. in case of Non HA cluster we should throw exception RM should fail to start . And in case of HA, transition to active should fail and none of the services should be active on failure. And as part of this jira we need to achieve this. Please inform if this approach is fine or needs more discussion on this. [~neillfontes], Hope you got to test with the steps which i mentioned in my earlier [comment|https://issues.apache.org/jira/browse/YARN-3152?focusedCommentId=14313875page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14313875]. Seems like you were able to see the same behavior what i mentioned in step 2 but wanted to know more about step3 where in i see the actives services getting oscillated between the 2 RM servers. Is it the same behavior as i mentioned or i am missing something. Missing hadoop exclude file fails RMs in HA --- Key: YARN-3152 URL: https://issues.apache.org/jira/browse/YARN-3152 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Environment: Debian 7 Reporter: Neill Lima Assignee: Naganarasimha G R NI have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception: 2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed 2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) ... 4 more Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297) ... 5 more 2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session 2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed 2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=1 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error) 2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file. I just think as of a improvement: - Should RMs ignore the missing file as the NNs did? - Should single RM fail even when the file is not present? Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316960#comment-14316960 ] Jian He commented on YARN-3171: --- [~Naganarasimha], thanks for working on this ! YARN-2163 is probably related to this too. Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Naganarasimha G R Priority: Minor Attachments: ats_webui.png The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316980#comment-14316980 ] Jason Lowe commented on YARN-914: - Thanks for updating the doc, Junping. Additional comments: Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS? The design says when a node starts decommissioning we will remove its resources from the cluster, but that's not really the case, correct? We should remove its available (not total) resources from the cluster then continue to remove available resources as containers complete on that node. Failing to do so will result in weird metrics like more resources running on the cluster than the cluster says it has, etc. Are we only going to support graceful decommission via updates to the include/exclude files and refresh? Not needed for the initial cut, but thinking of a couple of use-cases and curious what others thought: * Would be convenient to have an rmadmin command that does this in one step, especially for a single-node. Arguably if we are persisting cluster nodes in the state store we can migrate the list there, and the include/exclude list simply become convenient ways to batch-update the cluster state. * Will NMs be able to request a graceful decommission via their health check script? There have been some cases in the past where it would have been nice for the NM to request a ramp-down on containers but not instantly kill all of them with an UNHEALTHY report. As for the UI changes, initial thought is that decommissioning nodes should still show up in the active nodes list since they are still running containers. A separate decommissioning tab to filter for those nodes would be nice, although I suppose users can also just use the jquery table to sort/search for nodes in that state from the active nodes list if it's too crowded to add yet another node state tab (or maybe get rid of some effectively dead tabs like the reboot state tab). For the NM restart open question, this should no longer an issue now that the NM is unaware of graceful decommission All the RM needs to do is ensure that a node that is rejoining the cluster when the RM thought it was already part of it retains its previous running/decommissioning state. That way if an NM is decommissioning before the restart it will continue to decommission after it restarts. For the AM dealing with being notified of decommissioning, again I think this should just be treated like a strict preemption for the short term. IMHO all the AM needs to know is that the RM is planning on taking away those containers, and what the AM should do about it is similar whether the reason for removal is preemption or decommissioning. Back to the long running services delaying decommissioning concern, does YARN even know the difference between a long-running container and a normal container? If it doesn't, how is it supposed to know a container is not going to complete anytime soon? Even a normal container could run for many hours. It seems to me the first thing we would need before worrying about this scenario is the ability for YARN to know/predict the expected runtime of containers. There's still an open question about tracking the timeout RM side instead of NM side. Sounds like the NM side is not going to be pursued at this point, and we're going with no built-in timeout support in YARN for the short-term. Support graceful decommission of nodemanager Key: YARN-914 URL: https://issues.apache.org/jira/browse/YARN-914 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Luke Lu Assignee: Junping Du Attachments: Gracefully Decommission of NodeManager (v1).pdf, Gracefully Decommission of NodeManager (v2).pdf When NMs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running applications. Currently if a NM is decommissioned, all running containers on the NM need to be rescheduled on other NMs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well. We propose to introduce a mechanism to optionally gracefully decommission a node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3167) implement the core functionality of the base aggregator service
[ https://issues.apache.org/jira/browse/YARN-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317032#comment-14317032 ] Vrushali C commented on YARN-3167: -- Related: Attached on YARN-3031 a sequence diagram that reflects the interactions for writing between the AM, the base aggregator service, the timeline service writer api and backend store. https://issues.apache.org/jira/secure/attachment/12698191/Sequence_diagram_write_interaction.png implement the core functionality of the base aggregator service --- Key: YARN-3167 URL: https://issues.apache.org/jira/browse/YARN-3167 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee The basic skeleton of the timeline aggregator has been set up by YARN-3030. We need to implement the core functionality of the base aggregator service. The key things include - handling the requests from clients (sync or async) - buffering data - handling the aggregation logic - invoking the storage API -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3178) Clean up stack trace on the client when RM fails over
Arpit Gupta created YARN-3178: - Summary: Clean up stack trace on the client when RM fails over Key: YARN-3178 URL: https://issues.apache.org/jira/browse/YARN-3178 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Arpit Gupta Priority: Minor When the client fails over it spits out a stack trace. It will be good to clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: Sequence_diagram_write_interaction.png Attaching a sequence diagram that reflects the interactions for writing between the AM, the base aggregator service, the timeline service writer api and backend store. create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.png, YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: YARN-3031.01.patch create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316981#comment-14316981 ] Bartosz Ługowski commented on YARN-1621: Thanks for comments [~vinodkv] and [~Naganarasimha]. [~vinodkv], I can't rename it to list-containers, because options parser disallows - char. [~Naganarasimha], I think it is good idea to move it to yarn container -list, so we can easily add additional filters only in one place. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3178) Clean up stack trace on the client when RM fails over
[ https://issues.apache.org/jira/browse/YARN-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317034#comment-14317034 ] Arpit Gupta commented on YARN-3178: --- Here is an example stack trace {code} 14/02/25 17:40:23 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport. Not retrying because the invoked method is not idempotent, and unable to determine whether it was invoked java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: host/ip; destination host is: host:8032; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy11.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:142) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:268) at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:294) at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:152) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:531) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344) at org.apache.hadoop.mapred.JobClient$NetworkedJob.monitorAndPrintJob(JobClient.java:407) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:855) at org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:1018) at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:135) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1050) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:945) 14/02/25 17:40:23 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 14/02/25 17:40:23 INFO mapreduce.Job: map 14% reduce 0% 14/02/25 17:40:24 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport. Not retrying because the invoked method is not idempotent, and unable to determine whether it was invoked java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: host/ip; destination host is: host:8032; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at
[jira] [Updated] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated YARN-2764: - Labels: counters (was: ) counters.LimitExceededException shouldn't abort AsyncDispatcher --- Key: YARN-2764 URL: https://issues.apache.org/jira/browse/YARN-2764 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Ted Yu Labels: counters I saw the following in container log: {code} 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attemptattempt_1414221548789_0023_r_03_0 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. {code} Counter limit was exceeded when JobFinishedEvent was created. Better handling of LimitExceededException should be provided so that AsyncDispatcher can continue functioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3031: - Attachment: (was: YARN-3031.01.patch) create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3178) Clean up stack trace on the client when RM fails over
[ https://issues.apache.org/jira/browse/YARN-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3178: -- Assignee: Varun Saxena Clean up stack trace on the client when RM fails over - Key: YARN-3178 URL: https://issues.apache.org/jira/browse/YARN-3178 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Arpit Gupta Assignee: Varun Saxena Priority: Minor When the client fails over it spits out a stack trace. It will be good to clean that up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3036) [Storage implementation] Create standalone HBase backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3036: -- Summary: [Storage implementation] Create standalone HBase backing storage implementation for ATS writes (was: create standalone HBase backing storage implementation for ATS writes) [Storage implementation] Create standalone HBase backing storage implementation for ATS writes -- Key: YARN-3036 URL: https://issues.apache.org/jira/browse/YARN-3036 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Per design in YARN-2928, create a (default) standalone HBase backing storage implementation for ATS writes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3040) [Data Model] Implement client-side API for handling flows
[ https://issues.apache.org/jira/browse/YARN-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3040: -- Summary: [Data Model] Implement client-side API for handling flows (was: [Data ] Implement client-side API for handling flows) [Data Model] Implement client-side API for handling flows - Key: YARN-3040 URL: https://issues.apache.org/jira/browse/YARN-3040 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement client-side API for handling *flows*. Frameworks should be able to define and pass in all attributes of flows and flow runs to YARN, and they should be passed into ATS writers. YARN tags were discussed as a way to handle this piece of information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3042) [Data Model] Create ATS metrics API
[ https://issues.apache.org/jira/browse/YARN-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3042: -- Summary: [Data Model] Create ATS metrics API (was: create ATS metrics API) [Data Model] Create ATS metrics API --- Key: YARN-3042 URL: https://issues.apache.org/jira/browse/YARN-3042 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Siddharth Wagle Per design in YARN-2928, create the ATS metrics API and integrate it into the entities. The concept may be based on the existing hadoop metrics, but we want to make sure we have something that would satisfy all ATS use cases. It also needs to capture whether a metric should be aggregated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3047) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3047: -- Summary: [Data Serving] Set up ATS reader with basic request serving structure and lifecycle (was: set up ATS reader with basic request serving structure and lifecycle) [Data Serving] Set up ATS reader with basic request serving structure and lifecycle --- Key: YARN-3047 URL: https://issues.apache.org/jira/browse/YARN-3047 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Attachments: YARN-3047.001.patch Per design in YARN-2938, set up the ATS reader as a service and implement the basic structure as a service. It includes lifecycle management, request serving, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3048) [Data Serving] Handle how to set up and start/stop ATS reader instances
[ https://issues.apache.org/jira/browse/YARN-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3048: -- Summary: [Data Serving] Handle how to set up and start/stop ATS reader instances (was: handle how to set up and start/stop ATS reader instances) [Data Serving] Handle how to set up and start/stop ATS reader instances --- Key: YARN-3048 URL: https://issues.apache.org/jira/browse/YARN-3048 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, come up with a way to set up and start/stop ATS reader instances. This should allow setting up multiple instances and managing user traffic to those instances. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3051: -- Summary: [Storage abstraction] Create backing storage read interface for ATS readers (was: create backing storage read interface for ATS readers) [Storage abstraction] Create backing storage read interface for ATS readers --- Key: YARN-3051 URL: https://issues.apache.org/jira/browse/YARN-3051 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, create backing storage read interface that can be implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317180#comment-14317180 ] Robert Kanter commented on YARN-3181: - LGTM +1 pending Jenkins. Cleaning out findbugs are always a good thing FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3050) [Data Serving] Implement new flow-based ATS queries in the new ATS design
[ https://issues.apache.org/jira/browse/YARN-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3050: -- Summary: [Data Serving] Implement new flow-based ATS queries in the new ATS design (was: implement new flow-based ATS queries in the new ATS design) [Data Serving] Implement new flow-based ATS queries in the new ATS design - Key: YARN-3050 URL: https://issues.apache.org/jira/browse/YARN-3050 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Flow based queries.docx Implement new flow-based ATS queries in the new ATS design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3052) [Data Serving] Provide a very simple POC html ATS UI
[ https://issues.apache.org/jira/browse/YARN-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3052: -- Summary: [Data Serving] Provide a very simple POC html ATS UI (was: provide a very simple POC html ATS UI) [Data Serving] Provide a very simple POC html ATS UI Key: YARN-3052 URL: https://issues.apache.org/jira/browse/YARN-3052 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee As part of accomplishing a minimum viable product, we want to be able to show some UI in html (however crude it is). This subtask calls for creating a barebones UI to do that. This should be replaced later with a better-designed and implemented proper UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Compatiblity] Implement existing ATS queries in the new ATS design
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3049: -- Summary: [Compatiblity] Implement existing ATS queries in the new ATS design (was: implement existing ATS queries in the new ATS design) [Compatiblity] Implement existing ATS queries in the new ATS design --- Key: YARN-3049 URL: https://issues.apache.org/jira/browse/YARN-3049 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2
[ https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3150: -- Summary: [Documentation] Documenting the timeline service v2 (was: Documenting the timeline service v2) [Documentation] Documenting the timeline service v2 --- Key: YARN-3150 URL: https://issues.apache.org/jira/browse/YARN-3150 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Let's make sure we will have a document to describe what's new in TS v2, the APIs, the client libs and so on. We should do better around documentation in v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3166) [Source organization] Decide detailed package structures for timeline service v2 components
[ https://issues.apache.org/jira/browse/YARN-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3166: -- Summary: [Source organization] Decide detailed package structures for timeline service v2 components (was: Decide detailed package structures for timeline service v2 components) [Source organization] Decide detailed package structures for timeline service v2 components --- Key: YARN-3166 URL: https://issues.apache.org/jira/browse/YARN-3166 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu Open this JIRA to track all discussions on detailed package structures for timeline services v2. This JIRA is for discussion only. For our current timeline service v2 design, aggregator (previously called writer) implementation is in hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.aggregator}} In YARN-2928's design, the next gen ATS reader is also a server. Maybe we want to put reader related implementations into hadoop-yarn-server's: {{org.apache.hadoop.yarn.server.timelineservice.reader}} Both readers and aggregators will expose features that may be used by YARN and other 3rd party components, such as aggregator/reader APIs. For those features, maybe we would like to expose their interfaces to hadoop-yarn-common's {{org.apache.hadoop.yarn.timelineservice}}? Let's use this JIRA as a centralized place to track all related discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3171) Sort by application id doesn't work in ATS web ui
[ https://issues.apache.org/jira/browse/YARN-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317252#comment-14317252 ] Zhijie Shen commented on YARN-3171: --- Does YARN-2766 not fix the problem? Sort by application id doesn't work in ATS web ui - Key: YARN-3171 URL: https://issues.apache.org/jira/browse/YARN-3171 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Jeff Zhang Assignee: Naganarasimha G R Priority: Minor Attachments: ats_webui.png The order doesn't change when I click the column header -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3031) create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317122#comment-14317122 ] Hadoop QA commented on YARN-3031: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12698194/YARN-3031.01.patch against trunk revision 50625e6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6598//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6598//console This message is automatically generated. create backing storage write interface for ATS writers -- Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.png, YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3180) container-executor gets SEGV for default banned user
[ https://issues.apache.org/jira/browse/YARN-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317142#comment-14317142 ] Olaf Flebbe commented on YARN-3180: --- Seems like it. The fix to the logic is the same. The test is different. I do test the exact environment, the other test does test much more than simply the check_user() call. The patch does not apply on git trunk. please apply either . container-executor gets SEGV for default banned user Key: YARN-3180 URL: https://issues.apache.org/jira/browse/YARN-3180 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1, 2.6.1 Reporter: Olaf Flebbe Attachments: 0001-YARN-3180-container-executor-gets-SEGV-for-default-b.patch container-executor dumps core if container-executor.cfg * Does not contain a banned.users statement, getting the default in effect * The banned user id is above min.user.id * The user is contained in the default banned.user and yes this did happened to me. Patch and test appended (relativ to git trunk) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3124: - Attachment: YARN-3124.5.patch Thanks comments [~jianhe]. Attached new patch (ver.5) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, YARN-3124.4.patch, YARN-3124.5.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3031: -- Summary: [Storage abstraction] Create backing storage write interface for ATS writers (was: create backing storage write interface for ATS writers) [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.png, YARN-3031.01.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3033) [Aggregator wireup] Implement NM starting the ATS writer companion
[ https://issues.apache.org/jira/browse/YARN-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3033: -- Summary: [Aggregator wireup] Implement NM starting the ATS writer companion (was: implement NM starting the ATS writer companion) [Aggregator wireup] Implement NM starting the ATS writer companion -- Key: YARN-3033 URL: https://issues.apache.org/jira/browse/YARN-3033 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, implement node managers starting the ATS writer companion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3035) [Storage implementation] Create a test-only backing storage implementation for ATS writes
[ https://issues.apache.org/jira/browse/YARN-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3035: -- Summary: [Storage implementation] Create a test-only backing storage implementation for ATS writes (was: create a test-only backing storage implementation for ATS writes) [Storage implementation] Create a test-only backing storage implementation for ATS writes - Key: YARN-3035 URL: https://issues.apache.org/jira/browse/YARN-3035 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Per design in YARN-2928, create a test-only bare bone backing storage implementation for ATS writes. We could consider something like a no-op or in-memory storage strictly for development and testing purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3043) [Data Model] Create ATS configuration, metadata, etc. as part of entities
[ https://issues.apache.org/jira/browse/YARN-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3043: -- Summary: [Data Model] Create ATS configuration, metadata, etc. as part of entities (was: create ATS configuration, metadata, etc. as part of entities) [Data Model] Create ATS configuration, metadata, etc. as part of entities - Key: YARN-3043 URL: https://issues.apache.org/jira/browse/YARN-3043 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, create APIs for configuration, metadata, etc. and integrate them into entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3087: -- Summary: [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager (was: the REST server (web server) for per-node aggregator does not work if it runs inside node manager) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager - Key: YARN-3087 URL: https://issues.apache.org/jira/browse/YARN-3087 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Devaraj K This is related to YARN-3030. YARN-3030 sets up a per-node timeline aggregator and the associated REST server. It runs fine as a standalone process, but does not work if it runs inside the node manager due to possible collisions of servlet mapping. Exception: {noformat} org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for v2 not found at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3115) [Aggregator wireup] Work-preserving restarting of per-node aggregator
[ https://issues.apache.org/jira/browse/YARN-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3115: -- Summary: [Aggregator wireup] Work-preserving restarting of per-node aggregator (was: Work-preserving restarting of per-node aggregator) [Aggregator wireup] Work-preserving restarting of per-node aggregator - Key: YARN-3115 URL: https://issues.apache.org/jira/browse/YARN-3115 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-3030 makes the per-node aggregator work as the aux service of a NM. It contains the states of the per-app aggregators corresponding to the running AM containers on this NM. While NM is restarted in work-preserving mode, this information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3053) [Security] Review and implement for property security in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3053: -- Summary: [Security] Review and implement for property security in ATS v.2 (was: review and implement for property security in ATS v.2) [Security] Review and implement for property security in ATS v.2 Key: YARN-3053 URL: https://issues.apache.org/jira/browse/YARN-3053 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Per design in YARN-2928, we want to evaluate and review the system for security, and ensure proper security in the system. This includes proper authentication, token management, access control, and any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3153) Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio
[ https://issues.apache.org/jira/browse/YARN-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317255#comment-14317255 ] Wangda Tan commented on YARN-3153: -- Since most of the capacity-related config are ranged \[0, 100\], maximum-am-resource-percent should be a part of capacity settings like queue capacity, queue maximum-capacity. So I propose to make config to be Global configuration: {{yarn.scheduler.capacity.maximum-am-capacity-per-queue}}, default is 10 (10%) Queue configuration: {{yarn.scheduler.capacity.queue-path.maximum-am-capacity}} And to avoid confusion, we should deprecate: {{yarn.scheduler.capacity.maximum-am-resource-percent}} {{yarn.scheduler.capacity.queue-path.maximum-am-resource-percent}} In addition, maximum-am-capacity for queue is inheritable, when admin set a value for max-am in parent queue, leaf queue will inherit max-am if itself doesn't set. Sounds like a plan? Capacity Scheduler max AM resource limit for queues is defined as percentage but used as ratio -- Key: YARN-3153 URL: https://issues.apache.org/jira/browse/YARN-3153 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical In existing Capacity Scheduler, it can limit max applications running within a queue. The config is yarn.scheduler.capacity.maximum-am-resource-percent, but actually, it is used as ratio, in implementation, it assumes input will be \[0,1\]. So now user can specify it up to 100, which makes AM can use 100x of queue capacity. We should fix that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3183) Some classes define hashcode() but not equals()
[ https://issues.apache.org/jira/browse/YARN-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-3183: Attachment: YARN-3183.patch The patch adds {{equals}} methods that use the same variable that was used in {{hashCode}}. It also removes the unnecessary {{equals}} method. Some classes define hashcode() but not equals() --- Key: YARN-3183 URL: https://issues.apache.org/jira/browse/YARN-3183 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Minor Attachments: YARN-3183.patch These files all define {{hashCode}}, but don't define {{equals}}: {noformat} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationAttemptFinishEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationAttemptStartEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationFinishEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingApplicationStartEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingContainerFinishEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/WritingContainerStartEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptFinishedEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AppAttemptRegisteredEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationFinishedEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerCreatedEvent.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ContainerFinishedEvent.java {noformat} This one unnecessarily defines {{equals}}: {noformat} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceRetentionSet.java {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317048#comment-14317048 ] Wangda Tan commented on YARN-2495: -- Hi [~cwelch], Thanks for jumping in and providing your thoughts, and really sorry for the late response. I think biggest concern of you is about DECENTRALIZED_CONFIGURATION_ENABLED, let me talk about my thinkings :) IMHO, mixing decentralized/centralized is dangerous and will cause non-determinated result. You may think about merging them together, such as some labels set by admin using RMAdminCLI, and some others are set by NM. But I can give you an example shows it is still non-determinated even if we have +/- for ResourceTracker protocol: - Assume a node has label x,y (reported +x,+y) - RMAdmin remove y from the node (-y) - NM failure then restart, and report it has x,y (+x, +y). What should labels on the node be? I also don't like adding too much switches in configuration, but it seems a good way that we can support both with determinated behavior. For your other suggestions, - Name changes is-are, - Make RegisterNodeManagerRequest consist wiht NodeHeartbeatRequest I all agree with One more suggestion (as per suggested by [~vinodkv]), when there's anything wrong with node label reported from NM, we should fail NM (ask it to shutdown and give it proper diagnostic message). This is because if NM report a label but rejected, even if RM tell NM this, NM cannot handle it properly except print some error messages (we don't have smart logic now). Which will lead to problems in debugging (A NM reported some label to RM but scheduler failed allocating containers on the NM). To avoid it, a simple way is to shutdown the NM and admin can take a look at what happened. Thoughts? Wangda Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or using script suggested by [~aw] (YARN-2729) ) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
Karthik Kambatla created YARN-3181: -- Summary: FairScheduler: Fix up outdated findbugs issues Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2079: - Attachment: YARN-2079.002.patch Rebased patch for trunk. [~djp] could you take a look? It would be nice to get this into 2.7. Recover NonAggregatingLogHandler state upon nodemanager restart --- Key: YARN-2079 URL: https://issues.apache.org/jira/browse/YARN-2079 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2079.002.patch, YARN-2079.patch The state of NonAggregatingLogHandler needs to be persisted so logs are properly deleted across a nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2079: - Target Version/s: 2.7.0 (was: 2.6.0) Recover NonAggregatingLogHandler state upon nodemanager restart --- Key: YARN-2079 URL: https://issues.apache.org/jira/browse/YARN-2079 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2079.002.patch, YARN-2079.patch The state of NonAggregatingLogHandler needs to be persisted so logs are properly deleted across a nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues
[ https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3181: --- Attachment: yarn-3181-1.patch FairScheduler: Fix up outdated findbugs issues -- Key: YARN-3181 URL: https://issues.apache.org/jira/browse/YARN-3181 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-3181-1.patch In FairScheduler, we have excluded some findbugs-reported errors. Some of them aren't applicable anymore, and there are a few that can be easily fixed without needing an exclusion. It would be nice to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3038) [Aggregator wireup] Handle ATS writer failure scenarios
[ https://issues.apache.org/jira/browse/YARN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3038: -- Summary: [Aggregator wireup] Handle ATS writer failure scenarios (was: handle ATS writer failure scenarios) [Aggregator wireup] Handle ATS writer failure scenarios --- Key: YARN-3038 URL: https://issues.apache.org/jira/browse/YARN-3038 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, consider various ATS writer failure scenarios, and implement proper handling. For example, ATS writers may fail and exit due to OOM. It should be retried a certain number of times in that case. We also need to tie fatal ATS writer failures (after exhausting all retries) to the application failure, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3125: -- Summary: [Event producers] Change distributed shell to use new timeline service (was: Change distributed shell to use new timeline service) [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3134: -- Summary: [Storage implementation] Exploiting the option of using Phoenix to access HBase backend (was: Exploiting the option of using Phoenix to access HBase backend) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend --- Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-3168: --- Assignee: Gururaj Shetty Convert site documentation from apt to markdown --- Key: YARN-3168 URL: https://issues.apache.org/jira/browse/YARN-3168 Project: Hadoop YARN Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Gururaj Shetty Attachments: YARN-3168-00.patch YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3179) Update use of Iterator to Iterable
Ray Chiang created YARN-3179: Summary: Update use of Iterator to Iterable Key: YARN-3179 URL: https://issues.apache.org/jira/browse/YARN-3179 Project: Hadoop YARN Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Found these using the IntelliJ Findbugs-IDEA plugin, which uses findbugs3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3124) Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label
[ https://issues.apache.org/jira/browse/YARN-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317126#comment-14317126 ] Jian He commented on YARN-3124: --- remove unused QueueCapacities#reinitialize AbstractCSQueue#setupCapacities - setupConfiguredCapacities {{so we shouldn't do this for reservation queue }}, mind clarifying more? minor format {code} super.setupQueueConfigs(clusterResource); StringBuilder aclsString = new StringBuilder(); public synchronized void reinitialize( CSQueue newlyParsedQueue, Resource clusterResource) throws IOException { {code} Capacity Scheduler LeafQueue/ParentQueue should use QueueCapacities to track capacities-by-label Key: YARN-3124 URL: https://issues.apache.org/jira/browse/YARN-3124 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3124.1.patch, YARN-3124.2.patch, YARN-3124.3.patch, YARN-3124.4.patch After YARN-3098, capacities-by-label (include used-capacity/maximum-capacity/absolute-maximum-capacity, etc.) should be tracked in QueueCapacities. This patch is targeting to make capacities-by-label in CS Queues are all tracked by QueueCapacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3034) [Aggregator wireup] Implement RM starting its ATS writer
[ https://issues.apache.org/jira/browse/YARN-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3034: -- Summary: [Aggregator wireup] Implement RM starting its ATS writer (was: implement RM starting its ATS writer) [Aggregator wireup] Implement RM starting its ATS writer Key: YARN-3034 URL: https://issues.apache.org/jira/browse/YARN-3034 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3034.20150205-1.patch Per design in YARN-2928, implement resource managers starting their own ATS writers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3164) rmadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/YARN-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317162#comment-14317162 ] Wangda Tan commented on YARN-3164: -- In addition, [~bibinchundatt], could you replace tabs in your patch to spaces? rmadmin command usage prints incorrect command name --- Key: YARN-3164 URL: https://issues.apache.org/jira/browse/YARN-3164 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: YARN-3164.1.patch /hadoop/bin{color:red} ./yarn rmadmin -transitionToActive {color} transitionToActive: incorrect number of arguments Usage:{color:red} HAAdmin {color} [-transitionToActive serviceId [--forceactive]] {color:red} ./yarn HAAdmin {color} Error: Could not find or load main class HAAdmin Expected it should be rmadmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3041) [Data Model] create the ATS entity/event API
[ https://issues.apache.org/jira/browse/YARN-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3041: -- Summary: [Data Model] create the ATS entity/event API (was: [API] create the ATS entity/event API) [Data Model] create the ATS entity/event API Key: YARN-3041 URL: https://issues.apache.org/jira/browse/YARN-3041 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: YARN-3041.preliminary.001.patch Per design in YARN-2928, create the ATS entity and events API. Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, flow, flow run, YARN app, ...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)