[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555691#comment-14555691 ] Karthik Kambatla commented on YARN-3655: I would like to take a look at the patch as well. FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch, YARN-3655.002.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555810#comment-14555810 ] Lars Francke commented on YARN-3594: I don't think an additional test is needed as it should already be covered by existing tests. WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555819#comment-14555819 ] Lavkesh Lahngir commented on YARN-3591: --- Hm.. Got you point. Is DirectoryCollection class a good place to add newErrorDirs and newRepairedDirs ? So finally this is my understanding: please correct me if I am wrong. Def: newErrorDirs - Dirs which turned bad from localdirs or fulldirs. newRepairedDirs - Dirs which turned good from errorDirs. After calling checkLocalizedResources() with localdirs and fulldirs, we can call {code}cleanUpLocalDir(lfs, del, localDir);{code} on newRepairedDirs. We will put newErrorDirs to statestore so that when nm restarts it can do a cleanup. Also We need to remove them from statestore if they become repaired. Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3707: - Attachment: YARN-3707.1.patch Attached ver.1 patch. RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Priority: Blocker Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556926#comment-14556926 ] Karthik Kambatla commented on YARN-2942: Thanks for your persistence through the multiple versions of this design, Robert. I think we have an actionable plan now, thanks Jason and Vinod for your inputs on the JIRA and offline. Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556946#comment-14556946 ] Karthik Kambatla commented on YARN-3705: Yarn follows HDFS behavior here. Force-manual is discouraged and messes with automatic failover. I understand the proposed improvement would help with this, but would like to make sure the behavior is consistent across HDFS and Yarn. forcemanual transition of RM active/standby state in automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin -transitionToActive --forcemanual}} in automatic-failover.enabled mode changes the active/standby state of ResouceManager while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556899#comment-14556899 ] Hadoop QA commented on YARN-3632: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | javac | 7m 32s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 28s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 1 new checkstyle issues (total was 237, now 238). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 60m 26s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 96m 39s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734921/YARN-3632.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f346383 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/diffJavacWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8061/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8061/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8061/console | This message was automatically generated. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2942: Attachment: ConcatableAggregatedLogsProposal_v8.pdf Uploaded v8 doc based on the latest discussions. Aggregated Log Files should be combined --- Key: YARN-2942 URL: https://issues.apache.org/jira/browse/YARN-2942 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: CombinedAggregatedLogsProposal_v3.pdf, CombinedAggregatedLogsProposal_v6.pdf, CombinedAggregatedLogsProposal_v7.pdf, CompactedAggregatedLogsProposal_v1.pdf, CompactedAggregatedLogsProposal_v2.pdf, ConcatableAggregatedLogsProposal_v4.pdf, ConcatableAggregatedLogsProposal_v5.pdf, ConcatableAggregatedLogsProposal_v8.pdf, YARN-2942-preliminary.001.patch, YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, YARN-2942.003.patch Turning on log aggregation allows users to easily store container logs in HDFS and subsequently view them in the YARN web UIs from a central place. Currently, there is a separate log file for each Node Manager. This can be a problem for HDFS if you have a cluster with many nodes as you’ll slowly start accumulating many (possibly small) files per YARN application. The current “solution” for this problem is to configure YARN (actually the JHS) to automatically delete these files after some amount of time. We should improve this by compacting the per-node aggregated log files into one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556927#comment-14556927 ] Hadoop QA commented on YARN-3707: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 50m 1s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 86m 6s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734932/YARN-3707.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f346383 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8062/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8062/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8062/console | This message was automatically generated. RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556868#comment-14556868 ] MENG DING commented on YARN-1197: - So to summarize the current dilemma: Situation: - A container resource increase request has been granted, and a token has been issued to AM, and - The increase action has not been fulfilled, and the token is not expired yet Problem: - AM can initiate a container resource decrease action to NM, and NM will fulfill it and notify RM, and then - Before the toke expires, AM can still initiate a container resource increase action to NM with the token, and NM will fulfill it and notify RM Proposed solution: - When RM receives a container decrease message from NM, it will first check if there is an outstanding container increase action (by checking the ContainerResourceIncreaseExpirer) - If the answer is no, RM will go ahead and update its internal resource bookkeeping and reduce the container resource allocation for this container. - If the answer is yes, RM will skip the resource reduction in this cycle, keep the resource decrease message in its newlyDecreasedContainers data structure, and check again in the next NM-RM heartbeat cycle. - If in the next heartbeat, a resource increase message to the same container comes, the previous resource decrease message will be dropped. Not sure if there are better solution to this problem. Let me know if this makes sense or not. Thanks, Meng Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556914#comment-14556914 ] Wangda Tan commented on YARN-3632: -- bq. No, I want to avoid any possible interleaving of locks between the application and the queue, getting the ordering policy locks the queue briefly and this should not happen inside an application lock. Makes sense to me, I think both are fine to me. bq. The demand is being updated for that queue, I think the naming is clear enough. I'm OK with this. Any other comments? [~jianhe]. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2238: -- Attachment: YARN-2238.patch Uploaded a patch to fix the filter problem. Since the app table is actually shared on app page and scheduler page, the main idea is to not preserve the filter state of the data table. filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556803#comment-14556803 ] MENG DING commented on YARN-1197: - Well, I think I spoke too soon :-) The example I gave above is not entirely correct: 1. A container is currently using 6G 2. AM asks RM to increase it to 8G 3. RM grants the increase request, allocates the resource to the container to 8G, and issues a token to AM. It starts a timer and remembers the original resource allocation before the increase as 6G. 4. AM, instead of initiating the resource increase to NM, requests a resource decrease to NM to decrease it to 4G 5. The decrease is successful and RM gets the notification, and updates the container resource to 4G 6. Before the token expires, the AM requests the resource increase to NM 7 The increase is successful and RM gets the notification, and updates the container resource back to 8G Step 6 and 7 should not be allowed because the RM has already reduced the container resource to 4G, which effectively invalidated the previous granted increase request (8G), even though the token has not yet expired. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: mapreduce-project.patch.ver.1, tools-project.patch.ver.1, yarn-1197-scheduler-v1.pdf, yarn-1197-v2.pdf, yarn-1197-v3.pdf, yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, yarn-server-resourcemanager.patch.ver.1 The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556814#comment-14556814 ] Craig Welch commented on YARN-3632: --- bq. {code} if (application.updateResourceRequests(ask)) { } {code} No, I want to avoid any possible interleaving of locks between the application and the queue, getting the ordering policy locks the queue briefly and this should not happen inside an application lock. bq. {code} updateDemandForQueue {code} The demand is being updated for that queue, I think the naming is clear enough. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service
[ https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556877#comment-14556877 ] Hudson commented on YARN-3701: -- FAILURE: Integrated in Hadoop-trunk-Commit #7896 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7896/]) YARN-3701. Isolating the error of generating a single app report when (xgong: rev 455b3acf0e82b214e06bd7b538968252945cd3c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java Isolating the error of generating a single app report when getting all apps from generic history service Key: YARN-3701 URL: https://issues.apache.org/jira/browse/YARN-3701 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Fix For: 2.7.1 Attachments: YARN-3701.1.patch Nowadays, if some error of generating a single app report when getting the application list from generic history service, it will throw the exception. Therefore, even if it just 1 out of 100 apps has something wrong, the whole app list is screwed. The worst impact is making the default page (app list) of GHS web UI crash, wile REST API /applicationhistory/apps will also break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service
[ https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556701#comment-14556701 ] Xuan Gong commented on YARN-3701: - Make sense. For getAttempts and get Containers, probably, instead of throwing the exception, we could create a blank page and say the attempt/container does not exist. We could do it separately. The patch for this ticket is good enough. +1. Will commit shortly Isolating the error of generating a single app report when getting all apps from generic history service Key: YARN-3701 URL: https://issues.apache.org/jira/browse/YARN-3701 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3701.1.patch Nowadays, if some error of generating a single app report when getting the application list from generic history service, it will throw the exception. Therefore, even if it just 1 out of 100 apps has something wrong, the whole app list is screwed. The worst impact is making the default page (app list) of GHS web UI crash, wile REST API /applicationhistory/apps will also break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556767#comment-14556767 ] Craig Welch commented on YARN-3632: --- bq. 1) ... done bq. 2) ... done bq. 3) ... done Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556807#comment-14556807 ] Wangda Tan commented on YARN-3632: -- Thanks update [~cwelch], The only comment from my side is, you can still simplify CapacityScheduler changes a little bit, in {code} if (application.updateResourceRequests(ask)) { } {code} You can simply get queue, and call demandUpdated within the if {...} block, you don't need save the allocation as well as queue object outside of the synchronized block, correct? And the name {{updateDemandForQueue}} seems like a boolean, maybe renamed it to leafQueue should be clear enough. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556791#comment-14556791 ] Zhijie Shen commented on YARN-3700: --- [~xgong], thanks for the patch. Some comments bellow: 1. Actually not webapp will be affected by this config. REST API and application history protocol will be too. Can we rename it to yarn.timeline-service.generic-application-history.max-applications ? {code} 1450 /** Defines how many applications can be loaded into timeline service web ui.*/ 1451 public static final String TIMELINE_SERVICE_WEBAPP_MAX_APPS = 1452 TIMELINE_SERVICE_PREFIX + webapp.max-applications; {code} 2. In TestApplicationHistoryClientService, would you please validate if the applications are retrieved in descending order according to submission time (if I remember it correctly). 3. I'm thing if it is good support override the default config by passing the query param in url, such as ?max-applications=100. Thoughts? ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3707: -- Assignee: Wangda Tan RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3701) Isolating the error of generating a single app report when getting all apps from generic history service
[ https://issues.apache.org/jira/browse/YARN-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556872#comment-14556872 ] Xuan Gong commented on YARN-3701: - Committed into trunk/branch-2/branch-2.7. Thanks, zhijie Isolating the error of generating a single app report when getting all apps from generic history service Key: YARN-3701 URL: https://issues.apache.org/jira/browse/YARN-3701 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Attachments: YARN-3701.1.patch Nowadays, if some error of generating a single app report when getting the application list from generic history service, it will throw the exception. Therefore, even if it just 1 out of 100 apps has something wrong, the whole app list is screwed. The worst impact is making the default page (app list) of GHS web UI crash, wile REST API /applicationhistory/apps will also break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3632: - Target Version/s: 2.8.0 Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556938#comment-14556938 ] Wangda Tan commented on YARN-3707: -- Failed test is tracked by https://issues.apache.org/jira/browse/YARN-2871 RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556956#comment-14556956 ] Jian He commented on YARN-3707: --- looks good, +1 RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2238: -- Attachment: YARN-2238.patch filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2238: -- Attachment: (was: YARN-2238.patch) filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3676) Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests
[ https://issues.apache.org/jira/browse/YARN-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-3676: -- Attachment: YARN-3676.5.patch Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests Key: YARN-3676 URL: https://issues.apache.org/jira/browse/YARN-3676 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Attachments: YARN-3676.1.patch, YARN-3676.2.patch, YARN-3676.3.patch, YARN-3676.4.patch, YARN-3676.5.patch AssignMultiple is generally set to false to prevent overloading a Node (for eg, new NMs that have just joined) A possible scheduling optimization would be to disregard this directive for apps whose allowed locality is NODE_LOCAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location
[ https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned YARN-314: - Assignee: Karthik Kambatla Schedulers should allow resource requests of different sizes at the same priority and location -- Key: YARN-314 URL: https://issues.apache.org/jira/browse/YARN-314 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Karthik Kambatla Attachments: yarn-314-prelim.patch Currently, resource requests for the same container and locality are expected to all be the same size. While it it doesn't look like it's needed for apps currently, and can be circumvented by specifying different priorities if absolutely necessary, it seems to me that the ability to request containers with different resource requirements at the same priority level should be there for the future and for completeness sake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557004#comment-14557004 ] Jian He commented on YARN-3632: --- looks good to me too. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557007#comment-14557007 ] Xuan Gong commented on YARN-3700: - upload a new patch to Address all the latest comments ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch, YARN-3700.2.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557005#comment-14557005 ] Karthik Kambatla commented on YARN-3655: Oh, and I found it hard to understand the test. Can we add some documentation to clarify what the test is doing? We should essentially test the following: # Container gets reserved when not over maxAMShare # Container doesn't get reserved when over maxAMShare # If the maxAMShare were to go down due to fairshare going down, container gets unreserved. FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch, YARN-3655.002.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557052#comment-14557052 ] Hadoop QA commented on YARN-3700: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 43s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 3m 2s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | | | 47m 25s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryManagerOnTimelineStore | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734964/YARN-3700.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 446d515 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8064/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8064/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8064/console | This message was automatically generated. ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch, YARN-3700.2.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557022#comment-14557022 ] Xuan Gong commented on YARN-2238: - Test locally, the patch works fine. +1 LGTM. Will commit later if no objection. [~Naganarasimha] [~sjlee0] Could you take a look at this patch, too ? filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2238: -- Target Version/s: 2.7.1 (was: 2.8.0) filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557096#comment-14557096 ] Hadoop QA commented on YARN-3700: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 44s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 22s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 9s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | | | 47m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734975/YARN-3700.2.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 446d515 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8065/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8065/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8065/console | This message was automatically generated. ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557011#comment-14557011 ] Hadoop QA commented on YARN-2238: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 1 new checkstyle issues (total was 18, now 18). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 23s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | | | 38m 33s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734957/YARN-2238.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 446d515 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8063/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8063/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8063/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8063/console | This message was automatically generated. filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state
[ https://issues.apache.org/jira/browse/YARN-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557067#comment-14557067 ] Masatake Iwasaki commented on YARN-3705: Thanks for the comment, [~kasha]. I expected that ZKFailoverController (running as external process of NameNode) detects the status change of NameNode and quitElection in HDFS case.. I will check the behaviour in HDFS again. forcemanual transition of RM active/standby state in automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Sub-task Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin -transitionToActive --forcemanual}} in automatic-failover.enabled mode changes the active/standby state of ResouceManager while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3700: Attachment: YARN-3700.2.1.patch fix the testcase failure ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3655) FairScheduler: potential livelock due to maxAMShare limitation and container reservation
[ https://issues.apache.org/jira/browse/YARN-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557003#comment-14557003 ] Karthik Kambatla commented on YARN-3655: Comments on the patch: # okToUnreserve ## It was a little hard to wrap my head around. Can we negate it and call it {{isValidReservation(FSSchedulerNode)}}? ## Can we get rid of the if-else and have a simple {{return hasContainerForNode fitsInMaxShare !isOverAMShareLimit}}? # Add an {{if (isValidReservation)}} check in {{FSAppAttempt#reserve}} so all the reservation logic stays in one place? # In {{FSAppAttempt#assignContainer(node, request, nodeType, reserved)}}, ## We can get rid of the fitsInMaxShare check immediately preceding the call to {{reserve}}. ## Given {{if (fitsIn(capability, available))}}-block ends in return, we don't need to put the continuation in else. # While adding this check in {{FSAppAttempt#assignContainer(node)}} might work in practice, it somehow feels out of place. Also, assignReservedContainer could also lead to a reservation? # Instead of calling {{okToUnreserve}}/{{!isValidReservation}} in {{FairScheduler#attemptScheduling}}, we should likely add it as the first check in {{FSAppAttempt#assignReservedContainer}}. # Looks like assign-multiple is broken with reserved-containers. The while-loop for assign-multiple should look at both reserved and un-reserved containers assigned. Can we file a follow-up JIRA to fix this? FairScheduler: potential livelock due to maxAMShare limitation and container reservation - Key: YARN-3655 URL: https://issues.apache.org/jira/browse/YARN-3655 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-3655.000.patch, YARN-3655.001.patch, YARN-3655.002.patch FairScheduler: potential livelock due to maxAMShare limitation and container reservation. If a node is reserved by an application, all the other applications don't have any chance to assign a new container on this node, unless the application which reserves the node assigns a new container on this node or releases the reserved container on this node. The problem is if an application tries to call assignReservedContainer and fail to get a new container due to maxAMShare limitation, it will block all other applications to use the nodes it reserves. If all other running applications can't release their AM containers due to being blocked by these reserved containers. A livelock situation can happen. The following is the code at FSAppAttempt#assignContainer which can cause this potential livelock. {code} // Check the AM resource usage for the leaf queue if (!isAmRunning() !getUnmanagedAM()) { ListResourceRequest ask = appSchedulingInfo.getAllResourceRequests(); if (ask.isEmpty() || !getQueue().canRunAppAM( ask.get(0).getCapability())) { if (LOG.isDebugEnabled()) { LOG.debug(Skipping allocation because maxAMShare limit would + be exceeded); } return Resources.none(); } } {code} To fix this issue, we can unreserve the node if we can't allocate the AM container on the node due to Max AM share limitation and the node is reserved by the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3700: Attachment: YARN-3700.2.patch ATS Web Performance issue at load time when large number of jobs Key: YARN-3700 URL: https://issues.apache.org/jira/browse/YARN-3700 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-3700.1.patch, YARN-3700.2.patch Currently, we will load all the apps when we try to load the yarn timelineservice web page. If we have large number of jobs, it will be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3632: -- Attachment: YARN-3632.7.patch Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3676) Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests
[ https://issues.apache.org/jira/browse/YARN-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556858#comment-14556858 ] Hadoop QA commented on YARN-3676: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 49m 56s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 86m 8s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734915/YARN-3676.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f346383 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8060/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8060/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8060/console | This message was automatically generated. Disregard 'assignMultiple' directive while scheduling apps with NODE_LOCAL resource requests Key: YARN-3676 URL: https://issues.apache.org/jira/browse/YARN-3676 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Arun Suresh Assignee: Arun Suresh Attachments: YARN-3676.1.patch, YARN-3676.2.patch, YARN-3676.3.patch, YARN-3676.4.patch, YARN-3676.5.patch AssignMultiple is generally set to false to prevent overloading a Node (for eg, new NMs that have just joined) A possible scheduling optimization would be to disregard this directive for apps whose allowed locality is NODE_LOCAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3632) Ordering policy should be allowed to reorder an application when demand changes
[ https://issues.apache.org/jira/browse/YARN-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556922#comment-14556922 ] Craig Welch commented on YARN-3632: --- BTW, the whitespace and checkstyle look to be unimportant, the javac unrelated, and TestNodeLabelContainerAllocation passes fine for me with the patch so it is also unrelated. Ordering policy should be allowed to reorder an application when demand changes --- Key: YARN-3632 URL: https://issues.apache.org/jira/browse/YARN-3632 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-3632.0.patch, YARN-3632.1.patch, YARN-3632.3.patch, YARN-3632.4.patch, YARN-3632.5.patch, YARN-3632.6.patch, YARN-3632.7.patch At present, ordering policies have the option to have an application re-ordered (for allocation and preemption) when it is allocated to or a container is recovered from the application. Some ordering policies may also need to reorder when demand changes if that is part of the ordering comparison, this needs to be made available (and used by the fairorderingpolicy when sizebasedweight is true) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2238) filtering on UI sticks even if I move away from the page
[ https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-2238: - Assignee: Jian He filtering on UI sticks even if I move away from the page Key: YARN-2238 URL: https://issues.apache.org/jira/browse/YARN-2238 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Jian He Labels: usability Attachments: YARN-2238.patch, filtered.png The main data table in many web pages (RM, AM, etc.) seems to show an unexpected filtering behavior. If I filter the table by typing something in the key or value field (or I suspect any search field), the data table gets filtered. The example I used is the job configuration page for a MR job. That is expected. However, when I move away from that page and visit any other web page of the same type (e.g. a job configuration page), the page is rendered with the filtering! That is unexpected. What's even stranger is that it does not render the filtering term. As a result, I have a page that's mysteriously filtered but doesn't tell me what it's filtering on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3707) RM Web UI queue filter doesn't work
[ https://issues.apache.org/jira/browse/YARN-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556963#comment-14556963 ] Hudson commented on YARN-3707: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7897 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7897/]) YARN-3707. RM Web UI queue filter doesn't work. Contributed by Wangda Tan (jianhe: rev 446d51591e6e99cc60a85c4b9fbac379a8caa49d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java RM Web UI queue filter doesn't work --- Key: YARN-3707 URL: https://issues.apache.org/jira/browse/YARN-3707 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Blocker Fix For: 2.8.0 Attachments: YARN-3707.1.patch It cannot filter queue under root, it looks like YARN-3362 causes this issue. It changed .q field so that queue filter cannot get correct queue name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557149#comment-14557149 ] Joep Rottinghuis commented on YARN-3706: [~vrushalic] if we convert column names to lower case (to ensure that they are lower, in the face of potential later changes), should we also convert _all_ column names to lower case and cleanse them from separators right? Generalize native HBase writer for additional tables Key: YARN-3706 URL: https://issues.apache.org/jira/browse/YARN-3706 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Joep Rottinghuis Assignee: Joep Rottinghuis Priority: Minor When reviewing YARN-3411 we noticed that we could change the class hierarchy a little in order to accommodate additional tables easily. In order to get ready for benchmark testing we left the original layout in place, as performance would not be impacted by the code hierarchy. Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557155#comment-14557155 ] Vrushali C commented on YARN-3706: -- Hmm, yes, that would be a good thing to do. It would be a one way conversion though, which should be fine but we need to be aware of, while generating the response. We don't do that in hraven for metrics and config and I think similarly in the YARN-3411 patch, we are not doing that for config and metrics. Generalize native HBase writer for additional tables Key: YARN-3706 URL: https://issues.apache.org/jira/browse/YARN-3706 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Joep Rottinghuis Assignee: Joep Rottinghuis Priority: Minor When reviewing YARN-3411 we noticed that we could change the class hierarchy a little in order to accommodate additional tables easily. In order to get ready for benchmark testing we left the original layout in place, as performance would not be impacted by the code hierarchy. Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555964#comment-14555964 ] Junping Du commented on YARN-3594: -- +1. Patch LGTM. Committing this in. WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Priyank Rastogi updated YARN-3543: -- Labels: (was: BB2015-05-TBR) ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2832) Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors
[ https://issues.apache.org/jira/browse/YARN-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved YARN-2832. - Resolution: Duplicate It is fixed as part of YARN-3375, closing as duplicate. Wrong Check Logic of NodeHealthCheckerService Causes Latent Errors -- Key: YARN-2832 URL: https://issues.apache.org/jira/browse/YARN-2832 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1, 2.5.1 Environment: Any environment Reporter: Tianyin Xu Attachments: health.check.service.1.patch NodeManager allows users to specify the health checker script that will be invoked by the health-checker service via the configuration parameter, _yarn.nodemanager.health-checker.script.path_ During the _serviceInit()_ of the health-check service, NM checks whether the parameter is set correctly using _shouldRun()_, as follows, {code:title=/* NodeHealthCheckerService.java */|borderStyle=solid} protected void serviceInit(Configuration conf) throws Exception { if (NodeHealthScriptRunner.shouldRun(conf)) { nodeHealthScriptRunner = new NodeHealthScriptRunner(); addService(nodeHealthScriptRunner); } addService(dirsHandler); super.serviceInit(conf); } {code} The problem is that if the parameter is misconfigured (e.g., permission problem, wrong path), NM does not have any log message to inform users which could cause latent errors or mysterious problems (e.g., why my scripts does not work?) I see the checking and printing logic is put in _serviceStart()_ function in _NodeHealthScriptRunner.java_ (see the following code snippets). However, the logic is very wrong. For an incorrect parameter that does not pass the shouldRun check, _serviceStart()_ would never be called because the _NodeHealthScriptRunner_ instance does not have the chance to be created (see the code snippets above). {code:title=/* NodeHealthScriptRunner.java */|borderStyle=solid} protected void serviceStart() throws Exception { // if health script path is not configured don't start the thread. if (!shouldRun(conf)) { LOG.info(Not starting node health monitor); return; } ... } {code} Basically, I think the checking and printing logic should be put in the serviceInit() in NodeHealthCheckerService instead of serviceStart() in NodeHealthScriptRunner. See the attachment for the simple patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555996#comment-14555996 ] Lavkesh Lahngir commented on YARN-3591: --- For adding newErrorDirs do we have to create a new protobuf message and implement methods for storing and loading in all statestores? Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3591) Resource Localisation on a bad disk causes subsequent containers failure
[ https://issues.apache.org/jira/browse/YARN-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555961#comment-14555961 ] Lavkesh Lahngir commented on YARN-3591: --- typo: cleanUpLocalDir(lfs, del, newRepairedDirs); Resource Localisation on a bad disk causes subsequent containers failure - Key: YARN-3591 URL: https://issues.apache.org/jira/browse/YARN-3591 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lavkesh Lahngir Assignee: Lavkesh Lahngir Attachments: 0001-YARN-3591.1.patch, 0001-YARN-3591.patch, YARN-3591.2.patch, YARN-3591.3.patch, YARN-3591.4.patch It happens when a resource is localised on the disk, after localising that disk has gone bad. NM keeps paths for localised resources in memory. At the time of resource request isResourcePresent(rsrc) will be called which calls file.exists() on the localised path. In some cases when disk has gone bad, inodes are stilled cached and file.exists() returns true. But at the time of reading, file will not open. Note: file.exists() actually calls stat64 natively which returns true because it was able to find inode information from the OS. A proposal is to call file.list() on the parent path of the resource, which will call open() natively. If the disk is good it should return an array of paths with length at-least 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3704) Container Launch fails with exitcode 127 with DefaultContainerExecutor
Devaraj K created YARN-3704: --- Summary: Container Launch fails with exitcode 127 with DefaultContainerExecutor Key: YARN-3704 URL: https://issues.apache.org/jira/browse/YARN-3704 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0, 3.0.0 Reporter: Devaraj K Priority: Minor Please find the below NM log when the issue occurs. {code:xml} 2015-05-22 08:08:53,165 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1432208816246_0930_01_37 is : 127 2015-05-22 08:08:53,166 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1432208816246_0930_01_37 and exit code: 127 ExitCodeException exitCode=127: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1432208816246_0930_01_37 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 127 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=127: 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456) 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 2015-05-22 08:08:53,179 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:262) 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2015-05-22 08:08:53,180 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 127 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1432208816246_0930_01_37 transitioned from RUNNING to EXITED_WITH_FAILURE 2015-05-22 08:08:53,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1432208816246_0930_01_37 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556037#comment-14556037 ] Junping Du commented on YARN-41: Thanks [~devaraj.k] for updating the patch! Just finish my review. Two major comments: {code} +// the isStopped check is for avoiding multiple unregistrations. +if (this.registeredWithRM !this.isStopped !isNMUnderSupervision()) { + unRegisterNM(); +} {code} According to discussion above, I think we need to check yarn.nodemanager.recovery.enabled as well. Because even NM is under supervision, if we disable nm recovery, we should continue to unregister NM to RM. {code} .addTransition(NodeState.RUNNING, NodeState.DECOMMISSIONED, - RMNodeEventType.DECOMMISSION, + EnumSet.of(RMNodeEventType.DECOMMISSION, RMNodeEventType.SHUTDOWN), {code} So, the node after shutdown will become DECOMMISSIONED as final state? I think we don't expect these nodes show in DECOMMISSIONED list. Isn't it? May be we should have some new NodeState as SHUTDOWN for this case. This could make changes incompatible, at least for behaviors and UI. We may need to mark this JIRA as incompatible and document these changes somewhere when patch is done. Some minor comments: Add tests for new PB objects UnRegisterNodeManagerRequestPBImpl, UnRegisterNodeManagerResponsePBImpl into TestYarnServerApiClasses.java. {code} catch (YarnException e) { + throw new ServiceException(e); +} catch (IOException e) { + throw new ServiceException(e); +} {code} Better to replace with {code} catch (YarnException | IOException e) { throw new ServiceException(e); } {code} {code} + @Test + public void testUnRegisterNodeManager() throws Exception { +UnRegisterNodeManagerRequest request = recordFactory +.newRecordInstance(UnRegisterNodeManagerRequest.class); +assertNotNull(client.unRegisterNodeManager(request)); + +ResourceTrackerTestImpl.exception = true; +try { + client.unRegisterNodeManager(request); + fail(there should be YarnException); +} catch (YarnException e) { + assertTrue(e.getMessage().startsWith(testMessage)); +} finally { + ResourceTrackerTestImpl.exception = false; +} + } {code} If other exception get thrown here with wrong message, the test would still pass. Isn't it? better to catch all exceptions and check if it is YarnException. {code} + private void unRegisterNM() { +RecordFactory recordFactory = RecordFactoryPBImpl.get(); +UnRegisterNodeManagerRequest request = recordFactory +.newRecordInstance(UnRegisterNodeManagerRequest.class); +request.setNodeId(this.nodeId); +try { + resourceTracker.unRegisterNodeManager(request); + LOG.info(Successfully Unregistered the Node with ResourceManager); +} catch (Exception e) { + LOG.warn(Unregistration of Node failed., e); +} + } {code} Put nodeId in the log could help in trouble shooting, also add the missing period in log. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556000#comment-14556000 ] Hudson commented on YARN-3594: -- FAILURE: Integrated in Hadoop-trunk-Commit #7892 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7892/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3703) Container Launch fails with exitcode 2 with DefaultContainerExecutor
Devaraj K created YARN-3703: --- Summary: Container Launch fails with exitcode 2 with DefaultContainerExecutor Key: YARN-3703 URL: https://issues.apache.org/jira/browse/YARN-3703 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0, 3.0.0 Reporter: Devaraj K Priority: Minor Please find the below NM log when the issue occurs. {code:xml} 2015-05-21 20:14:53,907 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1432208816246_0225_01_34 is : 2 2015-05-21 20:14:53,908 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1432208816246_0225_01_34 and exit code: 2 ExitCodeException exitCode=2: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) at org.apache.hadoop.util.Shell.run(Shell.java:456) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1432208816246_0225_01_34 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 2 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=2: 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:456) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:262) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2015-05-21 20:14:53,910 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2015-05-21 20:14:53,910 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 2 2015-05-21 20:14:53,911 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1432208816246_0225_01_34 transitioned from RUNNING to EXITED_WITH_FAILURE 2015-05-21 20:14:53,911 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1432208816246_0225_01_34 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3477: - Labels: (was: BB2015-05-TBR) TimelineClientImpl swallows exceptions -- Key: YARN-3477 URL: https://issues.apache.org/jira/browse/YARN-3477 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0, 2.7.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-3477-001.patch, YARN-3477-002.patch If timeline client fails more than the retry count, the original exception is not thrown. Instead some runtime exception is raised saying retries run out # the failing exception should be rethrown, ideally via NetUtils.wrapException to include URL of the failing endpoing # Otherwise, the raised RTE should (a) state that URL and (b) set the original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556098#comment-14556098 ] Devaraj K commented on YARN-41: --- Thanks a lot [~djp] for review and for the comments. bq. So, the node after shutdown will become DECOMMISSIONED as final state? I think we don't expect these nodes show in DECOMMISSIONED list. Isn't it? May be we should have some new NodeState as SHUTDOWN for this case. {code:xml}Shouldn't be counting shut-down nodes in LOST. Adding a new state is perhaps an over-kill, DECOMMISSIONED is the closest I can think of.{code} Initial got the above comment from Vinod about the node state. I feel we can proceed here with DECOMMISSIONED state as Vinod suggested and also it will not create any compatibility issue. Adding a new state and Ui change can be done as per your suggestions with the following up jira and can be implemented for trunk without causing the compatibility issue in 2.x versions. Please post your comments on this. bq. If other exception get thrown here with wrong message, the test would still pass. Isn't it? better to catch all exceptions and check if it is YarnException. I don't see any problem with this test. If any other exception gets here then that exception would be directly thrown from the test and anyway test fails with that exception. Please correct me if I am missing anything here. The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId
[ https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1391: - Labels: (was: BB2015-05-TBR) Lost node list should be identify by NodeId --- Key: YARN-1391 URL: https://issues.apache.org/jira/browse/YARN-1391 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch in case of multiple node managers on a single machine. each of them should be identified by NodeId, which is more unique than just host name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1426: - Labels: (was: BB2015-05-TBR) YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.
[ https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556170#comment-14556170 ] Junping Du commented on YARN-41: bq. Initial got the above comment from Vinod about the node state. I feel we can proceed here with DECOMMISSIONED state as Vinod suggested and also it will not create any compatibility issue. Sorry. I could miss that comments before. OK. I think we are talking about UI and behavior compatibility rather than API compatibility. The previous things we can tolerant within major releases but the later one we cannot. Both ways doesn't break API compatibility as NodeState is marked as UNSTABLE and we were just adding a DECOMMISSIONING state. Isn't it? The compatibility issue I mentioned above is still there as behavior changes: user (and management tools) could feel confused that after this patch, the node will show up in the decommission list after normally shutdown. It has been a long time that decommission means we want to retire some nodes, so will reject the registration of these nodes when they are restarted (unless we explicitly remove these nodes from the list), but this is not the case after this patch. I agree with Vinod that LOST is pretty far away here, but DECOMMISSIONED is also not so ideal I think. Thoughts? The RM should handle the graceful shutdown of the NM. - Key: YARN-41 URL: https://issues.apache.org/jira/browse/YARN-41 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Reporter: Ravi Teja Ch N V Assignee: Devaraj K Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41.patch Instead of waiting for the NM expiry, RM should remove and handle the NM, which is shutdown gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556095#comment-14556095 ] Hudson commented on YARN-3675: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556093#comment-14556093 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Assignee: Raju Bairishetti Fix For: 2.7.1 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556156#comment-14556156 ] Hadoop QA commented on YARN-1126: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12697127/YARN-905-addendum.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 55ed655 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8056/console | This message was automatically generated. Add validation of users input nodes-states options to nodes CLI --- Key: YARN-1126 URL: https://issues.apache.org/jira/browse/YARN-1126 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-905-addendum.patch Follow the discussion in YARN-905. (1) case-insensitive checks for all. (2) validation of users input, exit with non-zero code and print all valid states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556083#comment-14556083 ] Hudson commented on YARN-3594: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/CHANGES.txt WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API
[ https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556094#comment-14556094 ] Hudson commented on YARN-3694: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072) * hadoop-yarn-project/CHANGES.txt * hadoop-project/src/site/site.xml Fix dead link for TimelineServer REST API - Key: YARN-3694 URL: https://issues.apache.org/jira/browse/YARN-3694 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3694.patch There is a dead link in the index. {code:title=hadoop-project/src/site/site.xml} item name=Timeline Server href=TimelineServer.html#Timeline_Server_REST_API_v1/ {code} should be fixed as {code} item name=Timeline Server href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556092#comment-14556092 ] Hudson commented on YARN-3684: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #204 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/204/]) YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more extensible mechanism of context objects. Contributed by Sidharta Seethana. (vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java *
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556078#comment-14556078 ] Hudson commented on YARN-3684: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more extensible mechanism of context objects. Contributed by Sidharta Seethana. (vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java *
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556081#comment-14556081 ] Hudson commented on YARN-3675: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556079#comment-14556079 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Assignee: Raju Bairishetti Fix For: 2.7.1 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556069#comment-14556069 ] Hudson commented on YARN-3594: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API
[ https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556080#comment-14556080 ] Hudson commented on YARN-3694: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #935 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/935/]) YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072) * hadoop-project/src/site/site.xml * hadoop-yarn-project/CHANGES.txt Fix dead link for TimelineServer REST API - Key: YARN-3694 URL: https://issues.apache.org/jira/browse/YARN-3694 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3694.patch There is a dead link in the index. {code:title=hadoop-project/src/site/site.xml} item name=Timeline Server href=TimelineServer.html#Timeline_Server_REST_API_v1/ {code} should be fixed as {code} item name=Timeline Server href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3705) forcemanual transition of RM active/standby state in automatic-failover mode should change elector state
Masatake Iwasaki created YARN-3705: -- Summary: forcemanual transition of RM active/standby state in automatic-failover mode should change elector state Key: YARN-3705 URL: https://issues.apache.org/jira/browse/YARN-3705 Project: Hadoop YARN Issue Type: Improvement Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Executing {{rmadmin -transitionToActive --forcemanual}} and {{rmadmin -transitionToActive --forcemanual}} in automatic-failover.enabled mode changes the active/standby state of ResouceManager while keeping the state of ActiveStandbyElector. It should make elector to quit and rejoin otherwise forcemanual transition should not be allowed in automatic-failover mode in order to avoid confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556118#comment-14556118 ] Junping Du commented on YARN-1126: -- The patch has been uploaded for long time. Re-kick Jenkins test to see if it still applies. Add validation of users input nodes-states options to nodes CLI --- Key: YARN-1126 URL: https://issues.apache.org/jira/browse/YARN-1126 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-905-addendum.patch Follow the discussion in YARN-905. (1) case-insensitive checks for all. (2) validation of users input, exit with non-zero code and print all valid states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1126) Add validation of users input nodes-states options to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1126: - Labels: (was: BB2015-05-TBR) Add validation of users input nodes-states options to nodes CLI --- Key: YARN-1126 URL: https://issues.apache.org/jira/browse/YARN-1126 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-905-addendum.patch Follow the discussion in YARN-905. (1) case-insensitive checks for all. (2) validation of users input, exit with non-zero code and print all valid states when user gives an invalid state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raju Bairishetti reassigned YARN-3644: -- Assignee: Raju Bairishetti Node manager shuts down if unable to connect with RM Key: YARN-3644 URL: https://issues.apache.org/jira/browse/YARN-3644 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Srikanth Sundarrajan Assignee: Raju Bairishetti When NM is unable to connect to RM, NM shuts itself down. {code} } catch (ConnectException e) { //catch and throw the exception if tried MAX wait time to connect RM dispatcher.getEventHandler().handle( new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); throw new YarnRuntimeException(e); {code} In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs. Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1515: - Target Version/s: (was: 2.6.0) Provide ContainerManagementProtocol#signalContainer processing a batch of signals -- Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556334#comment-14556334 ] Hudson commented on YARN-3675: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown
[ https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556183#comment-14556183 ] Junping Du commented on YARN-1426: -- Thanks [~jeagles] for delivering the patch! The patch LGTM overall. Just one minor issues: {code} - new RMNMInfo(rmContext, scheduler); + rmNMInfo = new RMNMInfo(rmContext, scheduler); + rmNMInfo.registerMBean(); {code} I think we may prefer to move rmNMInfo.registerMBean(); to serviceStart() instead of serviceInit(). Theoretically, serviceStop() should undo things we did in serviceStart(). Isn't it? YARN Components need to unregister their beans upon shutdown Key: YARN-1426 URL: https://issues.apache.org/jira/browse/YARN-1426 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.0.0, 2.3.0 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1743) Decorate event transitions and the event-types with their behaviour
[ https://issues.apache.org/jira/browse/YARN-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1743: - Labels: documentation (was: BB2015-05-TBR documentation) Decorate event transitions and the event-types with their behaviour --- Key: YARN-1743 URL: https://issues.apache.org/jira/browse/YARN-1743 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Jeff Zhang Labels: documentation Attachments: NodeManager.gv, NodeManager.pdf, YARN-1743-2.patch, YARN-1743-3.patch, YARN-1743.patch Helps to annotate the transitions with (start-state, end-state) pair and the events with (source, destination) pair. Not just readability, we may also use them to generate the event diagrams across components. Not a blocker for 0.23, but let's see. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3706) Generalize native HBase writer for additional tables
[ https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556277#comment-14556277 ] Joep Rottinghuis commented on YARN-3706: [~sjlee0] I've created this jira and will upload a partial initial patch with the layout of the code. I can't seen to assign this jira to myself. Could you please do so for me? cc [~vrushalic] Generalize native HBase writer for additional tables Key: YARN-3706 URL: https://issues.apache.org/jira/browse/YARN-3706 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Joep Rottinghuis Priority: Minor When reviewing YARN-3411 we noticed that we could change the class hierarchy a little in order to accommodate additional tables easily. In order to get ready for benchmark testing we left the original layout in place, as performance would not be impacted by the code hierarchy. Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1515) Provide ContainerManagementProtocol#signalContainer processing a batch of signals
[ https://issues.apache.org/jira/browse/YARN-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1515: - Labels: (was: BB2015-05-TBR) Provide ContainerManagementProtocol#signalContainer processing a batch of signals -- Key: YARN-1515 URL: https://issues.apache.org/jira/browse/YARN-1515 Project: Hadoop YARN Issue Type: Sub-task Components: api, nodemanager Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: YARN-1515.v01.patch, YARN-1515.v02.patch, YARN-1515.v03.patch, YARN-1515.v04.patch, YARN-1515.v05.patch, YARN-1515.v06.patch, YARN-1515.v07.patch, YARN-1515.v08.patch This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for timed-out task attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API
[ https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556200#comment-14556200 ] Hudson commented on YARN-3694: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072) * hadoop-project/src/site/site.xml * hadoop-yarn-project/CHANGES.txt Fix dead link for TimelineServer REST API - Key: YARN-3694 URL: https://issues.apache.org/jira/browse/YARN-3694 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3694.patch There is a dead link in the index. {code:title=hadoop-project/src/site/site.xml} item name=Timeline Server href=TimelineServer.html#Timeline_Server_REST_API_v1/ {code} should be fixed as {code} item name=Timeline Server href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556188#comment-14556188 ] Hudson commented on YARN-3594: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/CHANGES.txt WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556226#comment-14556226 ] Hudson commented on YARN-3646: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Assignee: Raju Bairishetti Fix For: 2.7.1 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556215#comment-14556215 ] Hudson commented on YARN-3594: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/CHANGES.txt WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556228#comment-14556228 ] Hudson commented on YARN-3675: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/CHANGES.txt FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556330#comment-14556330 ] Hudson commented on YARN-3684: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more extensible mechanism of context objects. Contributed by Sidharta Seethana. (vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java *
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556332#comment-14556332 ] Hudson commented on YARN-3646: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Assignee: Raju Bairishetti Fix For: 2.7.1 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3594) WintuilsProcessStubExecutor.startStreamReader leaks streams
[ https://issues.apache.org/jira/browse/YARN-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556321#comment-14556321 ] Hudson commented on YARN-3594: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3594. WintuilsProcessStubExecutor.startStreamReader leaks streams. Contributed by Lars Francke. (junping_du: rev 132d909d4a6509af9e63e24cbb719be10006b6cd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/CHANGES.txt WintuilsProcessStubExecutor.startStreamReader leaks streams --- Key: YARN-3594 URL: https://issues.apache.org/jira/browse/YARN-3594 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Steve Loughran Assignee: Lars Francke Priority: Trivial Labels: newbie Fix For: 2.8.0 Attachments: YARN-3594.1.patch while looking at the file, my IDE highlights that the thread runnables started by {{WintuilsProcessStubExecutor.startStreamReader()}} don't close their streams as they exit. a java7 try-with-resources would trivially fix this -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3694) Fix dead link for TimelineServer REST API
[ https://issues.apache.org/jira/browse/YARN-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556333#comment-14556333 ] Hudson commented on YARN-3694: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2151/]) YARN-3694. Fix dead link for TimelineServer REST API. Contributed by Jagadesh Kiran N. (aajisaka: rev a5def580879428bc7af3c030ef33554e0519f072) * hadoop-project/src/site/site.xml * hadoop-yarn-project/CHANGES.txt Fix dead link for TimelineServer REST API - Key: YARN-3694 URL: https://issues.apache.org/jira/browse/YARN-3694 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Akira AJISAKA Assignee: Jagadesh Kiran N Priority: Minor Labels: newbie Fix For: 2.7.1 Attachments: YARN-3694.patch There is a dead link in the index. {code:title=hadoop-project/src/site/site.xml} item name=Timeline Server href=TimelineServer.html#Timeline_Server_REST_API_v1/ {code} should be fixed as {code} item name=Timeline Server href=hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1/ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1462: - Labels: (was: BB2015-05-TBR) AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Attachments: YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556197#comment-14556197 ] Hudson commented on YARN-3684: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more extensible mechanism of context objects. Contributed by Sidharta Seethana. (vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java *
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556201#comment-14556201 ] Hudson commented on YARN-3675: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3646) Applications are getting stuck some times in case of retry policy forever
[ https://issues.apache.org/jira/browse/YARN-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556199#comment-14556199 ] Hudson commented on YARN-3646: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2133 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2133/]) YARN-3646. Applications are getting stuck some times in case of retry (devaraj: rev 0305316d6932e6f1a05021354d77b6934e57e171) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java Applications are getting stuck some times in case of retry policy forever - Key: YARN-3646 URL: https://issues.apache.org/jira/browse/YARN-3646 Project: Hadoop YARN Issue Type: Bug Components: client Reporter: Raju Bairishetti Assignee: Raju Bairishetti Fix For: 2.7.1 Attachments: YARN-3646.001.patch, YARN-3646.002.patch, YARN-3646.patch We have set *yarn.resourcemanager.connect.wait-ms* to -1 to use FOREVER retry policy. Yarn client is infinitely retrying in case of exceptions from the RM as it is using retrying policy as FOREVER. The problem is it is retrying for all kinds of exceptions (like ApplicationNotFoundException), even though it is not a connection failure. Due to this my application is not progressing further. *Yarn client should not retry infinitely in case of non connection failures.* We have written a simple yarn-client which is trying to get an application report for an invalid or older appId. ResourceManager is throwing an ApplicationNotFoundException as this is an invalid or older appId. But because of retry policy FOREVER, client is keep on retrying for getting the application report and ResourceManager is throwing ApplicationNotFoundException continuously. {code} private void testYarnClientRetryPolicy() throws Exception{ YarnConfiguration conf = new YarnConfiguration(); conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS, -1); YarnClient yarnClient = YarnClient.createYarnClient(); yarnClient.init(conf); yarnClient.start(); ApplicationId appId = ApplicationId.newInstance(1430126768987L, 10645); ApplicationReport report = yarnClient.getApplicationReport(appId); } {code} *RM logs:* {noformat} 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 21 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875162 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1430126768987_10645' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 15/05/14 16:33:24 INFO ipc.Server: IPC Server handler 47 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 10.14.120.231:61621 Call#875163 Retry#0 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1526) change owner before setting setguid
[ https://issues.apache.org/jira/browse/YARN-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1526: - Labels: (was: BB2015-05-TBR) change owner before setting setguid --- Key: YARN-1526 URL: https://issues.apache.org/jira/browse/YARN-1526 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Environment: BSD Reporter: Radim Kolar Attachments: create-order-chown.txt, create-order.txt if nodemgr work directory has copy group from parent flag (often set on /tmp), chmod fails to set gid bit because owners group does not match caller group. to chown first, then chmod -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3684) Change ContainerExecutor's primary lifecycle methods to use a more extensible mechanism for passing information.
[ https://issues.apache.org/jira/browse/YARN-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556224#comment-14556224 ] Hudson commented on YARN-3684: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #193 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/193/]) YARN-3684. Changed ContainerExecutor's primary lifecycle methods to use a more extensible mechanism of context objects. Contributed by Sidharta Seethana. (vinodkv: rev 53fafcf061616516c24e2e2007a66a93d23d3e25) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerLivenessContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/LocalizerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DockerContainerExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerReacquisitionContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerStartContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/ContainerSignalContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDockerContainerExecutorWithMocks.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/executor/DeletionAsUserContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java *
[jira] [Commented] (YARN-3543) ApplicationReport should be able to tell whether the Application is AM managed or not.
[ https://issues.apache.org/jira/browse/YARN-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556239#comment-14556239 ] Rohith commented on YARN-3543: -- [~aw] Would you help to understand and resolve build issue? Basically the issue what I observe is the patch containes many file changes that includes many projects. When the test cases are triggered, it is ignoring the applied patches and taking existing class files which causing the compilation failure and other issues. But if I apply patch and build , it is successfull. ApplicationReport should be able to tell whether the Application is AM managed or not. --- Key: YARN-3543 URL: https://issues.apache.org/jira/browse/YARN-3543 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Spandan Dutta Assignee: Rohith Attachments: 0001-YARN-3543.patch, 0001-YARN-3543.patch, 0002-YARN-3543.patch, 0002-YARN-3543.patch, 0003-YARN-3543.patch, 0004-YARN-3543.patch, 0004-YARN-3543.patch, YARN-3543-AH.PNG, YARN-3543-RM.PNG Currently we can know whether the application submitted by the user is AM managed from the applicationSubmissionContext. This can be only done at the time when the user submits the job. We should have access to this info from the ApplicationReport as well so that we can check whether an app is AM managed or not anytime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3706) Generalize native HBase writer for additional tables
Joep Rottinghuis created YARN-3706: -- Summary: Generalize native HBase writer for additional tables Key: YARN-3706 URL: https://issues.apache.org/jira/browse/YARN-3706 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Joep Rottinghuis Priority: Minor When reviewing YARN-3411 we noticed that we could change the class hierarchy a little in order to accommodate additional tables easily. In order to get ready for benchmark testing we left the original layout in place, as performance would not be impacted by the code hierarchy. Here is a separate jira to address the hierarchy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3675) FairScheduler: RM quits when node removal races with continousscheduling on the same node
[ https://issues.apache.org/jira/browse/YARN-3675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556299#comment-14556299 ] Hudson commented on YARN-3675: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #203 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/203/]) YARN-3675. FairScheduler: RM quits when node removal races with continuous-scheduling on the same node. (Anubhav Dhoot via kasha) (kasha: rev 4513761869c732cf2f462763043067ebf8749df7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java FairScheduler: RM quits when node removal races with continousscheduling on the same node - Key: YARN-3675 URL: https://issues.apache.org/jira/browse/YARN-3675 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Fix For: 2.7.1 Attachments: YARN-3675.001.patch, YARN-3675.002.patch, YARN-3675.003.patch With continuous scheduling, scheduling can be done on a node thats just removed causing errors like below. {noformat} 12:28:53.782 AM FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Error in handling event type APP_ATTEMPT_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.unreserve(FSAppAttempt.java:469) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:815) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplicationAttempt(FairScheduler.java:763) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1217) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:111) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684) at java.lang.Thread.run(Thread.java:745) 12:28:53.783 AMINFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Exiting, bbye.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)