[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032481#comment-15032481 ] Wangda Tan commented on YARN-4292: -- Thanks [~sunilg] for updating the patch, patch generally looks good, the only suggestion is renaming containersPhysicalMemoryMB to aggregated-. The name is longer but also more clear :) > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032514#comment-15032514 ] Hadoop QA commented on YARN-3223: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 168, now 167). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 42s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 52s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 30s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774860/YARN-3223-v3.patch | | JIRA
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032561#comment-15032561 ] Carlo Curino commented on YARN-4358: Thanks [~asuresh], I just uploading a version of the patch that should address a bunch of the checkstyle/findbugs etc. > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4401) A failed app recovery should not prevent the RM from starting
Daniel Templeton created YARN-4401: -- Summary: A failed app recovery should not prevent the RM from starting Key: YARN-4401 URL: https://issues.apache.org/jira/browse/YARN-4401 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.7.1 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Critical There are many different reasons why an app recovery could fail with an exception, causing the RM start to be aborted. If that happens the RM will fail to start. Presumably, the reason the RM is trying to do a recovery is that it's the standby trying to fill in for the active. Failing to come up defeats the purpose of the HA configuration. Instead of preventing the RM from starting, a failed app recovery should log an error and skip the application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-4358: --- Attachment: YARN-4358.3.patch > Improve relationship between SharingPolicy and ReservationAgent > --- > > Key: YARN-4358 > URL: https://issues.apache.org/jira/browse/YARN-4358 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.8.0 > > Attachments: YARN-4358.2.patch, YARN-4358.3.patch, YARN-4358.patch > > > At the moment an agent places based on available resources, but has no > visibility to extra constraints imposed by the SharingPolicy. While not all > constraints are easily represented some (e.g., max-instantaneous resources) > are easily represented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1856: Attachment: YARN-1856.001.patch Uploaded a patch that adds support for cgroups based memory monitoring. I found that the default setting for swappiness results in a significant change in behaviour compared to the existing pmem monitor. I've added a configuration to let admins set the swappiness value, with the default being 0. > cgroups based memory monitoring for containers > -- > > Key: YARN-1856 > URL: https://issues.apache.org/jira/browse/YARN-1856 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.3.0 >Reporter: Karthik Kambatla >Assignee: Varun Vasudev > Attachments: YARN-1856.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler
[ https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3980: - Issue Type: Sub-task (was: Bug) Parent: YARN-1011 > Plumb resource-utilization info in node heartbeat through to the scheduler > -- > > Key: YARN-3980 > URL: https://issues.apache.org/jira/browse/YARN-3980 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Karthik Kambatla >Assignee: Inigo Goiri > Fix For: 2.8.0 > > Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, > YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, > YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, > YARN-3980-v8.patch, YARN-3980-v9.patch > > > YARN-1012 and YARN-3534 collect resource utilization information for all > containers and the node respectively and send it to the RM on node heartbeat. > We should plumb it through to the scheduler so the scheduler can make use of > it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4397) if this addAll() function`s params is fault? @NodeListManager#getUnusableNodes()
[ https://issues.apache.org/jira/browse/YARN-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032083#comment-15032083 ] Daniel Templeton commented on YARN-4397: I'm not sure I understand. This function adds the contents of {{unusableRMNodesConcurrentSet}} to {{unUsableNodes}} and then returns the size of {{unusableRMNodesConcurrentSet}}. {{unusableRMNodesConcurrentSet}} isn't modified, so the order doesn't matter. > if this addAll() function`s params is fault? > @NodeListManager#getUnusableNodes() > > > Key: YARN-4397 > URL: https://issues.apache.org/jira/browse/YARN-4397 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: Feng Yuan > Fix For: 2.8.0 > > > code in NodeListManager#144L: > /** >* Provides the currently unusable nodes. Copies it into provided > collection. >* @param unUsableNodes >* Collection to which the unusable nodes are added >* @return number of unusable nodes added >*/ > public int getUnusableNodes(Collection unUsableNodes) { > unUsableNodes.addAll(unusableRMNodesConcurrentSet); > return unusableRMNodesConcurrentSet.size(); > } > unUsableNodes and unusableRMNodesConcurrentSet's sequence is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032073#comment-15032073 ] Sunil G commented on YARN-3226: --- Thanks [~djp] I will work on fixing these items in next patch. I will also wait for comments from [~xgong]. Thank You. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031919#comment-15031919 ] Hadoop QA commented on YARN-4399: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 37s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 37s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 56s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774830/YARN-4399.001.patch | | JIRA Issue | YARN-4399 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 84d5598b7a38 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3
[jira] [Updated] (YARN-2014) Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9
[ https://issues.apache.org/jira/browse/YARN-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2014: - Target Version/s: 2.9.0 (was: 2.6.3, 2.7.3) > Performance: AM scaleability is 10% slower in 2.4 compared to 0.23.9 > > > Key: YARN-2014 > URL: https://issues.apache.org/jira/browse/YARN-2014 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: patrick white >Assignee: Jason Lowe > > Performance comparison benchmarks from 2.x against 0.23 shows AM scalability > benchmark's runtime is approximately 10% slower in 2.4.0. The trend is > consistent across later releases in both lines, latest release numbers are: > 2.4.0.0 runtime 255.6 seconds (avg 5 passes) > 0.23.9.12 runtime 230.4 seconds (avg 5 passes) > Diff: -9.9% > AM Scalability test is essentially a sleep job that measures time to launch > and complete a large number of mappers. > The diff is consistent and has been reproduced in both a larger (350 node, > 100,000 mappers) perf environment, as well as a small (10 node, 2,900 > mappers) demo cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031914#comment-15031914 ] Junping Du commented on YARN-3223: -- bq. RMNodeImpl does not know directly the amount of usedResource in order to trigger an RMNodeResourceUpdateEvent. I can use rmNode.context.getScheduler().(rmNode.getNodeID()).getUsedResource(), but I'm not sure if adding that dependency on scheduler is okay. This is a reasonable concern. I would not prefer rmNode talk to schedulerNode directly also. Instead, I would prefer YarnScheduler to trigger RMNodeResourceUpdateEvent who should know the usedResource of schedulerNode. Concretely saying, there are two scenarios to trigger resource update event when nodes are in decommissioning: 1. DecommissioningNodeTransition happens on RMNode: we can let RMNode to send a new scheduler event with RMNode info only (no resource info needed), may be called something as DecommissioningNodeResourceUpdateSchedulerEvent, so scheduler in handling this event will create a RMNodeResourceUpdateEvent with SchedulerNode's usedResource instead. 2. Every time container get finished on decommissioning node: We can also send RMNodeResourceUpdateEvent in YarnScheduler (Fifo/Fair/Capacity) within completedContainer() just after SchedulerNode's usedResource get updated. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032243#comment-15032243 ] Sangjin Lee commented on YARN-3862: --- Thanks Allen for clarifying the issue. It's great that there are a lot of improvements made for the build in the trunk, but this does pose a pretty interesting challenge for branch development. In short, it's just not realistic for a branch development team to rebase with trunk on a constant basis. We do want to stay as close to the trunk as possible, but it is definitely an overhead that takes a full day or two of downtime to resolve issues. And in the latest rebase, we're still smarting from the git branch lockdown issue. As you suggested, a compromise here may be cherry-picking only those build changes from trunk. However, it's not without issues there. It's about always being able to isolate the build changes cleanly. For example, I'm not comfortable cherry-picking all of pom.xml changes. There can be pom changes that are not related to the build fixes and also those that are part of much bigger changes. If we had to cherry-pick them, I'd rather do a full rebase. Also, I hope this still preserves our ability to do a simple "git rebase trunk" when we do a regular rebase. I see the following 3 commits to dev-support since our last rebase: {noformat} commit 0ca8df716a1bb8e7f894914fb0d740a1d14df8e3 Author: Haohui MaiDate: Thu Nov 12 10:17:41 2015 -0800 HADOOP-12562. Make hadoop dockerfile usable by Yetus. Contributed by Allen Wittenauer. commit 123b3db743a86aa18e46ec44a08f7b2e7c7f6350 Author: Tsuyoshi Ozawa Date: Mon Oct 26 23:17:45 2015 +0900 HADOOP-12513. Dockerfile lacks initial 'apt-get update'. Contributed by Akihiro Suda. commit 39581e3be2aaeb1eeb7fb98b6bdecd8d4e3c7269 Author: Vinayakumar B Date: Tue Oct 13 19:00:08 2015 +0530 HDFS-9139. Enable parallel JUnit tests for HDFS Pre-commit (Contributed by Chris Nauroth and Vinayakumar B) {noformat} And I think we can cherry-pick HADOOP-12513 and HADOOP-12562. Does that sound right? Do we need more? > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-3223: - Attachment: YARN-3223-v3.patch > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch, YARN-3223-v3.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4400) AsyncDispatcher.waitForDrained should be final
Daniel Templeton created YARN-4400: -- Summary: AsyncDispatcher.waitForDrained should be final Key: YARN-4400 URL: https://issues.apache.org/jira/browse/YARN-4400 Project: Hadoop YARN Issue Type: Improvement Components: yarn Affects Versions: 2.7.1 Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4400) AsyncDispatcher.waitForDrained should be final
[ https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4400: --- Attachment: YARN-4400.001.patch > AsyncDispatcher.waitForDrained should be final > -- > > Key: YARN-4400 > URL: https://issues.apache.org/jira/browse/YARN-4400 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Trivial > Attachments: YARN-4400.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-3223: - Attachment: 0001-YARN-3223-resource-update.patch Since completedContainer is often called multiple times like from nodeUpdate(), I moved the trigger of RMNodeResourceUpdateEvent directly into nodeUpdate() when a node is decommissioning. If this is ok, I will add similar code to Fifo/Fair schedulers. > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-3223: - Attachment: (was: 0001-YARN-3223-resource-update.patch) > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032151#comment-15032151 ] Daniel Templeton commented on YARN-4398: The {{AsyncDispatcher.GenericEventHandler.handle()}} method is MT safe. The {{AsyncDispatcher.getEventHandler()}} is the unsafe call, and it's only unsafe because of the lazy initialization. Prior to YARN-1121, it was returning a new object every time, which was thread safe. I see two obvious options: revert the YARN-1121 optimization in the {{AsyncDispatcher.getEventHandler()}} method or do eager initialization into a final member variable. Either way, the calls become MT-safe, letting you just drop the synchronization. > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING > Attachments: YARN-4398.2.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { >
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032146#comment-15032146 ] Varun Saxena commented on YARN-3862: I guess porting HADOOP-12562 changes to this branch would be enough for this issue. [~aw], Any specific files/folders or umbrella JIRAs' we should be tracking for changes ? > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032195#comment-15032195 ] Allen Wittenauer commented on YARN-3862: All of the pom.xml files and dev-support minimally. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032154#comment-15032154 ] Varun Saxena commented on YARN-3862: I guess tracking changes in build related files in dev-support folder should be enough. That might be easier than rebasing, if changes will happen frequently. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4400) AsyncDispatcher.waitForDrained should be final
[ https://issues.apache.org/jira/browse/YARN-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032300#comment-15032300 ] Hadoop QA commented on YARN-4400: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 12s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774854/YARN-4400.001.patch | | JIRA Issue | YARN-4400 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 8bee22303cec 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032339#comment-15032339 ] Sangjin Lee commented on YARN-3862: --- Unless anyone objects, I'm going to cherry-pick those 2 commits to unblock this. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032364#comment-15032364 ] Sangjin Lee commented on YARN-3862: --- Committed HADOOP-12562 and kicked off the jenkins build. It turns out the last rebase was on 11/9 so HADOOP-12562 is the only build change we missed. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032337#comment-15032337 ] Wangda Tan commented on YARN-3946: -- [~Naganarasimha], thanks for update, some comments: 1) RMAppImpl: When app goes to final state (FINISHED/KILLEd, etc.), should we simply set AMLaunchDiagnostics to null? 2) SchedulerApplicationAttempt: Why need two separate methods: updateDiagnosticsIfNotRunning/updateDiagnostics? They're a little confusing to me, I think AM launch diagnostics should be updated only if AM container is not running. If you think it's make sense to you, I suggest to rename/merge them to updateAMContainerDiagnostics. 3) Do you think is it better to rename AMState.PENDING to inactivated? I think "PENDING" could mean "activated-but-not-activated" to end users (assume users don't have enough background knownledge about scheduler). 4) Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters Scheduled state, do you think is it better to do that in RUNNING and FINAL_SAVING state? Unmanaged AM could skip the SCHEDULED state. 5) It will be also very usaful if you can update AM launch diagnostics when RMAppAttempt go to LAUNCHED state, sometimes AM container allocated and sent to NM, but not sucessfully launched/registered to RM. Currently we don't know if this happens because YarnApplicationState doesn't have a "launched" state. [~jianhe], could you take a look at this patch as well? > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032406#comment-15032406 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 0s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s {color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} Patch generated 30 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice (total was 98, now 111). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 20s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_85 with JDK v1.7.0_85 generated 14 new issues (was 0, now 14). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 45s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 26m 24s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v3.patch > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032975#comment-15032975 ] Hadoop QA commented on YARN-4340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4340 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774927/YARN-4340.v3.patch | | JIRA Issue | YARN-4340 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9818/console | This message was automatically generated. > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032647#comment-15032647 ] Jian He commented on YARN-4398: --- [~iceberg565], thanks for looking into this. analysis makes sense to me. I think we can just remove the synchronized keyword ? bq. the AsyncDispatcher.getEventHandler() is the unsafe call Suppose the call is unsafe, in the worst case when contention happens, separate new objects will return to each caller instead of one, which is equivalent to new object every time as before ? > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING > Attachments: YARN-4398.2.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { > handler.handle(event); > } else { > throw new Exception("No handler for registered for " + type); > } > } catch (Throwable t) { > //TODO Maybe log the
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032818#comment-15032818 ] Sangjin Lee commented on YARN-4238: --- Sorry it took me a while to get to this. The changes look good mostly. It would be good to address the checkstyle issues and others as much as we can as they are basically the tech debt we have been accumulating. Thanks! [~varun_saxena] said: {quote} Now coming to what if client does not send it. This would be an issue if entities have to be returned sorted by created time or filtering on the basis of created time range has to be done. This can be explicitly stated for clients that if you do not report created time then we cannot guarantee order while fetching multiple entities. This will make it simple from an implementation viewpoint. If not, maybe we can cache it and check if entity has gone in to the backed or not and based on that, set created time. But in this case, issue is what if daemon(having the writer) goes down. Maybe we can store this info in a state store. But do we need to do that ? {quote} I think it is reasonable to say that the clients are required to set creation time and modification time, or they will not be present in the data and things like sort will not work correctly on those records. What do you think? If we do want to handle the case of missing creation time or modification time (that's an if), I think a co-processor might be the only reliable option for this (I spoke with [~vrushalic] on this). We do want to avoid having to read back data to do it. Some use cases for the co-processor: - filling in missing creation time stamp if it is missing - setting the modification timestamp only if the new value is later than the old value - setting the creation timestamp only if it the new value is older than the old value But running co-processors would add overhead and maintenance issues, so it would be good if we can avoid it... I'd like to hear what others think. > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032978#comment-15032978 ] stefanlee commented on YARN-4022: - hi forrestchen, my configuration in cluster is similar to you ,and when i delete a queue in fair-scheduler.xml,I submit an application to the deleted queue and the application will run using 'root.default' queue instead, then submit to an un-exist queue ,but still run using 'root.default' and no exception. why ? my hadoop version is 2.4.0 > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch, YARN-4022.002.patch, > YARN-4022.003.patch, YARN-4022.004.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4348: - Attachment: YARN-4348-branch-2.7.004.patch Adding missing {{continue}} statement after calling {{syncInternal}} in the following block: {code} if (shouldRetryWithNewConnection(ke.code()) && retry < numRetries) { LOG.info("Retrying operation on ZK with new Connection. " + "Retry no. " + retry); Thread.sleep(zkRetryInterval); createConnection(); syncInternal(ke.getPath()); continue; } {code} > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032979#comment-15032979 ] NING DING commented on YARN-4398: - Thanks for all your comments. I prefer to do eager initialization handlerInstance in AsyncDispatcher, then remove synchronized modifier in RMStateStore. Pelese see my new patch. > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING > Attachments: YARN-4398.2.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { > handler.handle(event); > } else { > throw new Exception("No handler for registered for " + type); > } > } catch (Throwable t) { > //TODO Maybe log the state of the queue > LOG.fatal("Error in dispatcher thread", t); > // If serviceStop is called, we should exit this thread gracefully. > if (exitOnDispatchException > &&
[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032845#comment-15032845 ] Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 3:18 AM: --- [~jianhe] good catch. Adding missing {{continue}} statement after calling {{syncInternal}} in the following block in v4 patch. was (Author: ozawa): Adding missing {{continue}} statement after calling {{syncInternal}} in the following block: {code} if (shouldRetryWithNewConnection(ke.code()) && retry < numRetries) { LOG.info("Retrying operation on ZK with new Connection. " + "Retry no. " + retry); Thread.sleep(zkRetryInterval); createConnection(); syncInternal(ke.getPath()); continue; } {code} > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033002#comment-15033002 ] Tsuyoshi Ozawa commented on YARN-4348: -- Jenkins still fail. Opened YETUS-217 to track the problem. Kicking Jenkins on local. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033002#comment-15033002 ] Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 3:22 AM: --- Jenkins still fail. Opened YETUS-217 to track the problem. Kicking test-patch.sh on local. was (Author: ozawa): Jenkins still fail. Opened YETUS-217 to track the problem. Kicking Jenkins on local. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] NING DING updated YARN-4398: Attachment: YARN-4398.3.patch > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING > Attachments: YARN-4398.2.patch, YARN-4398.3.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { > handler.handle(event); > } else { > throw new Exception("No handler for registered for " + type); > } > } catch (Throwable t) { > //TODO Maybe log the state of the queue > LOG.fatal("Error in dispatcher thread", t); > // If serviceStop is called, we should exit this thread gracefully. > if (exitOnDispatchException > && (ShutdownHookManager.get().isShutdownInProgress()) == false > && stopped == false) { > Thread shutDownThread = new Thread(createShutDownThread()); >
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v4.patch > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4022) queue not remove from webpage(/cluster/scheduler) when delete queue in xxx-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033006#comment-15033006 ] Hadoop QA commented on YARN-4022: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4022 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12754857/YARN-4022.004.patch | | JIRA Issue | YARN-4022 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9819/console | This message was automatically generated. > queue not remove from webpage(/cluster/scheduler) when delete queue in > xxx-scheduler.xml > > > Key: YARN-4022 > URL: https://issues.apache.org/jira/browse/YARN-4022 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: forrestchen > Labels: scheduler > Attachments: YARN-4022.001.patch, YARN-4022.002.patch, > YARN-4022.003.patch, YARN-4022.004.patch > > > When I delete an existing queue by modify the xxx-schedule.xml, I can still > see the queue information block in webpage(/cluster/scheduler) though the > 'Min Resources' items all become to zero and have no item of 'Max Running > Applications'. > I can still submit an application to the deleted queue and the application > will run using 'root.default' queue instead, but submit to an un-exist queue > will cause an exception. > My expectation is the deleted queue will not displayed in webpage and submit > application to the deleted queue will act just like the queue doesn't exist. > PS: There's no application running in the queue I delete. > Some related config in yarn-site.xml: > {code} > > yarn.scheduler.fair.user-as-default-queue > false > > > yarn.scheduler.fair.allow-undeclared-pools > false > > {code} > a related question is here: > http://stackoverflow.com/questions/26488564/hadoop-yarn-why-the-queue-cannot-be-deleted-after-i-revise-my-fair-scheduler-xm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3566) YARN Scheduler Web UI not properly sorting through Application ID or Progress bar
[ https://issues.apache.org/jira/browse/YARN-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-3566. -- Resolution: Duplicate Fixed by YARN-3406. Added YARN-3171 as a related ticket. > YARN Scheduler Web UI not properly sorting through Application ID or Progress > bar > - > > Key: YARN-3566 > URL: https://issues.apache.org/jira/browse/YARN-3566 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 2.5.0 >Reporter: Anthony Rojas > Attachments: Screen Shot 2015-04-30 at 1.23.56 PM.png > > > Noticed that the progress bar web UI component of the RM WebUI Cluster > scheduler is not sorting at all, whereas the as the RM web UI main view is > sortable. > The actual web URL that has the broken fields: > http://resource_manager.company.com:8088/cluster/scheduler > This URL however does have functional fields: > http://resource_manager.company.com:8088/cluster/apps > I'll attach a screenshot that shows which specific fields within the Web UI > table that aren't sorting when clicked on. > Clicking either the Progress Bar column or the Application ID column from > /cluster/scheduler did not trigger any changes at all; Shouldn't it have > sorted through ascending or descending of the jobs based on Application ID or > through the actual progress from the Progress bar? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032672#comment-15032672 ] Sangjin Lee commented on YARN-3862: --- The latest patch looks good to me. Can you please see if you can address the checkstyle and javadoc issues flagged in the jenkins run? Just one other nit: (ApplicationColumnPrefix.java) - l.177: unnecessary space change > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032834#comment-15032834 ] Jian He commented on YARN-4348: --- [~ozawa], should below call continue in the end ? {code} if (shouldRetryWithNewConnection(ke.code()) && retry < numRetries) { LOG.info("Retrying operation on ZK with new Connection. " + "Retry no. " + retry); Thread.sleep(zkRetryInterval); createConnection(); syncInternal(ke.getPath()); } {code} > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, > log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032709#comment-15032709 ] Sangjin Lee commented on YARN-3769: --- Thanks! cc [~djp] > Consider user limit when calculating total pending resource for preemption > policy in Capacity Scheduler > --- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 2.7.3 > > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, > YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4398) Yarn recover functionality causes the cluster running slowly and the cluster usage rate is far below 100
[ https://issues.apache.org/jira/browse/YARN-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032714#comment-15032714 ] Daniel Templeton commented on YARN-4398: [~jianhe], you are correct, but that approach just smells bad to me. It's behavior that someone will be confused by later. It would be better to do something intentional than something that accidentally works for a non-obvious reason. > Yarn recover functionality causes the cluster running slowly and the cluster > usage rate is far below 100 > > > Key: YARN-4398 > URL: https://issues.apache.org/jira/browse/YARN-4398 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: NING DING > Attachments: YARN-4398.2.patch > > > In my hadoop cluster, the resourceManager recover functionality is enabled > with FileSystemRMStateStore. > I found this cause the yarn cluster running slowly and cluster usage rate is > just 50 even there are many pending Apps. > The scenario is below. > In thread A, the RMAppImpl$RMAppNewlySavingTransition is calling > storeNewApplication method defined in RMStateStore. This storeNewApplication > method is synchronized. > {code:title=RMAppImpl.java|borderStyle=solid} > private static final class RMAppNewlySavingTransition extends > RMAppTransition { > @Override > public void transition(RMAppImpl app, RMAppEvent event) { > // If recovery is enabled then store the application information in a > // non-blocking call so make sure that RM has stored the information > // needed to restart the AM after RM restart without further client > // communication > LOG.info("Storing application with id " + app.applicationId); > app.rmContext.getStateStore().storeNewApplication(app); > } > } > {code} > {code:title=RMStateStore.java|borderStyle=solid} > public synchronized void storeNewApplication(RMApp app) { > ApplicationSubmissionContext context = app > > .getApplicationSubmissionContext(); > assert context instanceof ApplicationSubmissionContextPBImpl; > ApplicationStateData appState = > ApplicationStateData.newInstance( > app.getSubmitTime(), app.getStartTime(), context, app.getUser()); > dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState)); > } > {code} > In thread B, the FileSystemRMStateStore is calling > storeApplicationStateInternal method. It's also synchronized. > This storeApplicationStateInternal method saves an ApplicationStateData into > HDFS and it normally costs 90~300 milliseconds in my hadoop cluster. > {code:title=FileSystemRMStateStore.java|borderStyle=solid} > public synchronized void storeApplicationStateInternal(ApplicationId appId, > ApplicationStateData appStateDataPB) throws Exception { > Path appDirPath = getAppDir(rmAppRoot, appId); > mkdirsWithRetries(appDirPath); > Path nodeCreatePath = getNodePath(appDirPath, appId.toString()); > LOG.info("Storing info for app: " + appId + " at: " + nodeCreatePath); > byte[] appStateData = appStateDataPB.getProto().toByteArray(); > try { > // currently throw all exceptions. May need to respond differently for > HA > // based on whether we have lost the right to write to FS > writeFileWithRetries(nodeCreatePath, appStateData, true); > } catch (Exception e) { > LOG.info("Error storing info for app: " + appId, e); > throw e; > } > } > {code} > Think thread B firstly comes into > FileSystemRMStateStore.storeApplicationStateInternal method, then thread A > will be blocked for a while because of synchronization. In ResourceManager > there is only one RMStateStore instance. In my cluster it's > FileSystemRMStateStore type. > Debug the RMAppNewlySavingTransition.transition method, the thread stack > shows it's called form AsyncDispatcher.dispatch method. This method code is > as below. > {code:title=AsyncDispatcher.java|borderStyle=solid} > protected void dispatch(Event event) { > //all events go thru this loop > if (LOG.isDebugEnabled()) { > LOG.debug("Dispatching the event " + event.getClass().getName() + "." > + event.toString()); > } > Class type = event.getType().getDeclaringClass(); > try{ > EventHandler handler = eventDispatchers.get(type); > if(handler != null) { > handler.handle(event); > } else { > throw new Exception("No handler for registered for " + type); > } > } catch (Throwable t) { > //TODO Maybe log the state of the queue > LOG.fatal("Error in dispatcher thread", t); > // If serviceStop is called, we should exit this
[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032692#comment-15032692 ] Eric Payne commented on YARN-3769: -- [~sjlee0], Backport is in progress. Manual tests on 3-node cluster work well, but running into problems backporting the unit tests. > Consider user limit when calculating total pending resource for preemption > policy in Capacity Scheduler > --- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 2.7.3 > > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, > YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4358) Improve relationship between SharingPolicy and ReservationAgent
[ https://issues.apache.org/jira/browse/YARN-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032772#comment-15032772 ] Hadoop QA commented on YARN-4358: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 64, now 66). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 58s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_85 with JDK v1.7.0_85 generated 4 new issues (was 2, now 6). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 51s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 44s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK
[jira] [Reopened] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reopened YARN-1974: - Actually, it is useful to have this feature in DS. [~zhiguohong] Do you have the cycle to rebase the patch ? > add args for DistributedShell to specify a set of nodes on which the tasks run > -- > > Key: YARN-1974 > URL: https://issues.apache.org/jira/browse/YARN-1974 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.7.0 >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-1974.patch > > > It's very useful to execute a script on a specific set of machines for both > testing and maintenance purpose. > The args "--nodes" and "--relax_locality" are added to DistributedShell. > Together with an unit test using miniCluster. > It's also tested on our real cluster with Fair scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033077#comment-15033077 ] Sangjin Lee commented on YARN-3862: --- We might have to live with the ones about 7 parameters. Others, we should be able to fix, right? > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033059#comment-15033059 ] Varun Saxena commented on YARN-3862: Ok... Javadoc issues are related infact. Many of the checkstyle issues are not related though and cant be fixed as well(7 params in method). > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4401) A failed app recovery should not prevent the RM from starting
[ https://issues.apache.org/jira/browse/YARN-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033040#comment-15033040 ] Rohith Sharma K S commented on YARN-4401: - In an ideal case, app recovery should not fail. If it fails, then fix should given to "cause of failure". Do you have in mind any specific scenario which is causing recovery failure? I am open to get convinced:-) > A failed app recovery should not prevent the RM from starting > - > > Key: YARN-4401 > URL: https://issues.apache.org/jira/browse/YARN-4401 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > > There are many different reasons why an app recovery could fail with an > exception, causing the RM start to be aborted. If that happens the RM will > fail to start. Presumably, the reason the RM is trying to do a recovery is > that it's the standby trying to fill in for the active. Failing to come up > defeats the purpose of the HA configuration. Instead of preventing the RM > from starting, a failed app recovery should log an error and skip the > application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032845#comment-15032845 ] Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 5:44 AM: --- [~jianhe] good catch. Adding missing {{continue}} statement after calling {{syncInternal}} in v4 patch. was (Author: ozawa): [~jianhe] good catch. Adding missing {{continue}} statement after calling {{syncInternal}} in the following block in v4 patch. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4309: Attachment: YARN-4309.003.patch Uploaded a new patch that adds a section on broken symlinks to the directory info. > Add debug information to application logs when a container fails > > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033235#comment-15033235 ] Varun Saxena commented on YARN-3862: Yes, will fix whatever can be. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4328) Findbugs warning in resourcemanager in branch-2.7 and branch-2.6
[ https://issues.apache.org/jira/browse/YARN-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-4328: Attachment: YARN-4328.branch-2.7.00.patch Attaching a patch for branch-2.7. > Findbugs warning in resourcemanager in branch-2.7 and branch-2.6 > > > Key: YARN-4328 > URL: https://issues.apache.org/jira/browse/YARN-4328 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1, 2.6.2 >Reporter: Varun Saxena >Assignee: Akira AJISAKA >Priority: Minor > Attachments: YARN-4328.branch-2.7.00.patch > > > This issue exists in both branch-2.7 and branch-2.6 > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKSyncOperationCallback'> > category='PERFORMANCE' message='Should > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKSyncOperationCallback > be a _static_ inner class?' lineNumber='118'/> > {noformat} > Below issue exists only in branch-2.6 > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt'> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.queue; > locked 57% of time' lineNumber='261'/> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3417) AM to be able to exit with a request saying "restart me with these (possibly updated) resource requirements"
[ https://issues.apache.org/jira/browse/YARN-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033244#comment-15033244 ] Varun Saxena commented on YARN-3417: Steve, when is this required ? In 2.9.0 or we are targetting it for trunk ? Most of my bandwidth will be taken up by ATSv2 work. So if it is required immediately then probably Kuhu can take it up. I would not want to block it because of lack of bandwidth. Let me know. > AM to be able to exit with a request saying "restart me with these (possibly > updated) resource requirements" > > > Key: YARN-3417 > URL: https://issues.apache.org/jira/browse/YARN-3417 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > > If an AM wants to reconfigure itself or restart with new resources, there's > no way to do this without the active participation of a client. > It can call System.exit and rely on YARN to restart it -but that counts as a > failure and may lose the entire app. furthermore, that doesn't allow the AM > to resize itself. > A simple exit-code to be interpreted as restart-without-failure could handle > the first case; an explicit call to indicate restart, including potentially > new resource/label requirements, could be more reliabile, and certainly more > flexible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4328) Findbugs warning in resourcemanager in branch-2.7 and branch-2.6
[ https://issues.apache.org/jira/browse/YARN-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned YARN-4328: --- Assignee: Akira AJISAKA > Findbugs warning in resourcemanager in branch-2.7 and branch-2.6 > > > Key: YARN-4328 > URL: https://issues.apache.org/jira/browse/YARN-4328 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Akira AJISAKA >Priority: Minor > > This issue exists in both branch-2.7 and branch-2.6 > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKSyncOperationCallback'> > category='PERFORMANCE' message='Should > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKSyncOperationCallback > be a _static_ inner class?' lineNumber='118'/> > {noformat} > Below issue exists only in branch-2.6 > {noformat} > classname='org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt'> > category='MT_CORRECTNESS' message='Inconsistent synchronization of > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.queue; > locked 57% of time' lineNumber='261'/> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031629#comment-15031629 ] Hadoop QA commented on YARN-3542: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 32s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 37s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 34s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 52s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 244, now 241). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 25s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 34s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 52s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 13s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774788/YARN-3542.002.patch | | JIRA Issue | YARN-3542 | | Optional Tests | asflicense compile
[jira] [Commented] (YARN-1856) cgroups based memory monitoring for containers
[ https://issues.apache.org/jira/browse/YARN-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031677#comment-15031677 ] Hadoop QA commented on YARN-1856: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 23s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 35s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 238, now 245). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 33s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 36s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 46s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 1s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774801/YARN-1856.001.patch | | JIRA Issue | YARN-1856 | | Optional Tests | asflicense compile
[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2885: -- Summary: Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers (was: LocalRM: distributed scheduling decisions for queueable containers) > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3542: Attachment: YARN-3542.002.patch Uploaded a new patch to address test failures and checkstyle issues. > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Varun Vasudev >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2885: -- Attachment: YARN-2885-yarn-2877.001.patch Uploading a preliminary version of the patch.. # This depends on YARN-2882. Will upload a combined patch shortly # This patch introduces a Distributed Scheduling Request Interceptor using the AMRMProxy framework introduced in YARN-2884. The interceptor is set at the head of the pipeline if Distributed Scheduling is enabled. # An AM requiring Distributed scheduling can mark certain resource request asks with Execution Type QUEUEABLE. This is currently used just a means of partitioning the asks. The actual QUEUE-ing at the target NM will be tackled as part of YARN-2883 # Will be adding more test cases shortly > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
Lin Yiqun created YARN-4399: --- Summary: FairScheduler allocated container should resetSchedulingOpportunities count of its priority Key: YARN-4399 URL: https://issues.apache.org/jira/browse/YARN-4399 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun There is a bug on fairScheduler allocating containers when you configurate the locality configs.When you attempt to assigned a container,it will invoke {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned successfully or not. And if you configurate the yarn.scheduler.fair.locality.threshold.node and yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value will influence the locality of containers.Because if one container is assigned successfully and its priority schedulingOpportunity count will be increased, and second container will be increased again.This will may be let their priority of allowedLocality degrade. And this will let this container dealt by rackRequest. So I think in fairScheduler allocating container, if the previous container was dealt, its priority of schedulerCount should be reset to 0, and don't let its value influence container's allocating in next iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4399) FairScheduler allocated container should resetSchedulingOpportunities count of its priority
[ https://issues.apache.org/jira/browse/YARN-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated YARN-4399: Attachment: YARN-4399.001.patch > FairScheduler allocated container should resetSchedulingOpportunities count > of its priority > --- > > Key: YARN-4399 > URL: https://issues.apache.org/jira/browse/YARN-4399 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4399.001.patch > > > There is a bug on fairScheduler allocating containers when you configurate > the locality configs.When you attempt to assigned a container,it will invoke > {{FSAppAttempt#addSchedulingOpportunity}} whenever it can be assigned > successfully or not. And if you configurate the > yarn.scheduler.fair.locality.threshold.node and > yarn.scheduler.fair.locality.threshold.rack, the schedulingOpportunity value > will influence the locality of containers.Because if one container is > assigned successfully and its priority schedulingOpportunity count will be > increased, and second container will be increased again.This will may be let > their priority of allowedLocality degrade. And this will let this container > dealt by rackRequest. So I think in fairScheduler allocating container, if > the previous container was dealt, its priority of schedulerCount should be > reset to 0, and don't let its value influence container's allocating in next > iteration and this will increased the locality of containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)